1

Let's say we have a case of money laundering detection, and the only identification for customer and business is their bank_account numbers. How can we encode them for the input to neural networks. Onehot encoding can be too sparse.

Thanks!

  • 1
    Use an embedding layer. – Sycorax Feb 09 '22 at 13:20
  • 3
    Why do you think those would be useful as a features? In general, [things like user ID's are usually not useful as features](https://stats.stackexchange.com/questions/535931/encoding-id-variables-for-machine-learning/538383#538383). – Tim Feb 09 '22 at 13:29
  • @Tim, how else would you find anomalous behavior for a specific person? – Saif Ali Khan Feb 09 '22 at 18:38
  • 1
    @SaifAliKhan by having a model that gives the person some kind of score. With user ID as a feature, the model cannot generalize to people that were not in your training set. Moreover, I assume it is a supervised learning problem, so you need to have those users tagged--in such a case, you already know they are fraudulent, so don't need the model, just use your labels to blacklist them. – Tim Feb 09 '22 at 18:47
  • @Tim, it is unsupervised. I am using autoencoders for anomaly/fraud detection. Neuralnetwork would have to keep track of every individual and their activity. For example if a person spends too much than usual, it should raise an alarm in the network. That can only be possible the model has account number as the feature. – Saif Ali Khan Feb 09 '22 at 18:57
  • @Sycorax, is this the best solution? – Saif Ali Khan Feb 09 '22 at 18:59
  • 1
    You'd have to do an experiment to find out if it solves your specific problem satisfactorily. If you don't want to do an experiment, you'll have to trust the research produced by others. What have you learned by reading prior papers that use NNs to detect money laundering? – Sycorax Feb 09 '22 at 19:01
  • @Sycorax, haven't learned anything tbh. It is a task from the company I am applying to, for a job. Will have to return it tomorrow. They haven't asked for code, just logical reasoning of what should be done in this case. – Saif Ali Khan Feb 09 '22 at 19:53
  • "For example if a person spends too much than usual, it should raise an alarm in the network. That can only be possible the model has account number as the feature." No that only requires the time spent as a feature. The user identifier (account number) is typically just used as a grouping mechanism, to compute features *for that given id*. This is typically outside the outlier detector. – Jon Nordby Feb 16 '22 at 16:04

0 Answers0