-1

I have a data like as shown below (working on classification problem using traditional classification and DL based approaches)

enter image description here

I see in feature engineering tutorials (and tools) here and here, they usually compute basic statistics features based on numeric column such as max(loan amount), min(loan amount), sum(loan amount),stddev(loan amount), average (loan amount) etc.

I understand all these are done in an attempt to increase the predictive power of the model.

However, my question is

what does it mean when max(loan amount) or std dev(loan amount) is an important feature? can help me understand what insight does it convey? how to interpret this feature?

let's assume that model A returns max(loan_amount) as an important feature in predicting loan_default. what does it mean?

2nd example is let's assume that model B returns max(loan_amount) has a high positive coefficient in explaining the loan_default. So, what does it mean?

I can go on and on about different models but still am only trying to understand what does max(loan_amount) mean? what insight does it communicate? can you explain in simple english?

The Great
  • 1,380
  • 6
  • 18
  • @Dave - If I put `Deep learning` tag, it automatically takes them as neural networks. just FYI. But `deep learning` tag is already available. I dont know why it works this way. So, it's not me who is putting `neural networks` tag. I wish to put `deep learning` tag though because am trying DL based approaches as well. – The Great Feb 24 '22 at 04:41
  • [tag:deep-learning] is a synonym to [tag:neural-networks]. Here's the meta thread discussing why. https://stats.meta.stackexchange.com/q/5639/22311 – Sycorax Feb 25 '22 at 01:12
  • sure @Sycorax. understand. would you like to share anything on this post? – The Great Feb 25 '22 at 02:17
  • Interpreting a model depends on what the model is. Linear models are interpreted differently from, say, a random forest. What model are you using? – Sycorax Feb 25 '22 at 02:37
  • I am trying both random forest and logistic regression. But this post is mainly on interpreting the feature which are created using aggregate functions. Meaning, how should I interpret if `max(loan_amount)` or `min(loan_amount)` or `stddev(loan_amount)` comes out as an important feature. what insight does it convey. – The Great Feb 25 '22 at 02:39
  • The feature improves prediction in the sense that it’s inclusion reduces the loss on the training data. You can’t get much more specific unless you are specific about the model. – Sycorax Feb 25 '22 at 02:53
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/134411/discussion-between-the-great-and-sycorax). – The Great Feb 25 '22 at 03:09
  • I don't understand why this post is marked as duplicate when it has no relation with the linked post. It is not talking about any model in specific. It is mainly about feature interpretation (linguistic) – The Great Feb 25 '22 at 03:29
  • "I am trying both random forest and logistic regression." – Sycorax Feb 25 '22 at 03:32
  • But still this post has nothing to do with modelling approach. I said random forest and logistic regression because you asked that info. But I would still say, my objective of creating this post is to understand the agg features (and they are used across all models). So, question is about agg features – The Great Feb 25 '22 at 03:34
  • Feature importance interpretation has everything to do with the choice of model. There's no feature importance interpretation that's generic to all models. I could change the closure reason to "needs more detail" instead, but I don't think that would make much difference. – Sycorax Feb 25 '22 at 03:36

0 Answers0