4

I have a regression problem. The aim is to estimate the best fitting curve from a set of features. Now I have extracted a set of features that are relevant based on the literatures found.

Now the performance with the present set of features is not giving satisfactory performance. Feature engineering is a possible next approach that I plan to do. Now how do I do this? I can take the product of the features or polynomial features (quadratic, cubic, etc..) but will this not make the system non-linear prone to overfitting?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
prashanth
  • 3,747
  • 4
  • 21
  • 33

1 Answers1

5

The area you are interested in is called feature learning, an automatic way to do feature engineering

Please note it is not an easy task. From theoretical point of view feature learning is as hard as learning. That is since you can reduce learning a concept to learning a feature which is equivalent to the concept.

One of the promises of deep learning is automation of feature learning. You might also be interested in MIT's data science machine Though that, most feature learning is done manually and requires domain knowledge.

Since you wrote you are interested in curve matching, I guess these ideas will take you too far from your current work.

You might be interested in a technique that enable you to check if your features are equivalent of the raw data you have and if not, how much you lost.

The idea of estimating the loss due to binning is based on the paper "PAC learning with irrelevant attributes". For simplicity, suppose the our concept is binary so we can split the samples into positives and negatives. We will get to regression at the end. For each pair of a negative and a positive samples, the difference in concept might be explained by a difference in one of the features (or otherwise, it is not explainable by the given features). The set of the feature differences is the set of possible explanation to concept difference, hence the data to use to determine the concept. If we engineered some features and we still get the same set of explanations for the pairs, we didn't lose any information needed (with respect to learning algorithms that work by such comparisons). If our features lose information, we will have a smaller set of possible explanations but we will be able to measure accurately how much and where we lose.

For regression problems, treat close enough values of the pair of items as similar.

This method won't enable you to build features but you'll be able to evaluate them (it is also powerful for feature selection)

DaL
  • 4,462
  • 3
  • 16
  • 27