Highest Voted 'feature-engineering' Questions - Statistical Analysis Stack Exchange

93

votes

6 answers

Principled way of collapsing categorical variables with many levels?

What techniques are available for collapsing (or pooling) many categories to a few, for the purpose of using them as an input (predictor) in a statistical model? Consider a variable like college student major (discipline chosen by an undergraduate…

asked Apr 17 '15 at 13:31

shadowtalker

11,395
3
49
109

35

votes

8 answers

how to represent geography or zip code in machine learning model or recommender system?

I am building a model and I think that geographic location is likely to be very good at predicting my target variable. I have the zip code of each of my users. I am not entirely sure about the best way to include zip code as a predictor feature in…

machine-learning feature-engineering many-categories

asked Apr 23 '14 at 18:10

captain_ahab

1,301
1
12
21

33

votes

4 answers

Maximum Mean Discrepancy (distance distribution)

I have two data sets (source and target data) which follow different distributions. I am using MMD - that is a non-parametric distribution distance - to compute marginal distribution between the source and target data. source data, Xs target data,…

machine-learning distributions distance feature-engineering domain-adaptation

asked Apr 28 '17 at 15:45

Mahsa

431
1
5
5

31

votes

3 answers

Utility of feature-engineering : Why create new features based on existing features?

I often see people create new features based on existing features on a machine learning problem. For example, here : https://triangleinequality.wordpress.com/2013/09/08/basic-feature-engineering-with-the-titanic-data/ people have considered the size…

machine-learning feature-engineering

asked Jun 07 '18 at 08:54

Matthieu Veron

443
4
7

29

votes

2 answers

When should we discretize/bin continuous independent variables/features and when should not?

When should we discretize/bin independent variables/features and when should not? My attempts to answer the question: In general, we should not bin, because binning will lose information. Binning is actually increasing the degree of freedom of the…

machine-learning continuous-data feature-engineering binning

asked Aug 19 '16 at 17:31

Haitao Du

32,885
17
118
213

29

votes

2 answers

How to initialize the elements of the filter matrix?

I'm trying to better understand convolutional neural networks better by writing up Python code that doesn't depend on libraries (like Convnet or TensorFlow), and I'm getting stuck in the literature on how to choose values for the kernel matrix, when…

machine-learning neural-networks deep-learning feature-engineering conv-neural-network

asked Mar 08 '16 at 10:32

Kai Kuspa

291
1
3
3

25

votes

1 answer

What is "feature space"?

What is the definition of "feature space"? For example, When reading about SVMs, I read about "mapping to feature space". When reading about CART, I read about "partitioning to feature space". I understand what's going on, especially for CART, but I…

machine-learning svm feature-selection cart feature-engineering

asked Dec 22 '12 at 06:42

power

1,564
1
16
29

25

votes

2 answers

Autoencoders can't learn meaningful features

I have 50,000 images such as these two: They depict graphs of data. I wanted to extract features from these images so I used autoencoder code provided by Theano (deeplearning.net). The problem is, these autoencoders don't seem to learn any…

machine-learning neural-networks feature-engineering restricted-boltzmann-machine autoencoders

asked Dec 31 '14 at 07:08

b93dh44

253
1
3
6

24

votes

1 answer

Optimal construction of day feature in neural networks

Working on regression problem I started to think about representation of "day of a week" feature. I wonder which approach would perform better: one feature; value 1/7 for Monday; 2/7 for Tuesday... 7 features: (1, 0, 0, 0, 0, 0, 0) for Monday; (0,…

machine-learning neural-networks feature-engineering

asked Dec 01 '14 at 23:52

Oepas Dost

243
1
2
4

22

votes

3 answers

Why do neural networks need feature selection / engineering?

Particularly in the context of kaggle competitions I have noticed that model's performance is all about feature selection / engineering. While I can fully understand why that is in the case when dealing with the more conventional / old-school ML…

neural-networks deep-learning feature-selection feature-engineering

asked May 31 '18 at 09:50

piotrwiercinski

415
4
8

22

votes

5 answers

Why does feature engineering work ?

Recently I have learned that one of ways for finding better solutions for ML problems is by creation of features. One can do that by for example summing two features. For example, we possess two features "attack" and "defense" of some kind of hero.…

machine-learning feature-engineering

asked Dec 28 '17 at 14:55

MrKadek750

223
2
5

21

votes

2 answers

Tutorials for feature engineering

As is known to all, feature engineering is extremely important to machine learning, however I found few materials associated with this area. I participated to several competitions in Kaggle and believe that good features may even be more important…

machine-learning references feature-engineering

asked Nov 11 '12 at 21:50

FindBoat

741
1
8
6

19

votes

5 answers

Is it better to do exploratory data analysis on the training dataset only?

I'm doing exploratory data analysis (EDA) on a dataset. Then I will select some features to predict a dependent variable. The question is: Should I do the EDA on my training dataset only? Or should I join the training and test datasets together…

dataset feature-selection feature-engineering exploratory-data-analysis

asked Jan 07 '16 at 10:47

Aboelnour

293
2
6

16

votes

2 answers

Mixing continuous and binary data with linear SVM?

So I've been playing around with SVMs and I wonder if this is a good thing to do: I have a set of continuous features (0 to 1) and a set of categorical features that I converted to dummy variables. In this particular case, I encode the date of the…

categorical-data svm feature-selection linear-model feature-engineering

asked Jan 21 '14 at 16:42

user3010273

377
1
3
9

14

votes

1 answer

Feature construction and normalization in machine learning

Lets say I want to create a Logistic Classifier for a movie M. My features would be something like age of the person, gender, occupation, location. So training set would be something like: Age Gender Occupation Location Like(1)/Dislike(0) 23 …

machine-learning feature-engineering

asked Oct 19 '12 at 19:49

snow_leopard

345
2
12

Questions tagged [feature-engineering]