Questions tagged [kaggle]

Pertaining to questions arising from competitions hosted on Kaggle.com. Use this tag only for topics SPECIFIC to Kaggle, not just because your data come from Kaggle.

7 questions
59
votes
7 answers

Industry vs Kaggle challenges. Is collecting more observations and having access to more variables more important than fancy modelling?

I'd hope the title is self explanatory. In Kaggle, most winners use stacking with sometimes hundreds of base models, to squeeze a few extra % of MSE, accuracy... In general, in your experience, how important is fancy modelling such as stacking vs…
Tom
  • 1,204
  • 8
  • 17
13
votes
2 answers

Are Kaggle competitions just won by chance?

Kaggle competitions determine final rankings based on a held-out test set. A held-out test set is a sample; it may not be representative of the population being modeled. Since each submission is like a hypothesis, the algorithm that won the…
sjw
  • 5,091
  • 1
  • 21
  • 45
3
votes
2 answers

How to make train/test split with given class weights

I am doing simple multi class classification ML problem. I was given train data with perfectly balanced classes. However the data I must predict is not balanced. I was able to deduct the class proportions of test data. Is there a way to split…
1
vote
1 answer

What are the best practices for selecting your cross validation strategy?

I am new to Kaggle competitions and want to know if their are best practices for selecting a robust CV.
Kurtis Pykes
  • 135
  • 5
1
vote
1 answer

Cross validation best practice for competition purpose

I'm fairly new to DS scene and I have been learning about theories and doing practices on kaggle/participate in private competition. For real world problems, my understanding is that you split out test set from what you have, use training set for…
bchoiNY
  • 13
  • 3
1
vote
0 answers

What do they mean by Robust Cross-Validation?

I was reading a Kaggler Interview article and they kept specifying the importance of a stable and good cross-validation in order to win their competitions. What do they mean by that? I usually just use cross_val_score, and that's enough for me.
Chipmunkafy
  • 115
  • 1
  • 4
0
votes
1 answer

SVC doing great on validation & test data but scored very low on predicted data

First of all, this is my first machine learning project after taking Andrew Ng's course, so please bear with me. I'm working on the most famous dataset, the Titanic data. First, I split the dataset to training and testing set : training, testing =…