Pertaining to questions arising from competitions hosted on Kaggle.com. Use this tag only for topics SPECIFIC to Kaggle, not just because your data come from Kaggle.
Questions tagged [kaggle]
7 questions
59
votes
7 answers
Industry vs Kaggle challenges. Is collecting more observations and having access to more variables more important than fancy modelling?
I'd hope the title is self explanatory. In Kaggle, most winners use stacking with sometimes hundreds of base models, to squeeze a few extra % of MSE, accuracy... In general, in your experience, how important is fancy modelling such as stacking vs…

Tom
- 1,204
- 8
- 17
13
votes
2 answers
Are Kaggle competitions just won by chance?
Kaggle competitions determine final rankings based on a held-out test set.
A held-out test set is a sample; it may not be representative of the population being modeled. Since each submission is like a hypothesis, the algorithm that won the…

sjw
- 5,091
- 1
- 21
- 45
3
votes
2 answers
How to make train/test split with given class weights
I am doing simple multi class classification ML problem.
I was given train data with perfectly balanced classes. However the data I must predict is not balanced. I was able to deduct the class proportions of test data.
Is there a way to split…

Dmitry Petrov
- 31
- 3
1
vote
1 answer
What are the best practices for selecting your cross validation strategy?
I am new to Kaggle competitions and want to know if their are best practices for selecting a robust CV.

Kurtis Pykes
- 135
- 5
1
vote
1 answer
Cross validation best practice for competition purpose
I'm fairly new to DS scene and I have been learning about theories and doing practices on kaggle/participate in private competition.
For real world problems, my understanding is that you split out test set from what you have, use training set for…

bchoiNY
- 13
- 3
1
vote
0 answers
What do they mean by Robust Cross-Validation?
I was reading a Kaggler Interview article and they kept specifying the importance of a stable and good cross-validation in order to win their competitions. What do they mean by that? I usually just use cross_val_score, and that's enough for me.

Chipmunkafy
- 115
- 1
- 4
0
votes
1 answer
SVC doing great on validation & test data but scored very low on predicted data
First of all, this is my first machine learning project after taking Andrew Ng's course, so please bear with me.
I'm working on the most famous dataset, the Titanic data.
First, I split the dataset to training and testing set :
training, testing =…

Blaze Tama
- 115
- 1
- 8