Highest Voted 'train' Questions - Statistical Analysis Stack Exchange

331

votes

5 answers

What is the trade-off between batch size and number of iterations to train a neural network?

When training a neural network, what difference does it make to set: batch size to $a$ and number of iterations to $b$ vs. batch size to $c$ and number of iterations to $d$ where $ ab = cd $? To put it otherwise, assuming that we train the neural…

neural-networks train

asked Aug 05 '15 at 21:19

Franck Dernoncourt

42,093
30
155
271

32

votes

1 answer

Benefits of stratified vs random sampling for generating training data in classification

I would like to know if there are any/some advantages of using stratified sampling instead of random sampling, when splitting the original dataset into training and testing set for classification. Also, does stratified sampling introduce more bias…

classification cross-validation random-forest train stratification

asked Dec 07 '16 at 21:24

gc5

877
2
12
23

30

votes

3 answers

Imputation before or after splitting into train and test?

I have a data set with N ~ 5000 and about 1/2 missing on at least one important variable. The main analytic method will be Cox proportional hazards. I plan to use multiple imputation. I will also be splitting into a train and test set. Should I…

cross-validation survival multiple-imputation train

asked Apr 24 '14 at 18:55

Peter Flom

94,055
35
143
276

23

votes

2 answers

Scikit correct way to calibrate classifiers with CalibratedClassifierCV

Scikit has CalibratedClassifierCV, which allows us to calibrate our models on a particular X, y pair. It also states clearly that data for fitting the classifier and for calibrating it must be disjoint. If they must be disjoint, is it legitimate to…

cross-validation scikit-learn validation train calibration

asked Feb 22 '17 at 12:02

sapo_cosmico

374
1
2
10

17

votes

5 answers

Can increasing the amount of training data make overfitting worse?

Suppose I train a neural network on dataset A and evaluate on dataset B (that has a different feature distribution than dataset A). If I increase the amount of data in dataset A by a factor of 10, is it likely to decrease accuracy on dataset B?

machine-learning neural-networks validation overfitting train

asked Nov 14 '19 at 07:08

asdfaefi

171
1
4

15

votes

2 answers

Can I (justifiably) train a second model only on the observations that a previous model predicted poorly?

Say I commit the following sins while building a predictive model: I take my dataset and split it into four subsets: Three for training (Train_A, Train_B, and Train_C) and one for validation. I train an initial model (Model_A) on Train_A. Because…

predictive-models ensemble-learning train bias-variance-tradeoff weighted-data

asked May 05 '21 at 14:52

Jdclark

155
6

15

votes

2 answers

Is there a way to incorporate new data into an already trained neural network without retraining on all my data in Keras?

I have already trained a neural network on my data. In the future, I will receive some more data. How can I incorporate this data into my model without rebuilding it from scratch?

neural-networks train keras

asked Jun 22 '18 at 19:00

yalpsid eman

273
1
2
10

14

votes

3 answers

Training, testing, validating in a survival analysis problem

I've been browsing various threads here, but I don't think my exact question is answered. I have a dataset of ~50,000 students and their time to dropout. I am going to be performing proportional hazards regression with a large number of potential…

cross-validation survival train

asked Apr 02 '14 at 17:17

Peter Flom

94,055
35
143
276

14

votes

2 answers

Different results from randomForest via caret and the basic randomForest package

I am a bit confused: How can the results of a trained Model via caret differ from the model in the original package? I read Whether preprocessing is needed before prediction using FinalModel of RandomForest with caret package? but I do not use any…

r machine-learning random-forest caret train

asked Mar 12 '15 at 16:26

Malte

263
1
2
6

13

votes

1 answer

How to know if a learning curve from SVM model suffers from bias or variance?

I created this learning curve and I want to know if my SVM model suffers from bias or variance? How can I conclude that from this graph?

machine-learning svm bias train

asked Jun 27 '16 at 11:11

Afke

267
1
3
10

12

votes

4 answers

TfidfVectorizer: should it be used on train only or train+test

When training a model it is possible to train the Tfidf on the corpus of only the training set or also on the test set. It seems not to make sense to include the test corpus when training the model, though since it is not supervised, it is also…

python train scikit-learn

asked May 29 '15 at 21:19

PascalVKooten

2,127
5
22
34

11

votes

5 answers

Good examples/books/resources to learn about applied machine learning (not just ML itself)

I've taken an ML course previously, but now that I am working with ML related projects at my job, I am struggling quite a bit to actually apply it. I'm sure the stuff I'm doing has been researched/dealt with before, but I can't find specific…

machine-learning references train application

asked Jun 08 '16 at 15:49

stoneman

11
4

10

votes

4 answers

I've already used my entire dataset in a regression, should I not use that as a prediction model?

At the hospital I work at we were writing a paper on what variables about a patient predict whether they'll return for a follow-up visit. We included variables such as age, gender, distance from their home to the hospital, mechanism of injury and…

multiple-regression modeling train train-test-split

asked Oct 25 '21 at 19:29

Joe Crozier

247
1
9

10

votes

3 answers

Is it in general helpful to add "external" datasets to the training dataset?

Several people have already asked "is more data helpful?": What impact does increasing the training data have on the overall system accuracy? Can increasing the amount of training data make overfitting worse? Will a model always score better on the…

neural-networks dataset train

asked Jun 29 '20 at 14:30

gebbissimo

410
3
12

10

votes

3 answers

Approaches when learning from huge datasets?

Basically, there are two common ways to learn against huge datasets (when you're confronted by time/space restrictions): Cheating :) - use just a "manageable" subset for training. The loss of accuracy may be negligible because of the law of…

machine-learning large-data model-evaluation train

asked Feb 16 '12 at 07:33

andreister

3,257
17
29

Questions tagged [train]