A parameter that is not strictly for the statistical model (or data generating process), but a parameter for the statistical method. It could be a parameter for: a family of prior distributions, smoothing, a penalty in regularization methods, or an optimization algorithm.
Questions tagged [hyperparameter]
561 questions
56
votes
6 answers
Practical hyperparameter optimization: Random vs. grid search
I'm currently going through Bengio's and Bergstra's Random Search for Hyper-Parameter Optimization [1] where the authors claim random search is more efficient than grid search in achieving approximately equal performance.
My question is: Do people…

Bar
- 2,492
- 3
- 19
- 31
52
votes
1 answer
Do we have to tune the number of trees in a random forest?
Software implementations of random forest classifiers have a number of parameters to allow users to fine-tune the algorithm's behavior, including the number of trees $T$ in the forest. Is this a parameter that needs to be tuned, in the same way as…

Sycorax
- 76,417
- 20
- 189
- 313
39
votes
3 answers
Guideline to select the hyperparameters in Deep Learning
I'm looking for a paper that could help in giving a guideline on how to choose the hyperparameters of a deep architecture, like stacked auto-encoders or deep believe networks. There are a lot of hyperparameters and I'm very confused on how to choose…

Jack Twain
- 7,781
- 14
- 48
- 74
32
votes
3 answers
How to build the final model and tune probability threshold after nested cross-validation?
Firstly, apologies for posting a question that has already been discussed at length here, here, here, here, here, and for reheating an old topic. I know @DikranMarsupial has written about this topic at length in posts and journal papers, but I'm…

Andrew John Lowe
- 421
- 5
- 5
32
votes
2 answers
What is the reason that the Adam Optimizer is considered robust to the value of its hyper parameters?
I was reading about the Adam optimizer for Deep Learning and came across the following sentence in the new book Deep Learning by Bengio, Goodfellow and Courville:
Adam is generally regarded as being fairly robust to the choice of hyper parameters,…

Charlie Parker
- 5,836
- 11
- 57
- 113
27
votes
4 answers
How should Feature Selection and Hyperparameter optimization be ordered in the machine learning pipeline?
My objective is to classify sensor signals.
The concept of my solution so far is :
i) Engineering features from raw signal
ii) Selecting relevant features with ReliefF and a clustering approach
iii) Apply N.N, Random Forest and SVM
However I am…

Grunwalski
- 495
- 2
- 6
- 11
27
votes
6 answers
Is hyperparameter tuning on sample of dataset a bad idea?
I have a dataset of 140000 examples and 30 features for which I am training several classifiers for a binary classification (SVM, Logistic Regression, Random Forest etc)
In many cases hyperparameter tuning on the whole dataset using either Grid or…

LetsPlayYahtzee
- 528
- 1
- 6
- 17
24
votes
2 answers
Natural interpretation for LDA hyperparameters
Can somebody explain what is the natural interpretation for LDA hyperparameters? ALPHA and BETA are parameters of Dirichlet distributions for (per document) topic and (per topic) word distributions respectively. However can someone explain what it…

abhinavkulkarni
- 778
- 1
- 6
- 15
23
votes
4 answers
How bad is hyperparameter tuning outside cross-validation?
I know that performing hyperparameter tuning outside of cross-validation can lead to biased-high estimates of external validity, because the dataset that you use to measure performance is the same one you used to tune the features.
What I'm…

Ben Kuhn
- 5,373
- 1
- 16
- 27
22
votes
3 answers
How to get hyper parameters in nested cross validation?
I have read the following posts for nested cross validation and still am not 100% sure what I am to do with model selection with nested cross validation:
Nested cross validation for model selection
Model selection and cross-validation: The right…

Heavy Breathing
- 431
- 5
- 8
19
votes
2 answers
Is decision threshold a hyperparameter in logistic regression?
Predicted classes from (binary) logistic regression are determined by using a threshold on the class membership probabilities generated by the model. As I understand it, typically 0.5 is used by default.
But varying the threshold will change the…

Nick
- 393
- 2
- 9
19
votes
5 answers
What's in a name: hyperparameters
So in a normal distribution, we have two parameters: mean $\mu$ and variance $\sigma^2$. In the book Pattern Recognition and Machine Learning, there suddenly appears a hyperparameter $\lambda$ in the regularization terms of the error function.
What…

cgo
- 7,445
- 10
- 42
- 61
19
votes
2 answers
Advantages of Particle Swarm Optimization over Bayesian Optimization for hyperparameter tuning?
There's substantial contemporary research on Bayesian Optimization (1) for tuning ML hyperparameters. The driving motivation here is that a minimal number of data points are required to make informed choices about what points are worthwhile to try…

Sycorax
- 76,417
- 20
- 189
- 313
18
votes
2 answers
How to use XGboost.cv with hyperparameters optimization?
I want to optimize hyperparameters of XGboost using crossvalidation. However, it is not clear how to obtain the model from xgb.cv.
For instance I call objective(params) from fmin. Then model is fitted on dtrain and validated on dvalid. What if I…

Klausos
- 499
- 1
- 6
- 11
17
votes
3 answers
Hyper parameters tuning: Random search vs Bayesian optimization
So, we know that random search works better than grid search, but a more recent approach is Bayesian optimization (using gaussian processes). I've looked up a comparison between the two, and found nothing. I know that at Stanford's cs231n they…

Yoni Keren
- 526
- 1
- 3
- 13