Most Popular

1500 questions
32
votes
3 answers

How to rigorously define the likelihood?

The likelihood could be defined by several ways, for instance : the function $L$ from $\Theta\times{\cal X}$ which maps $(\theta,x)$ to $L(\theta \mid x)$ i.e. $L:\Theta\times{\cal X} \rightarrow \mathbb{R} $. the random function $L(\cdot \mid…
Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
32
votes
2 answers

What does kernel size mean?

When people talk about neural networks, what do they mean when they say "kernel size"? Kernels are similarity functions, but what does that say about kernel size?
quil
  • 433
  • 1
  • 4
  • 6
32
votes
5 answers

What problem does oversampling, undersampling, and SMOTE solve?

In a recent, well recieved, question, Tim asks when is unbalanced data really a problem in Machine Learning? The premise of the question is that there is a lot of machine learning literature discussing class balance and the problem of imbalanced…
32
votes
1 answer

Can degrees of freedom be a non-integer number?

When I use GAM, it gives me residual DF is $26.6$ (last line in the code). What does that mean? Going beyond GAM example, In general, can the number of degrees of freedom be a non-integer number? > library(gam) >…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
32
votes
5 answers

Modelling longitudinal data where the effect of time varies in functional form between individuals

Context: Imagine you had a longitudinal study which measured a dependent variable (DV) once a week for 20 weeks on 200 participants. Although I'm interested in general, typical DVs that I'm thinking of include job performance following hire or…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
32
votes
6 answers

Sample size for logistic regression?

I want to make a logistic model from my survey data. It is a small survey of four residential colonies in which only 154 respondents were interviewed. My dependent variable is "satisfactory transition to work". I found that, of the 154 respondents,…
32
votes
3 answers

What stop-criteria for agglomerative hierarchical clustering are used in practice?

I have found extensive literature proposing all sorts of criteria (e.g. Glenn et al. 1985(pdf) and Jung et al. 2002(pdf)). However, most of these are not that easy to implement (at least from my perspective). I am using scipy.cluster.hierarchy to…
Björn Pollex
  • 1,223
  • 2
  • 15
  • 18
32
votes
8 answers

Should I teach Bayesian or frequentist statistics first?

I am helping my boys, currently in high school, understanding statistics, and I am considering beginning with some simple examples without disregarding some glimpses to theory. My goal would be to give them the most intuitive yet instrumentally…
32
votes
1 answer

Benefits of stratified vs random sampling for generating training data in classification

I would like to know if there are any/some advantages of using stratified sampling instead of random sampling, when splitting the original dataset into training and testing set for classification. Also, does stratified sampling introduce more bias…
gc5
  • 877
  • 2
  • 12
  • 23
32
votes
3 answers

Is hour of day a categorical variable?

Is "hour of the day" where the value can be 0, 1, 2, ..., 23 a categorical variable? I would be tempted to say no, since 5, for example, is 'closer' to 4 or 6 than it is to 3 or 7. On the other hand, there is the discontinuity between 23 and 0. So…
Paul Reiners
  • 747
  • 2
  • 8
  • 11
32
votes
4 answers

LASSO with interaction terms - is it okay if main effects are shrunk to zero?

LASSO regression shrinks coefficients towards zero, thus providing effectively model selection. I believe that in my data there are meaningful interactions between nominal and continuous covariates. Not necessarily, however, are the 'main effects'…
tomka
  • 5,874
  • 3
  • 30
  • 71
32
votes
1 answer

Derivation of change of variables of a probability density function?

In the book pattern recognition and machine learning (formula 1.27), it gives $$p_y(y)=p_x(x) \left | \frac{d x}{d y} \right |=p_x(g(y)) | g'(y) |$$ where $x=g(y)$, $p_x(x)$ is the pdf that corresponds to $p_y(y)$ with respect to the change of the…
dontloo
  • 13,692
  • 7
  • 51
  • 80
32
votes
2 answers

PCA in numpy and sklearn produces different results

Am i misunderstanding something. This is my code using sklearn import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import decomposition from sklearn import datasets from sklearn.preprocessing…
aceminer
  • 813
  • 1
  • 9
  • 20
32
votes
3 answers

How to build the final model and tune probability threshold after nested cross-validation?

Firstly, apologies for posting a question that has already been discussed at length here, here, here, here, here, and for reheating an old topic. I know @DikranMarsupial has written about this topic at length in posts and journal papers, but I'm…
32
votes
2 answers

What is the reason that the Adam Optimizer is considered robust to the value of its hyper parameters?

I was reading about the Adam optimizer for Deep Learning and came across the following sentence in the new book Deep Learning by Bengio, Goodfellow and Courville: Adam is generally regarded as being fairly robust to the choice of hyper parameters,…