Most Popular
1500 questions
38
votes
8 answers
Is it OK to remove outliers from data?
I looked for a way to remove outliers from a dataset and I found this question.
In some of the comments and answers to this question, however, people mentioned that it is bad practice to remove outliers from the data.
In my dataset I have several…

Sininho
- 501
- 1
- 4
- 7
38
votes
3 answers
What is pre training a neural network?
Well the question says it all.
What is meant by "pre training a neural network"? Can someone explain in pure simple English?
I can't seem to find any resources related to it. It would be great if someone can point me to them.

Machina333
- 863
- 2
- 9
- 10
38
votes
7 answers
Is there an accepted definition for the median of a sample on the plane, or higher ordered spaces?
If so, what?
If not, why not?
For a sample on the line, the median minimizes the total absolute deviation. It would seem natural to extend the definition to R2, etc., but I've never seen it. But then, I've been out in left field for a long time.

phv3773
- 481
- 4
- 4
38
votes
3 answers
Do null and alternative hypotheses have to be exhaustive or not?
I saw a lot of times claims that they have to be exhaustive (the examples in such books were always set in such way, that they were indeed), on the other hand I also saw a lot of times books stating they should be exclusive (for example…

greenoldman
- 593
- 5
- 10
38
votes
1 answer
Why do we need to normalize the images before we put them into CNN?
I am not clear the reason that we normalise the image for CNN by (image - mean_image)? Thanks!

Zhi Lu
- 717
- 3
- 8
- 11
38
votes
7 answers
Why is the null hypothesis often sought to be rejected?
I hope I am making sense with the title. Often, the null hypothesis is formed with the intention of rejecting it. Is there a reason for this, or is it just a convention?

Prometheus
- 786
- 8
- 19
38
votes
7 answers
Should parsimony really still be the gold standard?
Just a thought:
Parsimonious models have always been the default go-to in model selection, but to what degree is this approach outdated? I'm curious about how much our tendency toward parsimony is a relic of a time of abaci and slide rules (or, more…

theforestecologist
- 1,777
- 3
- 21
- 40
38
votes
1 answer
Doing principal component analysis or factor analysis on binary data
I have a dataset with a large number of Yes/No responses. Can I use principal components (PCA) or any other data reduction analyses (such as factor analysis) for this type of data? Please advise how I go about doing this using SPSS.

Cathy
- 381
- 1
- 4
- 3
38
votes
5 answers
Do working statisticians care about the difference between frequentist and Bayesian inference?
As an outsider, it appears that there are two competing views on how one should perform statistical inference.
Are the two different methods both considered valid by working statisticians?
Is choosing one considered more of a philosophical…

Jonathan Fischoff
- 231
- 3
- 7
38
votes
3 answers
Do we need gradient descent to find the coefficients of a linear regression model?
I was trying to learn machine learning using the Coursera material. In this lecture, Andrew Ng uses gradient descent algorithm to find the coefficients of the linear regression model that will minimize the error function (cost function).
For linear…

Victor
- 5,925
- 13
- 43
- 67
38
votes
3 answers
Regression coefficients that flip sign after including other predictors
Imagine
You run a linear regression with four numeric predictors (IV1, ..., IV4)
When only IV1 is included as a predictor the standardised beta is +.20
When you also include IV2 to IV4 the sign of the standardised regression coefficient of IV1…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
38
votes
4 answers
Information gain, mutual information and related measures
Andrew More defines information gain as:
$IG(Y|X) = H(Y) - H(Y|X)$
where $H(Y|X)$ is the conditional entropy. However, Wikipedia calls the above quantity mutual information.
Wikipedia on the other hand defines information gain as the…

Amelio Vazquez-Reina
- 17,546
- 26
- 74
- 110
38
votes
4 answers
When is the bootstrap estimate of bias valid?
It is often claimed that bootstrapping can provide an estimate of the bias in an estimator.
If $\hat t$ is the estimate for some statistic, and $\tilde t_i$ are the bootstrap replicas (with $i\in\{1,\cdots,N\}$), then the bootstrap estimate of bias…

Bootstrapped
- 381
- 1
- 3
- 5
38
votes
7 answers
How to interpret the coefficient of variation?
I am trying to understand the Coefficient of Variation. When I try to apply it to the following two samples of data I am unable to understand how to interpret the results.
Let's say sample 1 is ${0, 5, 7, 12, 11, 17}$
and sample 2 is ${10 ,15 ,17…

Durin
- 964
- 2
- 7
- 17
38
votes
4 answers
What are the differences between sparse coding and autoencoder?
Sparse coding is defined as learning an over-complete set of basis vectors to represent input vectors (<-- why do we want this) . What are the differences between sparse coding and autoencoder? When will we use sparse coding and autoencoder?

RockTheStar
- 11,277
- 31
- 63
- 89