Most Popular
1500 questions
80
votes
3 answers
What is the intuition behind SVD?
I have read about singular value decomposition (SVD). In almost all textbooks it is mentioned that it factorizes the matrix into three matrices with given specification.
But what is the intuition behind splitting the matrix in such form? PCA and…

SHASHANK GUPTA
- 1,139
- 2
- 10
- 17
80
votes
6 answers
Is there any good reason to use PCA instead of EFA? Also, can PCA be a substitute for factor analysis?
In some disciplines, PCA (principal component analysis) is systematically used without any justification, and PCA and EFA (exploratory factor analysis) are considered as synonyms.
I therefore recently used PCA to analyse the results of a scale…

Carine
- 809
- 2
- 7
- 4
80
votes
9 answers
Skills hard to find in machine learners?
It seems that data mining and machine learning became so popular that now almost every CS student knows about classifiers, clustering, statistical NLP ... etc. So it seems that finding data miners is not a hard thing nowadays.
My question is:
What…

Jack Twain
- 7,781
- 14
- 48
- 74
79
votes
5 answers
What are good RMSE values?
Suppose I have some dataset. I perform some regression on it. I have a separate test dataset. I test the regression on this set. Find the RMSE on the test data. How should I conclude that my learning algorithm has done well, I mean what properties…

Shishir Pandey
- 1,051
- 2
- 9
- 11
79
votes
8 answers
Is there a name for the phenomenon of false positives counterintuitively outstripping true positives
It seems very counter intuitive to many people that a given diagnostic test with very high accuracy (say 99%) can generate massively more false positives than true positives in some situations, namely where the population of true positives is very…

Roger Heathcote
- 893
- 1
- 4
- 6
79
votes
2 answers
Bayes regression: how is it done in comparison to standard regression?
I got some questions about the Bayesian regression:
Given a standard regression as $y = \beta_0 + \beta_1 x + \varepsilon$.
If I want to change this into a Bayesian regression, do I need prior distributions both for $\beta_0$ and $\beta_1$ (or…

TinglTanglBob
- 878
- 1
- 8
- 13
79
votes
9 answers
What algorithm should I use to detect anomalies on time-series?
Background
I'm working in Network Operations Center, we monitor computer systems and their performance. One of the key metrics to monitor is a number of visitors\customers currently connected to our servers. To make it visible we (Ops team) collect…

Ilya Khadykin
- 891
- 1
- 7
- 6
79
votes
1 answer
How to interpret coefficients in a Poisson regression?
How can I interpret the main effects (coefficients for dummy-coded factor) in a Poisson regression?
Assume the following example:
treatment <- factor(rep(c(1, 2), c(43, 41)),
levels = c(1, 2),
…
user734124
79
votes
7 answers
Rules of thumb for minimum sample size for multiple regression
Within the context of a research proposal in the social sciences, I was asked the following question:
I have always gone by 100 + m (where m
is the number of predictors) when
determining minimum sample size for
multiple regression. Is…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
79
votes
5 answers
How exactly did statisticians agree to using (n-1) as the unbiased estimator for population variance without simulation?
The formula for computing variance has $(n-1)$ in the denominator:
$s^2 = \frac{\sum_{i=1}^N (x_i - \bar{x})^2}{n-1}$
I've always wondered why. However, reading and watching a few good videos about "why" it is, it seems, $(n-1)$ is a good unbiased…

PhD
- 13,429
- 19
- 45
- 47
78
votes
2 answers
Basic question about Fisher Information matrix and relationship to Hessian and standard errors
Ok, this is a quite basic question, but I am a little bit confused. In my thesis I write:
The standard errors can be found by calculating the inverse of the square root of the diagonal elements of the (observed) Fisher Information…

Jen Bohold
- 1,410
- 2
- 13
- 19
78
votes
6 answers
What are good initial weights in a neural network?
I have just heard, that it's a good idea to choose initial weights of a neural network from the range $(\frac{-1}{\sqrt d} , \frac{1}{\sqrt d})$, where $d$ is the number of inputs to a given neuron. It is assumed, that the sets are normalized - mean…

elmes
- 907
- 1
- 7
- 10
78
votes
3 answers
Diagnostics for logistic regression?
For linear regression, we can check the diagnostic plots (residuals plots, Normal QQ plots, etc) to check if the assumptions of linear regression are violated.
For logistic regression, I am having trouble finding resources that explain how to…

ialm
- 1,707
- 2
- 19
- 19
78
votes
12 answers
Famous statistical wins and horror stories for teaching purposes
I am designing a one year program in data analysis with a local community college. The program aims to prepare students to handle basic tasks in data analysis, visualization and summarization, advanced Excel skills and R programming.
I would like…

Placidia
- 13,501
- 6
- 33
- 62
78
votes
1 answer
How does a simple logistic regression model achieve a 92% classification accuracy on MNIST?
Even though all the images in the MNIST dataset are centered, with a similar scale, and face up with no rotations, they have a significant handwriting variation that puzzles me how a linear model achieves such a high classification accuracy.
As far…

Nitish Agarwal
- 813
- 4
- 6