Most Popular
1500 questions
90
votes
1 answer
When to use an offset in a Poisson regression?
Does anybody know why offset in a Poisson regression is used? What do you achieve by this?

MarkDollar
- 5,575
- 14
- 44
- 60
89
votes
1 answer
What correlation makes a matrix singular and what are implications of singularity or near-singularity?
I am doing some calculations on different matrices (mainly in logistic regression) and I commonly get the error "Matrix is singular", where I have to go back and remove the correlated variables. My question here is what would you consider a "highly"…

Error404
- 1,261
- 2
- 13
- 18
89
votes
4 answers
How to produce a pretty plot of the results of k-means cluster analysis?
I'm using R to do K-means clustering. I'm using 14 variables to run K-means
What is a pretty way to plot the results of K-means?
Are there any existing implementations?
Does having 14 variables complicate plotting the results?
I found something…

JEquihua
- 3,525
- 2
- 24
- 44
89
votes
5 answers
How to plot ROC curves in multiclass classification?
In other words, instead of having a two class problem I am dealing with 4 classes and still would like to assess performance using AUC.
CLOCK
89
votes
5 answers
On the importance of the i.i.d. assumption in statistical learning
In statistical learning, implicitly or explicitly, one always assumes that the training set $\mathcal{D} = \{ \bf {X}, \bf{y} \}$ is composed of $N$ input/response tuples $({\bf{X}}_i,y_i)$ that are independently drawn from the same joint…

Quantuple
- 1,296
- 1
- 8
- 20
89
votes
5 answers
Relationship between poisson and exponential distribution
The waiting times for poisson distribution is an exponential distribution with parameter lambda. But I don't understand it. Poisson models the number of arrivals per unit of time for example. How is this related to exponential distribution? Lets say…

user862
- 2,339
- 4
- 27
- 24
89
votes
10 answers
How should outliers be dealt with in linear regression analysis?
Often times a statistical analyst is handed a set dataset and asked to fit a model using a technique such as linear regression. Very frequently the dataset is accompanied with a disclaimer similar to "Oh yeah, we messed up collecting some of these…

Sharpie
- 4,126
- 5
- 21
- 18
89
votes
10 answers
What is a complete list of the usual assumptions for linear regression?
What are the usual assumptions for linear regression?
Do they include:
a linear relationship between the independent and dependent variable
independent errors
normal distribution of errors
homoscedasticity
Are there any others?

tony
- 899
- 2
- 7
- 3
89
votes
2 answers
Resampling / simulation methods: monte carlo, bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests
I am trying to understand difference between different resampling methods (Monte Carlo simulation, parametric bootstrapping, non-parametric bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests) and their…

Ram Sharma
- 2,226
- 3
- 20
- 24
88
votes
8 answers
When is unbalanced data really a problem in Machine Learning?
We already had multiple questions about unbalanced data when using logistic regression, SVM, decision trees, bagging and a number of other similar questions, what makes it a very popular topic! Unfortunately, each of the questions seems to be…

Tim
- 108,699
- 20
- 212
- 390
88
votes
24 answers
Rules of thumb for "modern" statistics
I like G van Belle's book on Statistical Rules of Thumb, and to a lesser extent Common Errors in Statistics (and How to Avoid Them) from Phillip I Good and James W. Hardin. They address common pitfalls when interpreting results from experimental and…

chl
- 50,972
- 18
- 205
- 364
88
votes
3 answers
What is the lasso in regression analysis?
I'm looking for a non-technical definition of the lasso and what it is used for.

Paul Vogt
- 881
- 1
- 7
- 3
88
votes
7 answers
Calculating the parameters of a Beta distribution using the mean and variance
How can I calculate the $\alpha$ and $\beta$ parameters for a Beta distribution if I know the mean and variance that I want the distribution to have? Examples of an R command to do this would be most helpful.

Dave Kincaid
- 1,458
- 1
- 12
- 18
88
votes
6 answers
How to tell if data is "clustered" enough for clustering algorithms to produce meaningful results?
How would you know if your (high dimensional) data exhibits enough clustering so that results from kmeans or other clustering algorithm is actually meaningful?
For k-means algorithm in particular, how much of a reduction in within-cluster variance…

xuexue
- 2,098
- 2
- 16
- 11
87
votes
3 answers
Shape of confidence interval for predicted values in linear regression
I have noticed that the confidence interval for predicted values in an linear regression tends to be narrow around the mean of the predictor and fat around the minimum and maximum values of the predictor. This can be seen in plots of these 4 linear…

luciano
- 12,197
- 30
- 87
- 119