Highest Voted Questions - Statistical Analysis Stack Exchange

90

votes

1 answer

When to use an offset in a Poisson regression?

Does anybody know why offset in a Poisson regression is used? What do you achieve by this?

poisson-regression offset

asked May 24 '11 at 08:12

MarkDollar

5,575
14
44
60

89

votes

1 answer

What correlation makes a matrix singular and what are implications of singularity or near-singularity?

I am doing some calculations on different matrices (mainly in logistic regression) and I commonly get the error "Matrix is singular", where I have to go back and remove the correlated variables. My question here is what would you consider a "highly"…

regression correlation matrix multicollinearity singular

asked Sep 24 '13 at 10:55

Error404

1,261
2
13
18

89

votes

4 answers

How to produce a pretty plot of the results of k-means cluster analysis?

I'm using R to do K-means clustering. I'm using 14 variables to run K-means What is a pretty way to plot the results of K-means? Are there any existing implementations? Does having 14 variables complicate plotting the results? I found something…

data-visualization classification k-means unsupervised-learning

asked Jun 25 '12 at 17:47

JEquihua

3,525
2
24
44

89

votes

5 answers

How to plot ROC curves in multiclass classification?

In other words, instead of having a two class problem I am dealing with 4 classes and still would like to assess performance using AUC.

classification roc

asked Aug 27 '10 at 01:56

CLOCK

89

votes

5 answers

On the importance of the i.i.d. assumption in statistical learning

In statistical learning, implicitly or explicitly, one always assumes that the training set $\mathcal{D} = \{ \bf {X}, \bf{y} \}$ is composed of $N$ input/response tuples $({\bf{X}}_i,y_i)$ that are independently drawn from the same joint…

machine-learning cross-validation non-independent iid

asked May 19 '16 at 13:28

Quantuple

1,296
1
8
20

89

votes

5 answers

Relationship between poisson and exponential distribution

The waiting times for poisson distribution is an exponential distribution with parameter lambda. But I don't understand it. Poisson models the number of arrivals per unit of time for example. How is this related to exponential distribution? Lets say…

distributions poisson-distribution exponential-distribution

asked Aug 25 '10 at 08:33

user862

2,339
4
27
24

89

votes

10 answers

How should outliers be dealt with in linear regression analysis?

Often times a statistical analyst is handed a set dataset and asked to fit a model using a technique such as linear regression. Very frequently the dataset is accompanied with a disclaimer similar to "Oh yeah, we messed up collecting some of these…

regression outliers

asked Jul 19 '10 at 23:39

Sharpie

4,126
5
21
18

89

votes

10 answers

What is a complete list of the usual assumptions for linear regression?

What are the usual assumptions for linear regression? Do they include: a linear relationship between the independent and dependent variable independent errors normal distribution of errors homoscedasticity Are there any others?

regression assumptions

asked Oct 03 '11 at 04:19

tony

899
2
7
3

89

votes

2 answers

Resampling / simulation methods: monte carlo, bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests

I am trying to understand difference between different resampling methods (Monte Carlo simulation, parametric bootstrapping, non-parametric bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests) and their…

r bootstrap resampling jackknife permutation-test

asked Jun 19 '14 at 17:59

Ram Sharma

2,226
3
20
24

88

votes

8 answers

When is unbalanced data really a problem in Machine Learning?

We already had multiple questions about unbalanced data when using logistic regression, SVM, decision trees, bagging and a number of other similar questions, what makes it a very popular topic! Unfortunately, each of the questions seems to be…

machine-learning classification predictive-models unbalanced-classes

asked Jun 02 '17 at 12:08

Tim

108,699
20
212
390

88

votes

24 answers

Rules of thumb for "modern" statistics

I like G van Belle's book on Statistical Rules of Thumb, and to a lesser extent Common Errors in Statistics (and How to Avoid Them) from Phillip I Good and James W. Hardin. They address common pitfalls when interpreting results from experimental and…

modeling exploratory-data-analysis rule-of-thumb

asked Sep 16 '10 at 10:21

chl

50,972
18
205
364

88

votes

3 answers

What is the lasso in regression analysis?

I'm looking for a non-technical definition of the lasso and what it is used for.

regression lasso regularization

asked Oct 19 '11 at 04:24

Paul Vogt

881
1
7
3

88

votes

7 answers

Calculating the parameters of a Beta distribution using the mean and variance

How can I calculate the $\alpha$ and $\beta$ parameters for a Beta distribution if I know the mean and variance that I want the distribution to have? Examples of an R command to do this would be most helpful.

r distributions estimation beta-distribution

asked Jun 22 '11 at 17:17

Dave Kincaid

1,458
1
12
18

88

votes

6 answers

How to tell if data is "clustered" enough for clustering algorithms to produce meaningful results?

How would you know if your (high dimensional) data exhibits enough clustering so that results from kmeans or other clustering algorithm is actually meaningful? For k-means algorithm in particular, how much of a reduction in within-cluster variance…

clustering k-means

asked Jun 08 '11 at 00:04

xuexue

2,098
2
16
11

87

votes

3 answers

Shape of confidence interval for predicted values in linear regression

I have noticed that the confidence interval for predicted values in an linear regression tends to be narrow around the mean of the predictor and fat around the minimum and maximum values of the predictor. This can be seen in plots of these 4 linear…

regression confidence-interval linear-model standard-error

asked Feb 06 '14 at 00:15

luciano

12,197
30
87
119

Most Popular