Highest Voted Questions - Statistical Analysis Stack Exchange

107

votes

7 answers

T-test for non normal when N>50?

Long ago I learnt that normal distribution was necessary to use a two sample T-test. Today a colleague told me that she learnt that for N>50 normal distribution was not necessary. Is that true? If true is that because of the central limit theorem?

hypothesis-testing normal-distribution t-test inference central-limit-theorem

asked Apr 14 '11 at 21:55

even

2,147
6
18
13

107

votes

3 answers

Does an unbalanced sample matter when doing logistic regression?

Okay, so I think I have a decent enough sample, taking into account the 20:1 rule of thumb: a fairly large sample (N=374) for a total of 7 candidate predictor variables. My problem is the following: whatever set of predictor variables I use, the…

regression logistic sample-size unbalanced-classes

asked Jan 07 '11 at 16:48

Michiel

1,173
3
8
5

107

votes

4 answers

What is rank deficiency, and how to deal with it?

Fitting a logistic regression using lme4 ends with Error in mer_finalize(ans) : Downdated X'X is not positive definite. A likely cause of this error is apparently rank deficiency. What is rank deficiency, and how should I address it?

r logistic lme4-nlme

asked Aug 25 '12 at 06:30

Jack Tanner

4,552
3
27
39

107

votes

2 answers

What is covariance in plain language?

What is covariance in plain language and how is it linked to the terms dependence, correlation and variance-covariance structure with respect to repeated-measures designs?

correlation repeated-measures terminology covariance independence

asked Jun 03 '12 at 05:01

abc

1,747
3
17
32

107

votes

12 answers

When should linear regression be called "machine learning"?

In a recent colloquium, the speaker's abstract claimed they were using machine learning. During the talk, the only thing related to machine learning was that they perform linear regression on their data. After calculating the best-fit coefficients…

regression machine-learning multiple-regression terminology definition

asked Mar 20 '17 at 22:10

jvriesem

1,399
2
9
14

107

votes

15 answers

US Election results 2016: What went wrong with prediction models?

First it was Brexit, now the US election. Many model predictions were off by a wide margin, and are there lessons to be learned here? As late as 4 pm PST yesterday, the betting markets were still favoring Hillary 4 to 1. I take it that the betting…

predictive-models ensemble-learning confounding

asked Nov 09 '16 at 18:08

horaceT

3,162
3
15
19

107

votes

4 answers

How to select kernel for SVM?

When using SVM, we need to select a kernel. I wonder how to select a kernel. Any criteria on kernel selection?

machine-learning svm kernel-trick

asked Nov 07 '11 at 11:12

xiaohan2012

6,819
5
18
18

106

votes

17 answers

What is the role of the logarithm in Shannon's entropy?

Shannon's entropy is the negative of the sum of the probabilities of each outcome multiplied by the logarithm of probabilities for each outcome. What purpose does the logarithm serve in this equation? An intuitive or visual answer (as opposed to a…

intuition entropy information-theory sequence-analysis diversity

asked Feb 19 '14 at 17:33

histelheim

2,465
4
23
40

106

votes

10 answers

Validation Error less than training error?

I found two questions here and here about this issue but there is no obvious answer or explanation yet.I enforce the same problem where the validation error is less than training error in my Convolution Neural Network. What does that mean?

machine-learning mathematical-statistics neural-networks cross-validation

asked Dec 17 '15 at 22:04

Bido

1,163
2
8
5

106

votes

1 answer

Conditional inference trees vs traditional decision trees

Can anyone explain the primary differences between conditional inference trees (ctree from party package in R) compared to the more traditional decision tree algorithms (such as rpart in R)? What makes CI trees different? Strengths and…

r machine-learning cart

asked Jun 20 '11 at 21:45

B_Miner

7,560
20
81
144

106

votes

11 answers

"Best" series of colors to use for differentiating series in publication-quality plots

Has any study been done on what are the best set of colors to use for showing multiple series on the same plot? I've just been using the defaults in matplotlib, and they look a little childish since they're all bright, primary colors.

data-visualization

asked Oct 06 '14 at 14:33

Daisy Sophia Hollman

1,203
2
9
7

105

votes

19 answers

How to annoy a statistical referee?

I recently asked a question regarding general principles around reviewing statistics in papers. What I would now like to ask, is what particularly irritates you when reviewing a paper, i.e. what's the best way to really annoy a statistical…

references referee

asked Oct 20 '10 at 19:09

csgillespie

11,849
9
56
85

105

votes

7 answers

Is it necessary to scale the target value in addition to scaling features for regression analysis?

I'm building regression models. As a preprocessing step, I scale my feature values to have mean 0 and standard deviation 1. Is it necessary to normalize the target values also?

regression machine-learning

asked Aug 11 '14 at 14:44

user2806363

2,313
3
17
27

104

votes

4 answers

What is the difference between zero-inflated and hurdle models?

I wonder if there is a clear-cut difference between the so-called zero-inflated distributions (models) and so-called hurdle-at-zero distributions (models)? The terms occur quite often in the literature and I suspect they are not the same, but would…

zero-inflation

asked Jan 07 '14 at 04:46

skulker

1,268
2
9
6

104

votes

13 answers

Simple algorithm for online outlier detection of a generic time series

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic). I would…

time-series outliers mathematical-statistics real-time

asked Aug 02 '10 at 20:37

gianluca

1,921
4
16
9

Most Popular