Most Popular

1500 questions
34
votes
3 answers

In caret what is the real difference between cv and repeatedcv?

This is similar to question Caret re-sampling methods, although that really never answered this part of the question in an agreed upon way. caret's train function offers cv and repeatedcv. What is the difference in say…
Brian Feeny
  • 501
  • 1
  • 5
  • 5
34
votes
2 answers

Distributions other than the normal where mean and variance are independent

I was wondering if there are any distributions besides the normal where the mean and variance are independent of each other (or in other words, where the variance is not a function of the mean).
Wolfgang
  • 15,542
  • 1
  • 47
  • 74
34
votes
6 answers

What is the difference between logistic regression and neural networks?

How do we explain the difference between logistic regression and neural network to an audience that have no background in statistics?
user16789
  • 740
  • 1
  • 9
  • 13
34
votes
4 answers

Why isn't RANSAC most widely used in statistics?

Coming from the field of computer vision, I've often used the RANSAC (Random Sample Consensus) method for fitting models to data with lots of outliers. However, I've never seen it used by statisticians, and I've always been under the impression…
Bossykena
  • 667
  • 6
  • 11
34
votes
3 answers

When to use a GAM vs GLM

I realize this may be a potentially broad question, but I was wondering whether there are assumptions that indicate the use of a GAM (Generalized additive model) over a GLM (Generalized linear model)? Someone recently told me that GAMs should only…
34
votes
7 answers

How to get started with neural networks

I'm completely new to neural networks but highly interested in understanding them. However it's not easy at all to get started. Could anyone recommend a good book or any other kind of resource? Is there a must-read? I'm thankful for any kind of tip.
Claudio Albertin
  • 443
  • 1
  • 5
  • 6
34
votes
3 answers

Propensity score matching after multiple imputation

I refer to this paper: Hayes JR, Groner JI. "Using multiple imputation and propensity scores to test the effect of car seats and seat belt usage on injury severity from trauma registry data." J Pediatr Surg. 2008 May;43(5):924-7. In this study,…
Joe King
  • 3,024
  • 6
  • 32
  • 58
34
votes
2 answers

Performing a statistical test after visualizing data - data dredging?

I'll propose this question by means of an example. Suppose I have a data set, such as the boston housing price data set, in which I have continuous and categorical variables. Here, we have a "quality" variable, from 1 to 10, and the sale price. I…
34
votes
2 answers

Interpretation of simple predictions to odds ratios in logistic regression

I'm somewhat new to using logistic regression, and a bit confused by a discrepancy between my interpretations of the following values which I thought would be the same: exponentiated beta values predicted probability of the outcome using beta…
mike
  • 767
  • 4
  • 10
  • 15
34
votes
7 answers

Real-life examples of common distributions

I am a grad student developing an interest for statistics. I like the material over-all, but I sometimes have a hard time thinking about applications to real life. Specifically, my question is about commonly used statistical distributions (normal -…
34
votes
4 answers

Origin of "5$\sigma$" threshold for accepting evidence in particle physics?

News reports say that CERN will announce tomorrow that the Higgs boson has been experimentally detected with 5$\sigma$ evidence. According to that article: 5$\sigma$ equates to a 99.99994% chance that the data the CMS and ATLAS detectors are…
Harvey Motulsky
  • 14,903
  • 11
  • 51
  • 98
34
votes
6 answers

Why is the expected value named so?

I understand how we get 3.5 as the expected value for rolling a fair 6-sided die. But intuitively, I can expect each face with equal chance of 1/6. So shouldn't the expected value of rolling a die be either of the number between 1-6 with equal…
34
votes
6 answers

Can somebody offer an example of a unimodal distribution which has a skewness of zero but which is not symmetrical?

In May 2010 Wikipedia user Mcorazao added a sentence to the skewness article that "A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution."…
Andy McKenzie
  • 1,299
  • 8
  • 16
34
votes
2 answers

Raw residuals versus standardised residuals versus studentised residuals - what to use when?

This looks like a similar question and didn't get many responses. Omitting tests such as Cook's D, and just looking at residuals as a group, I am interested in how others use residuals when assessing goodness-of-fit. I use the raw residuals: in a…
Michelle
  • 3,640
  • 1
  • 23
  • 33
34
votes
2 answers

How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?

I want to generate the plot described in the book ElemStatLearn "The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition" by Trevor Hastie & Robert Tibshirani& Jerome Friedman. The plot is: I am wondering how I…
littleEinstein
  • 523
  • 1
  • 5
  • 7