Most Popular
1500 questions
41
votes
3 answers
How to interpret OOB and confusion matrix for random forest?
I got a an R script from someone to run a random forest model. I modified and run it with some employee data. We are trying to predict voluntary separations.
Here is some additional info: this is a classification model were 0 = employee stayed, 1=…

daniellopez46
- 905
- 1
- 10
- 16
41
votes
8 answers
How to test hypothesis of no group differences?
Imagine you have a study with two groups (e.g., males and females) looking at a numeric dependent variable (e.g., intelligence test scores) and you have the hypothesis that there are no group differences.
Question:
What is a good way to test…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
41
votes
10 answers
Why is 600 out of 1000 more convincing than 6 out of 10?
Look at this excerpt from "The study skills handbook", Palgrave, 2012, by Stella Cottrell, page 155:
Percentages Notice when percentages are given.
Suppose, instead, the statement above read:
60% of people preferred oranges; 40% said they…

Juya
- 653
- 1
- 6
- 9
41
votes
5 answers
Good games for learning statistical thinking?
Are there any games that get the player "think like a statistician"?
For example, lightbot gets you to "think like a programmer" (in a very basic way). Are there any games - designed for entertainment or teaching - that can help get one comfortable…

Emile
- 1,057
- 1
- 10
- 16
41
votes
8 answers
When should one include a variable in a regression despite it not being statistically significant?
I am an economics student with some experience with econometrics and R. I would like to know if there is ever a situation where we should include a variable in a regression in spite of it not being statistically significant?

EconJohn
- 742
- 2
- 9
- 27
41
votes
2 answers
What exactly is the alpha in the Dirichlet distribution?
I'm fairly new to Bayesian statistics and I came across a corrected correlation measure, SparCC, that uses the Dirichlet process in the backend of it's algorithm. I have been trying to go through the algorithm step-by-step to really understand what…

O.rka
- 1,250
- 4
- 19
- 30
41
votes
1 answer
Why are non zero-centered activation functions a problem in backpropagation?
I read here the following:
Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of processing in a Neural Network (more on
this soon) would be receiving data that is not zero-centered. This has
implications…

Amelio Vazquez-Reina
- 17,546
- 26
- 74
- 110
41
votes
4 answers
How do I fit a constrained regression in R so that coefficients total = 1?
I see a similar constrained regression here:
Constrained linear regression through a specified point
but my requirement is slightly different. I need the coefficients to add up to 1. Specifically I am regressing the returns of 1 foreign exchange…

Thomas Browne
- 819
- 1
- 16
- 28
41
votes
4 answers
How to sample from a normal distribution with known mean and variance using a conventional programming language?
I've never had a course in statistics, so I hope I'm asking in the right place here.
Suppose I have only two data describing a normal distribution: the mean $\mu$ and variance $\sigma^2$. I want to use a computer to randomly sample from this…

Fixee
- 555
- 1
- 4
- 6
41
votes
2 answers
Difference between LOESS and LOWESS
What is the difference between LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing)? From Wikipedia I can only see that LOESS is a generalization of LOWESS. Do they have slightly different parameters?

pir
- 4,626
- 10
- 38
- 73
41
votes
6 answers
Intuitive explanation of convergence in distribution and convergence in probability
What is the intuitive difference between a random variable converging in probability versus a random variable converging in distribution?
I've read numerous definitions and mathematical equations, but that does not really help. (Please keep in mind,…

nicefella
- 1,153
- 2
- 13
- 18
41
votes
2 answers
Finding Quartiles in R
I'm working through a statistics textbook while learning R and I've run into a stumbling block on the following example:
After looking at ?quantile I attempted to recreate this in R with the following:
> nuclear <- c(7, 20, 16, 6, 58, 9, 20, 50,…
user60305
41
votes
4 answers
For plotting with R, should I learn ggplot2 or ggvis?
For plotting with R, should I learn ggplot2 or ggvis? I don't necessarily want to learn both if one of them is superior in any regard. Why R community keeps creating new packages with overlapping functionalities? The introduction blog post does…

qazwsx
- 707
- 2
- 7
- 10
40
votes
6 answers
How does cross-validation overcome the overfitting problem?
Why does a cross-validation procedure overcome the problem of overfitting a model?

user3269
- 4,622
- 8
- 43
- 53
40
votes
5 answers
Cross-validating time-series analysis
I've been using the caret package in R to build predictive models for classification and regression. Caret provides a unified interface to tune model hyper-parameters by cross validation or boot strapping. For example, if you are building a simple…

Zach
- 22,308
- 18
- 114
- 158