Most Popular

1500 questions
41
votes
3 answers

How to interpret OOB and confusion matrix for random forest?

I got a an R script from someone to run a random forest model. I modified and run it with some employee data. We are trying to predict voluntary separations. Here is some additional info: this is a classification model were 0 = employee stayed, 1=…
daniellopez46
  • 905
  • 1
  • 10
  • 16
41
votes
8 answers

How to test hypothesis of no group differences?

Imagine you have a study with two groups (e.g., males and females) looking at a numeric dependent variable (e.g., intelligence test scores) and you have the hypothesis that there are no group differences. Question: What is a good way to test…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
41
votes
10 answers

Why is 600 out of 1000 more convincing than 6 out of 10?

Look at this excerpt from "The study skills handbook", Palgrave, 2012, by Stella Cottrell, page 155: Percentages Notice when percentages are given. Suppose, instead, the statement above read: 60% of people preferred oranges; 40% said they…
Juya
  • 653
  • 1
  • 6
  • 9
41
votes
5 answers

Good games for learning statistical thinking?

Are there any games that get the player "think like a statistician"? For example, lightbot gets you to "think like a programmer" (in a very basic way). Are there any games - designed for entertainment or teaching - that can help get one comfortable…
Emile
  • 1,057
  • 1
  • 10
  • 16
41
votes
8 answers

When should one include a variable in a regression despite it not being statistically significant?

I am an economics student with some experience with econometrics and R. I would like to know if there is ever a situation where we should include a variable in a regression in spite of it not being statistically significant?
EconJohn
  • 742
  • 2
  • 9
  • 27
41
votes
2 answers

What exactly is the alpha in the Dirichlet distribution?

I'm fairly new to Bayesian statistics and I came across a corrected correlation measure, SparCC, that uses the Dirichlet process in the backend of it's algorithm. I have been trying to go through the algorithm step-by-step to really understand what…
O.rka
  • 1,250
  • 4
  • 19
  • 30
41
votes
1 answer

Why are non zero-centered activation functions a problem in backpropagation?

I read here the following: Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of processing in a Neural Network (more on this soon) would be receiving data that is not zero-centered. This has implications…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
41
votes
4 answers

How do I fit a constrained regression in R so that coefficients total = 1?

I see a similar constrained regression here: Constrained linear regression through a specified point but my requirement is slightly different. I need the coefficients to add up to 1. Specifically I am regressing the returns of 1 foreign exchange…
Thomas Browne
  • 819
  • 1
  • 16
  • 28
41
votes
4 answers

How to sample from a normal distribution with known mean and variance using a conventional programming language?

I've never had a course in statistics, so I hope I'm asking in the right place here. Suppose I have only two data describing a normal distribution: the mean $\mu$ and variance $\sigma^2$. I want to use a computer to randomly sample from this…
Fixee
  • 555
  • 1
  • 4
  • 6
41
votes
2 answers

Difference between LOESS and LOWESS

What is the difference between LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing)? From Wikipedia I can only see that LOESS is a generalization of LOWESS. Do they have slightly different parameters?
pir
  • 4,626
  • 10
  • 38
  • 73
41
votes
6 answers

Intuitive explanation of convergence in distribution and convergence in probability

What is the intuitive difference between a random variable converging in probability versus a random variable converging in distribution? I've read numerous definitions and mathematical equations, but that does not really help. (Please keep in mind,…
nicefella
  • 1,153
  • 2
  • 13
  • 18
41
votes
2 answers

Finding Quartiles in R

I'm working through a statistics textbook while learning R and I've run into a stumbling block on the following example: After looking at ?quantile I attempted to recreate this in R with the following: > nuclear <- c(7, 20, 16, 6, 58, 9, 20, 50,…
user60305
41
votes
4 answers

For plotting with R, should I learn ggplot2 or ggvis?

For plotting with R, should I learn ggplot2 or ggvis? I don't necessarily want to learn both if one of them is superior in any regard. Why R community keeps creating new packages with overlapping functionalities? The introduction blog post does…
qazwsx
  • 707
  • 2
  • 7
  • 10
40
votes
6 answers

How does cross-validation overcome the overfitting problem?

Why does a cross-validation procedure overcome the problem of overfitting a model?
user3269
  • 4,622
  • 8
  • 43
  • 53
40
votes
5 answers

Cross-validating time-series analysis

I've been using the caret package in R to build predictive models for classification and regression. Caret provides a unified interface to tune model hyper-parameters by cross validation or boot strapping. For example, if you are building a simple…
Zach
  • 22,308
  • 18
  • 114
  • 158