Highest Voted Questions - Statistical Analysis Stack Exchange

34

votes

5 answers

How to split dataset for time-series prediction?

I have historic sales data from a bakery (daily, over 3 years). Now I want to build a model to predict future sales (using features like weekday, weather variables, etc.). How should I split the dataset for fitting and evaluating the models? Does…

cross-validation partitioning

asked Sep 30 '14 at 16:23

tobip

1,450
4
14
11

34

votes

10 answers

How to represent an unbounded variable as number between 0 and 1

I want to represent a variable as a number between 0 and 1. The variable is a non-negative integer with no inherent bound. I map 0 to 0 but what can I map to 1 or numbers between 0 and 1? I could use the history of that variable to provide the…

normalization

asked Aug 02 '10 at 14:38

Russell Gallop

443
1
4
5

34

votes

2 answers

Satterthwaite vs. Kenward-Roger approximations for the degrees of freedom in mixed models

The lmerTest package provides an anova() function for linear mixed models with optionally Satterthwaite's (default) or Kenward-Roger's approximation of the degrees of freedom (df). What is the difference between these two approaches? When to choose…

r anova mixed-model lme4-nlme degrees-of-freedom

asked Jul 16 '14 at 13:30

doko

441
1
4
4

34

votes

1 answer

Relation between variational Bayes and EM

I read somewhere that Variational Bayes method is a generalization of the EM algorithm. Indeed, the iterative parts of the algorithms are very similar. In order to test whether the EM algorithm is a special version of the Variational Bayes, I tried…

bayesian expectation-maximization variational-bayes

asked Jul 03 '14 at 12:44

Ufuk Can Bicici

2,028
1
17
26

34

votes

6 answers

Data mining: How should I go about finding the functional form?

I'm curious about repeatable procedures that can be used to discover the functional form of the function y = f(A, B, C) + error_term where my only input is a set of observations (y, A, B and C). Please note that the functional form of fis…

regression machine-learning algorithms model-selection data-mining

asked May 05 '11 at 16:26

knorv

399
3
6

33

votes

5 answers

What are the relative merits of Winsorizing vs. Trimming data?

Winsorizing data means to replace the extreme values of a data set with a certain percentile value from each end, while Trimming or Truncating involves removing those extreme values. I always see both methods discussed as a viable option to lessen…

mean truncation winsorizing trimmed-mean types-of-averages

asked Mar 18 '14 at 14:25

Brian

551
1
5
8

33

votes

5 answers

Why do political polls have such large sample sizes?

When I watch the news I've noticed that the Gallup polls for things like presidential elections have [I assume random] sample sizes of well over 1,000. From what I remember from college statistics was that a sample size of 30 was a "significantly…

sampling sample-size statistical-power

asked Feb 24 '14 at 22:23

samplesize999

331
3
3

33

votes

3 answers

How to interpret the dendrogram of a hierarchical cluster analysis

Consider the R example below: plot( hclust(dist(USArrests), "ave") ) What exactly does the y-axis "Height" mean? Looking at North Carolina and California (rather on the left). Is California "closer" to North Carolina than Arizona? Can I make this…

interpretation hierarchical-clustering dendrogram

asked Jan 15 '14 at 11:04

Richi W

3,216
3
30
53

33

votes

6 answers

What would a robust Bayesian model for estimating the scale of a roughly normal distribution be?

There exists a number of robust estimators of scale. A notable example is the median absolute deviation which relates to the standard deviation as $\sigma = \mathrm{MAD}\cdot1.4826$. In a Bayesian framework there exist a number of ways to robustly…

r bayesian estimation standard-deviation robust

asked Jan 13 '14 at 16:08

Rasmus Bååth

6,422
34
57

33

votes

8 answers

Replacing outliers with mean

This question was asked by my friend who is not internet savvy. I've no statistics background and I've been searching around internet for this question. The question is : is it possible to replace outliers with mean value? if it's possible, is…

mean outliers robust winsorizing

asked Nov 29 '13 at 14:08

Alun

433
1
4
5

33

votes

5 answers

How to change data between wide and long formats in R?

You can have data in wide format or in long format. This is quite an important thing, as the useable methods are different, depending on the format. I know you have to work with melt() and cast() from the reshape package, but there seems some things…

data-transformation r

asked Feb 21 '11 at 10:27

Mien

719
3
9
18

33

votes

3 answers

Why not report the mean of a bootstrap distribution?

When one bootstraps a parameter to get the standard error we get a distribution of the parameter. Why don't we use the mean of that distribution as a result or estimate for the parameter we are trying to get? Shouldn't the distribution approximate…

distributions bootstrap standard-error expected-value

asked Sep 28 '13 at 22:32

Guillermo Perez

431
4
3

33

votes

4 answers

How do I fit a multilevel model for over-dispersed poisson outcomes?

I want to fit a multilevel GLMM with a Poisson distribution (with over-dispersion) using R. At the moment I am using lme4 but I noticed that recently the quasipoisson family was removed. I've seen elsewhere that you can model additive…

r mixed-model poisson-distribution lme4-nlme overdispersion

asked Feb 08 '11 at 13:02

George Michaelides

1,039
1
9
19

33

votes

2 answers

Drawing from Dirichlet distribution

Let's say we have a Dirichlet distribution with $K$-dimensional vector parameter $\vec\alpha = [\alpha_1, \alpha_2,...,\alpha_K]$. How can I draw a sample (a $K$-dimensional vector) from this distribution? I need a (possibly) simple explanation.

sampling dirichlet-distribution

asked Sep 04 '13 at 16:15

user1315305

1,199
4
14
15

33

votes

8 answers

What math subjects would you suggest to prepare for data mining and machine learning?

I'm trying to put together a self-directed math curriculum to prepare for learning data mining and machine learning. This is motivated by starting Andrew Ng's machine learning class on Coursera and feeling that before proceeding I needed to improve…

machine-learning references data-mining

asked Aug 30 '13 at 17:30

measureallthethings

131
1
3
7

Most Popular