Highest Voted Questions - Statistical Analysis Stack Exchange

74

votes

16 answers

Practical thoughts on explanatory vs. predictive modeling

Back in April, I attended a talk at the UMD Math Department Statistics group seminar series called "To Explain or To Predict?". The talk was given by Prof. Galit Shmueli who teaches at UMD's Smith Business School. Her talk was based on research she…

predictive-models

asked Aug 03 '10 at 20:19

wahalulu

171
1
3
7

74

votes

2 answers

What is the difference between ZCA whitening and PCA whitening?

I am confused about ZCA whitening and normal whitening (which is obtained by dividing principal components by the square roots of PCA eigenvalues). As far as I know, $$\mathbf x_\mathrm{ZCAwhite} = \mathbf U \mathbf x_\mathrm{PCAwhite},$$ where…

pca dimensionality-reduction image-processing

asked Oct 01 '14 at 07:22

RockTheStar

11,277
31
63
89

73

votes

15 answers

Good GUI for R suitable for a beginner wanting to learn programming in R?

Is there any GUI for R that makes it easier for a beginner to start learning and programming in that language?

r

asked Dec 09 '10 at 13:49

mariana soffer

1,091
2
15
18

73

votes

7 answers

Where to cut a dendrogram?

Hierarchical clustering can be represented by a dendrogram. Cutting a dendrogram at a certain level gives a set of clusters. Cutting at another level gives another set of clusters. How would you pick where to cut the dendrogram? Is there something…

clustering dendrogram

asked Oct 17 '10 at 21:57

Eduardas

2,239
4
23
22

73

votes

11 answers

Having a job in data-mining without a PhD

I've been very interested in data-mining and machine-learning for a while, partly because I majored in that area at school, but also because I am truly much more excited trying to solve problems that require a bit more thought than just programming…

machine-learning data-mining careers phd

asked May 01 '12 at 23:39

Charles Menguy

2,277
4
15
16

73

votes

3 answers

One-hot vs dummy encoding in Scikit-learn

There are two different ways to encoding categorical variables. Say, one categorical variable has n values. One-hot encoding converts it into n variables, while dummy encoding converts it into n-1 variables. If we have k categorical variables, each…

regression categorical-data data-transformation scikit-learn data-preprocessing

asked Jul 16 '16 at 04:26

Munichong

1,645
3
15
26

73

votes

10 answers

What is the difference between discrete data and continuous data?

continuous-data discrete-data

asked Jul 20 '10 at 03:53

Albort

881
1
9
10

73

votes

15 answers

Complete substantive examples of reproducible research using R

The Question: Are there any good examples of reproducible research using R that are freely available online? Ideal Example: Specifically, ideal examples would provide: The raw data (and ideally meta data explaining the data), All R code including…

r references reproducible-research

asked Aug 21 '10 at 04:58

Jeromy Anglim

42,044
23
146
250

73

votes

6 answers

Optimization when Cost Function Slow to Evaluate

Gradient descent and many other methods are useful for finding local minima in cost functions. They can be efficient when the cost function can be evaluated quickly at each point, whether numerically or analytically. I have what appears to me to…

gradient-descent optimization bayesian-optimization

asked Jan 31 '16 at 04:04

Jared Becksfort

943
1
7
12

73

votes

15 answers

Why would parametric statistics ever be preferred over nonparametric?

Can someone explain to me why would anyone choose a parametric over a nonparametric statistical method for hypothesis testing or regression analysis? In my mind, it's like going for rafting and choosing a non-water resistant watch, because you may…

regression hypothesis-testing mathematical-statistics estimation nonparametric

asked Jul 30 '15 at 11:48

en1

877
1
7
9

73

votes

4 answers

A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them?

On 25 February 2015, the journal Basic and Applied Social Psychology issued an editorial banning $p$-values and confidence intervals from all future papers. Specifically, they say (formatting and emphasis are mine): [...] prior to publication,…

hypothesis-testing confidence-interval p-value effect-size psychology

asked Feb 25 '15 at 19:01

amoeba

93,463
28
275
317

73

votes

3 answers

How to use Pearson correlation correctly with time series

I have 2 time-series (both smooth) that I would like to cross-correlate to see how correlated they are. I intend to use the Pearson correlation coefficient. Is this appropriate? My second question is that I can choose to sample the 2 time-series as…

time-series correlation pearson-r smoothing

asked Jan 12 '15 at 20:59

user1551817

1,007
1
8
11

73

votes

5 answers

Covariance and independence?

I read from my textbook that $\text{cov}(X,Y)=0$ does not guarantee X and Y are independent. But if they are independent, their covariance must be 0. I could not think of any proper example yet; could someone provide one?

independence covariance

asked Jul 09 '11 at 19:47

Flying pig

5,689
11
32
31

72

votes

9 answers

If A and B are correlated with C, why are A and B not necessarily correlated?

I know empirically that is the case. I have just developed models that run into this conundrum. I also suspect it is not necessarily a yes/no answer. I mean by that if both A and B are correlated with C, this may have some implication regarding…

correlation cross-correlation

asked Dec 25 '10 at 19:24

Sympa

6,862
3
30
56

72

votes

3 answers

How to actually plot a sample tree from randomForest::getTree()?

Anyone got library or code suggestions on how to actually plot a couple of sample trees from: getTree(rfobj, k, labelVar=TRUE) (Yes I know you're not supposed to do this operationally, RF is a blackbox, etc etc. I want to visually sanity-check a…

r data-visualization random-forest cart

asked Oct 29 '12 at 19:43

smci

1,456
1
13
20

Most Popular