Most Popular
1500 questions
74
votes
16 answers
Practical thoughts on explanatory vs. predictive modeling
Back in April, I attended a talk at the UMD Math Department Statistics group seminar series called "To Explain or To Predict?". The talk was given by Prof. Galit Shmueli who teaches at UMD's Smith Business School. Her talk was based on research she…

wahalulu
- 171
- 1
- 3
- 7
74
votes
2 answers
What is the difference between ZCA whitening and PCA whitening?
I am confused about ZCA whitening and normal whitening (which is obtained by dividing principal components by the square roots of PCA eigenvalues). As far as I know,
$$\mathbf x_\mathrm{ZCAwhite} = \mathbf U \mathbf x_\mathrm{PCAwhite},$$ where…

RockTheStar
- 11,277
- 31
- 63
- 89
73
votes
15 answers
Good GUI for R suitable for a beginner wanting to learn programming in R?
Is there any GUI for R that makes it easier for a beginner to start learning and programming in that language?

mariana soffer
- 1,091
- 2
- 15
- 18
73
votes
7 answers
Where to cut a dendrogram?
Hierarchical clustering can be represented by a dendrogram. Cutting a dendrogram at a certain level gives a set of clusters. Cutting at another level gives another set of clusters. How would you pick where to cut the dendrogram? Is there something…

Eduardas
- 2,239
- 4
- 23
- 22
73
votes
11 answers
Having a job in data-mining without a PhD
I've been very interested in data-mining and machine-learning for a while, partly because I majored in that area at school, but also because I am truly much more excited trying to solve problems that require a bit more thought than just programming…

Charles Menguy
- 2,277
- 4
- 15
- 16
73
votes
3 answers
One-hot vs dummy encoding in Scikit-learn
There are two different ways to encoding categorical variables. Say, one categorical variable has n values. One-hot encoding converts it into n variables, while dummy encoding converts it into n-1 variables. If we have k categorical variables, each…

Munichong
- 1,645
- 3
- 15
- 26
73
votes
10 answers
What is the difference between discrete data and continuous data?
What is the difference between discrete data and continuous data?

Albort
- 881
- 1
- 9
- 10
73
votes
15 answers
Complete substantive examples of reproducible research using R
The Question: Are there any good examples of reproducible research using R that are freely available online?
Ideal Example:
Specifically, ideal examples would provide:
The raw data (and ideally meta data explaining the data),
All R code including…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
73
votes
6 answers
Optimization when Cost Function Slow to Evaluate
Gradient descent and many other methods are useful for finding local minima in cost functions. They can be efficient when the cost function can be evaluated quickly at each point, whether numerically or analytically.
I have what appears to me to…

Jared Becksfort
- 943
- 1
- 7
- 12
73
votes
15 answers
Why would parametric statistics ever be preferred over nonparametric?
Can someone explain to me why would anyone choose a parametric over a nonparametric statistical method for hypothesis testing or regression analysis?
In my mind, it's like going for rafting and choosing a non-water resistant watch, because you may…

en1
- 877
- 1
- 7
- 9
73
votes
4 answers
A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them?
On 25 February 2015, the journal Basic and Applied Social Psychology issued an editorial banning $p$-values and confidence intervals from all future papers.
Specifically, they say (formatting and emphasis are mine):
[...] prior to publication,…

amoeba
- 93,463
- 28
- 275
- 317
73
votes
3 answers
How to use Pearson correlation correctly with time series
I have 2 time-series (both smooth) that I would like to cross-correlate to see how correlated they are.
I intend to use the Pearson correlation coefficient. Is this appropriate?
My second question is that I can choose to sample the 2 time-series as…

user1551817
- 1,007
- 1
- 8
- 11
73
votes
5 answers
Covariance and independence?
I read from my textbook that $\text{cov}(X,Y)=0$ does not guarantee X and Y are independent. But if they are independent, their covariance must be 0. I could not think of any proper example yet; could someone provide one?

Flying pig
- 5,689
- 11
- 32
- 31
72
votes
9 answers
If A and B are correlated with C, why are A and B not necessarily correlated?
I know empirically that is the case. I have just developed models that run into this conundrum. I also suspect it is not necessarily a yes/no answer. I mean by that if both A and B are correlated with C, this may have some implication regarding…

Sympa
- 6,862
- 3
- 30
- 56
72
votes
3 answers
How to actually plot a sample tree from randomForest::getTree()?
Anyone got library or code suggestions on how to actually plot a couple of sample trees from:
getTree(rfobj, k, labelVar=TRUE)
(Yes I know you're not supposed to do this operationally, RF is a blackbox, etc etc. I want to visually sanity-check a…

smci
- 1,456
- 1
- 13
- 20