Most Popular

1500 questions
70
votes
3 answers

Proper way of using recurrent neural network for time series analysis

Recurrent neural networks differ from "regular" ones by the fact that they have a "memory" layer. Due to this layer, recurrent NN's are supposed to be useful in time series modelling. However, I'm not sure I understand correctly how to use…
Boris Gorelik
  • 2,397
  • 3
  • 19
  • 25
70
votes
6 answers

Do the predictions of a Random Forest model have a prediction interval?

If I run a randomForest model, I can then make predictions based on the model. Is there a way to get a prediction interval of each of the predictions such that I know how "sure" the model is of its answer. If this is possible is it simply based on…
Dean MacGregor
  • 956
  • 1
  • 7
  • 10
70
votes
2 answers

What is the difference between a neural network and a deep belief network?

I am getting the impression that when people are referring to a 'deep belief' network that this is basically a neural network but very large. Is this correct or does a deep belief network also imply that the algorithm itself is different (ie, no…
70
votes
5 answers

Why is the Jeffreys prior useful?

I understand that the Jeffreys prior is invariant under re-parameterization. However, what I don't understand is why this property is desired. Why wouldn't you want the prior to change under a change of variables?
tskuzzy
  • 933
  • 2
  • 8
  • 13
70
votes
3 answers

When are Log scales appropriate?

I've read that using log scales when charting/graphing is appropriate in certain circumstances, like the y-axis in a time series chart. However, I've not been able to find a definitive explanation as to why that's the case, or when else it would be…
dav
  • 1,551
  • 2
  • 15
  • 23
70
votes
3 answers

What's the difference between feed-forward and recurrent neural networks?

What is the difference between a feed-forward and recurrent neural network? Why would you use one over the other? Do other network topologies exist?
70
votes
3 answers

How is the minimum of a set of IID random variables distributed?

If $X_1, ..., X_n$ are independent identically-distributed random variables, what can be said about the distribution of $\min(X_1, ..., X_n)$ in general?
Simon Nickerson
  • 811
  • 1
  • 8
  • 9
70
votes
6 answers

Why is multicollinearity not checked in modern statistics/machine learning

In traditional statistics, while building a model, we check for multicollinearity using methods such as estimates of the variance inflation factor (VIF), but in machine learning, we instead use regularization for feature selection and don't seem to…
69
votes
8 answers

What are good basic statistics to use for ordinal data?

I have some ordinal data gained from survey questions. In my case they are Likert style responses (Strongly Disagree-Disagree-Neutral-Agree-Strongly Agree). In my data they are coded as 1-5. I don't think means would mean much here, so what basic…
PaulHurleyuk
  • 1,549
  • 3
  • 16
  • 18
69
votes
8 answers

Is PCA followed by a rotation (such as varimax) still PCA?

I have tried to reproduce some research (using PCA) from SPSS in R. In my experience, principal() function from package psych was the only function that came close (or if my memory serves me right, dead on) to match the output. To match the same…
Roman Luštrik
  • 3,338
  • 3
  • 31
  • 39
69
votes
2 answers

What is the relationship between independent component analysis and factor analysis?

I am new to Independent Component Analysis (ICA) and have just a rudimentary understanding of the the method. It seems to me that ICA is similar to Factor Analysis (FA) with one exception: ICA assumes that the observed random variables are a linear…
69
votes
4 answers

Reduce Classification Probability Threshold

I have a question regarding classification in general. Let $f$ be a classifier, which outputs a set of probabilities given some data D. Normally, one would say: well, if $P(c|D) > 0.5$, we will assign a class 1, otherwise 0 (let this be a binary…
sdgaw erzswer
  • 1,199
  • 1
  • 9
  • 13
69
votes
1 answer

What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?

The Mean Absolute Percentage Error (mape) is a common accuracy or error measure for time series or other predictions, $$ \text{MAPE} = \frac{100}{n}\sum_{t=1}^n\frac{|A_t-F_t|}{A_t}\%,$$ where $A_t$ are actuals and $F_t$ corresponding forecasts or…
Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
69
votes
7 answers

What is a "saturated" model?

What is meant when we say we have a saturated model?
Graham Cookson
  • 7,543
  • 6
  • 41
  • 35
69
votes
5 answers

What problem do shrinkage methods solve?

The holiday season has given me the opportunity to curl up next to the fire with The Elements of Statistical Learning. Coming from a (frequentist) econometrics perspective, I'm having trouble grasping the uses of shrinkage methods like ridge…
Charlie
  • 13,124
  • 5
  • 38
  • 68