Most Popular

1500 questions
40
votes
2 answers

What is it meant with the $\sigma$-algebra generated by a random variable?

Often, in the course of my (self-)study of statistics, I've met the terminology "$\sigma$-algebra generated by a random variable". I don't understand the definition on Wikipedia, but most importantly I don't get the intuition behind it. Why/when do…
DeltaIV
  • 15,894
  • 4
  • 62
  • 104
40
votes
3 answers

Why are Decision Trees not computationally expensive?

In An Introduction to Statistical Learning with Applications in R, the authors write that fitting a decision tree is very fast, but this doesn't make sense to me. The algorithm has to go through every feature and partition it in every way possible…
matt_js
  • 451
  • 4
  • 6
40
votes
3 answers

Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant overall F-statistic

I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations. Regression gives me the following $p$-values from the $t$-test for every slope coefficient:…
40
votes
3 answers

Why law of large numbers does not apply in the case of Apple share price?

Here is the article in NY times called "Apple confronts the law of large numbers". It tries to explain Apple share price rise using law of large numbers. What statistical (or mathematical) errors does this article make?
40
votes
6 answers

Effect size as the hypothesis for significance testing

Today, at the Cross Validated Journal Club (why weren't you there?), @mbq asked: Do you think we (modern data scientists) know what significance means? And how it relates to our confidence in our results? @Michelle replied as some (including me)…
Carlos Accioly
  • 4,715
  • 4
  • 25
  • 28
40
votes
3 answers

Why do naive Bayesian classifiers perform so well?

Naive Bayes classifiers are a popular choice for classification problems. There are many reasons for this, including: "Zeitgeist" - widespread awareness after the success of spam filters about ten years ago Easy to write The classifier model is…
winwaed
  • 1,103
  • 1
  • 9
  • 11
40
votes
2 answers

How to draw valid conclusions from "big data"?

"Big data" is everywhere in the media. Everybody says that "big data" is the big thing for 2012, e.g. KDNuggets poll on hot topics for 2012. However, I have deep concerns here. With big data, everybody seems to be happy just to get anything out. But…
Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
40
votes
1 answer

Step-by-step example of reverse-mode automatic differentiation

Not sure if this question belongs here, but it's closely related to gradient methods in optimization, which seems to be on-topic here. Anyway, feel free to migrate if you think some other community has better expertise in the topic. In short, I'm…
ffriend
  • 9,380
  • 5
  • 24
  • 29
40
votes
14 answers

Regression to the mean vs gambler's fallacy

On the one hand, I have the regression to the mean and on the other hand I have the gambler´s fallacy. Gambler’s fallacy is defined by Miller and Sanjurjo (2019) as “the mistaken belief that random sequences have a systematic tendency towards…
Luis P.
  • 731
  • 1
  • 5
  • 12
40
votes
1 answer

Training loss goes down and up again. What is happening?

My training loss goes down and then up again. It is very weird. The cross-validation loss tracks the training loss. What is going on? I have two stacked LSTMS as follows (on Keras): model = Sequential() model.add(LSTM(512, return_sequences=True,…
patapouf_ai
  • 503
  • 1
  • 5
  • 7
40
votes
8 answers

Under what conditions should one use multilevel/hierarchical analysis?

Under which conditions should someone consider using multilevel/hierarchical analysis as opposed to more basic/traditional analyses (e.g., ANOVA, OLS regression, etc.)? Are there any situations in which this could be considered mandatory? Are there…
Patrick
  • 723
  • 1
  • 8
  • 12
40
votes
1 answer

When is nested cross-validation really needed and can make a practical difference?

When using cross-validation to do model selection (such as e.g. hyperparameter tuning) and to assess the performance of the best model, one should use nested cross-validation. The outer loop is to assess the performance of the model, and the inner…
amoeba
  • 93,463
  • 28
  • 275
  • 317
40
votes
4 answers

What are the advantages of stacking multiple LSTMs?

What are the advantages, why would one use multiple LSTMs, stacked one side-by-side, in a deep-network? I am using a LSTM to represent a sequence of inputs as a single input. So once I have that single representation— why would I pass it through…
40
votes
2 answers

Variance of product of dependent variables

What is the formula for variance of product of dependent variables? In the case of independent variables the formula is simple: $$ {\rm var}(XY) = E(X^{2}Y^{2}) - E(XY)^{2} = {\rm var}(X){\rm var}(Y) + {\rm var}(X)E(Y)^2 + {\rm var}(Y)E(X)^2 $$ But…
Riga
  • 103
  • 1
  • 5
  • 6
40
votes
7 answers

Combining probabilities/information from different sources

Lets say I have three independent sources and each of them make predictions for the weather tomorrow. The first one says that the probability of rain tomorrow is 0, then the second one says that the probability is 1, and finally the last one says…