Highest Voted Questions - Statistical Analysis Stack Exchange

190

votes

5 answers

How exactly does one “control for other variables”?

Here is the article that motivated this question: Does impatience make us fat? I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, age, etc) in order to best isolate the true…

regression causality confounding controlling-for-a-variable statistics-in-media

asked Oct 20 '11 at 20:52

JackOfAll

2,597
6
20
16

188

votes

9 answers

How to summarize data by group in R?

I have R data frame like this: age group 1 23.0883 1 2 25.8344 1 3 29.4648 1 4 32.7858 2 5 33.6372 1 6 34.9350 1 7 35.2115 2 8 35.2115 2 9 35.2115 2 10 36.7803 1 ... I need to get…

r data-transformation

asked Mar 13 '11 at 12:02

Yuriy Petrovskiy

4,081
7
25
30

188

votes

2 answers

How do I get the number of rows of a data.frame in R?

After reading a dataset: dataset <- read.csv("forR.csv") How can I get R to give me the number of cases it contains? Also, will the returned value include of exclude cases omitted with na.omit(dataset)?

r

asked Dec 08 '10 at 12:16

Tom Wright

2,161
2
15
14

187

votes

10 answers

Why the sudden fascination with tensors?

I've noticed lately that a lot of people are developing tensor equivalents of many methods (tensor factorization, tensor kernels, tensors for topic modeling, etc) I'm wondering, why is the world suddenly fascinated with tensors? Are there recent…

machine-learning references matrix linear-algebra tensor

asked Feb 23 '16 at 09:38

Y. S.

1,237
3
9
14

186

votes

8 answers

What intuitive explanation is there for the central limit theorem?

In several different contexts we invoke the central limit theorem to justify whatever statistical method we want to adopt (e.g., approximate the binomial distribution by a normal distribution). I understand the technical details as to why the…

intuition central-limit-theorem

asked Oct 19 '10 at 02:14

user28

183

votes

2 answers

How to determine which distribution fits my data best?

I have a dataset and would like to figure out which distribution fits my data best. I used the fitdistr() function to estimate the necessary parameters to describe the assumed distribution (i.e. Weibull, Cauchy, Normal). Using those parameters I…

r distributions goodness-of-fit kolmogorov-smirnov-test distribution-identification

asked Jan 08 '15 at 09:37

tobibo

1,935
3
11
8

182

votes

4 answers

Why do we need sigma-algebras to define probability spaces?

We have a random experiment with different outcomes forming the sample space $\Omega,$ on which we look with interest at certain patterns, called events $\mathscr{F}.$ Sigma-algebras (or sigma-fields) are made up of events to which a probability…

probability intuition measure-theory sigma-algebra

asked Mar 01 '16 at 09:44

Antoni Parellada

23,430
15
100
197

178

votes

78 answers

Statistics Jokes

Well, we've got favourite statistics quotes. What about statistics jokes?

references humor

asked Aug 06 '10 at 01:53

Thylacoleo

4,829
5
24
32

178

votes

5 answers

Training on the full dataset after cross-validation?

TL:DR: Is it ever a good idea to train an ML model on all the data available before shipping it to production? Put another way, is it ever ok to train on all data available and not check if the model overfits, or get a final read of the expected…

machine-learning cross-validation model-selection

asked Jun 05 '11 at 16:50

Amelio Vazquez-Reina

17,546
26
74
110

177

votes

8 answers

What does 1x1 convolution mean in a neural network?

I am currently doing the Udacity Deep Learning Tutorial. In Lesson 3, they talk about a 1x1 convolution. This 1x1 convolution is used in Google Inception Module. I'm having trouble understanding what is a 1x1 convolution. I have also seen this post…

neural-networks deep-learning convolution conv-neural-network

asked Feb 05 '16 at 03:33

jkschin

1,873
3
9
6

176

votes

8 answers

What is the influence of C in SVMs with linear kernel?

I am currently using an SVM with a linear kernel to classify my data. There is no error on the training set. I tried several values for the parameter $C$ ($10^{-5}, \dots, 10^2$). This did not change the error on the test set. Now I wonder: is…

machine-learning svm libsvm

asked Jun 23 '12 at 19:54

alfa

2,505
3
15
15

174

votes

6 answers

Can a probability distribution value exceeding 1 be OK?

On the Wikipedia page about naive Bayes classifiers, there is this line: $p(\mathrm{height}|\mathrm{male}) = 1.5789$ (A probability distribution over 1 is OK. It is the area under the bell curve that is equal to 1.) How can a value $>1$ be OK? I…

probability distributions normal-distribution density-function faq

asked Nov 05 '10 at 01:25

babelproofreader

4,544
4
22
35

173

votes

4 answers

Choice of K in K-fold cross-validation

I've been using the $K$-fold cross-validation a few times now to evaluate performance of some learning algorithms, but I've always been puzzled as to how I should choose the value of $K$. I've often seen and used a value of $K = 10$, but this seems…

machine-learning classification cross-validation

asked May 04 '12 at 03:52

Charles Menguy

2,277
4
15
16

173

votes

3 answers

How does Keras 'Embedding' layer work?

Need to understand the working of 'Embedding' layer in Keras library. I execute the following code in Python import numpy as np from keras.models import Sequential from keras.layers import Embedding model = Sequential() model.add(Embedding(5, 2,…

text-mining word-embeddings keras

asked Mar 29 '17 at 12:47

prashanth

3,747
4
21
33

172

votes

7 answers

What's the difference between variance and standard deviation?

I was wondering what the difference between the variance and the standard deviation is. If you calculate the two values, it is clear that you get the standard deviation out of the variance, but what does that mean in terms of the distribution you…

variance mathematical-statistics standard-deviation

asked Aug 26 '12 at 12:31

Le Max

3,559
9
26
26

Most Popular