Highest Voted Questions - Statistical Analysis Stack Exchange

55

votes

6 answers

Why on average does each bootstrap sample contain roughly two thirds of observations?

I have run across the assertion that each bootstrap sample (or bagged tree) will contain on average approximately $2/3$ of the observations. I understand that the chance of not being selected in any of $n$ draws from $n$ samples with replacement is…

bootstrap

asked Mar 06 '14 at 01:43

xyzzy

823
2
8
7

55

votes

4 answers

How to visualize a fitted multiple regression model?

I am currently writing a paper with several multiple regression analyses. While visualizing univariate linear regression is easy via scatter plots, I was wondering whether there is any good way to visualize multiple linear regressions? I am…

regression multiple-regression data-visualization reporting

asked Oct 20 '13 at 21:46

Shawn Wang

1,245
3
12
12

55

votes

5 answers

Using deep learning for time series prediction

I'm new in area of deep learning and for me first step was to read interesting articles from deeplearning.net site. In papers about deep learning, Hinton and others mostly talk about applying it to image problems. Can someone try to answer me can it…

time-series machine-learning prediction deep-learning deep-belief-networks

asked Aug 29 '13 at 11:37

Vedran

651
1
6
4

55

votes

4 answers

How to identify a bimodal distribution?

I understand that once we plot the values as a chart, we can identify a bimodal distribution by observing the twin-peaks, but how does one find it programmatically? (I am looking for an algorithm.)

distributions

asked Jan 04 '11 at 13:03

venkasub

683
1
6
7

55

votes

11 answers

Is there a 1 in 20 or 1 in 400 chance of guessing the outcome of a d20 roll before it happens?

My friends are in a bit of an argument over Dungeons & Dragons. My player managed to guess the outcome of a D20 roll before it happened, and my friend said that his chance of guessing the number was 1 in 20. Another friend argues that his chance of…

probability dice

asked Nov 03 '21 at 17:18

Theguy Whatguys

633
2
7

55

votes

4 answers

How do we decide when a small sample is statistically significant or not?

Sorry if the title isn't clear, I'm not a statistician, and am not sure how to phrase this. I was looking at the global coronavirus statistics on worldometers, and sorted the table by cases per million population to get an idea of how different…

statistical-significance population

asked Oct 26 '20 at 20:11

Avrohom Yisroel

673
5
7

55

votes

19 answers

Mathematical Statistics Videos

A question previously sought recommendations for textbooks on mathematical statistics Does anyone know of any good online video lectures on mathematical statistics? The closest that I've found are: Machine Learning Econometrics UPDATE: A number…

mathematical-statistics references

asked Jul 22 '10 at 10:08

Jeromy Anglim

42,044
23
146
250

55

votes

12 answers

Is the COVID-19 pandemic curve a Gaussian curve?

We've all heard a lot about "flattening the curve". I was wondering if these curve – that look like bells – can be qualified as Gaussian despite the fact that there is a temporal dimension.

normal-distribution spatio-temporal epidemic-curve

asked Mar 22 '20 at 15:14

Samos

804
1
8
17

55

votes

8 answers

Is sampling relevant in the time of 'big data'?

Or more so "will it be"? Big Data makes statistics and relevant knowledge all the more important but seems to underplay Sampling Theory. I've seen this hype around 'Big Data' and can't help wonder that "why" would I want to analyze everything?…

sampling data-mining large-data

asked Sep 09 '12 at 19:58

PhD

13,429
19
45
47

55

votes

4 answers

Normalization vs. scaling

What is the difference between data 'Normalization' and data 'Scaling'? Till now I thought both terms refers to same process but now I realize there is something more that I don't know/understand. Also if there is a difference between Normalization…

data-transformation scales normality-assumption normalization

asked Sep 03 '12 at 08:56

d.putto

901
2
10
13

55

votes

3 answers

Where does the misconception that Y must be normally distributed come from?

Seemingly reputable sources claim that the dependent variable must be normally distributed: Model assumptions: $Y$ is normally distributed, errors are normally distributed, $e_i \sim N(0,\sigma^2)$, and independent, and $X$ is fixed, and …

regression least-squares linear-model dependent-variable

asked Apr 25 '18 at 20:14

colorlace

1,010
11
25

55

votes

6 answers

How to determine best cutoff point and its confidence interval using ROC curve in R?

I have the data of a test that could be used to distinguish normal and tumor cells. According to ROC curve it looks good for this purpose (area under curve is 0.9): My questions are: How to determine cutoff point for this test and its confidence…

r data-visualization confidence-interval roc ggplot2

asked Jun 03 '12 at 11:07

Yuriy Petrovskiy

4,081
7
25
30

55

votes

7 answers

Best PCA algorithm for huge number of features (>10K)?

I previously asked this on StackOverflow, but it seems like it might be more appropriate here, given that it didn't get any answers on SO. It's kind of at the intersection between statistics and programming. I need to write some code to do PCA…

pca algorithms model-evaluation high-dimensional

asked Sep 18 '10 at 02:08

dsimcha

7,375
7
32
29

55

votes

5 answers

Statistical inference when the sample "is" the population

Imagine you have to do reporting on the numbers of candidates who yearly take a given test. It seems rather difficult to infer the observed % of success, for instance, on a wider population due to the specifity of the target population. So you may…

hypothesis-testing population sampling

asked Sep 13 '10 at 18:35

pbneau

1,161
4
13
17

55

votes

3 answers

How to select a clustering method? How to validate a cluster solution (to warrant the method choice)?

One of the biggest issue with cluster analysis is that we may happen to have to derive different conclusion when base on different clustering methods used (including different linkage methods in hierarchical clustering). I would like to know your…

clustering validation model-evaluation hierarchical-clustering

asked Feb 13 '16 at 23:19

Learner

789
1
7
16

Most Popular