Highest Voted Questions - Statistical Analysis Stack Exchange

59

votes

7 answers

Industry vs Kaggle challenges. Is collecting more observations and having access to more variables more important than fancy modelling?

I'd hope the title is self explanatory. In Kaggle, most winners use stacking with sometimes hundreds of base models, to squeeze a few extra % of MSE, accuracy... In general, in your experience, how important is fancy modelling such as stacking vs…

large-data stacking collecting-data kaggle

asked Jul 10 '18 at 12:42

Tom

1,204
8
17

59

votes

5 answers

Best practice when analysing pre-post treatment-control designs

Imagine the following common design: 100 participants are randomly allocated to either a treatment or a control group the dependent variable is numeric and measured pre- and post- treatment Three obvious options for analysing such data are: Test…

anova ancova clinical-trials change-scores faq

asked Oct 10 '10 at 13:04

Jeromy Anglim

42,044
23
146
250

59

votes

7 answers

Are residuals "predicted minus actual" or "actual minus predicted"

I've seen "residuals" defined variously as being either "predicted minus actual values" or "actual minus predicted values". For illustration purposes, to show that both formulas are widely used, compare the following Web searches: residual…

residuals terminology error

asked Apr 24 '18 at 13:03

Tripartio

1,517
1
13
19

59

votes

2 answers

How should one interpret the comparison of means from different sample sizes?

Take the case of book ratings on a website. Book A is rated by 10,000 people with an average rating of 4.25 and the variance $\sigma = 0.5$. Similarly Book B is rated by 100 people and has a rating of 4.5 with $\sigma = 0.25$. Now because of the…

t-test mean sample-size rating

asked Jun 29 '12 at 01:24

PhD

13,429
19
45
47

59

votes

5 answers

Which loss function is correct for logistic regression?

I read about two versions of the loss function for logistic regression, which of them is correct and why? From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$: $$l(\beta) =…

logistic loss-functions

asked Dec 11 '16 at 17:05

xtt

724
1
6
10

59

votes

7 answers

Binary classification with strongly unbalanced classes

I have a data set in the form of (features, binary output 0 or 1), but 1 happens pretty rarely, so just by always predicting 0, I get accuracy between 70% and 90% (depending on the particular data I look at). The ML methods give me about the same…

machine-learning classification binary-data unbalanced-classes

asked Sep 19 '16 at 18:39

LazyCat

782
1
6
11

59

votes

5 answers

Apply word embeddings to entire document, to get a feature vector

How do I use a word embedding to map a document to a feature vector, suitable for use with supervised learning? A word embedding maps each word $w$ to a vector $v \in \mathbb{R}^d$, where $d$ is some not-too-large number (e.g., 500). Popular word…

classification natural-language supervised-learning word2vec word-embeddings

asked Jul 01 '16 at 17:16

D.W.

5,892
2
39
60

59

votes

1 answer

Bootstrap vs. jackknife

Both bootstrap and jackknife methods can be used to estimate bias and standard error of an estimate and mechanisms of both resampling methods are not huge different: sampling with replacement vs. leave out one observation at a time. However,…

r confidence-interval bootstrap jackknife

asked Jan 13 '12 at 03:09

Tu.2

2,627
6
26
26

59

votes

10 answers

Measuring entropy/ information/ patterns of a 2d binary matrix

I want to measure the entropy/ information density/ pattern-likeness of a two-dimensional binary matrix. Let me show some pictures for clarification: This display should have a rather high entropy: A) This should have medium entropy: B) These…

algorithms binary-data entropy pattern-recognition information-theory

asked Oct 17 '11 at 12:39

Felix S

4,432
4
26
34

59

votes

6 answers

Difference between "kernel" and "filter" in CNN

What is the difference between the terms "kernel" and "filter" in the context of convolutional neural networks?

neural-networks terminology deep-learning conv-neural-network

asked May 31 '15 at 06:19

ryguy

791
1
6
6

59

votes

4 answers

Recurrent vs Recursive Neural Networks: Which is better for NLP?

There are Recurrent Neural Networks and Recursive Neural Networks. Both are usually denoted by the same acronym: RNN. According to Wikipedia, Recurrent NN are in fact Recursive NN, but I don't really understand the explanation. Moreover, I don't…

machine-learning neural-networks deep-learning natural-language

asked May 22 '15 at 17:50

crscardellino

855
2
8
10

58

votes

6 answers

How can a distribution have infinite mean and variance?

It would be appreciated if the following examples could be given: A distribution with infinite mean and infinite variance. A distribution with infinite mean and finite variance. A distribution with finite mean and infinite variance. A distribution…

distributions variance mean

asked Mar 27 '14 at 04:49

user1205901 - Reinstate Monica

11,303
26
77
152

58

votes

6 answers

Warning in R - Chi-squared approximation may be incorrect

I have data showing fire fighter entrance exam results. I am testing the hypothesis that exam results and ethnicity are not mutually independent. To test this, I ran a Pearson chi-square test in R. The results show what I expected, but it gave a…

r categorical-data chi-squared-test small-sample error-message

asked Jan 07 '14 at 12:00

ferrelwill

683
1
5
5

58

votes

8 answers

Examples where method of moments can beat maximum likelihood in small samples?

Maximum likelihood estimators (MLE) are asymptotically efficient; we see the practical upshot in that they often do better than method of moments (MoM) estimates (when they differ), even at small sample sizes Here 'better than' means in the sense…

estimation maximum-likelihood mse method-of-moments efficiency

asked Dec 22 '13 at 23:30

Glen_b

257,508
32
553
939

58

votes

3 answers

What is the intuition behind conditional Gaussian distributions?

Suppose that $\mathbf{X} \sim N_{2}(\mathbf{\mu}, \mathbf{\Sigma})$. Then the conditional distribution of $X_1$ given that $X_2 = x_2$ is multivariate normally distributed with mean: $$ E[P(X_1 | X_2 = x_2)] =…

normal-distribution multivariate-analysis intuition

asked Sep 27 '13 at 14:37

eroeijr

581
1
5
4

Most Popular