Most Popular
1500 questions
59
votes
7 answers
Industry vs Kaggle challenges. Is collecting more observations and having access to more variables more important than fancy modelling?
I'd hope the title is self explanatory. In Kaggle, most winners use stacking with sometimes hundreds of base models, to squeeze a few extra % of MSE, accuracy... In general, in your experience, how important is fancy modelling such as stacking vs…

Tom
- 1,204
- 8
- 17
59
votes
5 answers
Best practice when analysing pre-post treatment-control designs
Imagine the following common design:
100 participants are randomly allocated to either a treatment or a control group
the dependent variable is numeric and measured pre- and post- treatment
Three obvious options for analysing such data are:
Test…

Jeromy Anglim
- 42,044
- 23
- 146
- 250
59
votes
7 answers
Are residuals "predicted minus actual" or "actual minus predicted"
I've seen "residuals" defined variously as being either "predicted minus actual values" or "actual minus predicted values". For illustration purposes, to show that both formulas are widely used, compare the following Web searches:
residual…

Tripartio
- 1,517
- 1
- 13
- 19
59
votes
2 answers
How should one interpret the comparison of means from different sample sizes?
Take the case of book ratings on a website. Book A is rated by 10,000 people with an average rating of 4.25 and the variance $\sigma = 0.5$. Similarly Book B is rated by 100 people and has a rating of 4.5 with $\sigma = 0.25$.
Now because of the…

PhD
- 13,429
- 19
- 45
- 47
59
votes
5 answers
Which loss function is correct for logistic regression?
I read about two versions of the loss function for logistic regression, which of them is correct and why?
From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$:
$$l(\beta) =…

xtt
- 724
- 1
- 6
- 10
59
votes
7 answers
Binary classification with strongly unbalanced classes
I have a data set in the form of (features, binary output 0 or 1), but 1 happens pretty rarely, so just by always predicting 0, I get accuracy between 70% and 90% (depending on the particular data I look at). The ML methods give me about the same…

LazyCat
- 782
- 1
- 6
- 11
59
votes
5 answers
Apply word embeddings to entire document, to get a feature vector
How do I use a word embedding to map a document to a feature vector, suitable for use with supervised learning?
A word embedding maps each word $w$ to a vector $v \in \mathbb{R}^d$, where $d$ is some not-too-large number (e.g., 500). Popular word…

D.W.
- 5,892
- 2
- 39
- 60
59
votes
1 answer
Bootstrap vs. jackknife
Both bootstrap and jackknife methods can be used to estimate bias and standard error of an estimate and mechanisms of both resampling methods are not huge different: sampling with replacement vs. leave out one observation at a time. However,…

Tu.2
- 2,627
- 6
- 26
- 26
59
votes
10 answers
Measuring entropy/ information/ patterns of a 2d binary matrix
I want to measure the entropy/ information density/ pattern-likeness of a two-dimensional binary matrix. Let me show some pictures for clarification:
This display should have a rather high entropy:
A)
This should have medium entropy:
B)
These…

Felix S
- 4,432
- 4
- 26
- 34
59
votes
6 answers
Difference between "kernel" and "filter" in CNN
What is the difference between the terms "kernel" and "filter" in the context of convolutional neural networks?

ryguy
- 791
- 1
- 6
- 6
59
votes
4 answers
Recurrent vs Recursive Neural Networks: Which is better for NLP?
There are Recurrent Neural Networks and Recursive Neural Networks. Both are usually denoted by the same acronym: RNN. According to Wikipedia, Recurrent NN are in fact Recursive NN, but I don't really understand the explanation.
Moreover, I don't…

crscardellino
- 855
- 2
- 8
- 10
58
votes
6 answers
How can a distribution have infinite mean and variance?
It would be appreciated if the following examples could be given:
A distribution with infinite mean and infinite variance.
A distribution with infinite mean and finite variance.
A distribution with finite mean and infinite variance.
A distribution…

user1205901 - Reinstate Monica
- 11,303
- 26
- 77
- 152
58
votes
6 answers
Warning in R - Chi-squared approximation may be incorrect
I have data showing fire fighter entrance exam results. I am testing the hypothesis that exam results and ethnicity are not mutually independent. To test this, I ran a Pearson chi-square test in R. The results show what I expected, but it gave a…

ferrelwill
- 683
- 1
- 5
- 5
58
votes
8 answers
Examples where method of moments can beat maximum likelihood in small samples?
Maximum likelihood estimators (MLE) are asymptotically efficient; we see the practical upshot in that they often do better than method of moments (MoM) estimates (when they differ), even at small sample sizes
Here 'better than' means in the sense…

Glen_b
- 257,508
- 32
- 553
- 939
58
votes
3 answers
What is the intuition behind conditional Gaussian distributions?
Suppose that $\mathbf{X} \sim N_{2}(\mathbf{\mu}, \mathbf{\Sigma})$. Then the conditional distribution of $X_1$ given that $X_2 = x_2$ is multivariate normally distributed with mean:
$$ E[P(X_1 | X_2 = x_2)] =…

eroeijr
- 581
- 1
- 5
- 4