Most Popular
1500 questions
38
votes
2 answers
A fair die is rolled 1,000 times. What is the probability of rolling the same number 5 times in a row?
A fair die is rolled 1,000 times. What is the probability of rolling the same number 5 times in a row? How do you solve this type of question for variable number of throws and number of repeats?

Teodor Dyakov
- 615
- 5
- 9
38
votes
3 answers
What percentage of a population needs a test in order to estimate prevalence of a disease? Say, COVID-19
A group of us got to discussing what percentage of a population needs to be tested for COVID-19 in order to estimate the true prevalence of the disease. It got complicated, and we ended the night (over zoom) arguing about signal detection and…

Industrademic
- 501
- 5
- 6
38
votes
4 answers
Why squared residuals instead of absolute residuals in OLS estimation?
Why are we using the squared residuals instead of the absolute residuals in OLS estimation?
My idea was that we use the square of the error values, so that residuals below the fitted line (which are then negative), would still have to be able to be…

PascalVKooten
- 2,127
- 5
- 22
- 34
38
votes
2 answers
Why is the Dirichlet distribution the prior for the multinomial distribution?
In LDA topic model algorithm, I saw this assumption. But I don't know why chose Dirichlet distribution? I don't know if we can use Uniform distribution over Multinomial as a pair?

ColinBinWang
- 535
- 1
- 5
- 5
38
votes
4 answers
Simple way to algorithmically identify a spike in recorded errors
We need an early warning system. I am dealing with a server that is known to have performance issues under load. Errors are recorded in a database along with a timestamp. There are some manual intervention steps that can be taken to decrease the…

dbenton
- 383
- 1
- 4
- 5
38
votes
4 answers
Is a strong background in maths a total requisite for ML?
I'm starting to want to advance my own skillset and I've always been fascinated by machine learning. However, six years ago instead of pursuing this I decided to take a completely unrelated degree to computer science.
I have been developing…

Layke
- 503
- 1
- 6
- 8
38
votes
7 answers
Can cross validation be used for causal inference?
In all contexts I am familiar with cross-validation it is solely used with the goal of increasing predictive accuracy. Can the logic of cross validation be extended in estimating the unbiased relationships between variables?
While this paper by…

Andy W
- 15,245
- 8
- 69
- 191
38
votes
2 answers
What is the difference between the Shapiro–Wilk test of normality and the Kolmogorov–Smirnov test of normality?
What is the difference between the Shapiro–Wilk test of normality and the Kolmogorov–Smirnov test of normality? When will results from these two methods differ?

russellpierce
- 17,079
- 16
- 67
- 98
38
votes
3 answers
How to prove that the radial basis function is a kernel?
How to prove that the radial basis function $k(x, y) = \exp(-\frac{||x-y||^2)}{2\sigma^2})$ is a kernel? As far as I understand, in order to prove this we have to prove either of the following:
For any set of vectors $x_1, x_2, ..., x_n$ matrix…

Leo
- 2,484
- 3
- 22
- 29
38
votes
6 answers
If a credible interval has a flat prior, is a 95% confidence interval equal to a 95% credible interval?
I'm very new to Bayesian statistics, and this may be a silly question. Nevertheless:
Consider a credible interval with a prior that specifies a uniform distribution. For example, from 0 to 1, where 0 to 1 represents the full range of possible values…

pomodoro
- 723
- 5
- 15
38
votes
3 answers
Maximum Likelihood Estimators - Multivariate Gaussian
Context
The Multivariate Gaussian appears frequently in Machine Learning and the following results are used in many ML books and courses without the derivations.
Given data in form of a matrix $\mathbf{X} $ of dimensions
$ m \times p$, if we…

Xavier Bourret Sicotte
- 7,986
- 3
- 40
- 72
38
votes
7 answers
Why doesn't regularization solve Deep Neural Nets hunger for data?
An issue I've seen frequently brought up in the context of Neural Networks in general, and Deep Neural Networks in particular, is that they're "data hungry" - that is they don't perform well unless we have a large data set with which to train the…

Skander H.
- 10,602
- 2
- 33
- 81
38
votes
1 answer
What is the derivative of the ReLU activation function?
What is the derivative of the ReLU activation function defined as:
$$ \mathrm{ReLU}(x) = \mathrm{max}(0, x)$$
What about the special case where there is a discontinuity in the function at $x=0$?

Tom Hale
- 2,231
- 3
- 13
- 31
38
votes
14 answers
What is the intuition behind the formula for conditional probability?
The formula for the conditional probability of $\text{A}$ happening given that $\text{B}$ has happened is:$$
P\left(\text{A}~\middle|~\text{B}\right)=\frac{P\left(\text{A} \cap \text{B}\right)}{P\left(\text{B}\right)}.
$$
My textbook explains the…

WorldGov
- 705
- 7
- 14
38
votes
3 answers
Mode, Class and Type of R objects
I was wondering what are the differences between Mode, Class and Type of R objects?
Type of a R object can be obtained by typeof() function, mode by mode(), and class by class().
Also any other similar functions and concepts that I missed?
Thanks…

Tim
- 1
- 29
- 102
- 189