Most Popular

1500 questions
35
votes
4 answers

Feature map for the Gaussian kernel

In SVM, the Gaussian kernel is defined as: $$K(x,y)=\exp\left({-\frac{\|x-y\|_2^2}{2\sigma^2}}\right)=\phi(x)^T\phi(y)$$ where $x, y\in \mathbb{R^n}$. I do not know the explicit equation of $\phi$. I want to know it. I also want to know…
Vivian
  • 715
  • 2
  • 7
  • 12
35
votes
1 answer

How are the standard errors computed for the fitted values from a logistic regression?

When you predict a fitted value from a logistic regression model, how are standard errors computed? I mean for the fitted values, not for the coefficients (which involves Fishers information matrix). I only found out how to get the numbers with R…
35
votes
6 answers

How do I calculate a weighted standard deviation? In Excel?

So, I have a data set of percentages like so: 100 / 10000 = 1% (0.01) 2 / 5 = 40% (0.4) 4 / 3 = 133% (1.3) 1000 / 2000 = 50% (0.5) I want to find the standard deviation of the percentages, but weighted for their…
Yahel
  • 555
  • 3
  • 9
  • 11
35
votes
5 answers

How to translate the results from lm() to an equation?

We can use lm() to predict a value, but we still need the equation of the result formula in some cases. For example, add the equation to plots.
user27736
  • 359
  • 1
  • 3
  • 4
35
votes
2 answers

How are Bayesian Priors Decided in Real Life?

I always had the following question: How are Bayesian Priors decided in real life? I created the following scenario to pose my question: Suppose you are researcher and you are interested in studying if the age of a giraffe can be predicted by the…
stats_noob
  • 5,882
  • 1
  • 21
  • 42
35
votes
1 answer

Reference: who introduced the tilde "~" notation to mean "has probability distribution..."?

[Note: although this question has an accepted answer, the investigation is not finished yet. I encourage you to post your findings.] Who first introduced the notation "$X \sim Q$", meaning that $Q$ is the probability distribution for $X$, and its…
pglpm
  • 1,175
  • 7
  • 18
35
votes
3 answers

Can AIC compare across different types of model?

I'm using AIC (Akaike's Information Criterion) to compare non-linear models in R. Is it valid to compare the AICs of different types of model? Specifically, I'm comparing a model fitted by glm versus a model with a random effect term fitted by glmer…
Thomas K
  • 453
  • 1
  • 4
  • 5
35
votes
1 answer

Maximum likelihood estimators for a truncated distribution

Consider $N$ independent samples $S$ obtained from a random variable $X$ that is assumed to follow a truncated distribution (e.g. a truncated normal distribution) of known (finite) minimum and maximum values $a$ and $b$ but of unknown parameters…
35
votes
3 answers

Gradient of Hinge loss

I'm trying to implement basic gradient descent and I'm testing it with a hinge loss function i.e. $l_{\text{hinge}} = \max(0,1-y\ \boldsymbol{x}\cdot\boldsymbol{w})$. However, I'm confused about the gradient of the hinge loss. I'm under the…
brcs
  • 513
  • 1
  • 5
  • 8
35
votes
3 answers

Are the digits of $\pi$ statistically random?

Suppose you observe the sequence: 7, 9, 0, 5, 5, 5, 4, 8, 0, 6, 9, 5, 3, 8, 7, 8, 5, 4, 0, 0, 6, 6, 4, 5 , 3, 3, 7, 5, 9, 8, 1, 8, 6, 2, 8, 4, 6, 4, 1, 9, 9, 0, 5, 2, 2, 0, 4, 5, 2, 8 ... What statistically tests would you apply to determine if this…
Cam.Davidson.Pilon
  • 11,476
  • 5
  • 47
  • 75
35
votes
3 answers

Kernel logistic regression vs SVM

As is known to all, SVM can use kernel method to project data points in higher spaces so that points can be separated by a linear space. But we can also use logistic regression to choose this boundary in the kernel space, so what's the advantages…
FindBoat
  • 741
  • 1
  • 8
  • 6
35
votes
3 answers

How to calculate pooled variance of two or more groups given known group variances, means, and sample sizes?

Say there are $m+n$ elements split into two groups ($m$ and $n$). The variance of the first group is $\sigma_m^2$ and the variance of the second group is $\sigma^2_n$. The elements themselves are assumed to be unknown but I know the means $\mu_m$…
user1809989
  • 453
  • 1
  • 4
  • 4
35
votes
5 answers

Measuring the "distance" between two multivariate distributions

I'm looking for some good terminology to describe what I'm trying to do, to make it easier to look for resources. So, say I have two clusters of points A and B, each associated to two values, X and Y, and I want to measure the "distance" between A…
Emile
  • 1,057
  • 1
  • 10
  • 16
35
votes
5 answers

What if my linear regression data contains several co-mingled linear relationships?

Let's say I am studying how daffodils respond to various soil conditions. I have collected data on the pH of the soil versus the mature height of the daffodil. I'm expecting a linear relationship, so I go about running a linear…
SlowMagic
  • 613
  • 6
  • 9
35
votes
5 answers

How to handle a "self defeating" prediction model?

I was watching a presentation by an ML specialist from a major retailer, where they had developed a model to predict out of stock events. Let's assume for a moment that over time, their model becomes very accurate, wouldn't that somehow be…
Skander H.
  • 10,602
  • 2
  • 33
  • 81