Most Popular

1500 questions
33
votes
1 answer

Why does glmnet use "naive" elastic net from the Zou & Hastie original paper?

The original elastic net paper Zou & Hastie (2005) Regularization and variable selection via the elastic net introduced elastic net loss function for linear regression (here I assume all variables are centered and scaled to unit variance):…
amoeba
  • 93,463
  • 28
  • 275
  • 317
33
votes
10 answers

How to teach students who fear statistics?

I am about to help teach statistics to medical students this semester. I've heard many horror stories about the fear of these students from learning statistics. Can anyone suggest what to do with this fear? (Either links to people who are discussing…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
33
votes
6 answers

Why study convex optimization for theoretical machine learning?

I am working on theoretical machine learning — on transfer learning, to be specific — for my Ph.D. Out of curiosity, why should I take a course on convex optimization? What take-aways from convex optimization can I use in my research on…
Upendra01
  • 1,566
  • 4
  • 18
  • 28
33
votes
2 answers

What did my neural network just learn? What features does it care about and why?

A neural net learns features of a data set as a means of achieving some goal. When it is done, we may want to know what the neural net learned. What were the features and why did it care about those. Can someone give some references on the body…
user442920
  • 533
  • 5
  • 14
33
votes
1 answer

If I generate a random symmetric matrix, what's the chance it is positive definite?

I got a strange question when I was experimenting some convex optimizations. The question is: Suppose I randomly (say standard normal distribution) generate a $N \times N$ symmetric matrix, (for example, I generate upper triangular matrix, and fill…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
33
votes
1 answer

Differences between a statistical model and a probability model?

Applied probability is an important branch in probability, including computational probability. Since statistics is using probability theory to construct models to deal with data, as my understanding, I am wondering what's the essential difference…
Honglang Wang
  • 915
  • 3
  • 9
  • 16
33
votes
4 answers

What is a manifold?

In dimensionality reduction technique such as Principal Component Analysis, LDA etc often the term manifold is used. What is a manifold in non-technical term? If a point $x$ belongs to a sphere whose dimension I want to reduce, and if there is a…
Ria George
  • 1,375
  • 2
  • 14
  • 31
33
votes
3 answers

Why is max pooling necessary in convolutional neural networks?

Most common convolutional neural networks contains pooling layers to reduce the dimensions of output features. Why couldn't I achieve the same thing by simply increase the stride of the convolutional layer? What makes the pooling layer necessary?
user3667089
  • 443
  • 1
  • 4
  • 6
33
votes
5 answers

Is an overfitted model necessarily useless?

Assume that a model has 100% accuracy on the training data, but 70% accuracy on the test data. Is the following argument true about this model? It is obvious that this is an overfitted model. The test accuracy can be enhanced by reducing the…
Hossein
  • 3,170
  • 1
  • 16
  • 32
33
votes
4 answers

Maximum Mean Discrepancy (distance distribution)

I have two data sets (source and target data) which follow different distributions. I am using MMD - that is a non-parametric distribution distance - to compute marginal distribution between the source and target data. source data, Xs target data,…
33
votes
4 answers

What is the fiducial argument and why has it not been accepted?

One of the late contributions of R.A. Fisher was fiducial intervals and fiducial principled arguments. This approach however is nowhere near as popular as frequentist or Bayesian principled arguments. What is the fiducial argument and why has is…
JohnRos
  • 5,336
  • 26
  • 56
33
votes
4 answers

What exactly is the difference between a parametric and non-parametric model?

I am confused with the definition of non-parametric model after reading this link Parametric vs Nonparametric Models and Answer comments of my another question. Originally I thought "parametric vs non-parametric" means if we have distribution…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
33
votes
3 answers

How to do logistic regression in R when outcome is fractional (a ratio of two counts)?

I'm reviewing a paper which has the following biological experiment. A device is used to expose cells to varying amounts of fluid shear stress. As greater shear stress is applied to the cells, more of them start to detach from the substrate. At each…
thecity2
  • 1,485
  • 2
  • 15
  • 22
33
votes
3 answers

How can the regression error term ever be correlated with the explanatory variables?

The first sentence of this wiki page claims that "In econometrics, an endogeneity problem occurs when an explanatory variable is correlated with the error term.1 " My question is that how can this ever happen? Isn't regression beta chosen such that…
33
votes
1 answer

What are some useful guidelines for GBM parameters?

What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM? Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…
Ram Ahluwalia
  • 3,003
  • 6
  • 27
  • 38