Highest Voted Questions - Statistical Analysis Stack Exchange

33

votes

1 answer

Why does glmnet use "naive" elastic net from the Zou & Hastie original paper?

The original elastic net paper Zou & Hastie (2005) Regularization and variable selection via the elastic net introduced elastic net loss function for linear regression (here I assume all variables are centered and scaled to unit variance):…

regression glmnet elastic-net regularization

asked Feb 02 '18 at 10:12

amoeba

93,463
28
275
317

33

votes

10 answers

How to teach students who fear statistics?

I am about to help teach statistics to medical students this semester. I've heard many horror stories about the fear of these students from learning statistics. Can anyone suggest what to do with this fear? (Either links to people who are discussing…

teaching

asked Oct 02 '10 at 17:06

Tal Galili

19,935
32
133
195

33

votes

6 answers

Why study convex optimization for theoretical machine learning?

I am working on theoretical machine learning — on transfer learning, to be specific — for my Ph.D. Out of curiosity, why should I take a course on convex optimization? What take-aways from convex optimization can I use in my research on…

machine-learning optimization convex transfer-learning

asked Jan 25 '18 at 08:23

Upendra01

1,566
4
18
28

33

votes

2 answers

What did my neural network just learn? What features does it care about and why?

A neural net learns features of a data set as a means of achieving some goal. When it is done, we may want to know what the neural net learned. What were the features and why did it care about those. Can someone give some references on the body…

neural-networks deep-learning

asked Jan 11 '18 at 17:00

user442920

533
5
14

33

votes

1 answer

If I generate a random symmetric matrix, what's the chance it is positive definite?

I got a strange question when I was experimenting some convex optimizations. The question is: Suppose I randomly (say standard normal distribution) generate a $N \times N$ symmetric matrix, (for example, I generate upper triangular matrix, and fill…

probability matrix random-generation eigenvalues random-matrix

asked Jan 08 '18 at 18:54

Haitao Du

32,885
17
118
213

33

votes

1 answer

Differences between a statistical model and a probability model?

Applied probability is an important branch in probability, including computational probability. Since statistics is using probability theory to construct models to deal with data, as my understanding, I am wondering what's the essential difference…

probability mathematical-statistics

asked Jun 23 '12 at 18:40

Honglang Wang

915
3
9
16

33

votes

4 answers

What is a manifold?

In dimensionality reduction technique such as Principal Component Analysis, LDA etc often the term manifold is used. What is a manifold in non-technical term? If a point $x$ belongs to a sphere whose dimension I want to reduce, and if there is a…

terminology manifold-learning

asked Jul 08 '17 at 06:13

Ria George

1,375
2
14
31

33

votes

3 answers

Why is max pooling necessary in convolutional neural networks?

Most common convolutional neural networks contains pooling layers to reduce the dimensions of output features. Why couldn't I achieve the same thing by simply increase the stride of the convolutional layer? What makes the pooling layer necessary?

deep-learning conv-neural-network pooling

asked Jul 01 '17 at 01:35

user3667089

443
1
4
6

33

votes

5 answers

Is an overfitted model necessarily useless?

Assume that a model has 100% accuracy on the training data, but 70% accuracy on the test data. Is the following argument true about this model? It is obvious that this is an overfitted model. The test accuracy can be enhanced by reducing the…

model accuracy overfitting

asked May 11 '17 at 06:18

Hossein

3,170
1
16
32

33

votes

4 answers

Maximum Mean Discrepancy (distance distribution)

I have two data sets (source and target data) which follow different distributions. I am using MMD - that is a non-parametric distribution distance - to compute marginal distribution between the source and target data. source data, Xs target data,…

machine-learning distributions distance feature-engineering domain-adaptation

asked Apr 28 '17 at 15:45

Mahsa

431
1
5
5

33

votes

4 answers

What is the fiducial argument and why has it not been accepted?

One of the late contributions of R.A. Fisher was fiducial intervals and fiducial principled arguments. This approach however is nowhere near as popular as frequentist or Bayesian principled arguments. What is the fiducial argument and why has is…

inference philosophical fiducial

asked Apr 24 '12 at 06:53

JohnRos

5,336
26
56

33

votes

4 answers

What exactly is the difference between a parametric and non-parametric model?

I am confused with the definition of non-parametric model after reading this link Parametric vs Nonparametric Models and Answer comments of my another question. Originally I thought "parametric vs non-parametric" means if we have distribution…

machine-learning neural-networks terminology nonparametric

asked Mar 20 '17 at 13:54

Haitao Du

32,885
17
118
213

33

votes

3 answers

How to do logistic regression in R when outcome is fractional (a ratio of two counts)?

I'm reviewing a paper which has the following biological experiment. A device is used to expose cells to varying amounts of fluid shear stress. As greater shear stress is applied to the cells, more of them start to detach from the substrate. At each…

r logistic multinomial-distribution

asked Apr 19 '12 at 14:46

thecity2

1,485
2
15
22

33

votes

3 answers

How can the regression error term ever be correlated with the explanatory variables?

The first sentence of this wiki page claims that "In econometrics, an endogeneity problem occurs when an explanatory variable is correlated with the error term.1 " My question is that how can this ever happen? Isn't regression beta chosen such that…

regression

asked Feb 22 '17 at 00:27

denizen of the north

848
1
8
15

33

votes

1 answer

What are some useful guidelines for GBM parameters?

What are some useful guidelines for testing parameters (i.e. interaction depth, minchild, sample rate, etc.) using GBM? Let's say I have 70-100 features, a population of 200,000 and I intend to test interaction depth of 3 and 4. Clearly I need to do…

r hypothesis-testing cart boosting

asked Apr 03 '12 at 03:27

Ram Ahluwalia

3,003
6
27
38

Most Popular