Most Popular

1500 questions
56
votes
6 answers

What are alternatives of Gradient Descent?

Gradient Descent has a problem of getting stuck in Local Minima. We need to run gradient descent exponential times in order to find global minima. Can anybody tell me about any alternatives of gradient descent as applied in neural network learning,…
Tropa
  • 765
  • 1
  • 7
  • 13
56
votes
13 answers

Mean absolute deviation vs. standard deviation

In the text book "New Comprehensive Mathematics for O Level" by Greer (1983), I see averaged deviation calculated like this: Sum up absolute differences between single values and the mean. Then get its average. Througout the chapter the term mean…
itsols
  • 729
  • 1
  • 7
  • 8
56
votes
5 answers

How to derive the ridge regression solution?

I am having some issues with the derivation of the solution for ridge regression. I know the regression solution without the regularization term: $$\beta = (X^TX)^{-1}X^Ty.$$ But after adding the L2 term $\lambda\|\beta\|_2^2$ to the cost function,…
user34790
  • 6,049
  • 6
  • 42
  • 64
56
votes
13 answers

What are the breakthroughs in Statistics of the past 15 years?

I still remember the Annals of Statistics paper on Boosting by Friedman-Hastie-Tibshirani, and the comments on that same issues by other authors (including Freund and Schapire). At that time, clearly Boosting was viewed as a breakthrough in many…
gappy
  • 5,390
  • 3
  • 28
  • 50
56
votes
3 answers

Standard deviation of standard deviation

What is an estimator of standard deviation of standard deviation if normality of data can be assumed?
user88
56
votes
7 answers

Graph for relationship between two ordinal variables

What is an appropriate graph to illustrate the relationship between two ordinal variables? A few options I can think of: Scatter plot with added random jitter to stop points hiding each other. Apparently a standard graphic - Minitab calls this an…
Silverfish
  • 20,678
  • 23
  • 92
  • 180
56
votes
8 answers

R libraries for deep learning

I was wondering if there's any good R libraries out there for deep learning neural networks? I know there's the nnet, neuralnet, and RSNNS, but none of these seem to implement deep learning methods. I'm especially interested in unsupervised…
56
votes
5 answers

What is the difference between GARCH and ARMA?

I am confused. I don't understand the difference a ARMA and a GARCH process.. to me there are the same no ? Here is the (G)ARCH(p, q) process $$\sigma_t^2 = \underbrace{ \underbrace{ \alpha_0 + \sum_{i=1}^q \alpha_ir_{t-i}^2} …
John
  • 735
  • 1
  • 6
  • 9
56
votes
9 answers

Is it wrong to rephrase "1 in 80 deaths is caused by a car accident" as "1 in 80 people die as a result of a car accident?"

Statement One (S1): "One in 80 deaths is caused by a car accident." Statement Two (S2): "One in 80 people dies as a result of a car accident." Now, I personally don't see very much difference at all between these two statements. When writing, I…
faulty_ram_sticks
  • 671
  • 1
  • 5
  • 8
56
votes
4 answers

Random forest computing time in R

I am using the party package in R with 10,000 rows and 34 features, and some factor features have more than 300 levels. The computing time is too long. (It has taken 3 hours so far and it hasn't finished yet.) I want to know what elements have a…
Chenghao Liu
  • 721
  • 1
  • 7
  • 6
56
votes
4 answers

What should I do when my neural network doesn't generalize well?

I'm training a neural network and the training loss decreases, but the validation loss doesn't, or it decreases much less than what I would expect, based on references or experiments with very similar architectures and data. How can I fix this? As…
DeltaIV
  • 15,894
  • 4
  • 62
  • 104
56
votes
7 answers

Is there any gold standard for modeling irregularly spaced time series?

In field of economics (I think) we have ARIMA and GARCH for regularly spaced time series and Poisson, Hawkes for modeling point processes, so how about attempts for modeling irregularly (unevenly) spaced time series - are there (at least) any…
56
votes
5 answers

Regression when the OLS residuals are not normally distributed

There are several threads on this site discussing how to determine if the OLS residuals are asymptotically normally distributed. Another way to evaluate the normality of the residuals with R code is provided in this excellent answer. This is another…
56
votes
4 answers

What is the definition of a "feature map" (aka "activation map") in a convolutional neural network?

 Intro Background Within a convolutional neural network, we usually have a general structure / flow that looks like this: input image (i.e. a 2D vector x) (1st Convolutional layer (Conv1) starts here...) convolve a set of filters (w1) along the…
Atlas7
  • 663
  • 1
  • 6
  • 7
56
votes
4 answers

Regression for an outcome (ratio or fraction) between 0 and 1

I am thinking of building a model predicting a ratio $a/b$, where $a \le b$ and $a > 0$ and $b > 0$. So, the ratio would be between $0$ and $1$. I could use linear regression, although it doesn't naturally limit to 0..1. I have no reason to believe…