Most Popular

1500 questions
38
votes
6 answers

What is the connection between credible regions and Bayesian hypothesis tests?

In frequentist statistics, there is a close connection between confidence intervals and tests. Using inference about $\mu$ in the $\rm N(\mu,\sigma^2)$ distribution as an example, the $1-\alpha$ confidence interval $$\bar{x}\pm…
38
votes
4 answers

What is the difference between a stationary test and a unit root test?

What is the difference between the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test and the augmented Dickey-Fuller (ADF) test? Are they testing the same thing? Or do we need to use them in different situations?
38
votes
5 answers

Clustering a dataset with both discrete and continuous variables

I have a dataset X which has 10 dimensions, 4 of which are discrete values. In fact, those 4 discrete variables are ordinal, i.e. a higher value implies a higher/better semantic. 2 of these discrete variables are categorical in the sense that for…
38
votes
5 answers

How is the cost function from Logistic Regression differentiated

I am doing the Machine Learning Stanford course on Coursera. In the chapter on Logistic Regression, the cost function is this: Then, it is differentiated here: I tried getting the derivative of the cost function, but I got something completely…
octavian
  • 909
  • 2
  • 11
  • 18
38
votes
3 answers

Understanding input_shape parameter in LSTM with Keras

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of my data. I have as input a matrix of sequences of…
mazieres
  • 597
  • 1
  • 5
  • 9
38
votes
10 answers

Why are survival times assumed to be exponentially distributed?

I am learning survival analysis from this post on UCLA IDRE and got tripped up at section 1.2.1. The tutorial says: ... if the survival times were known to be exponentially distributed, then the probability of observing a survival time ... Why…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
38
votes
4 answers

Is there any supervised-learning problem that (deep) neural networks obviously couldn't outperform any other methods?

I have seen people have put a lot of efforts on SVM and Kernels, and they look pretty interesting as a starter in Machine Learning. But if we expect that almost-always we could find outperforming solution in terms of (deep) Neural Network, what is…
Robin
  • 585
  • 1
  • 6
  • 9
38
votes
2 answers

Are mixed models useful as predictive models?

I am a bit confused about advantages of mixed models in regard to predictive modelling. Since predictive models are usually meant to predict values of previously unknown observations then it seems obvious to me that the only way a mixed model may be…
sztal
  • 1,009
  • 1
  • 9
  • 14
38
votes
5 answers

Estimating same model over multiple time series

I have a novice background in time series (some ARIMA estimation/forecasting) and am facing a problem I don't fully understand. Any help would be greatly appreciated. I am analyzing multiple time series, all over the same time interval and all of…
sparc_spread
  • 755
  • 1
  • 8
  • 18
38
votes
4 answers

Why use colormap viridis over jet?

As announced in https://www.youtube.com/watch?v=xAoljeRJ3lU, Matplotlib changes the default colormap from jet to viridis. However, I don't understand it pretty well. Maybe because I'm color blind? The original colormap jet looks very strong, I can…
cqcn1991
  • 1,145
  • 1
  • 10
  • 16
38
votes
7 answers

How to deal with hierarchical / nested data in machine learning

I'll explain my problem with an example. Suppose you want to predict the income of an individual given some attributes: {Age, Gender, Country, Region, City}. You have a training dataset like so train <- data.frame(CountryID=c(1,1,1,1, 2,2,2,2,…
Ben
  • 1,612
  • 3
  • 17
  • 30
38
votes
3 answers

Meaning (and proof) of "RNN can approximate any algorithm"

Recently I read that a recurrent neural network can approximate any algorithm. So my question is: what does this exactly mean and can you give me a reference where this is proved?
user3726947
  • 483
  • 1
  • 5
  • 6
38
votes
3 answers

What does the Akaike Information Criterion (AIC) score of a model mean?

I have seen some questions here about what it means in layman terms, but these are too layman for for my purpose here. I am trying to mathematically understand what does the AIC score mean. But at the same time, I don't want a rigor proof that…
caveman
  • 2,431
  • 1
  • 16
  • 32
38
votes
3 answers

How do bottleneck architectures work in neural networks?

We define a bottleneck architecture as the type found in the ResNet paper where [two 3x3 conv layers] are replaced by [one 1x1 conv, one 3x3 conv, and another 1x1 conv layer]. I understand that the 1x1 conv layers are used as a form of dimension…
derekchen14
  • 545
  • 1
  • 5
  • 8
38
votes
5 answers

Is there an explanation for why there are so many natural phenomena that follow normal distribution?

I think this is a fascinating topic and I do not fully understand it. What law of physics makes so that so many natural phenomena have normal distribution? It would seem more intuitive that they would have uniform distribution. It is so hard for me…