Most Popular
1500 questions
35
votes
3 answers
Why does minimizing the MAE lead to forecasting the median and not the mean?
From the Forecasting: Principles and Practice textbook by Rob J Hyndman and George Athanasopoulos, specifically the section on accuracy measurement:
A forecast method that minimizes the MAE will lead to forecasts of the
median, while minimizing…

Brans Ds
- 1,192
- 1
- 10
- 16
35
votes
3 answers
What exactly is a seed in a random number generator?
I tried some usual google search etc. but most of the answers I find are either somewhat ambiguous or language/library specific such as Python or C++ stdlib.h etc. I am looking for a language agnostic, mathematical answer, not the specifics of a…

Della
- 453
- 1
- 4
- 6
35
votes
1 answer
Why KL divergence is non-negative?
Why is KL divergence non-negative?
From the perspective of information theory, I have such an intuitive understanding:
Say there are two ensembles $A$ and $B$ which are composed of the same set of elements labeled by $x$. $p(x)$ and $q(x)$ are…

meTchaikovsky
- 1,414
- 1
- 9
- 23
35
votes
3 answers
Training loss increases with time
I am training a model (Recurrent Neural Network) to classify 4 types of sequences. As I run my training I see the training loss going down until the point where I correctly classify over 90% of the samples in my training batches. However a couple of…

dins2018
- 353
- 1
- 3
- 5
35
votes
4 answers
How is Poisson distribution different to normal distribution?
I have generated a vector which has a Poisson distribution, as follows:
x = rpois(1000,10)
If I make a histogram using hist(x), the distribution looks like a the familiar bell-shaped normal distribution. However, a the Kolmogorov-Smirnoff test…

luciano
- 12,197
- 30
- 87
- 119
35
votes
5 answers
How do I use the SVD in collaborative filtering?
I'm a bit confused with how the SVD is used in collaborative filtering. Suppose I have a social graph, and I build an adjacency matrix from the edges, then take an SVD (let's forget about regularization, learning rates, sparsity optimizations, etc),…

Vishal
- 1,101
- 3
- 12
- 19
35
votes
5 answers
Why is Poisson regression used for count data?
I understand that for certain datasets such as voting it performs better. Why is Poisson regression used over ordinary linear regression or logistic regression? What is the mathematical motivation for it?

zaxtax
- 523
- 1
- 5
- 8
35
votes
6 answers
Sampling for Imbalanced Data in Regression
There have been good questions on handling imbalanced data in the classification context, but I am wondering what people do to sample for regression.
Say the problem domain is very sensitive to the sign but only somewhat sensitive to the magnitude…

someben
- 738
- 1
- 6
- 11
35
votes
5 answers
Can you overfit by training machine learning algorithms using CV/Bootstrap?
This question may well be too open-ended to get a definitive answer, but hopefully not.
Machine learning algorithms, such as SVM, GBM, Random Forest etc, generally have some free parameters that, beyond some rule of thumb guidance, need to be tuned…

Bogdanovist
- 6,059
- 1
- 23
- 28
35
votes
2 answers
scale a number between a range
I have been trying to achieve a system which can scale a number down and in between two ranges. I have been stuck with the mathematical part of it.
What im thinking is lets say number 200 to be normalized so it falls between a range lets say 0 to…

Saneesh B
- 453
- 1
- 5
- 5
35
votes
7 answers
How to generate numbers based on an arbitrary discrete distribution?
How do I generate numbers based on an arbitrary discrete distribution?
For example, I have a set of numbers that I want to generate. Say they are labelled from 1-3 as follows.
1: 4%, 2: 50%, 3: 46%
Basically, the percentages are probabilities that…

FurtiveFelon
- 521
- 2
- 5
- 6
35
votes
2 answers
Calculate Transition Matrix (Markov) in R
Is there a way in R (a built-in function) to calculate the transition matrix for a Markov Chain from a set of observations?
For example, taking a data set like the following and calculate the first order transition…

B_Miner
- 7,560
- 20
- 81
- 144
35
votes
5 answers
Can SVM do stream learning one example at a time?
I have a streaming data set, examples are available one at a time. I would need to do multi class classification on them. As soon as I fed a training example to the learning process, I have to discard the example. Concurrently, I'm also using the…

siamii
- 1,767
- 5
- 21
- 29
35
votes
8 answers
What is a standard deviation?
What is a standard deviation, how is it calculated and what is its use in statistics?

Oren Hizkiya
- 851
- 2
- 11
- 10
35
votes
6 answers
Changing the scale of a variable to 0-100
I have constructed a social capital index using PCA technique. This index comprises values both positive and negative. I want to transform / convert this index to 0-100 scale to make it easy to interpret. Please suggest me an easiest way to do so.

Sohail Akram
- 359
- 1
- 4
- 3