Most Popular

1500 questions
46
votes
6 answers

Why don't linear regression assumptions matter in machine learning?

When I learned linear regression in my statistics class, we are asked to check for a few assumptions which need to be true for linear regression to make sense. I won't delve deep into those assumptions, however, these assumptions don't appear when…
46
votes
3 answers

Empirical relationship between mean, median and mode

For a unimodal distribution that is moderately skewed, we have the following empirical relationship between the mean, median and mode: $$ \text{(Mean - Mode)}\sim 3\,\text{(Mean - Median)} $$ How was this relationship derived? Did Karl Pearson…
46
votes
10 answers

How to plot trends properly

I am creating a graph to show trends in death rates (per 1000 ppl.) in different countries and the story that should come from the plot is that Germany (light blue line) is the only one whose trend is increasing after 1932. This is my first (basic)…
PhDing
  • 2,470
  • 6
  • 32
  • 57
46
votes
1 answer

How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?

Specifically, I suppose I wonder about this statement: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…
46
votes
8 answers

How to do community detection in a weighted social network/graph?

I'm wondering if someone could suggest what are good starting points when it comes to performing community detection/graph partitioning/clustering on a graph that has weighted, undirected edges. The graph in question has approximately 3 million…
46
votes
1 answer

Variational inference versus MCMC: when to choose one over the other?

I think I get the general idea of both VI and MCMC including the various flavors of MCMC like Gibbs sampling, Metropolis Hastings etc. This paper provides a wonderful exposition of both methods. I have the following questions: If I wish to do…
46
votes
2 answers

How do you do bootstrapping with time series data?

I recently learned about using bootstrapping techniques to calculate standard errors and confidence intervals for estimators. What I learned was that if the data is IID, you can treat the sample data as the population, and do sampling with…
statnub
  • 741
  • 2
  • 7
  • 6
46
votes
6 answers

Why do we need multivariate regression (as opposed to a bunch of univariate regressions)?

I just browsed through this wonderful book: Applied multivariate statistical analysis by Johnson and Wichern. The irony is, I am still not able to understand the motivation for using multivariate (regression) models instead of separate univariate…
46
votes
4 answers

What references should be cited to support using 30 as a large enough sample size?

I have read/heard many times that the sample size of at least 30 units is considered as "large sample" (normality assumptions of means usually approximately holds due to the CLT, ...). Therefore, in my experiments, I usually generate samples of 30…
46
votes
10 answers

Deriving Bellman's Equation in Reinforcement Learning

I see the following equation in "In Reinforcement Learning. An Introduction", but don't quite follow the step I have highlighted in blue below. How exactly is this step derived?
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
46
votes
6 answers

Does the reciprocal of a probability represent anything?

I was wondering if the reciprocal of P(X = 1) represents anything in particular?
A. Fleming
  • 561
  • 4
  • 4
46
votes
7 answers

How to choose between ROC AUC and F1 score?

I recently completed a Kaggle competition in which roc auc score was used as per competition requirement. Before this project, I normally used f1 score as the metric to measure model performance. Going forward, I wonder how should I choose between…
George Liu
  • 653
  • 2
  • 7
  • 15
46
votes
3 answers

How are Random Forests not sensitive to outliers?

I've read in a few sources, including this one, that Random Forests are not sensitive to outliers (in the way that Logistic Regression and other ML methods are, for example). However, two pieces of intuition tell me otherwise: Whenever a decision…
makansij
  • 1,919
  • 5
  • 27
  • 38
46
votes
2 answers

Poisson regression to estimate relative risk for binary outcomes

Brief Summary Why is it more common for logistic regression (with odds ratios) to be used in cohort studies with binary outcomes, as opposed to Poisson regression (with relative risks)? Background Undergraduate and graduate statistics and…
46
votes
1 answer

What is the difference between Metropolis-Hastings, Gibbs, Importance, and Rejection sampling?

I have been trying to learn MCMC methods and have come across Metropolis-Hastings, Gibbs, Importance, and Rejection sampling. While some of these differences are obvious, i.e., how Gibbs is a special case of Metropolis-Hastings when we have the full…