Most Popular

1500 questions
270
votes
6 answers

What is batch size in neural network?

I'm using Python Keras package for neural network. This is the link. Is batch_size equals to number of test samples? From Wikipedia we have this information: However, in other cases, evaluating the sum-gradient may require expensive evaluations…
user2991243
  • 3,621
  • 4
  • 22
  • 48
269
votes
6 answers

Is $R^2$ useful or dangerous?

I was skimming through some lecture notes by Cosma Shalizi (in particular, section 2.1.1 of the second lecture), and was reminded that you can get very low $R^2$ even when you have a completely linear model. To paraphrase Shalizi's example: suppose…
raegtin
  • 9,090
  • 12
  • 48
  • 53
265
votes
13 answers

Is there any reason to prefer the AIC or BIC over the other?

The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a preference based on the stringency of the criteria,…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
262
votes
10 answers

How would you explain covariance to someone who understands only the mean?

...assuming that I'm able to augment their knowledge about variance in an intuitive fashion ( Understanding "variance" intuitively ) or by saying: It's the average distance of the data values from the 'mean' - and since variance is in square units,…
PhD
  • 13,429
  • 19
  • 45
  • 47
261
votes
11 answers

How would you explain Markov Chain Monte Carlo (MCMC) to a layperson?

Maybe the concept, why it's used, and an example.
Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
254
votes
3 answers

How to know that your machine learning problem is hopeless?

Imagine a standard machine-learning scenario: You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions about some variable based on what you have. As…
Tim
  • 108,699
  • 20
  • 212
  • 390
252
votes
15 answers

What are the differences between Factor Analysis and Principal Component Analysis?

It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use one over the other. A real example would be…
Brandon Bertelsen
  • 6,672
  • 9
  • 35
  • 46
247
votes
46 answers

What are common statistical sins?

I'm a grad student in psychology, and as I pursue more and more independent studies in statistics, I am increasingly amazed by the inadequacy of my formal training. Both personal and second hand experience suggests that the paucity of statistical…
Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65
242
votes
7 answers

How to choose a predictive model after k-fold cross-validation?

I am wondering how to choose a predictive model after doing K-fold cross-validation. This may be awkwardly phrased, so let me explain in more detail: whenever I run K-fold cross-validation, I use K subsets of the training data, and end up with K…
Berk U.
  • 4,265
  • 5
  • 21
  • 42
231
votes
38 answers

What is the best introductory Bayesian statistics textbook?

Which is the best introductory textbook for Bayesian statistics? One book per answer, please.
Shane
  • 11,961
  • 17
  • 71
  • 89
228
votes
8 answers

Algorithms for automatic model selection

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem is that I am unable to find a methodology, or an…
S4M
  • 2,432
  • 3
  • 13
  • 6
226
votes
4 answers

ROC vs precision-and-recall curves

I understand the formal differences between them, what I want to know is when it is more relevant to use one vs. the other. Do they always provide complementary insight about the performance of a given classification/detection system? When is it…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
222
votes
4 answers

When (and why) should you take the log of a distribution (of numbers)?

Say I have some historical data e.g., past stock prices, airline ticket price fluctuations, past financial data of the company... Now someone (or some formula) comes along and says "let's take/use the log of the distribution" and here's where I go…
PhD
  • 13,429
  • 19
  • 45
  • 47
219
votes
13 answers

What is the difference between data mining, statistics, machine learning and AI?

What is the difference between data mining, statistics, machine learning and AI? Would it be accurate to say that they are 4 fields attempting to solve very similar problems but with different approaches? What exactly do they have in common and…
Olivier Lalonde
  • 121
  • 3
  • 3
  • 5
219
votes
9 answers

Why is Newton's method not widely used in machine learning?

This is something that has been bugging me for a while, and I couldn't find any satisfactory answers online, so here goes: After reviewing a set of lectures on convex optimization, Newton's method seems to be a far superior algorithm than gradient…
Fei Yang
  • 2,181
  • 3
  • 8
  • 4