4

What is be the optimal ratio of sample size to the number of parameters in the multiple regression model?

I am wondering as I would like to improve accuracy of prediction. Some sources suggest a ratio of 3 to 1, whereas others suggest 10 to 1. Any other suggestions?

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
beginner
  • 621
  • 2
  • 9
  • 15
  • 4
    That depends on many things, for example: Your exact definition of optimal or how strongly correlated the variables are. As a consequence, I don't think that there can be a general answer to that question. – Maarten Buis Jun 05 '13 at 13:17
  • 2
    What are you trying to optimize, exactly? – Glen_b Jun 05 '13 at 13:35
  • 1
    See also this previous question on [sample sizes for multiple regression](http://stats.stackexchange.com/questions/10079/rules-of-thumb-for-minimum-sample-size-for-multiple-regression) – Jeromy Anglim Aug 05 '13 at 07:47
  • the whole population is only 76 subjects and defined as a finalized population . Performed multiple regression. How many variables I can check on this population? Is also here i must have a ratio of at least 10 subjects for each variable tested? Thanks MS –  Dec 28 '13 at 10:43

1 Answers1

3

Should is a dangerous word. It presupposes a measure of goodness without explicitly stating it.

If you have a noiseless system then sometimes a 1:1 ratio is acceptable. If you are trying to prove using pass-fail tests with 95% CI that your maximum error rate is under 2% - then you might want at least 400 samples giving you a ratio in the hundreds.

If you have a slow-learning Kalman filter and the right (wrong sorts of noise) then you might want thousands of measurements for a few parameters.

Bottom line: your mileage is going to vary depending on the nature of the information you get from each sample and what you are trying to do with it. My personal rule of thumb is that I prefer to get 30 samples per parameter.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82