0

I have performed a hedonic regression with a database where all the variables are very significant and according to the calculations made in R with a squared R = 0.6123.

I have performed the same regression for a subset of this database when they met a certain condition, and the result has been that many of the variables that were previously significant are no longer and the squared R = 0.2509.

The problem is that these variables under my point of view should not be so affected.

Does anyone find any explanation to this? Does it only mean that these variables in this case are no longer significant, or could it show a deeper problem?

Thank You

  • I have a test to perform: randomly extract several subsets of the same size as this one and compare regression results. – James Phillips Aug 16 '19 at 17:03
  • @JamesPhillips I have done what you say, I have edited the question to add the results captures – Ricardo T. Aug 16 '19 at 18:09
  • Please see the example I posted at https://stats.stackexchange.com/a/13317/919. – whuber Aug 16 '19 at 20:17
  • @whuber First of all, to say that thank you very much for contributing this document, I hope I have been able to understand everything you wanted to explain in it. I understand then that when performing different subsets of the data, its R ^ 2 decreases. I wanted to ask you if it is normal that when studying these subset it is normal for some variables to lose significance against what was expected a priori. Should I worry about this lack of significance in my key variables, or understand it as a valid result, although it differs from what was expected? Is there a way to test this? – Ricardo T. Aug 17 '19 at 11:31
  • I didn't mean to indicate $R^2$ decreases on subsets: it very well could increase. For instance, it will always equal $1$ on any two-element subset. The point worth learning is that $R^2$ depends greatly on the ranges of the regressor variables. As far as individual variable significance goes, you need to understand "significant" as meaning "detectable," which ought to make it clear that when you limit the data, your ability to detect values decreases. Thus it's no surprise that some of the variables are not found significant in the subset regressions. – whuber Aug 17 '19 at 14:00
  • @whuber Perfect, thank you very much for your explanation and time – Ricardo T. Aug 18 '19 at 13:43

0 Answers0