4

I'm creating a generalised linear regression using a binomial link function for two variables A and B. From looking at the data it appears that A/B may have discriminatory effect. Is it sensible to include A/B as an additional term in the model? Thinking about this myself, this appear term would be the inverse of including the interaction term A:B and therefore not the same as including the interaction.

EDIT 1:

Having thought about this a little bit more, if I include variables on a log scale would this be the same as including a ratio when the parameter estimates are negative?

EDIT 2:

I am experimenting with machine learning and have been exploring example data sets in the R package. I am currently using the data(crabs) in the MASS package and trying to predict crab gender and species from morphological variables. I first looked at the PCA of the variables to identify underlying vectors within the data. The first component in the PCA is all positive with respect to each variable, and from my understanding of PCA this means the variables are all correlated along this component. The second component contains both positive and negative values, specifically carapace width and rear width. A dot plot and density plot of these variable indicates two possible distributions.

Plot of second component Plot of second component density

Is it therefore sensible to use the following logistic model for predicting species:

species ~ CW + RW + I(CW/RW)

As Stuart mentioned in the comments, these variables may be correlated with each other and therefore non-independent. I used bootstrapping of the crab data to test the models' accuracy. For the case of predicting sex, including the ratio term does seem to improve the model's predictive accuracy slightly.

Model Bootstrapping

I should also note that including all available measured variables improves the model accuracy but I would like to try and "infer" what the important variables may be using PCA.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Michael Barton
  • 363
  • 2
  • 11
  • 1
    You may want to look at [How to ask a statistics question](http://www.statisticalanalysisconsulting.com/how-to-ask-a-statistics-question/). As is, it's hard to answer this. Sometimes a ratio could make sense, I guess. But what's your DV? What's your IV? etc. – Peter Flom Oct 31 '12 at 18:37
  • How would adding variables on a log scale be like adding a ratio of two variables (regardless of sign?). Adding the log of the ratio would be like adding the difference of the logged variables, but that doesn't seem to be what you are saying. – Peter Flom Oct 31 '12 at 18:39
  • 2
    In social sciences ratios are very often included in regressions. For example, household per capita income = household income / number of people in the household. But what I *think* you're asking is about including the ratio, numerator and denominator in the same model. That would be a bit less common. One concern would be that one or other of the numerator or denominator might be strongly correlated with the ratio. Maybe say a bit more about the context and what variables you're thinking of using to get a better answer. – Stuart Nov 01 '12 at 00:48
  • 2
    On using logs, modelling $y = b_1ln(x_1) + b_2ln(x_2) + b_3$ would be equivalent to $e^y = x_1^{b_1}x_2^{b_2}e^{b_3}$ and if you estimated this equation you *might* find evidence consistent with, or contradicting, variation in y being explained by variation in the ratio $x_1/x_2$. It would seem simpler just to include the ratio in a linear regression. Again though you may need to be clearer about what you're trying to test. – Stuart Nov 01 '12 at 00:58
  • 2
    If the aim is only to maximise predictive power, rather than accurately calculate coefficients on the individual variables, then I don't think you have to worry about the correlation among the predictor variables and your ratio model could make sense. – Stuart Nov 01 '12 at 21:16
  • Maybe is answered here: https://stats.stackexchange.com/questions/58664/ratios-in-regression-aka-questions-on-kronmal/410465#410465 – kjetil b halvorsen Dec 26 '20 at 21:42

0 Answers0