I am interested in modeling a continuous variable (e.g., second language learners' English proficiency measured in TOEFL scores) as a function of a number of predictors some of which are continuous while others are count-based proportions or ratios (e.g., number of nouns per sentence or the ratio of infrequent words in their essays). Intuition tells me that the observations with larger denominator values (e.g., 10/200) in those proportions and ratios should be weighted more than those with smaller denominator values (e.g., 1/20) in estimating the coefficient associated with the predictor. Is there a way to account for such a difference in the uncertainty in predictor values? I thought it might be related to measurement error models, but have not been able to find relevant information.
Asked
Active
Viewed 26 times
0
-
Thank you for a relevant post. I now see Kronmal (1993) was generally against using ratios/proportions in regression modeling. When they are the dependent variable, though, I thought one can model it via the Poisson (or negative binomial) regression that predicts the nominator and includes log-transformed denominator as an offset term. If this is acceptable (is it not?), I thought there might be a similar technique to account for denominator differences in predictor variables as well. – Akira Murakami Apr 21 '20 at 02:14
-
Look again at the paper, it discusses both use cases (as dependent and independent variable). The Poisson regression use is clearly acceptable, but that technique cannot be extended to independent variables. See the linked Q and the paper, is discussed there. – kjetil b halvorsen Apr 21 '20 at 02:34
-
I'm aware Poisson regression and its variants cannot be extended to predictors, but since having a ratio or proportion as a predictor is perhaps not very uncommon, I wonder what people do in those cases (apart from blindly throwing them into models). I actually expected many similar questions and relevant discussion here and elsewhere on the web, but have hardly found any except for the post you linked to. – Akira Murakami Apr 21 '20 at 03:05
-
But did you read those? Try maybe the idea from bullet point 3 of the Q *Include numerator and (inverse) denominator as main effects, ratio as interaction term.* – kjetil b halvorsen Apr 21 '20 at 03:27
-
Sorry, I was mainly reading your comments on the linked question, as I thought the math might be beyond my understanding. I will have a look at the paper more closely, though. Thank you again for your pointer. – Akira Murakami Apr 21 '20 at 03:49