Are ratio response variables problematic in linear regression?

Question

Is it problematic to use the ratio of two measurements as a response variables in linear regression?

Motivating Example: Modeling Differences

My motivation comes from the following understanding of differences of two measurements as response variables:

To understand how something changes over time using two measurements in series, people often model as:

(T2_i - T1_i) = b0 + e_i

And by rearrangement equivalent to:

T2_i = b0 + 1.0 * T1_i + e_i

So, the relationship between T1 and T2 above is fixed at 1:1

Better would be:

T2_i = b0 + b1 * T1_i + e_i

so that the relationship between T1 and T2 is estimated, not assumed. If indeed the relationship is not 1:1, this model will be better than the difference model.

My Question: What About Ratios?

Suppose we are analyzing the quantities of two parts that make a whole as a ratio, or a density in which we divide count data by effort data. So ...

(P1_i / P2_i) = b0 + e_i

By rearrangement:

P1_i = P2_i * (b0 + e_i)

If we distribute:

P1_i = b0 * P2_i + e_i * P2_i

What (on Earth) does this assume?
Is it problematic?
What if we have a linear predictor in the model? We end up with:

P1_i = b0 * P2_i + b1 * X1_i * P2_i + e_i * P2_i

Is it now true that b0 is an estimate of the relationship between P2 and P1 and that b1 is an estimate of the interactive effect of X1 and P1? Shouldn't be, because now there is no intercept, and no estimate for the effect of X1 is made. And what of the error term being multiplies by P2 -- that can't be good ... ?

A better model would appear to be:

P1_i = b0 + b1 * P2_i + b1 * X1_i + e_i

Now we are predicting P1 controlling for P2. Don't we get the same information here?

It is more clear that we do get the information we're looking for if thinking instead about density, where P1 is, say, the number of objects observed, and P2 is the effort spent sampling for them. To control for effort makes perfect sense, but is it equally valid to do division to create the response variable?

See: https://stats.stackexchange.com/questions/58664/ratios-in-regression-aka-questions-on-kronmal/410465#410465 — kjetil b halvorsen, Feb 02 '20 at 02:51

Are ratio response variables problematic in linear regression?

Motivating Example: Modeling Differences

My Question: What About Ratios?

0 Answers0