0

I have collected people's reactions to pictures (rating 0-100). The pictures are either greyscale or in colour (categorical predictor) and they vary in resolution (continuous predictor). I want to use colour and resolution to predict the rating with a linear model:

lm(rating ~ colour * resolution)

However, when I performed a t-test to see whether colour predicts resolution, I found a significant difference with greyscale pictures having a higher resolution than colour pictures.

Is the dependence from one predictor on the other a problem for the linear model?

If yes, what could I do to rectify this situation?

If not and I can still use the linear model, can I still interpret a significant main effect of colour as the effect that is independent from resolution?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Max
  • 37
  • 4
  • 1
    What do you want to use your model to do? If you want to use it to say that greyscale pictures are rated lower than color pictures, but you know that high-resolution pictures tend to be rated higher than low-resolution pictures, then, yes, you could confuse your model by having higher resolution greyscale images than color images. The predicted ratings will be legitimate (probably), but the interpretation gets messy. If all that matters to you is prediction, then there is less of a problem, but it sounds like you want to interpret the "colour" coefficient. – Dave Nov 03 '21 at 14:19
  • Yes, I would like to interpret the "colour" coefficient. In an ideal scenario, I would like to be able to tell apart the effect of "colour", "resolution" and the interaction of both. – Max Nov 05 '21 at 13:39

1 Answers1

1

Your linear regression models rating conditional on colour and resolution. Dependencies (correlations, associations) between the predictor variables is not in itself a problem, assuming it is not to severe (like correlation $\pm 1$).

You can look up posts here about , some relevant ones:

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467