1

The task is to build a regression model for individuals. I have all the independent variables for each individual, but the dependent variable only as an aggregates on group-level.

Lets say, I am trying to predict the score a student will achieve at some test. I have information about the student that can be used as predictor-variable (f.e. time spent studying). But the results of the test are only given as aggregated sums for each class. I can link every student to a class, but I don't know individual test-results.

One potential way I can think of would be to aggregate the independent variables too and run the regression completely on the aggregated data. But it's probably rarely the case that correlation on aggregated level and individual level are the same. So I don't know how to make any judgement about the validity of such an approach.

Is there any 'good'(or less bad) approach to this problem?

Bobipuegi
  • 753
  • 5
  • 12

2 Answers2

2

Okay, I found out that what I was looking for is called a hierarchical linear models (see wikipedia). Just dropping that here, in case someone else encounters a similar problem.

Bobipuegi
  • 753
  • 5
  • 12
1

I have the same doubt as the initial question and did some checks on multi-level model or hierarchical linear models. Multi-level model does not seem to be a solution for this problem.

As per my understanding, multi-level models are used when the independent variables are at different levels. In the given question, the dependent variable is the aggregated one. All the independent variables are at the same level.

IISsENII
  • 11
  • 1
  • If the relationship between "time studying" on an individual's test score is affine then why not just average "time studying" over the class and use that as an explanatory variable in a class-level regression? – Estacionario Jul 21 '21 at 14:37