Is it okay to include the dependent variable as an input variable to the higher-level regression model, in a hierarchical / multi-level setup

Question

Let's say I have a hierarchical dataset with student scores (for each student) nested within schools. While modelling for a varying intercept, would it be okay to include the average of student scores within a school as an input variable while modelling the school-level intercept? Wouldn't this count as "data leakage"?

In some sense, I am using the dependent variable as an input to predict the dependent variable itself.

This might make more sense if you have data over time and you use the average score for the students of a school at the previous time period. — Antoine Vernet, Sep 14 '18 at 09:28
Let's say I have three years of monthly data. Do I average over all three years? Because that would end up using information from the "current" and "future" periods. How do I use only the "previous" period, as I can't input different values for different periods when modelling the intercept for each student? Or can I? — infinitesimal, Sep 14 '18 at 09:33
With three years of data, including the average score by school of the previous period would force you to drop the first period and only use period 2 and 3 for analysis, which might not be desirable. However, in theory, there is nothing that prevents you from having a time varying variable at the school level in your HLM (with three levels: students nested in schools, nested in time periods). — Antoine Vernet, Sep 14 '18 at 15:54

Is it okay to include the dependent variable as an input variable to the higher-level regression model, in a hierarchical / multi-level setup

0 Answers0