I am trying to figure out the best approach to deal with cross-correlated variables in statistical analysis.
I am helping to analyse results of a randomised control trial for an educational intervention for a group of poor pre-school children in three different localities. We set up a trial with an experimental group and an intervention control group in each of the locations where students were randomly assigned to each.
We see a strong effect of age of children and time spent by children on the activities (the participation was voluntary) with the test results. However, one of the locations ended up concentrating a group of children that were most active. We have qualitative explanations for why that is the case, however, we would like to see whether we can tease out the effect of time spent at the activities.
Working in the Python language, I have tried different approaches – hierarchical regression where I have an issue how to input individual variables. Multiple regression where I am uncertain how valid this approach is for binary categorical variables (this holds also for hierarchical version). What is the appropriate analysis for two continuous variables (age, time at activities) and one categorical (three locations) for this problem?