I work for a government department. We fund a program that provides case management support for adult students in vocational training. We have around 1,000 students per year that access the support. There are around 100,000 other students per year.
I'm trying to determine whether their participation in the support program has an impact on whether they complete the course.
I think what I need to do is establish a group of students that have similar characteristics as the students that get support. Then compare success in training across both groups.
I've been working with R for a little bit and I've been doing some research. I think I'd need to develop a multilevel model that gives a propensity score. I've also been looking at tutorials on decision trees, but I'm not sure if this would give me what I'm looking for.
My questions are:
What approaches can be used to select a sample group from the 100,000 students that mirrors the characteristics of the group of 1,000 that received support? The variables are a mixture of ages, genders, regional/metro locations, disability and previous study levels. Links to examples or tutorials would be helpful.
Is a propensity score the right tact to consider the relationship with support and completion? Again please suggest any tutorials or examples or other alternatives.
Thanks in advance