distance metric for student course schedules

Question

I'm doing an exploratory clustering analysis of student course schedules at a college. Interpretability by humans is paramount: we're trying to inform future research questions and possibly scheduling decisions. Success for this project would include observations like "many students take classes only on Monday/Wednesday" or "hey, look, there are a bunch of students who take morning classes at the Springfield campus and go over to the Franklin campus in the afternoon".

For each student, for each class the student took, I have the start time, end time, days of the week, and campus. I'm working with data from a single semester.

As a first pass, I'm doing agglomerative clustering in sklearn and exploring the dendrogram by hand. The hard part so far is the question I'm posting here: what would be a reasonable distance metric between two students based on their course schedules?

My first attempt was to divide the week into 30-minute intervals and assign each student a 1 for every slot where the student was in class and a 0 elsewhere. For example, a student with a morning MW class, an afternoon TR class, and a Friday seminar would look like this:

Then I can compute the overlap between two course schedules using the Jaccard distance. But I'm not fully satisfied with this approach:

Non-overlapping classes are all equally dissimilar. A class that starts at 9:00 is just as different from one that starts at 11:00 as it is from one that starts at 16:00.
There's no "bonus" for classes at the same time on different days, or for classes at different times on the same day.
It's not obvious how to extend this to include online classes (although I suppose I could make a new "day" just for online classes) or the campus where the class was located.

score 1 · Accepted Answer · answered Aug 05 '19 at 07:41

1

The problem with clustering in this context is that these methods usually assume that:

All students are typical, i.e., mustnbelong to one of the clusters
Each student exhibits exactly one behavior

I doubt that these implicit assumptions are beneficial.

Instead, consider a frequent itemset mining and association rule mining approach, to find patterns such as "takes classes on Monday morning => takes classes on Tuesday morning". This is beneficial because students may behave non-standard, and may combine multiple typical behaviors. But it also comes with challenges, such as removing trivial patterns, and filtering out the actual interesting patterns.

answered Aug 05 '19 at 07:41

Has QUIT--Anony-Mousse

39,639
7
61
96

Thanks; frequent itemset mining does look like a better approach. I assume each "item" would need to be a single course, or possibly course meeting, right? In that case, is there a version of this approach that allows items to have features (e.g., "Monday" or "9:00") that it can use in constructing rules, instead of treating each item as an atom? – A. S. K. Aug 05 '19 at 17:26
Make Monday an item and 9am another item? – Has QUIT--Anony-Mousse Aug 05 '19 at 23:36
That sounds good for a single class, but if the student is taking multiple classes, I think that would lead to ambiguity. {"Monday", "9:00", "Tuesday", "11:00"} is compatible with a 9:00 Monday class plus an 11:00 Tuesday class, and vice versa. All the examples of frequent itemset mining I've seen so far use the "market basket" analogy; do actual retailers use this kind of analysis? And if they do, how do they deal with, e.g., different brands of bread or flavors of ice cream? – A. S. K. Aug 06 '19 at 05:10
If you want to detect patterns such as "always at 9" you'll need this ambiguity. You can add overlapping symbols, but you'll then need to ignore trivial rules such as "Monday + at9 => Monday-9". – Has QUIT--Anony-Mousse Aug 06 '19 at 06:38
This rule induction approach is reminding me of similar algorithms from theoretical linguistics (e.g., the [Minimal Generalization Learner](http://www.mit.edu/~albright/mgl/) or [MaxEnt](http://roa.rutgers.edu/files/858-0806/858-HAYES-0-0.PDF)). They explicitly treat sounds as bundles of features rather than atoms, which would be useful here. But in other ways their architecture doesn't quite work for this particular problem. – A. S. K. Aug 06 '19 at 15:00
MaxEnt etc. need sequences to make sense, as they model state transitions. There is a relationship but it is fairly universal to all probabilistic approaches. – Has QUIT--Anony-Mousse Aug 06 '19 at 20:12

distance metric for student course schedules

1 Answers1