Method to compare ratings from multiple different sources with missing data

Question

I want a method to compare ratings from multiple sources and find a single measure that best reflects all the ratings. To give a specific example, let's call it "The fellowship review committee problem" (but please don't hate me, I am not part of a fellowship review committee, this is just an example):

A college fellowship review committee tries to decide which of their students best deserves a fellowship. It reviews 1000 student applications and wishes to focus on the grades of each student in up to 5 specific courses (let's say the courses are A, B, C, D, E). Not all students are required to have taken all 5 courses, some have taken 3, some 4, and some 5 (but all applicants have taken at least 3 out of A, B, C, D, E). What is the best single number that measures which student is better?

My initial idea to solve this problem was to use principal components analysis (PCA). If all applicants had taken all 5 courses, that would work well. Now, I see in this post that PCA does not work well with missing data. I have two questions:

If there is no missing data (all students have taken all courses), is there a better way than PCA to find a single measure?
If I do have missing data, is there a better solution than the so-called DINEOF procedure described in the link I posted above?

score 2 · Accepted Answer · answered Jun 26 '14 at 21:46

I've thought about this issue before. I've never tried to actually implement my way to resolve the issue so there might be better ways to do this.

One thing you can do is assume some unobserved quality of each thing you're evaluating. In this case, you are evaluating students, so this might be something that would be correlated with IQ/standardized test scores/conscientiousness. However, in this case we'll assume you can't observe these variables (or at best the quality variable could be assume to be a function of standardized test scores if you have them). When you have unobserved variables, you typically need to rely on an EM algorithm, though it might be easier with a Bayesian approach (Gibbs sampling or Stan).

You would then have a second step of models predicting the grades for each class given this quality measure. There are a number of ways to do this, which I won't get in to (given the quality measure, it's standard).

Given you can find the quality measure, you can rank the students. Of course, ideally you should incorporate some measure of uncertainty into the evaluation. For instance, if you assume the quality measure is normally distributed, then it might be that the students with 5 class have lower standard deviations than students with 3 classes.

Of course, there are simpler techniques you could use, like multiple imputation for the missing scores and then take a weighted average based on the importance or difficulty of the classes, that would take less time. However, if it were really important to get the rankings right, then this is what I would do.

Thanks, the Bayesian approach sounds good! Would really help if someone could point out a reference which discusses how to implement something like Gibbs sampling for this problem. In particular, how to choose a good starting model to map from the "quality" variable, to grades. For example if I know some grade distributions are convex and others are concave. Also how to deal with missing data in this setting (or does updating the posterior with each existing grade take care of this?). Sorry if that's all too basic, I'm only learning all this material now. — nikosd, Jun 27 '14 at 01:16
I wouldn't call Gibbs Sampling basic. It's rather challenging stuff. However, if you're not familiar with it, then this might not be the best exercise for it, as this is a rather difficult approach. That might mean that the multiple imputation approach I describe at the end might be more appropriate. There are many resources on this site and others that can describe multiple imputation or point you to books that describe it in more detail. — John, Jun 27 '14 at 01:50

Method to compare ratings from multiple different sources with missing data

1 Answers1