Often test takers are ranked based on their test scores (e.g., people taking a civil service exam) and often those test scores are rounded before being ranked, producing many ties). How might I go about calculating a confidence interval around such ranks (in order to give the decision makers who use the ranked list an estimate of the degree of confidence the user should have in the order of the rankings). I would like to describe and convey to users some information about whether a different of k ranks is likely to reflect a true difference or is likely to arise often based on chance. I presume the confidence interval will depend on the numeric score underlying the ranked list. Is there a general solution, perhaps based on the mean, s.d., and number of people tested? Assume the underlying distribution is normal.
Asked
Active
Viewed 102 times
1
-
2Could you clarify what such a confidence interval would represent? One can think of several distinct interpretations of your question, depending on what the objective of the analysis is: to find an interval for a particular individual or intervals for all individuals and how exactly such intervals ought to be interpreted, for instance. Would the [analysis of confidence intervals for percentiles](https://stats.stackexchange.com/questions/122001/confidence-intervals-for-median) be appropriate for your problem? – whuber May 20 '19 at 12:52
-
@whuber. I have clarified the question. Since the underlying score distribution may be assumed to be normal, the links you provide are not maximally useful (as they are distribution free). – Joel W. May 20 '19 at 15:30
-
1Since the question focuses on ranks, it appears likely that you will gain nothing by making such a distributional assumption. Indeed, the discrete nature of test scores produces a likelihood of ties in ranks--perhaps a high likelihood of many ties--which would never occur with a truly Normal distribution. Beware, then, that your distributional assumption could produce incorrect results. A more serious problem is that you don't have the necessary information: on the basis of just one test, how could you possibly attribute differences in test scores to individuals instead of pure chance? – whuber May 20 '19 at 15:36
-
@whuber. I did say in the question that I thought the answer would depend on the score and the number of people tested. The advantage of assuming a normal curve are: (1) it reflects reality, and (2) it will help to more accurately reflect the non-uniform distribution of scores. You are correct about ties, especially since the ranks are often based on rounded test scores (depending on the policy of the testing organization). – Joel W. May 20 '19 at 15:58
-
What is most important is whether the Normal model reflects the aspects of the reality that matter to the question. Although it might be a beautiful approximation to the distribution of test scores for routine purposes, like comparing means or variances, your special circumstances a Normal approximation might depart hugely from reality in terms of quantities that really matter, such as the likelihood of ties. This is one huge advantage nonparametric methods have over parametric ones: they are less likely to go wrong when your assumptions aren't quite correct. – whuber May 20 '19 at 16:02
-
But let's not get distracted by this technical issue: if the only thing you have is one set of test results, it's possible that the ranks give you little information: they could reflect chance variation in performance among individuals who are all equally good at the test. This shows that your question can be answered only by referring to additional information, such as independent evaluations of test replicability. That alone is enough to answer your general question in the negative: there cannot possibly exist a formula based on just these test results, no matter how many people were tested. – whuber May 20 '19 at 16:05
-
@whuber This is the nub of the matter: "they could reflect chance variation in performance among individuals who are all equally good at the test." I would like to convey to the decision makers when the difference is likely to be based on chance and when the difference is likely to reflect a true difference in ability. As to the impact of number tested, with a larger N, there will be more people bunched around the mean, so differences in ranks around the mean will be less meaningful with large Ns. – Joel W. May 20 '19 at 16:22
-
@whuber Perhaps something as simple as: When N=100, people at the highest 10 ranks are likely to reflect true score differences but ranks between 40 and 60 are likely to be due to chance. (This would seem to require a family of tables to interpret rank score differences.) – Joel W. May 20 '19 at 18:37
-
I won't keep repeating myself, so one last time, allow me to point out that you haven't yet described any information that would permit you to do support such conclusions. To get some intuition about this, consider the possibility that the wrong test happened to have been administered--maybe a multiple-choice biology test written in Chinese was given to English-speaking managers who prepared for an exam in business analytics. They therefore guessed (or just gave up). In a Normally distributed set of results, some will have performed noticeably better than others--but that's *meaningless.* – whuber May 20 '19 at 18:42
-
@whuber. I want to assume hat the tests in question have the typical level of correlation with a relevant criterion; that is, they are valid for the intended use. Are you saying that the meaningfulness of differences in rank are also a function of the correlation between the test and the criterion? (I had not thought of that, but it seems to make sense.) – Joel W. May 20 '19 at 18:49
-
I think you are pointing to a key quantity, namely the variability of the difference between the test outcome and some hypothesized "correct" outcome for each examinee. If you have some quantitative expression for that variability, you should be able to make progress. – whuber May 20 '19 at 18:51
-
@whuber I think it would be useful to have a set of guidelines for test-criterion correlations of .2, .3, and .4. This range captures most of the observed (corrected) predictor-criterion correlations. BTW, no one to date has considered forming "bands" of basically indistinguishable predicted criterion scores; such bands are always in terms of predictor scores. That might become the topic for a new question. – Joel W. May 20 '19 at 18:58