What is an appropriate test to see if there is a relationship between two things, where one of them is itself obtained by averaging human rankings.
As an example, there are good and bad bottles of wine. The company scientist has found a possible objective measure of the quality of the wine. To validate this measure, gather 100 different bottles, and ask 10 people to rate each bottle on a 1-5 scale. Then average the ratiings, so each bottle has a single average score A. It also has an objective score B given by the scientist's measure, which is on a continuous scale.
One could do correlation between A,B across the 100 bottles, or alternately gather some of the highly human-rated wines in group A1, some low-rated wines in group A2, and then do a t-test of difference of means of the scientist measure on groups A1 vs A2.
But neither of these take into account the fact that the ratings A were themselves obtained by averaging, which has its own variance.
(To explain the question further, suppose the wine bottles were rated on a 1-1000 scale rather than a 1-5 scale. Consider two bottles, one has ratings of between 498 and 502 with an average rating of 500, and the second has an average rating of 520 with similar small variance. The objective measure also gives the second bottle a higher score, so this example is weak support for a relationship. But now suppose that the ratings of the first bottle ranged from 1 to 1000, with an average of 500, and the ratings of the second also had huge variance. In this case the difference in means seems accidental, and this pair of (A,B) should provide less support for the proposed relationship)
How to account for this?