How do I develop a metric for comparisons that involves a combination of variables at different scales?

Question

I am trying to select the best answer from a set of samples, using two variables at different scales.

In the data below, each row indicates the accuracy of a given measurement, the number of times it occurs in the sample set, and the answer that it indicates.

Accuracy:0.00886982324861, Frequency:5.0, Answer:1.0
Accuracy:0.0104663914334, Frequency:1.0, Answer:2.0
Accuracy:0.0112727390014, Frequency:1.0, Answer:2.0
Accuracy:0.0143046058573, Frequency:3.0, Answer:1.0
Accuracy:0.0251741710747, Frequency:1.0, Answer:1.0
Accuracy:0.0322055218681, Frequency:1.0, Answer:2.0

Until now, I have been ignoring the accuracy and simply selecting the answer that occurs more frequently i.e., Answer 1 occurs 9 times, Answer 2 occurs 3 times, so select answer 1.

A product of frequency and accuracy produces a metric that takes both into account, but the different scales weights frequency disproportionately more than accuracy.

How can I calculate a metric that allows me to use both accuracy and frequency, but accounts for their different scales?

Your help is greatly appreciated, John

*EDIT Thanks for your response. In the data above, accuracy is a distance measurement from the closest category to which I am attempting to quantize this data. For example, suppose I have category 1 and category 2, the sample has a value of 1.4 so I would describe it's accuracy as .4 away from category 1 and .6 away from category 2, indicating that category 1 is the answer.

In the above data, if I multiply the frequency of sample one by it's accuracy ( 5 * .0088) the result is .0443 while the second sample gives me .0104 and the fourth sample yields .0429. From the data, the most accurate sample occurs 5 times (frequency 5) so it should clearly be better than the fourth most accurate sample occurring three times, but without some type of scaling to the results are only .0014 different. What if I had a sample that occurs once but its accuracy is .9. That would have a product of .9 which is larger than the first sample.

do you have anything in the way of training data? i.e. some data for which you know what the correct answer is? — user603, Jul 26 '11 at 21:08

score 0 · Answer 1 · answered Jul 26 '11 at 21:14

I'm confused because I don't understand the context. What is "accuracy"? What are you trying to measure?

If you want both to be on the "same scale" then you could just transform both to z scores and add them. Then they are on a standard deviation scale. But multiplying the two sort of gets rid of the "problem" (I am not sure if it is a problem, or what the problem is). Let's take a case that's easier to understand: Area of a rectangle is length*width, right? Now, if you have the two measured on different scales, it makes no difference to the product - it just scales up

A B C D E
ht in inches ht in feet width in inches A*C B*C

12 1 12 144 12 120 10 12 1440 120 12 1 120 1440 120 12 1 1200 14,440 1200

If you multiply A by 10, you multiply B, D and E by 10. Scale has nothing to do with it.

Variation has something to do with it, and you can make the two measures have equal sd by the z transform, if that's what you want.

score 0 · Accepted Answer · answered Jul 26 '11 at 22:04

You have to decide to what extent one substitutes for another and whether this measure is constant or are there "decreasing returns" to one of them. Economists typically use min[a*x,y] for perfect complementarity (no substitution), a*x+y for perfect substitution (a units of x for unit of y whatever y is) and a "compromise" (called Cobb-Douglas function) of x^a*y^b with b typically in (0,1) and equal to 1-a so you need less x for unit of y the more you have of y to keep the function at the same level.

How do I develop a metric for comparisons that involves a combination of variables at different scales?

2 Answers2

Linked