I have data of a medical experiment and want to run a linear regression. More precisely, I have severity degrees of a disease of two independent groups (treatment and control group). The severity degrees are categorical with values {0,1,2,3} (0 meaning no disease, whereas 3 represents a widespread of the disease). In each session, a number of samples is taken from a subject. Next, the samples are rated according to their severity degree. These ratings are averaged over all samples taken and recoded into percentages. Hence, results of two subjects could look like this:
-----------------------------------------
| Group | 0 | 1 | 2 | 3 |
-----------------------------------------
| Control | 0.195 | 0.5 | 0.3 | 0.005 |
| Treatment | 0.499 | 0.4 | 0.1 | 0.001 |
-----------------------------------------
For the regression analysis, I want to code the severities into a single score. A first thought would be to sum up severities 1 to 3. That is, the score of the first row of the table would be score(0.5,0.3,0.005) = 0.5+0.3+0.005 = 0.805
. But this is not a good score function since it disregards the ordinal nature of the severities (e.g. score(0.005,0.3,0.5)
would yield the same value even though its greatly more severe).
The next idea is to weight the different levels, i.e. score(s_1,s_2,s_3) = w_1*s_1 + w_2*s_2 + w_3*s_3
(in case of a linear function). Obviously, the weights should satisfy w1 <= w_2 <= w3
. Is there any good (maybe even common) way to determine such weights?