I am studying the responses of 500 subjects to temperature increase using a linear regression on each degree of temperature (from 10°C to 28°C). Thus, it was possible for me to compute the intercept and slope for each one of the 500 subjects.
I am interested in identifying three groups of individuals that are "statistically identical":
- Group 1: those who are robust and respond independently of the temperature effect. These are the subjects which have a slope of approximately zero.
- Group 2: Those having negative / downward slopes
- Group 3: Those having positive / upward slopes.
I have computed the slope for each subject and afterwards sorted the data from lowest to highest slope.
Can you please help me to use the most relevant statistical parameters and tools that allow me to identify individuals (statistically homogeneous) of each of these three groups?
I considered using the t-test, clustering techniques, PCA and discrimination but I need your help for the choice and how to do it using SAS or R.
For better illustration, here is part of the data:
Sub. C15 C16 C17 C18 C19 Slope
1 30,55 30,05 29,56 29,07 28,58 -0,49
2 22,22 21,83 21,44 21,05 20,67 -0,39
3 20,16 19,78 19,39 19,01 18,63 -0,38
4 61,07 60,69 60,31 59,93 59,55 -0,38
5 49,29 48,92 48,55 48,18 47,81 -0,37
6 52,54 52,18 51,81 51,44 51,08 -0,37
238 18,19 18,18 18,18 18,18 18,18 -0,0017
239 -10,23 -10,23 -10,23 -10,24 -10,24 -0,0010
240 -14,44 -14,44 -14,44 -14,44 -14,44 -0,0006
241 19,76 19,75 19,75 19,75 19,75 -0,0006
242 13,55 13,55 13,55 13,55 13,55 0,0010
243 19,93 19,93 19,93 19,93 19,94 0,0012
244 55,69 55,69 55,69 55,69 55,69 0,0016
495 -28,70 -28,43 -28,16 -27,90 -27,63 0,27
496 -9,71 -9,40 -9,10 -8,80 -8,49 0,30
497 -12,29 -11,98 -11,67 -11,35 -11,04 0,31
498 -43,85 -43,48 -43,11 -42,74 -42,37 0,37
499 -29,41 -28,97 -28,52 -28,07 -27,62 0,45
500 -8,54 -7,98 -7,43 -6,87 -6,31 0,56
Columns C15-C19
(for example) are predicted values of the trait of interest at each temperature. These values were the results obtained using a reaction norm model. The slope for each subject is computed here using linear regression and afterwords the data were sorted by slope values.
In this example:
Subjects (1 to 6) have the lowest negative slopes and remain of interest in our study according to their decrease in response to increasing temperature.
Subjects (238-244) are here as example of subjects having a slope of approximately (zero) and are more interesting in our study because they represent the robust ones and where their performance are not affected by increase of the temperature
Subjects (495-500) are those having the highest positive slope. We are interested of them because their performance increase even in harsh conditions and high temperature.
Now our question is:
How can you identify more precisely (based on a statistical tool) the subjects that should correspond to each of the 3 groups?
In a first step, we think we could use a t-test to identify the subjects of each group. Some people advised us to apply clustering or discriminant approaches. But we were not able to figure out how to do it the right way. We have some background in SAS. We asking how to solve our problem and help us with an example of how to solve the problem in SAS.