After reading for hours, I am still swimming.
I have a set of plasma concentration results from different labs:
value (mg/L), standard deviation, range, sample count
For example (in reality, I have a lot more rows):
1. 25.8 mg/L, +/- 8.0, 13.0-41.0, 24
2. 55.0 mg/L, +/- 7.9, ?-100.0, 10
For the second value, I have only the highest measured concentration available, which I put as high value of the range. For the low range, I put a "?".
I would like to calculate an average of these two rows to summarize the results in a single row. Further data is not available.
Update: I just realize that the values could be mean or median, I don't even know that.
Distribution:
The lab values are from different labs, and they might use different techniques. I would think that a unit like mg/L represents an "absolute" measurement, meaning that established different techniques should result in similar results within a certain range, and if not, the technique should be questioned (which is out of scope for me), but I am ready to learn better. Anyway, I have no way of knowing which technique was used.
The results are from a somewhat uniform population: They have a certain untreated medical condition which will appear on average at a certain older age. But it could happen to a much younger or much older person, too, and I don't know the patients age for a particular value. Also, it's medical, so there could be differences in location, genetics, or other preconditions.
So considering tristans answer, I would say they are from different distributions, although I'd rather would not :)
Question 1: For the value, I sum the values and divide by their total count: (55.0+25.8)/2 = 40,40 mg/L. Is that a good practice, or should I do it different?
Question 2: What would be the statistical correct way to get an average of the standard deviation? Does that even make sense?
Question 3: How to get an "average" of the range? I would think that I just take the lowest range and the highest range of all results (13.0-100.0), can someone confirm/disconfirm?
Question 4: How should I handle missing low ranges as in the example, and is putting the max value as high range even a good idea?
Question 5: I am not at all sure if the samples should play a role here. If they do, how?
Thanks in advance for reading and any help.