I'm not sure if this is on-topic here. If not, then please tell me and I'll delete it.
I have a question about the proper way to design biomedical experiments that involve human-administered measurement. Let's take a hypothetical scenario where we're performing an experiment that requires taking measurements of various parts of human anatomy. Furthermore, let's assume that it is difficult to take such measurements and requires some significant degree of 'human judgement', so it requires the 'measurers' to be individuals of a certain skill/experience/training/qualification/whatever. Does 'proper' experimental design, in terms of ensuring the scientific validity of the experiment/research, require that the same skilled individual perform all of the measurements? What if the measurements were taken using two skilled individuals? What about three? Etc.
I have taken some time to think about this, and it seems to me that what I'm fundamentally thinking about here is error. I am not experienced when it comes to experimental design, but it seems to me that there are a few dimensions to this:
The first and primary, dimension is whether having more than one person taking measurements is problematic. Different individuals – even skilled ones – are bound to be of differing skill levels and introduce varying amounts and types of error (I don't necessarily mean types of error in the statistical sense; it could be error from interpreting the anatomy in a slightly different way, or using the measurement device in a slightly different way, or just not having slept well the night before, etc.).
At the end of the research, all of the data is aggregated together, with no distinction between data from one measurer vs data from another measurer. This means that we are then trying to draw conclusions from data that has this 'mishmash' of errors, rather than just having a single person do all of the measurements so that all of the data contain a consistent pattern of errors.
It seems to me that the fact that all of the data have a consistent pattern of error means that the data is effectively 'normalised' in this way, and so drawing conclusions from it would be more valid/meaningful than the mishmash of errors case.
The second dimension is the number of people/subjects participating in the experiment (that is, our sample size). This is because, even if we have multiple people taking measurements and introducing their own errors, leading to inconsistent patterns of errors in the data, I wonder if just having a larger amount of data/measurements would lead to some kind of statistical effect that makes drawing conclusions more valid/meaningful.
The third dimension is the relative/proportional number of measurements taken by each measurer. If we have a total of 10 subjects and two measurers, and each measurer measures 5 subjects, then it seems to me that the problem explained in 1. would be maximised, whereas, if we had one measurer measure 9 subjects and the other measurer measure 1, then we're getting close to the result of just having a single measurer.