We are facing a study where participants are to be followed longitudinally for a sensitive self-report question. Efforts were made to establish trust with participants, but part of the data collection specifies that an objective biomarker for the self-report (drug test) would be collected at random.
We are facing a few issues in the design and proposed analyses:
1) What would an appropriate method be for randomly validating this measure and conducting inference based on the joint measure? There's a possible lie process, where people optimistically bias their perspective. And they might become somewhat (not perfectly) more honest if they know they're going to be tested, or if they're going to be tested that time, and/or if they've been tested before. Would some forms of measurement error models require that we confirm measures in participant who self-report using even though we might assume that these participants would not lie?
2) Should the participants be informed before or after they self-report their weekly use that we will collect a biospecimen? Or should biospecimen be collected before asking questions at all? Some notes: we don't want to deceive participants, it would be inappropriate to collect biospecimen if we're not going to measure it. For power considerations, we can't really take an A/B approach to this and explore a billion different ways of collecting these measures.
3) What types of measurement models are available to model the efficacy of an RCT to decrease prevalence of substance use? Obviously, biomarkers will be used for a primary endpoint. I can conceive of a latent growth model to integrate biomarker findings and assume that the random process of self-response may be informed by biomarker evaluations of individuals in the same randomization arm. Obviously these ideas are a bit half baked and I'm not too versed in the usual language around these ideas.
I'd appreciate some cited references or ideas on how to deal with this problem.