From your question, I gather you have two concerns:
- What happens to the control population when new cases are observed in a case-cohort study?
- What happens when not all covariates measured in some case or control subjects can be feasibly measured in all case or control subjects?
First, a very brief overview of case-cohort study design: In a cohort study, the cohort population estimates the exposure distribution of the source population (e.g. all patients at risk for myocardial infarction in the USA, or all patients at risk for myocardial infarction who would be admitted to the coronary care unit at the Mayo Clinic if they experienced an infarction). In a nested case-control study, the control population is sub-sampled from the cohort population and can be thought of as a more efficient (in terms of research resources) estimate of the exposure distribution of the source population. A case-cohort study is a special design of the nested case-control study where the control population is a random sample of the cohort population at time $t_0$, and thus sampling is independent of an individual's contributed person-time or outcome (e.g. disease) status. Since the case-cohort study's control population includes all subjects at risk of the outcome at the beginning of follow-up, the odds-ratios from the case-cohort study estimate the risk-ratios from a cohort study.
Regarding re-sampling of the control population as new cases are observerd
Individual membership in the control population of a case-cohort study does not change during study follow-up. Any individual in the control population who develops the outcome at time $t$ remains a member of the control population, while also becoming a member of the case population. Thus, if at time $t_1$ $A_{t=t_1}$ individuals in the cohort have developed the outcome, the odds ratio estimating the risk of developing outcome given exposure over $t_1$ time is $OR_{t=t_1}$. If at time $t_2$ $A_{t=t_2}$ individuals in the cohort have developed the outcome, the odds ratio over $t_2$ time is $OR_{t=t_2}$. There is no need to re-sample the control population. For more see Kupper LL et al. A Hybrid Epidemiologic Study Design Useful in Estimating Relative Risk. J Am Stat Assoc. Vol. 70, No. 351, Sep., 1975.
Regarding the inability to measure all covariates of the new cases and controls
A variant of case-control studies is the two-stage or two-phase sampling design. My understanding of two-phase sampling is limited, but I cautiously offer the following brief summary: In a two-phase case-control study, inexpensive covariates are measured for all subjects included in the control and case populations. Expensive covariates are measured only on a sub-sample of control and case populations. Analytical methods attempt to treat inexpensive covariates as surrogates for expensive covariates, which can allow statistically-efficient estimates of associations between expensive covariates and outcome.
Here are a few references, of which I have only glanced over:
The twophase()
function from the R survey
package implements two phase analysis methods. A short vignette on two phase analyses is available at the survey webpage on CRAN.
If you have not already read it, an excellent reference for study design and other concerns in epidemiology is Rothman, Greenalnd, and Lash's Modern Epidemiology.