3

I have a dataset of $N=3000$ biopsies from humans, each of which have an outcome I am trying to examine using covariates of the patient who provided the biopsy. Some biopsies from the same patient are positive whereas others from the same patient are negative, which is why I am using the biopsy (and not patient) as my unit of analysis. A single patient will have approximately 5-20 biopsy data points (or rows) in my excel sheet, which I think makes my data not independent or at least highly clustered. My outcome is binary (positive/negative).

Therefore, since I have 3000 biopsies from 150-600 patients, and my unit of analysis is biopsy, how can I model this data without violating the independent requirement of regression analysis?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Anthony
  • 33
  • 4

1 Answers1

3

Generalized linear mixed models can be used. Classical regression model is the "fixed effect" part. And the "random effect" will capture the patient specific patterns.

Here is one good tutorial with examples.

https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-generalized-linear-mixed-models/

In the above link. There are $8525$ patients, and $407$ doctors. The regression target is a $8525 \times 1$ vector. The term $\mathbf{X}\beta$ is the classical regression model that assume each sample are independent. And the term $\mathbf{Z}\mu$ gives the "clusters assignments", i.e., assign patients to doctors.

$$ \overbrace{\mathbf{y}}^{\mbox{8525 x 1}} \quad = \quad \overbrace{\underbrace{\mathbf{X}}_{\mbox{8525 x 6}} \quad \underbrace{\boldsymbol{\beta}}_{\mbox{6 x 1}}}^{\mbox{8525 x 1}} \quad + \quad \overbrace{\underbrace{\mathbf{Z}}_{\mbox{8525 x 407}} \quad \underbrace{\boldsymbol{u}}_{\mbox{407 x 1}}}^{\mbox{8525 x 1}} \quad + \quad \overbrace{\boldsymbol{\varepsilon}}^{\mbox{8525 x 1}} $$

Related question:

When to use mixed effect model?

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • Dirichlet Regression: https://www.cambridge.org/core/journals/political-analysis/article/modeling-contextdependent-latent-effect-heterogeneity/B7B0AF067DF97A1A8F0B50646EF64F24 – Diogo Feb 28 '20 at 07:30