Cluster sampling is a sampling design in which the observation units have to be grouped together for logistical reasons (e.g., students clustered in schools or households clustered in a geographic area). Typically, cluster samples are multistage samples, so geographic areas are selected in the first stage and households in the subsequent stage.
Questions tagged [cluster-sample]
126 questions
14
votes
3 answers
Fitting multilevel models to complex survey data in R
I'm looking for advice on how to analyze complex survey data with multilevel models in R. I've used the survey package to weight for unequal probabilities of selection in one-level models, but this package does not have functions for multilevel…

Eric Green
- 629
- 9
- 20
7
votes
5 answers
Are the differences between sampling clusters and sampling strata, conceptual, methodological, neither or both?
I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and in particular, seem to be driven by homogeneity…

Alexis
- 26,219
- 5
- 78
- 131
7
votes
2 answers
Cluster Boostrap with Unequally Sized Clusters
I need to perform a bootstrap for variance estimation on a GEE model for clustered data that I am analyzing. I understand that I need to use a clustered bootstrap for this, which is pretty much the same thing as the usual nonparametric bootstrap,…

StatsStudent
- 10,205
- 4
- 37
- 68
6
votes
2 answers
Agreement in clustered sample data
I have analyzed several data curves from a group of patients (16 curves per patient) with different analysis methods and want to test for the agreement of the methods.
So far, I have neglected the potential correlation within the patients and was…

user30248
- 61
- 1
6
votes
3 answers
Why are samples within a cluster less informative than randomly chosen ones from entire population?
Please give me mathematical explanation if possible. And also in the book Kothari 2004, it says:
There is also not as much information in ‘n’ observations within a cluster as there happens to be in ‘n’ randomly drawn observations.
Can you also…

spartacus
- 181
- 3
6
votes
2 answers
Simple way to cluster histograms
I'm trying to cluster set of histograms. The histograms represent the frequencies of the distribution for a numbers from 1 to 5. The following figure shows two samples of my data.
I have 10,000 histograms with fixed number of bins (5) and I'm…

Omar14
- 399
- 1
- 5
- 11
5
votes
1 answer
What regression model to use for both repeated measurements and cluster data (and how to do it in R)?
My research question involves looking at association between the characteristics of neighborhoods (% male, % female, income, % young people, % old people) and the participation rate in a programme (% -continuous).
The participation rate for each…

Ngan_Tran
- 53
- 3
5
votes
1 answer
Does clustering lead to overdispersion?
TL;DR
Clustering is often cited as a source of overdispersion in count data. However, I seem to arrive at the conclusion that clustering actually reduces the dispersion.
Could someone confirm this or show me where I am wrong?
Model
Here's an…
user97654
5
votes
1 answer
Survival analysis for patients that have been subjected to multiple treatments
I have data for patients that were subjected to either one treatment or multiple treatments at various points in time and I need to analyse their survival times after treatments. This of course means that some patients appears only once in the…

sztal
- 1,009
- 1
- 9
- 14
4
votes
1 answer
Power calculation for cluster-level analysis in cluster randomized trials
I would like to solve for $\pi_1$ in equation 7.14 of Hayes and Moulton's Cluster Randomized Trials. I can't for the life of me remember how to do so. Here is a link to the equation.
$$
c =…

eg_23611
- 39
- 1
4
votes
3 answers
Regression model with aggregated targets
Similar as in this self-answered question, I want to ask about possible approaches for modelling data with aggregated targets, i.e. things like
$$
\bar y_{j[i]} = \alpha + \beta x_i + \varepsilon_i
$$
where $j[i]$ is the $j$-th group, where $i$-th…

Tim
- 108,699
- 20
- 212
- 390
3
votes
1 answer
Defining PSU in "Sampled with Replacement" Cluster Samples
I am trying to decide on an approach to estimate design effect for a multi stage cluster survey. The clusters were selected with probability proportional to size sampling WITH replacement. The primary sampling units (districts) are large enough…

david rae
- 145
- 6
3
votes
1 answer
How can I model clustered data in a regression?
I have a dataset of $N=3000$ biopsies from humans, each of which have an outcome I am trying to examine using covariates of the patient who provided the biopsy. Some biopsies from the same patient are positive whereas others from the same patient…

Anthony
- 33
- 4
3
votes
1 answer
Standard Error For Cluster Sampling
A problem I've found and been thinking about for a while but not sure I can articulate properly. Any help appreciated.
A
survey
is being
planned
with
the
goal
of
interviewing
$n$ people
in
some
number
$J$
of
clusters.
For…

Gosset's Student
- 348
- 2
- 11
3
votes
1 answer
Analysis of a cluster crossover design
I am analyzing data from a cluster randomized cross over trial. There are 9 clusters and 3 periods in the data. The clusters are dental clinics and in each period different patients are sampled. I have three interventions (control, treatment1,…

Jonathan Marin
- 31
- 3