Questions tagged [cluster-sample]

Cluster sampling is a sampling design in which the observation units have to be grouped together for logistical reasons (e.g., students clustered in schools or households clustered in a geographic area). Typically, cluster samples are multistage samples, so geographic areas are selected in the first stage and households in the subsequent stage.

126 questions
14
votes
3 answers

Fitting multilevel models to complex survey data in R

I'm looking for advice on how to analyze complex survey data with multilevel models in R. I've used the survey package to weight for unequal probabilities of selection in one-level models, but this package does not have functions for multilevel…
Eric Green
  • 629
  • 9
  • 20
7
votes
5 answers

Are the differences between sampling clusters and sampling strata, conceptual, methodological, neither or both?

I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and in particular, seem to be driven by homogeneity…
Alexis
  • 26,219
  • 5
  • 78
  • 131
7
votes
2 answers

Cluster Boostrap with Unequally Sized Clusters

I need to perform a bootstrap for variance estimation on a GEE model for clustered data that I am analyzing. I understand that I need to use a clustered bootstrap for this, which is pretty much the same thing as the usual nonparametric bootstrap,…
6
votes
2 answers

Agreement in clustered sample data

I have analyzed several data curves from a group of patients (16 curves per patient) with different analysis methods and want to test for the agreement of the methods. So far, I have neglected the potential correlation within the patients and was…
6
votes
3 answers

Why are samples within a cluster less informative than randomly chosen ones from entire population?

Please give me mathematical explanation if possible. And also in the book Kothari 2004, it says: There is also not as much information in ‘n’ observations within a cluster as there happens to be in ‘n’ randomly drawn observations. Can you also…
spartacus
  • 181
  • 3
6
votes
2 answers

Simple way to cluster histograms

I'm trying to cluster set of histograms. The histograms represent the frequencies of the distribution for a numbers from 1 to 5. The following figure shows two samples of my data. I have 10,000 histograms with fixed number of bins (5) and I'm…
5
votes
1 answer

What regression model to use for both repeated measurements and cluster data (and how to do it in R)?

My research question involves looking at association between the characteristics of neighborhoods (% male, % female, income, % young people, % old people) and the participation rate in a programme (% -continuous). The participation rate for each…
Ngan_Tran
  • 53
  • 3
5
votes
1 answer

Does clustering lead to overdispersion?

TL;DR Clustering is often cited as a source of overdispersion in count data. However, I seem to arrive at the conclusion that clustering actually reduces the dispersion. Could someone confirm this or show me where I am wrong? Model Here's an…
5
votes
1 answer

Survival analysis for patients that have been subjected to multiple treatments

I have data for patients that were subjected to either one treatment or multiple treatments at various points in time and I need to analyse their survival times after treatments. This of course means that some patients appears only once in the…
sztal
  • 1,009
  • 1
  • 9
  • 14
4
votes
1 answer

Power calculation for cluster-level analysis in cluster randomized trials

I would like to solve for $\pi_1$ in equation 7.14 of Hayes and Moulton's Cluster Randomized Trials. I can't for the life of me remember how to do so. Here is a link to the equation. $$ c =…
4
votes
3 answers

Regression model with aggregated targets

Similar as in this self-answered question, I want to ask about possible approaches for modelling data with aggregated targets, i.e. things like $$ \bar y_{j[i]} = \alpha + \beta x_i + \varepsilon_i $$ where $j[i]$ is the $j$-th group, where $i$-th…
Tim
  • 108,699
  • 20
  • 212
  • 390
3
votes
1 answer

Defining PSU in "Sampled with Replacement" Cluster Samples

I am trying to decide on an approach to estimate design effect for a multi stage cluster survey. The clusters were selected with probability proportional to size sampling WITH replacement. The primary sampling units (districts) are large enough…
david rae
  • 145
  • 6
3
votes
1 answer

How can I model clustered data in a regression?

I have a dataset of $N=3000$ biopsies from humans, each of which have an outcome I am trying to examine using covariates of the patient who provided the biopsy. Some biopsies from the same patient are positive whereas others from the same patient…
3
votes
1 answer

Standard Error For Cluster Sampling

A problem I've found and been thinking about for a while but not sure I can articulate properly. Any help appreciated. A survey is being planned with the goal of interviewing $n$ people in some number $J$ of clusters. For…
3
votes
1 answer

Analysis of a cluster crossover design

I am analyzing data from a cluster randomized cross over trial. There are 9 clusters and 3 periods in the data. The clusters are dental clinics and in each period different patients are sampled. I have three interventions (control, treatment1,…
1
2 3
8 9