To simulate a PCA, start with the desired result and work backwards: add some random error in the orthogonal directions and rotate that randomly.
The following example stipulates a two-dimensional result (i.e., two principal components) with two "blobs"; it is readily extended to more blobs. (More dimensions would take a bit more work in modifying sigma
to accommodate higher-dimensional covariance matrices.)
Let's start with a random rotation matrix.
set.seed(17)
p <- 5 # dimensions
rot <- qr.Q(qr(matrix(rnorm(p^2), p)))
We generate the blobs separately from different multivariate normal distributions. The parameters (their means and covariance matrices) are buried in the mvrnorm
arguments. To make it easy and reliable to specify the shape of such a distribution, we create a small function sigma
to convert the angle of the principal axis and the two variances into a covariance matrix.
sigma <- function(theta=0, lambda=c(1,1)) {
cos.t <- cos(theta); sin.t <- sin(theta)
a <- matrix(c(cos.t, sin.t, -sin.t, cos.t), ncol=2)
t(a) %*% diag(lambda) %*% a
}
library(MASS)
n1 <- 50 # First group population
n2 <- 75 # Second group population
x <- rbind(mvrnorm(n1, c(-2,-1), sigma(0, c(1/2,1))),
mvrnorm(n2, c(0,1), sigma(pi/3, c(1, 1/3))))
Adjoin the orthogonal error and rotate:
eps <- 0.25 # Error SD should be small compared to the SDs for the blobs
x <- cbind(x, matrix(rnorm(dim(x)[1]*(p-2), sd=eps), ncol=p-2))
y <- x %*% rot
That's the simulated dataset. To check it, apply PCA:
fit <- prcomp(y) # PCA
summary(fit) # Brief summary showing two principal components
par(mfrow=c(2,2)) # Prepare to plot
plot(fit$x[, 1:2], asp=1) # Display the first two components $
plot(x[, 1:2], asp=1); # Display the original data for comparison
points(x[1:n1,1:2], col="Blue") #...distinguish one of the blobs
screeplot(fit) # The usual screeplot, supporting the summary
zapsmall(rot %*% fit$rotation, digits=2) # Compare the fitted and simulated rotations.

The final check produces a matrix whose block form (an upper 2 by 2 block and lower 3 by 3 block with near-zeros elsewhere) confirms the accuracy of the PCA estimates:
PC1 PC2 PC3 PC4 PC5
[1,] 0.58 0.81 0.05 0.00 -0.01
[2,] 0.81 -0.58 0.00 0.00 0.00
[3,] -0.01 -0.01 0.23 -0.62 -0.75
[4,] 0.03 0.03 -0.96 -0.28 -0.06
[5,] 0.00 0.00 0.18 -0.73 0.66
(The upper 2 by 2 block, suitably scaled by the first two eigenvalues, approximately describes the relationship between the point in the "fit" plot and those in the "original components" plot. This one looks like a rotation and a reflection.)