This update is based on your comments below, which completely change how I read your question.
A Pearson's correlation is for paired observations. If you make S2 a 50% random sample of S1, and make S5 the first half, in order, of S1 (so they are the same length) the correlation between S2 and S5 will be very close to zero, as you have completely clobbered the order. The correlation will also be very close to zero if you make S2 every other element of S1.
Here is the output of some R code demonstrating this.
# This makes the results repeatable (use the same seed)
> set.seed(1188)
# Choose a random, normally distributed sample (default mean and SD are 0 and 1)
> S1 <- rnorm(1000)
# Take every other element in S1
> S2 <- remove[S1(TRUE, FALSE)]
# Take the first 500 elements of S1
> S5 <- S1[1:500]
# Show the first few values in S1 and S2
> head(S2)
[1] -0.5583091 0.2582470 -0.6253171 1.2863448
[5] -0.7943670 -1.0510371
> head(S5)
[1] -0.5583091 1.2792432 0.2582470 -1.4063328
[5] -0.6253171 -0.3928849
# Perform a Pearson correlation
> rcorr(S2,S5, type="pearson")
x y
x 1.00 0.05
y 0.05 1.00
n= 500
P x y
x 0.2785
y 0.2785
# The correlation is 0.05, very close to zero, and the p-value of the correlation
# test is 0.2785, which is > 0.05, which means we can't conclude that the correlation
# of 0.04 is actually different from 0.
My original answer is below.
I didn't read the "previous question" you refer to. I will read "correlation" in a way that makes sense given the question - meaning how similar are the three sets, assuming the order of the numbers is not important. (Pearson's correlation is defined for paired observations, which is not what you have, as the set are different lenghts.)
Assuming true random number generation:
Question 1: S2 will be 1/2 the size of S1, and S3 will be 1/3 the size of S1. There will be overlaps between the values chosen for S2 and S3 (every 6th element in S1).
Question 2 and 3: Assuming S2 and S3 will be 1/2 and 1/3 of S1 -- If you take S3 from [S1 - S2], S3 won't have any of the exact values S2 does (assuming random real numbers, with infinite fractional digits). There won't be any overlap of the variables. (At 8 decimal points it's "possible" two values in S1 could be the same, and one could end up in S1, and one in S2). With replacement, some of the same values will be chosen for S2 and S3 -- on average, 1/6 sixth of the numbers from S1 will be found in both S2 and S3 (1/2 of 1/3).
If you want exactly 1/6 of the numbers to be chosen for both S2 and S3 (every time), use the first method, picking every other number, then every third (with replacement - assumed). Using a systematic way of choosing the numbers, which is not based on the value of the numbers in any of the sets, will not effect the randomness of the sets.
These are the criteria I replied to:
S1 = (u0,u1,u2,u3,u4,u5,...un)
S2 = (u0,u2,u4,u6,...)(every second element)
S3 = (u0,u3,u6,u9,...)(every third element)
- Is there any assumption that makes S1 random that will be broken if it is divided in this way? Something like introducing correlation
between S2 and S3, or making either more autocorrelated.
- Considering two new sequences S4 and S5 are generated as random samples (with replacement) from S1, does any of the answered in 1
holds?
- What if S4 and S5 were random samples without replacement? Meaning that they are disjoint.