How do I work out an adequate/representative sample in the following scenario?
My intention is to use correlation and regression models to test the relationship between income level and money spent on a particular product by store type.
Scenario
There are 200 stores, with a total of 10,000 registered customers. The stores can be grouped in terms of their floor space as large (100 stores), medium (70 stores) and small (30 stores).
The large stores account for 60% of the registered customers, the medium stores account for 30% of the customers and the small stores account for 10% of the customers.
Sampling Method
Using the cluster sampling technique, I have randomly chosen 2 large, 2 medium and 2 small stores. (This is on the assumption that the stores are similiar as per their group size.)
Then, using stratified random sampling, I chose 50 customers from each of the six stores such that I have 100 large store customers, 100 medium store customers and 100 small store customers. (This was to ensure gender and age balance.)
My final sample is then 300 customers.
Is this a representative sample (given the scenario)?
Or, do I have to use some sort of weighting to ensure the final sample reflects:
- the store size i.e. 100 are large stores, 70 are medium stores and 30 are small stores.
- customer distribution i.e. 60% of the customers are from large stores, 30% are from medium stores and 10% are from small stores)
Here a method I propose to use. It is Probability Proportionate to Size (PPS) Sampling Method. Please see my comment below.