As a math guy trying to understand Principal Component Analysis from the standpoint of Linear Algebra, I am following along with this paper I found. As I read it, I want to understand the statistics vocabulary involved, and the hierarchy/relationship of terms.
- Population
- Variable
- Experiment
- Sample
- Observation
- Outcome
I'd like to lay out my understanding of these words and their relationships, ask for corrections, and pose some clarifying questions:
A population is the set of all possible outcomes of an
experiment.A sample is a subset of the population, and is therefore a set of outcomes
- A random variable is the mapping of a population of all possible real life outcomes of an experiment to some numerical value.
This leaves a couple questions:
- What is an observation?
If I draw blood from $n$ people, and each draw yields $m$ data points (such as platelet count, plasma levels, blood alcohol content) what is each draw of blood called? It seems like it should be a sample, but I thought samples were directly related to outcomes of single random variables. In this case a sample of blood contains many variables. Is this where the word observation comes in?
Is the experiment the act of drawing blood $n$ times? Or is it the act of observing the variables within each sample (medically speaking) of blood?