I have a dataset with the following characteristics and I can’t seem to wrap my head around it. “Three st.dev.s include 99.7% of the data” is what I tell myself, but that seems to be inaccurately worded.
Observations: 2246
Mean: 39
St.dev.: 3
Min: 34
Max: 46
Mean - 3*sd: 30
Mean + 3*sd: 48
This tells me that 99.7% of the data lie within 30 and 48, but a 100% of the data lie within 34 and 46 and that doesn’t make sense. Does it just mean my sample is not representative of the total population? I mean, obviously, it isn't, but let's assume I don't know that humans younger than 34 and older than 46 exist. By the way, this is from the variable age
from the Stata sample dataset nlsw88.dta
.
I have looked at this question, but it doesn't help me untie my brain knot, either. ht place to ask.
EDIT: Just realized those are many questions. Please consider the header question the one that needs an answer. The rest is pretty much just my messed up thought process unfurling.