1

A colleague and I had a conversation about whether the following variables are categorical or quantitative.

1) Social security numbers

2) Phone numbers

3) Postal zip codes

We agreed that all three are in fact categorical, but couldn't agree on a good reason.

The definition of a categorical variable (at least here In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, thus assigning each individual to a particular group or "category."

However, it seems a somewhat weak case can also be made that the variables are discrete valued random variables. I need one good reason to convince students of why these variables are not quantitative. Any ideas?

Greenparker
  • 14,131
  • 3
  • 36
  • 80
  • 2
    In a famous commentary on Stevens, [Frederic M. Lord (1953)](https://scholar.google.com/scholar?q=on+the+statistical+treatment+of+football+numbers) gives an example in which a statistical analysis treats football jersey numbers *very effectively* as being quantitative (even though these are archetypically categorical). One of his points--and one echoed over many decades by John Tukey--is that many data are not *inherently* "categorical" or "quantitative." These concepts may be useful for training novices, but they are misleading or worse as guides to statistical analysis. – whuber Mar 22 '16 at 19:15
  • Nick Cox's thoughtful answer to a closely related question at http://stats.stackexchange.com/questions/67551/calculate-mean-of-ordinal-variable may be relevant to your conversation. – whuber Mar 23 '16 at 15:38

1 Answers1

1

Here's a simple test. If you 'add' two of the variables is that another 'value' of the variable. If we have income, the sum of any two incomes is another possible income. However what sense does (zipcode1) + (zipcode2) have. Ditto for ssn's and phone numbers. The bottom line is that one can make algebraic sense of numerical variables and that one can't make algebraic sense of categorical variables.

meh
  • 1,902
  • 13
  • 18
  • 1
    Counter example: take area code 312 (Chicago) and 203 (Connecticut). Add them, and you get 515 (Des Moines). Does that make area code a quantitative variable? – daOnlyBG Mar 22 '16 at 18:57
  • 1
    No, but Chicago+Connecticut = Des Moines, makes no sense. Hence, the addition here makes no sense. @aginensky, I think both yours and daOnlyBG's answer is acceptable, and probably has similar points. – Greenparker Mar 22 '16 at 19:11
  • 2
    According to this test, then, percentages of a whole cannot be "numerical" because (say) 50% + 60% = 110% is not possible. I doubt you really meant to imply that, which suggests you might want to qualify your simple test. If you do, ponder a little on other common forms of data such as angles, intensities of image pixels, or geographic coordinates. – whuber Mar 22 '16 at 19:38
  • @whuber, no but .5 +.6 = 1.1 makes sense. – meh Mar 23 '16 at 15:26
  • @ Greenparker @whuber It's also possible that the sum of two ssn's could be another ssn, and for the same money red +yellow = orange. However more precisely it is still true that categorical variables don't have algebraic operations. In particular the sum of two area codes isn't always another area code, so area codes are not numerical . For me the principle would be - is there at least a semi-group operation. – meh Mar 23 '16 at 15:31
  • 2
    That's a fine mathematical principle--but it isn't useful for data analysis, because it arbitrarily precludes effective solutions. Indeed, your immediate two comments seem contradictory: yes, $0.5+0.6=1.1$ makes *mathematical* sense, but it has no *meaning* in the context I posited. Likewise, summing two SSNs (interpreted as nine-digit base-10 representations of integers) makes *mathematical* sense but it has no *meaning* in most conceivable contexts. – whuber Mar 23 '16 at 15:35