15

When I calculate entropy for the xkcd Password Strength (comic 936) I don't get nearly the amount of entropy stated in the comic.

So why doesn't the the first password "Tr0ub4dor&3" have an entropy of around 50 bits? And why doesn't the passphrase sentence "correcthorsebatterystaple" represent over 100 bits of entropy?

Maarten Bodewes
  • 88,868
  • 12
  • 146
  • 304
Blafasel
  • 163
  • 1
  • 5
  • [this](https://en.wikipedia.org/wiki/Password_strength#Entropy_as_a_measure_of_password_strength) article might be a good place to begin – hunter Sep 24 '18 at 12:46
  • I've reformatted your question, but note that "I don't get nearly the amount" assumes that you calculate *less* entropy rather than *more* entropy. Please fix that if that isn't the case and your entropy calculation seems to overshoot. (2) Please try and format any followup question to the best of your abilities and include enough info so that it doesn't rely on external resources (even if that external resource is xkcd, which presumably will survive another 100 years or so). – Maarten Bodewes Sep 24 '18 at 13:21
  • This comic was discussed on security.stackexchange.com; in particular, [this answer](https://security.stackexchange.com/a/6096/655) contains a detailed analysis of the entropy calculations. – Thomas Pornin Sep 24 '18 at 13:52
  • 4
    Now I am suddenly terribly curious whether Munroe *actually* flipped coins or rolled dice to get "correct horse battery staple." Our entire password security system could be flawed! – Cort Ammon Sep 24 '18 at 16:30
  • 2
    Of course, due to that widely publicized comic the _actual_ password strength of "correcthorsebatterystaple" is now at most about 4 bits, because if you hypothetically imagine someone saying "okay, let's try brute force to see if it is one of these $2^4$ passwords" it is eminently plausible that the list that follows might include "correcthorsebatterystaple". – hmakholm left over Monica Sep 24 '18 at 21:15
  • Why do you think the entropies should be ~50 and >100, respectively? – marcelm Sep 24 '18 at 21:29
  • @marcelm For the first, i tried many different situations, for example, Splitting the word troubadou in trou (French+leet) bad (englisch+leet) ou (French with and without leet) and brute force the last 2 with 92 possible characters or just 11 characters with 72 possibilites (62 alphanummeric case sensitiv + 10 most common sepcial characters) and so on. Depending on which way i have done it got a entropy of 40-70bits...so this was my ~50. The sentence, a big dictionary or just a dumb character Count + case insensitive Alphabet and a few other variations and i had always >100bits – Blafasel Sep 25 '18 at 06:22
  • @Blafasel Could you add your reasoning to the question itself? All relevant information should be in the question, and it is important that would-be answerers know your reasoning :) – marcelm Sep 25 '18 at 16:21

2 Answers2

23

I don't get nearly the amount of entropy stated in the comic.

Interestingly enough the reasoning for the entropy rating are actually justified in the comic by the little boxes which each represent 1 bit of uncertainty.

This means for Tr0ub4dor&3

  • It's estimatated that the word itself "Troubador" comes up in dictionaries which contain about $2^{16}$ words
  • It adds one bit for each of o,a,o of the word to encode whether the letter was replaced or not
  • It adds one bit to decide whether the word was capitalized or not
  • It adds one bit for the ordering of the trailing numeral and special character
  • It adds 3 bits for the unknown numeral, approximating $10$ with $2^3$ instead of $2^4$ which is more accurate
  • It adds 4 bits for the unknown punctuation, ie which of the approximately 16 standard ones it is

This sums up to $16+3+1+1+3+4=28$

For correct horse battery staple the reasoning is that each of the four words is drawn from a dictionary of size $2^{11}$ which means $4\times 11=44$ bits of entropy.

In both cases it can be assumed that the attacker knows the possible choices influencing the entropy estimation and that it's actually a uniformly random decision which word / pick is done.


If you want an even more thorough explanation of this comic, I can only recommend you read the bear's answer on this over on InfoSec.SE.

SEJPM
  • 45,265
  • 7
  • 94
  • 199
  • 2
    Also not explicit in the comic: How difficult is it to remember each part of the password? (Versus how much entropy that thing to remember adds.) The "troubador" part is pretty easy and gets you more entropy than other parts. That's 16 bits. The capitalization of the first letter is one more thing to memorize. The 1337-ness of the third is another. Then the fourth is one more. Than the sixth is another. Then wait, was it "or" or "our"? That's a lot to remember for... less than 4 extra bits of entropy. And that makes you way more likely to need a reset or get locked out. Not exactly economical. – Future Security Sep 24 '18 at 14:22
  • 1
    "approximating $10$ with $2^3$ instead of $2^4$ which is more accurate" ... 10 seems to be nearer to 8 than to 16, isn't it? – Paŭlo Ebermann Sep 24 '18 at 17:43
  • @FutureSecurity I thought that was the entire point of the comic, that all these "password must contain at least one of each of these sets of characters" security measures make passwords hard to remember but not much more secure. – Kamil Drakari Sep 24 '18 at 17:44
  • 1
    @PaŭloEbermann indeed which is why $2^3$ is better – SEJPM Sep 24 '18 at 17:44
  • 4
    Ah, I misread your text as saying $2^4$ would be more accurate. – Paŭlo Ebermann Sep 24 '18 at 17:46
  • 5
    @KamilDrakari That is indeed the point of the comic. However the overwhelming majority of internet commenters whenever this comic comes up seem to "understand" a different point. A good number believe the 1337speak makes a password one bajillion times stronger. Others say "Ah, but the second method use dictionary words, which are weak. You should try "k0rr3ct ǝsɹoɥ b4TT3ry [Zszywka](https://pl.wikipedia.org/wiki/Zszywka) 1235". – Future Security Sep 24 '18 at 18:18
  • 2
    @PaŭloEbermann It's also a good idea to make a pessimistic estimate of entropy, too. Plus 3 bits might be too generous, since '1' and '2' are probably a lot more common than the rest. – Future Security Sep 24 '18 at 18:24
  • @FutureSecurity If we assume Benford distribution (AKA log-uniform, AKA first-digit law), we get entropy of 2.876 bits, or 1.99 nats. So yes, "3 bits" is 0.124 bits too generous :P – John Dvorak Sep 25 '18 at 16:18
2

One official way to estimate the strength of a user selected password such as "Tr0ub4dor&3" is to look at NIST recommendations. Granted that this is now deprecated, but the relevant publication was NIST Special Publication 800-63 Version 1.0.2, Electronic Authentication Guideline.

Table A.1 (reproduced below in case of link rot):-

Table A.1

The reasoning behind this table is within the document at $\S$ A.2.1 Guessing Entropy Estimate. NIST therefor estimates that the entropy is 33 bits if we interpolate for 11 characters and use dictionary and composition rules.

The difficulty of assessing the entropy of short sequences, particularly human produced ones is the take away from this question. The two current answers diverge in strength by a factor of 32. If we compare NIST's estimate to Blafasel's original query on 50 bits, the entropy diverges 131,072 times. NIST says of the above, "Readers are cautioned against interpreting the following rules as anything more than a very rough rule of thumb method". True.

Another take away is that very few sites will allow the stronger and easier to remember technique of choice from a word list, such as "correcthorsebatterystaple". The on-line version of the UK government doesn't, no bank I'm aware of does, and stackexchange.com doesn't.

Paul Uszak
  • 14,496
  • 2
  • 23
  • 69