Is the use of "utf8=✓" preferable to "utf8=true"?

514

209

I have recently seen a few URIs containing the query parameter "utf8=✓". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding.

So, is this a better way to resolve potential problems with character encoding, or is it just a developer having fun with a hack?

Gary Rowe

Posted 2012-10-13T11:57:01.900

Reputation: 20 024

7

I disagree. There are schemes out there that look like URNs and that take query parameters - such as Bitcoin. URIs are not confined to browsers. See http://en.wikipedia.org/wiki/URI_scheme. This question may also address the general case where character encoding is required when a browser accesses a protocol handler.

Gary Rowe 2012-10-19T08:29:12.983

3Give examples of these URLs or didn't happen.hakre 2012-10-22T12:59:07.020

10Off topic, but OK. Here's my personal donation Bitcoin URI: bitcoin:1KzTSfqjF2iKCduwz59nv2uqh1W2JsTxZH?amount=0.5&label=Agile%20Stack. Notice that the scheme is essentially a URN with query parameters, but it hands off to a protocol handler. This kind of URI could probably benefit from the “utf8=✓” workaround as well.Gary Rowe 2012-10-22T17:47:16.857

Answers

761

By default, older versions of IE (<=8) will submit form data in Latin-1 encoding if possible. By including a character that can't be expressed in Latin-1, IE is forced to use UTF-8 encoding for its form submissions, which simplifies various backend processes, for example database persistence.

If the parameter was instead utf8=true then this wouldn't trigger the UTF-8 encoding in these browsers.

Gareth

Posted 2012-10-13T11:57:01.900

Reputation: 4 397

81I didn't know that, really interesting!Florian Margaine 2012-10-13T12:50:23.407

20Ah, so it's an IE workaround... nice. Saves a lot of validation too.Gary Rowe 2012-10-13T13:03:56.433

1I can't quite see how it would save you from having to validate input, considering that you always have to consider malicious input for public interfaces.Lars Viklund 2012-10-13T13:07:59.657

@LarsViklund You have to secured against malicious input, sure. But this is, to a large degree, independent of character encoding weirdness.delnan 2012-10-13T13:47:08.413

8@LarsViklund I should have been clearer with my comment. I meant that the validation associated with character encoding is simplified, not bypassed.Gary Rowe 2012-10-13T13:48:54.977

3@Lars Correct, it doesn't absolve you from having to check your input. But it does mean that encoding tweaks only become part of your security handling and don't taint the concept of your "standard processing" pathGareth 2012-10-14T10:08:18.443

33

Also see http://stackoverflow.com/questions/3222013/what-is-the-snowman-param-in-rails-3-forms-for/3348524#3348524. Apparently Ruby on Rails used to use a snowman character, and was changed to a checkmark which was less ambiguous but less funny.

Jack V. 2012-10-17T10:06:03.790

3Always the deprecated Internet Explorer making developers' lives harder...RobinJ 2012-10-18T15:39:02.310

Amusingly one of the first versions of this used the UTF-8 snowman ☃ to force the conversion.tadman 2012-10-18T17:07:46.943

Well, &#10003;!Matt 2012-10-18T17:33:16.197

How does the browser/parser know to evaluate this to true? Does anything other than false or 0 auto evaluate to true, or are there specific values which map to true and others to false?JohnLBevan 2012-10-18T19:21:50.947

8@JohnLBevan it's ignored by the receiving end, it's done it's job to force the browser to send things in utf8 instead of latin1. I've also seen it as ie= (that's the 'pile of poo' code point, looks like it's not rendering in comments.)cabbey 2012-10-18T19:54:13.413

Ahh sorry, just reread the question & spotted that this is in the URI; not the html/xml. So presumably putting utf8=false would be meaningless - the parameter's only purpose is to act as a hack for ie. Thanks @cabbey.JohnLBevan 2012-10-18T20:08:55.227

2Thats just so ✓cubsink 2012-10-18T20:29:46.057

3@Gareth: Can you back-up the statement that IE <= 8 forms do not support the document and/or form encoding?hakre 2012-10-22T13:00:19.313

2By default, older versions of IE (<=8)... strikes again!Andrew 2013-01-09T08:03:32.733

That makes sense - instead of having to handle non-UTF8 you can just bomb out with a "Outdated browser not supported; please upgrade" error.Demi 2013-12-18T05:09:57.577