I was recently sorting data in Google Sheets that involved the word "hello" in various languages. An excerpt of this sheet is below. Column C was calculated using the =UNICODE(LEFT(B2,1)) function transposed down the C column. The top-left of the table is cell A1.
| Language | Hello | Unicode Value of First Character |
|---|---|---|
| English | hello | 104 |
| Armenian | Բարեւ Ձեզ | 1,330 |
| Amharic | ሰላም | 4,656 |
| Bengali | হ্যালো | 2,489 |
| Korean | 여보세요 | 50,668 |
| Japanese | こんにちは | 12,371 |
| Chinese | 你好 | 20,320 |
I then sorted the sheet from A to Z, using the Sort sheet by Column B (A to Z) menu option in the Data tab. However, the result was the table above. This wasn't what I was expecting, because the first character of cell B6 (여) has a Unicode hex value of U+C5EC, which is greater than the first character of cell B7 (こ, U+3053). Excel Online's attempt to sort the same table from A to Z was the following:
| Language | Hello | Unicode Value of First Character |
|---|---|---|
| English | hello | 104 |
| Armenian | Բարեւ Ձեզ | 1,330 |
| Japanese | こんにちは | 12,371 |
| Bengali | হ্যালো | 2,489 |
| Amharic | ሰላም | 4,656 |
| Korean | 여보세요 | 50,668 |
| Chinese | 你好 | 20,320 |
I previously believed that Google Sheets' and Excel's alphabetical ordering involved the Unicode codepoint of each character, but this appears to be incorrect. Here are other related notable findings:
- Characters that look similar appear to be grouped together, such as both
£ U+00A3 POUND SIGNand¥ U+00A5 YEN SIGNcoming beforeA U+0041 LATIN CAPITAL LETTER A. - Some CJK characters, like
㈮ U+322E PARENTHESIZED IDEOGRAPH METAL, are alphabetically before before almost every other character, including most Latin characters. - Many look-alike characters are alphabetically adjacent to their Basic Latin counterparts, like
Å U+212B ANGSTROM SIGNandA U+0041 LATIN CAPITAL LETTER Aboth being beforeB, but not all look-alikes, such asΑ U+0391 GREEK CAPITAL LETTER ALPHAcoming afterZ.
A link with all of these aforementioned Google Sheets tests is available here.
I personally can't find much reason to this ordering. How do Google Sheets and Excel alphabetize text?