1

I need guidance on the exact distinction, if any, between LC_COLLATE='C' and LC_COLLATE='C.UTF-8' in PostgreSQL. It seems to me that they behave exactly the same and that the codeset (with bearing on character classification for character-case conversion, etc.) is picked up from LC_CTYPE alone.

Here is an experiment with PostgreSQL 9.6 that seems to support this assessment ('Ä' is the upper-case version of the German Umlaut 'ä'):

SHOW LC_COLLATE; -- 'en_US.UTF-8': cluster default
SHOW LC_CTYPE; -- 'en_US.UTF-8': cluster default

CREATE DATABASE test WITH TEMPLATE='template0' ENCODING='UTF8' 
  LC_COLLATE='C.UTF-8' LC_CTYPE='C';
\c test
SELECT upper('ä'); -- 'ä': codeset 'UTF-8' apparently not picked up from LC_COLLATE
\c postres
DROP DATABASE test;

CREATE DATABASE test WITH TEMPLATE='template0' ENCODING='UTF8' 
  LC_COLLATE='C' LC_CTYPE='C.UTF-8';
\c test
SELECT upper('ä'); -- 'Ä': codeset 'UTF-8' apparently picked up from LC_CTYPE
\c postres
DROP DATABASE test;

CREATE DATABASE test WITH TEMPLATE='template0' ENCODING='UTF8' 
  LC_COLLATE='C.UTF-8';
SELECT datcollate, datctype FROM pg_database 
  WHERE datname='test'; -- 'C.UTF-8', 'en_US.UTF-8'
\c test
SELECT upper('ä'); -- 'Ä': codeset 'UTF-8' apparently picked up from cluster default `LC_CTYPE`
\c postres
DROP DATABASE test;

So it would seem to me that LC_COLLATE='C' and LC_COLLATE='C.UTF-8' are synonymous. The rationale could be that LC_COLLATE determines string sort order, which does not depend on any codeset in the case of C. Am I correct or is there a ready counter-example?

rookie099
  • 258
  • 1
  • 2
  • 8
  • `locale -a | grep "^C"`, on my system, shows `C` and `C.utf8`. There does not seem to be a `C.UTF-8`? or is this just a configuration on my system? – Luuk Jan 02 '20 at 18:22
  • 1
    oh sorry, i should have Googled first to find [Differences between en_US.utf8 and en_US.UTF-8?](https://superuser.com/questions/999133/differences-between-en-us-utf8-and-en-us-utf-8) – Luuk Jan 02 '20 at 18:29
  • You should add a semi-colon after `DROP DATABASE test`, than the DROP DATABASE starts working, – Luuk Jan 02 '20 at 18:45
  • @Luuk Thx for spotting the typo (re semicolon). – rookie099 Jan 03 '20 at 09:40
  • That doesn't differ from [PostgreSQL: difference between collations 'C' and 'C.UTF-8'](https://dba.stackexchange.com/questions/240930) you asked 6 months ago? I already submitted a counter-example there showing that `C` and `C.UTF-8` produce different results in string comparisons. – Daniel Vérité Jan 03 '20 at 16:55
  • @DanielVérité I missed that so thanks for pointing that out. The difference between `C` and `C.UTF-8` (in assignments to `LC_COLLATE`) is still a bit puzzling to me, though. Will re-digest those answers (and mark this other question as duplicate in the meantime). – rookie099 Jan 05 '20 at 07:36
  • Does this answer your question? [PostgreSQL: difference between collations 'C' and 'C.UTF-8'](https://dba.stackexchange.com/questions/240930/postgresql-difference-between-collations-c-and-c-utf-8) – rookie099 Jan 05 '20 at 07:37

0 Answers0