Language tags in IPP (was: Re: [Suppress-Script] Initial list
of 300 languages)
cowan at ccil.org
Mon Mar 13 01:57:59 CET 2006
Doug Ewell scripsit:
> It also seems to have some interrelationship with the character set of
> the print job, which seems wrong to me; figuring out which character
> repertoires are necessary for which natural languages is a decidedly
> non-trivial effort (ask Michael, who has done this work for the European
Yes. In particular, if the charset and the language don't agree, according
to the printer's notion of "agree", the printer is free to print mojibake.
> This strongly suggests to me that when we are considering adding
> Suppress-Script values for up to 300 languages, we should focus
> primarily on those languages that are most likely to be used with a
> region subtag, and spend much less time worrying about the rest.
An excellent point. Based on Ethnologue data on national and official
languages, I find that the following languages are national or official
in more than one country:
ar Arabic, bn Bengali, ch Chamorro, da Danish, de German, el Greek, en English,
es Spanish, fr French, hr Croatian, hu Hungarian, it Italian, ko Korean, ln Lingala,
ms Malay, nl Dutch, pt Portuguese, sd Sindhi, sr Serbian, ss Swati, sv Swedish,
sw Swahili, ta Tamil, tn Tswana, tr Turkish, ur Urdu, zh Mandarin Chinese.
Of these, all have Suppress-Script values except Korean (should be Hang),
Mandarin (should be Hani), Sindhi (multiple scripts: Arab or Guru),
and Serbian (Cyrl or Latn).
> Santali, which is spoken in multiple regions, and for which a "default"
> script assignment is not obvious.
Fortunately, the answer for Santali is known: "multiple scripts, the users
of which are at each others' throats." No S-S value for Santali.
Take two turkeys, one goose, four John Cowan
cabbages, but no duck, and mix them http://www.ccil.org/~cowan
together. After one taste, you'll duck cowan at ccil.org
soup the rest of your life. http://www.ap.org
More information about the Ietf-languages