Macrolanguages, countries & orthographies
lars at aronsson.se
Tue Feb 13 23:30:39 CET 2007
Mark Davis wrote:
> Assume that old Czech is as different from modern as fro is from fr.
But is this a real problem? How much total literature is written
and available in different variations of Czech? My prejudice says
that as a nation with a language and literature of its own, Czech
is about as young as Finnish, Norwegian or Serbian, i.e. 19th
century. Can you give any concrete examples when not having a
separate *code* for pre-renaissance Czech is a practical problem?
Linguists of course have *names* for Swedish of all ages, but I
see no real use for having ISO or the IETF specify language
*codes*. I could be wrong, but if so please enlighten and correct
me. Nobody is going to translate OpenOffice or Mozilla to the
language spoken by vikings (Old Norse) or the Swedish used during
the Lutheran reformation (called New Swedish, ironically).
Yes, there is now a branch of Wikipedia in Old English
(ang.wikipedia.org), but that is a rare exception. I don't expect
this to happen in other languages. Ang has now 744 articles,
compared to the 11,000 articles of the Latin Wikipedia.
I'm scanning old books, and I'm starting to see a practical
problem with different orthographies and spelling reforms, similar
to those addressed with the IETF defined codes for German de-1901
and de-1996. Analogous to these codes, we could perhaps find use
for sv-1801, sv-1889, sv-1906, da-1775, da-1892 and da-1948,
because we now have *significant amounts* of text online in each
of these language versions. But before 1775/1801 the orthography
of Swedish and Danish varies so heavily with each work, that it
becomes pretty much useless to try to identify more versions.
And before that time, there is also so small amounts of literature
available, that any automatic processing becomes insignificant.
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the Ietf-languages