Mostly copied from UzConverter.
This is a very simple converter, with bidirectional one to one
correspondence: for every Latin letter there is a corresponding
Cyrillic letter and vice versa. There are no digraphs or punctuation
to convert.
The Latin alphabet is the primary one used for this language today,
and will probably remain so for the foreseeable future, so "tly" remains
the usual code, and "tly-cyrl" is added for Cyrillic.
Language name is changed:
* The main language name is now Latin.
* The word "language" ("зывон") is removed.
* The spelling of the word "Talysh" is based on the Pireyko dictionary.
Bug: T258975
Change-Id: I552e07967ea82e03c413a0b10b129a846aa007c7
This language is known in English as Shilha, Tashelhit,
Tachelhit, and also by some other names.
The most authoritative modern name in the language itself
is Taclḥiyt, with the letter c. It appears in the textbook
"Méthode de tachelhit, langue amazighe (berbère) du sud du Maroc"
by Abdallah El Mountassir and also in the online
"Dictionnaire Général de la Langue Amazighe Informatisé"
( https://tal.ircam.ma/dglai/lexieam.php ). It also directly
corresponds to the spelling in the Tifinagh script.
This change was requested by the localizers in translatewiki
and authors in the Wikipedia Incubator in this language.
They asked to define the Latin script as the primary.
The name of the variants are updated in the i18n file:
* The Latin names are changed to Taclḥit, as the Latin autonym.
* The Tifinagh name is added: it was missing, which caused
the item in the dropdown to be invisible.
Change-Id: I23689a90460110570c54a05587e5084607268035
Making it the same as in language-data. This is the main
variety, so there's no need to call it "Southern".
Change-Id: I8c667b55d59b9c40b00e835f5ff1da6dbdf9bdf7
This change adds ban-bali (Balinese language in Balinese script), falling back to ban (Balinese language in Latin script).
The language code ban-bali is already known to UniversalLanguageSelector and will be used on Balinese Wikisource (currently in development, hosted on Multilingual Wikisource). ban-bali will likely be used as a page language only for now, or for pieces of text within pages. It's important to make it available as a page language so that ULS can apply the correct font.
Bug: T245359
Change-Id: Icaff92904e9d1250c8c84bfcc6cfa3ebcb5230bd
Autonym in lower case: anarâškielâ
https://iso639-3.sil.org/code/smnhttps://www.ethnologue.com/language/smn
Also change the autonym of Southern Sami (sma) from upper case
(Åarjelsaemien) to lower case (åarjelsaemien) to be consistent to
langdb.yaml
Bug: T248299
Change-Id: I69f3322b1414a22dfa707792779d9d59cd7b837d
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.
In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).
There is 100% test coverage for the code in the new class.
This was previously committed as 2e52f48c2e and reverted because it
depended on e4468a1d6b, which had to be reverted for performance
issues. There should be no changes other than rebasing.
Bug: T201405
Change-Id: Ifa346c8a92bf1eb57dc5e79458b32b7b26f1ee8a
This is done for consistency with how it is in language-data,
and with how most other names of languages of Indonesia are written.
The word "Basa" means "language" and is usually not necessary
in list of languages.
Change-Id: I1b5195a0d00c21b51ddb1a10cd3ed7b787363fa3
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.
In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).
There is 100% test coverage for the code in the new class.
Change-Id: I245ae94bfc1f62b6af75ea57525139adf2539fe6
This change adds the Eastern Pwo language with ISO 639-3 code 'kjp' with
'bo' (Tibetian) fallback. The default script is the Burmese script.
Bug: T203908
Change-Id: Ic69c5f1398bcd96674254b69f678f21b71feb475
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well, and return the standard BCP 47 codes in the siteinfo
API.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
This is a replay of an earlier merged patch,
8380f0173e, which had to be reverted
because it caused regressions in the Babel extension (T199941).
Bug: T34483
Bug: T106367
Bug: T120847
Depends-On: I27a5b8e45b34c6b57c1b612b11548001c88cd483
Change-Id: Iebbc604af21d7f2af9c1f1ab2574cb5f309bf6ed
The Armenian autonym should not have a capital
initial, as names of languages are not proper
nouns in that language.
Bug: T202611
Change-Id: I17cd8706f5fee2f39255c3407b758103e4cb5455
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
Bug: T34483
Bug: T106367
Bug: T120847
Change-Id: I807dd55d49e9bd19443329231326a5b0d3e6c453
This code is useful for targeting Spanish spoken in the Latin America
and the Caribbean region. There are no plans to make this available as
an interface language, hence I am not adding a language file with a
fallback to 'es'.
Bug: T112889
Change-Id: If7f0ed7a13f1cc86985ce5ce509dcf543cc1c0ff
This change adds the Standard Moroccan Amazigh language with ISO
639-3 code 'zgh' with 'kab' (Kabyle) fallback. The default script is the
Neo-Tifinagh script.
Bug: T137491
Change-Id: Idd13f92d7ae05cd47267558c8ff4fa368b701e24
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a