This change adds the Eastern Pwo language with ISO 639-3 code 'kjp' with
'bo' (Tibetian) fallback. The default script is the Burmese script.
Bug: T203908
Change-Id: Ic69c5f1398bcd96674254b69f678f21b71feb475
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well, and return the standard BCP 47 codes in the siteinfo
API.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
This is a replay of an earlier merged patch,
8380f0173e, which had to be reverted
because it caused regressions in the Babel extension (T199941).
Bug: T34483
Bug: T106367
Bug: T120847
Depends-On: I27a5b8e45b34c6b57c1b612b11548001c88cd483
Change-Id: Iebbc604af21d7f2af9c1f1ab2574cb5f309bf6ed
The Armenian autonym should not have a capital
initial, as names of languages are not proper
nouns in that language.
Bug: T202611
Change-Id: I17cd8706f5fee2f39255c3407b758103e4cb5455
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
Bug: T34483
Bug: T106367
Bug: T120847
Change-Id: I807dd55d49e9bd19443329231326a5b0d3e6c453
This code is useful for targeting Spanish spoken in the Latin America
and the Caribbean region. There are no plans to make this available as
an interface language, hence I am not adding a language file with a
fallback to 'es'.
Bug: T112889
Change-Id: If7f0ed7a13f1cc86985ce5ce509dcf543cc1c0ff
This change adds the Standard Moroccan Amazigh language with ISO
639-3 code 'zgh' with 'kab' (Kabyle) fallback. The default script is the
Neo-Tifinagh script.
Bug: T137491
Change-Id: Idd13f92d7ae05cd47267558c8ff4fa368b701e24
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
The LEFT-TO-RIGHT MARK (U+200E) after the RTL autonym of the language
'lki' was inserted in 04fcd20c.
The LRM causes wrong parentheses on mixed bidi sequences on Google
Chrome:
<span dir="rtl">({{#language:lki}}) Foo</span>
Change-Id: I9db84938e2b2142a3cb61955dfcbda790e6bbc5f
Currently some language autonyms with parentheses have misaligned
parentheses on RTL environment on some browser. To reproduce open
index.php?title=Special:Preferences&uselang=en-rtl
Google Chrome is affected. Mozilla Firefox is not affected.
This changes fixes this problem like at the other autonyms with
parentheses.
Change-Id: Ie01116821b067017434681ea995e97ada8ff0a6d
Ref T190324 for more information. 조선말 is more common name than
the current "한국어 (조선)".
Bug: T190324
Change-Id: Ie94e60887afe05a92d240ad91faaa9aa7b9b6ea5
Signed-off-by: Yongmin Hong <revi@pobox.com>
This version has been in MediaWiki longer than my email history,
since 2005 at least. This spelling is not present in
https://en.wikipedia.org/wiki/Luxembourgish
Change-Id: Ibda7e6428a2c79b9f7d88892ef1c16e9921ae934
Previously, even if $wgUsePigLatinVariant was false, the language
would show up on Special:Preferences (and some other places) as
'en-x-piglatin - Igpay Atinlay'.
Follow-up to d8375bee24.
Change-Id: I08faacabca87c04299c7b535be8df1770e0a37ac
Guarded by the $wgUsePigLatinVariant variable, off by default.
Pig Latin is a language game where words in English are altered
according to the following rules:
* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
group) moved to the end and an '-ay' suffix appended.
https://en.wikipedia.org/wiki/Pig_Latin
* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
doesn't have a language class or converter.
Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
* 'no' is the language code for the macro language Norwegian with the
autonym 'norsk'.
* 'nb' is the language code for the language Norwegian Bokmål with the
autonym 'norsk bokmål'.
* 'nn' is the language code for the language Norwegian Nynorsk with the
autonym 'norsk nynorsk'.
'no' falls back to 'nb'.
Change-Id: Ieff4ff4ecdce20ce65a818612af90815121d70d3
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
The letter "k" was for some reason written in the Cyrillic alphabet.
Everywhere else in the Wp/olo incubator it is written with the Latin
letter "k", so this must be a mistake.
Change-Id: I51eb44b4cdb6014aafb7e6b4e5a725434b86e877