Add files for Norwegian Bokmål (nb) and Norwegian Nynorsk (nn) to enable
{{GRAMMAR:genitive}} for those two languages. Tested in Vagrant.
Change-Id: Ib8d84fa24daffde023203df4e8fa1148b52ea975
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.
In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).
There is 100% test coverage for the code in the new class.
This was previously committed as 2e52f48c2e and reverted because it
depended on e4468a1d6b, which had to be reverted for performance
issues. There should be no changes other than rebasing.
Bug: T201405
Change-Id: Ifa346c8a92bf1eb57dc5e79458b32b7b26f1ee8a
This is done for consistency with how it is in language-data,
and with how most other names of languages of Indonesia are written.
The word "Basa" means "language" and is usually not necessary
in list of languages.
Change-Id: I1b5195a0d00c21b51ddb1a10cd3ed7b787363fa3
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.
In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).
There is 100% test coverage for the code in the new class.
Change-Id: I245ae94bfc1f62b6af75ea57525139adf2539fe6
This change adds the Eastern Pwo language with ISO 639-3 code 'kjp' with
'bo' (Tibetian) fallback. The default script is the Burmese script.
Bug: T203908
Change-Id: Ic69c5f1398bcd96674254b69f678f21b71feb475
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well, and return the standard BCP 47 codes in the siteinfo
API.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
This is a replay of an earlier merged patch,
8380f0173e, which had to be reverted
because it caused regressions in the Babel extension (T199941).
Bug: T34483
Bug: T106367
Bug: T120847
Depends-On: I27a5b8e45b34c6b57c1b612b11548001c88cd483
Change-Id: Iebbc604af21d7f2af9c1f1ab2574cb5f309bf6ed
Added spaces around .
Removed empty return statement which are not required
Removed return after phpunit markTestIncomplete,
which is throwing to exit the test, no need for a return
Change-Id: I2c80b965ee52ba09949e70ea9e7adfc58a1d89ce
The Armenian autonym should not have a capital
initial, as names of languages are not proper
nouns in that language.
Bug: T202611
Change-Id: I17cd8706f5fee2f39255c3407b758103e4cb5455
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
Bug: T34483
Bug: T106367
Bug: T120847
Change-Id: I807dd55d49e9bd19443329231326a5b0d3e6c453
This code is useful for targeting Spanish spoken in the Latin America
and the Caribbean region. There are no plans to make this available as
an interface language, hence I am not adding a language file with a
fallback to 'es'.
Bug: T112889
Change-Id: If7f0ed7a13f1cc86985ce5ce509dcf543cc1c0ff
This change adds the Standard Moroccan Amazigh language with ISO
639-3 code 'zgh' with 'kab' (Kabyle) fallback. The default script is the
Neo-Tifinagh script.
Bug: T137491
Change-Id: Idd13f92d7ae05cd47267558c8ff4fa368b701e24
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
* Fix etsin/етсин/этсин as noted in If933fc67845ac994d9ddfdf8349aff445ec9b13a
** only convert tsin to тсин and let the other rules sort out the e
* Refactor most tests to be word-specific, which uncovered a couple of
bugs in corner cases
** rol/üst prefix matches should match whole words (original [^ü] regex
assumed word could not be end of string
* Fixed incidental bugs I noticed while looking into the items above
** куркчи => kürkçi was in the wrong section
** cönk => джонк was in the right section, but reversed
* Added additional tests cases for all of the above.
Change-Id: Ia96be488a7b41c3ddba623b5c9262703b1c82687
* refactor '\b' into WB const to make it easy to update in the future
* add new ц-related exceptions
Bug: T193764
Change-Id: Ib707136f8f2598d1f8ec995bf129b436dfb53cd9
The LEFT-TO-RIGHT MARK (U+200E) after the RTL autonym of the language
'lki' was inserted in 04fcd20c.
The LRM causes wrong parentheses on mixed bidi sequences on Google
Chrome:
<span dir="rtl">({{#language:lki}}) Foo</span>
Change-Id: I9db84938e2b2142a3cb61955dfcbda790e6bbc5f
* Move a many-to-one mapping from the L2C to the C2L table where it
belongs.
* Fix some regular expression patterns which ended up with misnumbered
replacement strings.
* All regular expressions should have the `u` (unicode) flag set.
* Typo/spelling fixes in comments
Change-Id: If933fc67845ac994d9ddfdf8349aff445ec9b13a
Currently some language autonyms with parentheses have misaligned
parentheses on RTL environment on some browser. To reproduce open
index.php?title=Special:Preferences&uselang=en-rtl
Google Chrome is affected. Mozilla Firefox is not affected.
This changes fixes this problem like at the other autonyms with
parentheses.
Change-Id: Ie01116821b067017434681ea995e97ada8ff0a6d
Refactor to match exceptions as patterns, not words
- break exception list to C2L and L2C pattern sets
- change main loop to break only on Roman numerals and transliterate
everything else, rather than tokenizing on single-script words
(this fixes the km² problem, too)
- update word anchors from ^ and $ to \b
- only process Roman numerals for L2C translit
- add exception for single "Roman" character followed by a period
which looks like an initial
- consolidate multi-step transliteration into regsConverter()
- remove regex support from main exception list to support strtr()
- re-organize some prefix/suffix/whole word patterns to the right place
- add tests for recently fixed use cases
- add support for many-to-one mappings in both directions
- update character classes, exception lists, and regexes based on
speaker feedback and example texts
Misc other fixes:
- fix some character classes errors
- remove unneeded character classes
- add tests for Roman numerals and quotes
- add tests for affixes and regexes
Bug: T188321
Bug: T189512
Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d
Ref T190324 for more information. 조선말 is more common name than
the current "한국어 (조선)".
Bug: T190324
Change-Id: Ie94e60887afe05a92d240ad91faaa9aa7b9b6ea5
Signed-off-by: Yongmin Hong <revi@pobox.com>
This version has been in MediaWiki longer than my email history,
since 2005 at least. This spelling is not present in
https://en.wikipedia.org/wiki/Luxembourgish
Change-Id: Ibda7e6428a2c79b9f7d88892ef1c16e9921ae934
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).
Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.
Regularize CRH language name in messages/* files.
Fix "varient" typos in qqq.json.
Add unit tests for CRH transliteration.
Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
Previously, even if $wgUsePigLatinVariant was false, the language
would show up on Special:Preferences (and some other places) as
'en-x-piglatin - Igpay Atinlay'.
Follow-up to d8375bee24.
Change-Id: I08faacabca87c04299c7b535be8df1770e0a37ac
Guarded by the $wgUsePigLatinVariant variable, off by default.
Pig Latin is a language game where words in English are altered
according to the following rules:
* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
group) moved to the end and an '-ay' suffix appended.
https://en.wikipedia.org/wiki/Pig_Latin
* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
doesn't have a language class or converter.
Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
* 'no' is the language code for the macro language Norwegian with the
autonym 'norsk'.
* 'nb' is the language code for the language Norwegian Bokmål with the
autonym 'norsk bokmål'.
* 'nn' is the language code for the language Norwegian Nynorsk with the
autonym 'norsk nynorsk'.
'no' falls back to 'nb'.
Change-Id: Ieff4ff4ecdce20ce65a818612af90815121d70d3
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988