These were introduced in MW 1.17 and are always true in production.
They were useful to allow folks to defer title conversion, but it's
been a long time now. We don't need to make this optional any more.
Change-Id: I65dcfe80dc3e1dfeb4d63924a8928655e012a20c
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places. Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.
This is a replay of an earlier merged patch,
0818070c59, which had to be reverted
because it was based on 8380f0173e which
caused regressions in the Babel extension (T199941).
Change-Id: Ica89d9547c58967747ab0fa15d4e83be5378796d
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well, and return the standard BCP 47 codes in the siteinfo
API.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
This is a replay of an earlier merged patch,
8380f0173e, which had to be reverted
because it caused regressions in the Babel extension (T199941).
Bug: T34483
Bug: T106367
Bug: T120847
Depends-On: I27a5b8e45b34c6b57c1b612b11548001c88cd483
Change-Id: Iebbc604af21d7f2af9c1f1ab2574cb5f309bf6ed
Some tests need to change the value of an ini setting, and typically implement
cleanup handling themselves, usually imperfectly.
Provide a helper function, $this->setIniSetting(), which will take care of
teardown in the same way that $this->setMwGlobals() does.
Change-Id: I7be4198592f0aaf73a28d3c60acb307a918b1a1f
setCode changes the language code for the Language object but it also
replaces the whole language codes for all Language objects.
> $lang = Language::factory( 'fr' )
> $lang2 = Language::factory( 'fr' )
> $lang->setCode( 'it' )
> print $lang2->getCode()
it
> $lang3 = Language::factory( 'fr' )
> print $lang3->getCode()
it
Better assign a new Language object.
Also add more tests for Language::equals.
Depends-On: I61439bac82021344c3f9a6056cccd937b3450af2
Depends-On: I2d9e551d6eb33f28f42aeaf48160eba21b83881f
Change-Id: I201b479f58e63c9c40fb8a3ec9575a551fb35235
* Exclude the data files from PHPUnit coverage.
* Add tests covering the normalize() implementations.
* Fix a small todo about using data providers.
* Set explicit visibility.
Change-Id: Ib104cc3215a36901cff853ad5969d92a6e0cf6a0
This avoids error-prone code written separately in every test. In
addition to no existing tests resetting the TitleFormatter (more
services probably need to be reset as well), they mostly reset only the
namespace cache on $wgContLang, which wouldn't help for any other
language.
The parser test runner still doesn't do this, but maybe it should.
Change-Id: I44b7a1aec48f14b0950907fa14bd0df80f674296
This changes behavior in some tests by making them set $wgLanguageCode
as well as $wgContLang, but that seems like a good thing.
Bug: T200246
Change-Id: I936888f46ff9fefe2707efba837e2ce3a7ca5e3f
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places. Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.
Change-Id: I8468a56d5b88f5786abd0a17b67bda2f1687fd0c
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry. Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).
Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes. Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well.
Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!
Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.
Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.
Bug: T34483
Bug: T106367
Bug: T120847
Change-Id: I807dd55d49e9bd19443329231326a5b0d3e6c453
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
* Fix etsin/етсин/этсин as noted in If933fc67845ac994d9ddfdf8349aff445ec9b13a
** only convert tsin to тсин and let the other rules sort out the e
* Refactor most tests to be word-specific, which uncovered a couple of
bugs in corner cases
** rol/üst prefix matches should match whole words (original [^ü] regex
assumed word could not be end of string
* Fixed incidental bugs I noticed while looking into the items above
** куркчи => kürkçi was in the wrong section
** cönk => джонк was in the right section, but reversed
* Added additional tests cases for all of the above.
Change-Id: Ia96be488a7b41c3ddba623b5c9262703b1c82687
* refactor '\b' into WB const to make it easy to update in the future
* add new ц-related exceptions
Bug: T193764
Change-Id: Ib707136f8f2598d1f8ec995bf129b436dfb53cd9
* Move a many-to-one mapping from the L2C to the C2L table where it
belongs.
* Fix some regular expression patterns which ended up with misnumbered
replacement strings.
* All regular expressions should have the `u` (unicode) flag set.
* Typo/spelling fixes in comments
Change-Id: If933fc67845ac994d9ddfdf8349aff445ec9b13a
Added an if-else block to see if the parameters passed to the function
designate a year between 1912 and 1941 or not. Resulting month values
are also adjusted.
Added a unit test for the related formatting.
Bug: T68648
Change-Id: Ic676b5c140de8878971a786a1a1811770a848016
Refactor to match exceptions as patterns, not words
- break exception list to C2L and L2C pattern sets
- change main loop to break only on Roman numerals and transliterate
everything else, rather than tokenizing on single-script words
(this fixes the km² problem, too)
- update word anchors from ^ and $ to \b
- only process Roman numerals for L2C translit
- add exception for single "Roman" character followed by a period
which looks like an initial
- consolidate multi-step transliteration into regsConverter()
- remove regex support from main exception list to support strtr()
- re-organize some prefix/suffix/whole word patterns to the right place
- add tests for recently fixed use cases
- add support for many-to-one mappings in both directions
- update character classes, exception lists, and regexes based on
speaker feedback and example texts
Misc other fixes:
- fix some character classes errors
- remove unneeded character classes
- add tests for Roman numerals and quotes
- add tests for affixes and regexes
Bug: T188321
Bug: T189512
Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d
In production, the regex and exception tables were not being loaded,
resulting in very poor transliteration. The loading has been moved to
the contructor, similar to the implementation of the Kazakh
transliteration.
Also, a bug in the mappings for Ö/ö -> Ё/ё and Ü/ü -> Ю/ю has been
fixed.
Test cases for specific additional examples have been added. (Though
it is worth noting that the regex and exception tables did load
properly during unit testing, so the problem wasn't caught there.)
Bug: T186727
Change-Id: I6bacee7d9de6f4a870a8a9ef1f04b819ad489c02
Language::fetchLanguageNames( 'mwfile' ) means all languages with the
default filter 'mw' and names in the language 'mwfile'.
Language::fetchLanguageNames( null, 'mwfile' ) means language all
languages with the filter 'mwfile' and names in the default language.
This change removes the test for the language codes:
* aa
* als
* bat-smg
* be-x-old
* cho
* fiu-vro
* ho
* hz
* kj
* kr
* mh
* mus
* ng
* no
* rn
* roa-rup
* shi-latn
* shi-tfng
* simple
* tum
* uz-cyrl
* uz-latn
* zh-classical
* zh-min-nan
* zh-yue
Change-Id: I7266a67e37862daf863d1565d84cfeebaf5cb680
Introduce truncateInternal() method in Language class, based on
existing truncate() method. New method abstracts string truncation,
allowing users to specify callable functions for text length measurement
and string truncation.
New method, truncateInternal(), is used to provide two options for
text truncation:
* For DB usage: truncateForDatabase() method is truncating text by
number of bytes.
* For UI usage: truncateForVisual() method is truncating text by number
of characters, using multibyte string PHP methods.
Old truncate() method is deprecated and just returns the results of
truncateForDatabase() method.
Newly introduced truncateForVisual() method is used for
truncation of long tag descriptions in RCFilters menu.
Bug: T179626
Change-Id: Ib01a8c303304064dde3ce983b817d93a88a5affd
Redundant given this is the project-wide license already,
especially in file headers that already include the GPL license
header.
This and other minor fixups based on feedback from Ie0cea0ef5027c7e5.
* Add @file where missing.
* Move @ingroup and @deprecated from file to class doc where needed.
Change-Id: I7067abb7abee1f0c238cb2536e16192e946d8daa
In some languages it's conventional not to insert a thousands
separator in numbers that are four digits long (1000-9999).
Rather than copy-paste the custom code to do this between 13 files,
introduce another option and have the base Language class handle it.
This also fixes an issue in several languages where this logic
previously would not work for negative or fractional numbers.
To implement this, a new option is added to MessagesXx.php files,
`$minimumGroupingDigits = 2;`, with the meaning as defined in
<http://unicode.org/reports/tr35/tr35-numbers.html>. It is a little
roundabout, but it could allow us to migrate the number formatting
(currently all custom code) to some generic library easily.
Bug: T177846
Change-Id: Iedd8de5648cf2de1c94044918626de2f96365d48
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
I removed comments that merely repeated the location of the class being
tested. There are other tests in this directory that don't have a
corresponding class and need further investigation.
Change-Id: Ic16f0887b5030ac53fab4382cfaedfb5426cdb08
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).
Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.
Regularize CRH language name in messages/* files.
Fix "varient" typos in qqq.json.
Add unit tests for CRH transliteration.
Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
Adjust regexes for what not to convert to avoid backtracking by
preferring possesive quantifiers
Add check that we really have matched to the end of the string, and
log error if the regex hits some sort of error preventing the
entire string from being matched. Should the regex not match to the
end, then language conversion is disabled for the string.
Bug: T124404
Change-Id: I4f0c171c7da804e9c1508ef1f59556665a318f6a
According to the typographical convention, a thousands separator
should not be inserted in numbers that are four digits long (between
1000 and 9999), unlike in English where it's usually acceptable.
This logic is currently implemented in LanguagePl::commafy().
Bug: T177846
Change-Id: I6dbd8febcf59000067cdd7d3c11111f2f77f4e66
Guarded by the $wgUsePigLatinVariant variable, off by default.
Pig Latin is a language game where words in English are altered
according to the following rules:
* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
group) moved to the end and an '-ay' suffix appended.
https://en.wikipedia.org/wiki/Pig_Latin
* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
doesn't have a language class or converter.
Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
$wgDummyLanguageCodes is a set and mapping of different language codes:
* Renamed language codes: ['als' => 'gsw', 'bat-smg' => 'sgs',
'be-xold' => 'be-tarask', 'fiu-vro' => 'vro',
'roa-rup' => 'rup', 'zh-classical' => 'lzh',
'zh-min-nan' => 'nan', 'zh-yue' => 'yue'].
The old language codes are deprecated because they are invalid but
should be supported for compatibility reasons for a while.
* Language codes of macro languages, which get mapped to the main
language: ['bh' => 'bho', 'no' => 'nb'].
* Language variants which get mapped to main language:
['simple' => 'en'].
* Internal language codes of the private-use-area which get mapped to
itself: ['qqq' => 'qqq', 'qqx' => 'qqx']
This is a very strange conglomeration which should get differentiated,
and were split up in the following ways:
* Renamed language codes are available from
LanguageCode::getDeprecatedCodeMapping().
* Language codes of macro languages and the variants that are mapped to
the main language are available as $wgExtraLanguageCodes and are set
in DefaultSettings.php.
* Internal language codes are set in $wgDummyLanguageCodes in Setup.php.
Change-Id: If73c74ee87d8235381449cab7dcd9f46b0f23590
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: I46261416f7603558dceb76ebe695a5cac274e417
For relative timestamps in $str, strtotime( $str, $now ) returns an
absolute Unix timestamp $str since $now, and this timestamp is given
to $time. However, Language::formatDuration expects a time duration,
not an absolute timestamp. We obtain this duration from the difference
between $time, the absolute timestamp of block expiry, and $now, the
absolute timestamp of the time in which the block action happened.
Tests have been added to test both this patch and 01936fa, the patch
that caused this regression.
Bug: T156453
Change-Id: I6fd8c02dc3c6456067fe25cb9f33f5b4c78332aa
This makes the code for processing JSON files with
grammar transformations reusable by different languages
and applies the same logic to Russian and Hebrew.
It will be done to other languages in further patches.
This patch is not supposed to change any functionality,
and the tests are intact (except a comment in the test
for Hebrew - the class doesn't exist any longer).
PHP:
* Move the JSON grammar transformation data processing logic
from LanguageRu.php to convertGrammar() in Language.php.
By default all these data files are supposed to be
processed identically, so the code should be common.
If there is no JSON data file, nothing new happens.
* LanguageRu's own convertGrammar() method is removed.
* The LanguageHe class is removed, now that all its functionality
is handled by generic JSON data processing in the Language class.
LanguageHe.php file is removed from the repo and from autoloading.
JavaScript:
* Move the JSON grammar transformation data processing logic
from ru.js to mediawiki.language.js.
* JavaScript grammar code files he.js and ru.js are removed
from the repo and from Resources.php, because all the data
is in JSON, and the default logic in mediawiki.language.js
works for both languages.
Bug: T115217
Change-Id: I5e75467121c3d791bb84f9e6fdfcf07c1840f81a
Available as of PHP 5.5 and more idomatic. Foo::class (explicit),
self::class (defined), and static::class (late bound).
Change-Id: I66937f32095a4e4ecde94ca20a935a3c3efc9cee
Some of them don't have many test cases, or have test cases that don't
represent the ideal transliteration and so are subject to change. But
this is better than nothing.
Change-Id: I4aae693bd77d9ff365f48113923ed7f9fed8d668
If we really need this we can do it in MediaWikiTestCase, next
to the setting of wgMainCacheType. But from what I can see the
code being tested here already doesn't use the old $wgMemc.
Change-Id: I9e4b2109b2f3c18d8d5551bbadae5711c1d4c0a6
To detect whether the truncation had chopped up a multibyte
character after the first byte, a regex was used. But in this
regex, the dot (.) didn't match newlines, so it failed to
detect chopped multibyte characters (after the first byte)
if there was a newline preceding the chopped character.
Bug: T116693
Change-Id: I66e4fd451acac0a1019da7060d5a37d70963a15a
CLDR provides translated language names. They are useful for showing
names by themselves in menus and lists, but it's often problematic to add them
to Russian sentences, because they need to be declined, so a message like
"This page is not available in the $1 language" is hard to localize.
This patch adds new cases for Russian -
"languagegen", "languageprep" and "languageadverb".
(The last one, as its name says, it's not actually a grammatical case,
but a transformation to an adverbial expression.)
This covers most of the needs for language names that MediaWiki supports.
Change-Id: Ib6a0afa5c3736f8b9b2e121cd752c53ee50fad75
* Fix the '-ти' rule to match the name of Wikiquote.
* Add tests for '-ти' and '-ник' rules.
* Remove the '-ь' and '-ка' rules, which were copied from Russian
and are not used in Ukrainian, and remove their tests as well.
* Remove non-implemented ("stub") cases.
* Cleanup the code of commafy().
Change-Id: I98647ceb8806d845f3c8150b92a5d9f7fe5866f2
The grammar rules for Ukrainian have several mistakes.
This is the first in a series of commits that fix this.
* Add grammar tests for PHP. There weren't any tests at all,
and now there are some. Not tests are added for rules that
are wrong and irrelevant and will be removed in subsequent commits.
* Add tests for JavaScript, and update a grammar rule that was
incorrectly copied from Russian.
Change-Id: I6de4581e2908eba39b33a13b07d048a34a3bd803
"B" and "P" are vanishingly rare abbreviations for "bytes" and "pixels"
respectively. Let's use the full English terms for these, combined with
a PLURAL magic word for good measure.
Change-Id: Id59c4b9dea2c13940ae790b6a236ac08abe0a768
Use arrays instead of strings, to avoid using
string functions with Unicode.
Handle thousands according to how years like 1000, 2000, etc.
are named in the Hebrew Wikipedia.
Bug: T97444
Change-Id: I5334e86793d28dfcf8939a249b03a5ea85fa4e69
Some failing tests are commented out and will be properly fixed
in subsequent commits.
Bug: T97444
Change-Id: I19721b5dc3dc6bbe923d9bf401fcf5d765fb7a7c
* Remove redundant @licence/@license from test suite files.
They already have full licence headers. And @licence raises a
warning in Doxygen.
* Fix weird messes of comments inside comments and other things.
Change-Id: I38da8ca76330f72b8dc22b0ecf1ea69d5ea55ede
* The updates include incompatible changes for plural forms in Russian,
Prussian, Tagalog, Manx and several languages that fall back to Russian.
In addition there are minor changes for other languages.
* Test cases were updated to reflect these changes.
Bug: 62861
Change-Id: I7cce477925330fe5bbf51a8470060dc1223981d0
MediaWiki default is "@return type Description", so set a type after
return and start the description with a capital letter. Also use the
more common spelling of boolean.
See http://phpdoc.org/docs/latest/references/phpdoc/tags/return.html for
more about @return
Change-Id: I4e5198822fe92836f9cef9918a9fc1a1a1e0a043
The results of this function are used to decide whether a code is
valid for loading an i18n file without any further normalization.
Partially reverts 93348f3 which made the regular expression case-
insensitive. Per IRC discussion, language codes should always be
lowercase and it's up to callers to deal with that.
Change-Id: I8975c3374a37935080d9f7eca6a602e32f67a87b
Add an out parameter to Language::sprintfDate that returns the amount of
time that its output is valid for (e.g., an output format of 'Y-m-d' at
11:50 PM would be valid for 600 seconds).
Change-Id: I3f5a80aa4d303f92c97d24ab780af920894d24ef
When parsing, filter any array values that are empty string
before using strtr php function so that strtr can handle
the array.
Bug: 64347
Change-Id: I94761caa70d44febfa0999c91048a01044fc1fbe
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Also added some missing @param.
Change-Id: Ic8aaf0a93796b97d0fa4617c1f86ff59f4b36131
LocalisationCache and Language have to take the JSON files into account
in deciding if a language is present or not.
Standardizing language validity checking with isSupportedLanguage
and isValidBuiltInCode.
Co-Authored-By: Niklas Laxström <niklas.laxstrom@gmail.com>
Co-Authored-By: Siebrand Mazeland <siebrand@kitano.nl>
Change-Id: I35bbb3a7a145fc48d14fff620407dff5ecfdd4fc
- Added spaces after if/foreach/catch
- Added new line before end of file
- Added or removed spaces before/after parenthesis, comma
- Added spaces around string concat
Change-Id: I0590070f1b3542108e242730e8d9a3ba9831e94f
Russian (ru) plural rules have a major change. The 'few' form is
merged with the 'other' form. The current forms are 'one', 'many', 'other'.
In MW ru plural rules were overridden using convertPlural methdod
in LanguagesRu.php with 3 forms.
Effectively forms[1] and forms[2] are swapped.
Followup: I9930b290d004667a3bb09e5c1663ec2c9c27d8a6
Bug: 56931
Change-Id: Ia5779e42315d3f41f52dce2bfffaee0a4297d23b
Updated plurals.xml with new data from CLDR 24.
This data is according to UTS #35 Rev 33.
Update the CLDRPluralRuleParser.js to version 1.1 from upstream
https://github.com/santhoshtr/CLDRPluralRuleParser
Changes to the plural rules:
* Hebrew override removed since CLDR 24 matches with MW plural rules.
* Updated the syntax of overridden rules to TR35 Rev 33 for
Lower Sorbian (dsb), Upper Sorbian (hsb), Belarusian in Taraskievica
orthography (be_tarask), Old Church Slavonic (cu), Bhojpuri (bho),
Samogitian (sgs).
* Removed Manx (gv) override. See I46ab3dadc7fe08c1e60bbd81a1ee841e166e9608.
* Removed the overriden convertPlural method for Serbian from LanguageSr.php,
since CLDR 24 matches with MW rules. Updated and added more tests.
Tests updated for Serbocroatian (sh), too. Old CLDR versions had 4 plural
rules and MW had only 3. In CLDR 24, the form 'many' was removed and it
became identical to the MW. Same for Bosnian (bs) and Croatian (hr).
Also for variants sr-ec and sr-el
* Macedonian (mk) used to count 11 as 'other' form.
CLDR 24 counts it as 'one'.
Not overriding, using CLDR 24 here.
Updated the tests. MW will not override this.
* Armenian (hy) used to count 0 as 'other'.
Now it is 'one' form.
Updated the tests. MW will not override this.
* Latvian (lv) used to count only 0 as 'zero' form, but CLDR 24, any number
satisifying the following formula is counted as zero:
n % 10 = 0 or n % 100 = 11..19 or v = 2 and f % 100 = 11..19
Examples: 0, 10~20, 30, 40, 50, 60, 100.
Updated the tests accordingly. Not overriding it in MW.
Users will see different plural form for the above numbers.
* Removed Ukranian custom plural rule since it match with MW
* Russian (ru) plural rules have a major change.
The 'few' form is merged with the 'other' form.
The current forms are 'one', 'many', 'other'.
In MW ru plural rules were overridden using convertPlural methdod
in LanguagesRu.php with 3 forms.
Effectively forms[1] and forms[2] are swapped.
This will affect the messages, and such messages
must be reviewed and updated. This change is not included in this patch and
wil be done separately.
Russian is the only remaining language class with convertPlural method overridden.
Notable impact on the exising messages:
* For languages ru, uk, be_tarask, sr, For the special case
of two plural forms and first mapped to 1 and rest to the other form, syntax like
{{plural:$1|1=one|other}} should be used.
For further information regarding each of the above language changes, see
1. http://unicode.org/cldr/trac/ticket/3727
2. http://goo.gl/H2HEz
CLDR 24 can handle fractions. Ideally it should start working
in MW without any code changes, but MW language test suite
does not have enough tests to confirm.
Followup: e571717e06
Bug: 56931
Change-Id: I9930b290d004667a3bb09e5c1663ec2c9c27d8a6
* New operands i, v, w, f, t
* New operators =, !=, %
* Ignore "samples", which are basically unit tests embedded in rule
specifications
* Ignore the new "other" rules, which have an empty condition. It
doesn't really makes sense to parse them, since the empty condition
means special handling should be done in the caller, it is not
equivalent to an unconditional true or false.
* Trailing zero support requires that the input number be a string.
Documented this.
* Fixed some comments
* Added test cases for new features
Bug: 56931
Change-Id: I96986c0c664f785e75b0a4ced2ec9e37b72681c1
Backported the plural rules from CLDR 24 as an override to CLDR 23 rules
exising in MediaWiki. The syntax for plural rules changed in CLDR 24, so
modified the syntax to fit the CLDR 23 syntax
Once we are ready with the updated parsers for CLDR 24(See bug 56931),
we should remove the override.
Since we remove the custom plural forms in MW in favor of CLDR,
the following changes comes into effect:
1. 'few' form used as 'zero' form in MW. Practially that make 'one' form used
as 'two' form and 'two' used as 'few' form.
This breaks existing gv {{PURAL}} usage as dicussed in the bug report
2. CLDR defines 'few' form as n % 100 = 0,20,40,60 but MW adds 80 also to that
list, ie n % 100 = 0,20,40,60, 80. So with this patch, 80 is no longer considered
as 'few' plural form.
Bug: 47099
Change-Id: I46ab3dadc7fe08c1e60bbd81a1ee841e166e9608
Russian has overridden convertPlural method, that was not
taking care of explicit plural forms.
Follow up: I2a9f93567087babb896999f1214d3c56afc67c96
Bug: 54514
Change-Id: Ia977fa544b1d0e40222c7296b7145dcd6f93ecc2
A new protected method looks for explicitly defined forms.
Every overriden language class is required to use this method.
Includes tests.
Redoing old patch I6dc759e3dfb05d6673209ba00da6592a384d5300
Bug: 46422
Change-Id: I2a9f93567087babb896999f1214d3c56afc67c96
Localization of numeric values should operate on the values as strings,
and should handle strings representign very large numbers gracefully.
Change-Id: I95394b96f9b70deb06ab818b54e08ac4ccb38c6c
When truncating a string at a point where it contains a space (ie "hello
world" to 9 chars), the resultant string will have a space before the
ellipsis ("hello ..."). This is both gramatically incorrect and just looks
wrong, and is fixed by trimming the string before appending the ellipsis.
Change-Id: Iec86b17bfc8c50e4c1a96fd373861841fc57848d
This change:
- Adds method scope
- adds @covers tags
- adds various @todos
- fixes some comments
Before the changes tests ran with:
1383 tests, 1412 assertions 10 skips
After changes the results remain the same
Change-Id: Iee57447bdb47026952ef5dcce6fed5dad0f80e52
I have no clue how this ever actually worked under Zend, but it
definitely didn't with HHVM. Now it works on both.
Change-Id: I521dfb74f30306736adda5662598fd036ad9849b
Currently if the site language is zh and a user is using variant zh-tw,
namespace names from zh-hant are displayed because of the language
converter, but they're not accepted by MediaWiki as valid namespace names
by default because zh falls back to zh-hans.
For core namespaces, all converted namespace names are manually added as
$namespaceAliases in MessagesZh.php but it's not always done in extensions.
With this patch converted namespace names are automatically added as
namespace aliases when namespace aliases are loaded.
In some followup commit it makes sense to remove existing core namespace
aliases which were created for this reason.
Change-Id: I01873d9c64a9943afbb655d6203cec9ebd39fb72
It is possible that only explicit plural forms are specified, and
therefore, it is possible that none match. However, handling of
explicit forms came after the count( $forms ) check, so input such
as {{PLURAL:|1=}} would trigger a "PHP Notice: Undefined offset: -1".
Change-Id: I8494de8ceb9e0cfff7203c69c21f02b3731275af
Follows-Up: I50eb0c6d1c02ca936848d310de625ed1fe43d91a
- Remove check for version, that version is already enforced in phpunit.php,
so there is not point showing a warning for it is useless
- Remove call to MessageCache::destroyInstance(), there is no need for it,
since $wgMessageCacheType is set in phpunit.php before running Setup.php
- Remove includes of bootstrap.php in LanguageSrTest.php and LanguageUzTest.php
Change-Id: I4b2db6b3e6f001175e1a407c5add2972aade5e60
Check if the timestamp has a length of 14 characters and if it is numeric.
Throw an exception otherwise. Includes tests.
Bug: 47629
Change-Id: I9a4fd0af88cf20c2a6bd72fd7048743466c1600f
Add an optional timezone parameter to Language::sprintfDate, add format
characters eIOPTZ, and correct crU.
While we're at it, remove backwards-compatability code for 'N' and then
merge the existing switch cases for cr and wNzWtLoU that are basically
identical, since all those cases need to be changed anyway.
Bug: 33454
Change-Id: Iea1f78428bc0d32d6395818311dbe4b94d776c42
Also update some previous inconsistencies pointed out by Krinkle in change IDs:
* Ide20743a2e84ff68549286120e6cff9d9f396f54
* I811ca957b6588085d67606ebc0cd4033a1e53839
Change-Id: Ife33b931870d0d7e04fcb40974997436d27f528f
A recent CLDR update changed the plural rules for Hebrew
and added a "many" form. That rule has a mistake, however -
it is not supposed to include the number 10. This is reported as
http://unicode.org/cldr/trac/ticket/5828
This commit updates the plural overrides for Hebrew and makes
them closer to the latest CLDR, but with a fix for that rule.
It also updates the tests to include support for the new rule
and to make sure that the right fallbacks are used when
less than four rules are supplied, because usually that is
the case. It is thus a partial revert of the changes introduced
in I3d72e4105f6244b0695116940e62a2ddef66eb66 .
Mingle-Task: 2715
Change-Id: I1315e6900ef7a89bf246e748b786f7fc31a629c6
For example, find out which rule type should be applied for 5 items
in Arabic. The result would be 'few'.
This implementation should be non-disruptive and completely backwards
compatible (which is the main reason it isn't a lot simpler).
Change-Id: I3d72e4105f6244b0695116940e62a2ddef66eb66
* CLDR does not define plural rules for sgs.
* Port the plural rules present in LanguageSgs.php to CLDR plural
definition syntax
* Remove LanguageSgs.php
* Update the tests, reorder/rename the plural form names
Change-Id: I44658402d69a6805cdfd189fe780eadee94056c7
* Replace == with ===
* Add support for the prepositional case
* Add support for Wikidata
* Add tests
Change-Id: Ic02bfb9ce88e93775036f3d15921cedca602237c
Fix almost all occurences of the following sniffs:
Generic.CodeAnalysis.UselessOverridingMethod.Found
Generic.Formatting.NoSpaceAfterCast.SpaceFound
Generic.Functions.FunctionCallArgumentSpacing.SpaceBeforeComma
Generic.Functions.OpeningFunctionBraceKernighanRitchie.BraceOnNewLine
Generic.PHP.LowerCaseConstant.Found
PSR2.Classes.PropertyDeclaration.ScopeMissing
PSR2.Files.EndFileNewline.TooMany
PSR2.Methods.MethodDeclaration.StaticBeforeVisibility
Change-Id: I96aacef5bafe5a2bca659744fba1380999cfc37d
By checking the code against $wgDummyLanguageCodes we can
get rid of checking it on a case-by-case basis.
It's only a suggestion (I don't know if it can break anything),
and Amir Aharoni said that big changes are coming (Bug 41103).
In private case, this change fixes Bug 27571 and maybe some other
language fallback related issues.
Change-Id: I5212beabd5cc212b50ee98b5b53ec01b20ffd0c3
Changing $_ to $number; changing $numberpart to $integerpart
Also adding a unit test for the commafy function
Change-Id: Iaf6dd027bd70722d316d1a9c10c9913fff8300ce
The language classes have been using the same setUp() tearDown() to
craft a new language object. I have abstracted that code in
LanguageClassesTestCase and made all the language test classes to extend
it. The language is interpolated directly from the class name and an
object for it can be retrieved with the getLang() method.
Change-Id: Ib931336ce219edabe2c72b7e9f04c976a500723e