* Move a many-to-one mapping from the L2C to the C2L table where it
belongs.
* Fix some regular expression patterns which ended up with misnumbered
replacement strings.
* All regular expressions should have the `u` (unicode) flag set.
* Typo/spelling fixes in comments
Change-Id: If933fc67845ac994d9ddfdf8349aff445ec9b13a
Refactor to match exceptions as patterns, not words
- break exception list to C2L and L2C pattern sets
- change main loop to break only on Roman numerals and transliterate
everything else, rather than tokenizing on single-script words
(this fixes the km² problem, too)
- update word anchors from ^ and $ to \b
- only process Roman numerals for L2C translit
- add exception for single "Roman" character followed by a period
which looks like an initial
- consolidate multi-step transliteration into regsConverter()
- remove regex support from main exception list to support strtr()
- re-organize some prefix/suffix/whole word patterns to the right place
- add tests for recently fixed use cases
- add support for many-to-one mappings in both directions
- update character classes, exception lists, and regexes based on
speaker feedback and example texts
Misc other fixes:
- fix some character classes errors
- remove unneeded character classes
- add tests for Roman numerals and quotes
- add tests for affixes and regexes
Bug: T188321
Bug: T189512
Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d
Add apostrophe to the set of valid word characters for the en-x-piglatin
converter, so that "don't" and "can't" are properly converted to Pig
Latin (eg, to "on'tday" and "an'tcay").
Add an optional `s` before `qu` so that "squish" is properly converted
to "ishsquay".
Change-Id: Ibc5cf2c007a42d9447688b857aa75f9a3d8ae152
Add a look-ahead to ensure that the regex intended to match roman numerals
doesn't also match the empty string. Tweak the regular expressions slightly
to ensure that Sr/Ku/Crh all have identical regular expressions.
Change-Id: If43bf99a21c42c6c5050f814c0bc99edec353228
In production, the regex and exception tables were not being loaded,
resulting in very poor transliteration. The loading has been moved to
the contructor, similar to the implementation of the Kazakh
transliteration.
Also, a bug in the mappings for Ö/ö -> Ё/ё and Ü/ü -> Ю/ю has been
fixed.
Test cases for specific additional examples have been added. (Though
it is worth noting that the regex and exception tables did load
properly during unit testing, so the problem wasn't caught there.)
Bug: T186727
Change-Id: I6bacee7d9de6f4a870a8a9ef1f04b819ad489c02
In some languages it's conventional not to insert a thousands
separator in numbers that are four digits long (1000-9999).
Rather than copy-paste the custom code to do this between 13 files,
introduce another option and have the base Language class handle it.
This also fixes an issue in several languages where this logic
previously would not work for negative or fractional numbers.
To implement this, a new option is added to MessagesXx.php files,
`$minimumGroupingDigits = 2;`, with the meaning as defined in
<http://unicode.org/reports/tr35/tr35-numbers.html>. It is a little
roundabout, but it could allow us to migrate the number formatting
(currently all custom code) to some generic library easily.
Bug: T177846
Change-Id: Iedd8de5648cf2de1c94044918626de2f96365d48
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
* Remove some creation dates, they are not protected by GPL
* Remove duplicate @defgroup API
* Remove @ingroup from some @file doc comments on class files. It is not
useful to list class files alongside classes in the doxygen module menu.
Add @ingroup to some more class files that had @ingroup on their file,
that was probably the author's intent.
* In PackedOverlayImageGallery, use the file comment as a class comment
* Don't put @defgroup and @file in the same comment. @defgroup makes the
whole doc comment describe the group.
* Instead of putting AnsiTermColorer in two groups, use hierarchical
groups.
Change-Id: If54f6e0b2bc1ea6de42045885cf836ee67b8e961
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).
Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.
Regularize CRH language name in messages/* files.
Fix "varient" typos in qqq.json.
Add unit tests for CRH transliteration.
Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case passing by reference
Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
Guarded by the $wgUsePigLatinVariant variable, off by default.
Pig Latin is a language game where words in English are altered
according to the following rules:
* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
group) moved to the end and an '-ay' suffix appended.
https://en.wikipedia.org/wiki/Pig_Latin
* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
doesn't have a language class or converter.
Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
HHVM 3.18 emits a notice when attempting to access the first offset of
an empty string. We had that fixed for ucfirst() in 3605066c96. This is
the same for lcfirst().
Bug: T161095
Change-Id: I1456611222c24290f259298e883ca89dd830c74b
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
It seems LanguageConverter::parseManualRule was removed by
69dbeb97f1 (2008),
and LanguageConverter::parserConvert by
c568220e61 (2010),
so it seems safe and reasonable to remove their implementations
from few remaining language-specific Converter classes.
Change-Id: I7092f5c8856723fabd2b1f99944451344feb5711
This makes the code for processing JSON files with
grammar transformations reusable by different languages
and applies the same logic to Russian and Hebrew.
It will be done to other languages in further patches.
This patch is not supposed to change any functionality,
and the tests are intact (except a comment in the test
for Hebrew - the class doesn't exist any longer).
PHP:
* Move the JSON grammar transformation data processing logic
from LanguageRu.php to convertGrammar() in Language.php.
By default all these data files are supposed to be
processed identically, so the code should be common.
If there is no JSON data file, nothing new happens.
* LanguageRu's own convertGrammar() method is removed.
* The LanguageHe class is removed, now that all its functionality
is handled by generic JSON data processing in the Language class.
LanguageHe.php file is removed from the repo and from autoloading.
JavaScript:
* Move the JSON grammar transformation data processing logic
from ru.js to mediawiki.language.js.
* JavaScript grammar code files he.js and ru.js are removed
from the repo and from Resources.php, because all the data
is in JSON, and the default logic in mediawiki.language.js
works for both languages.
Bug: T115217
Change-Id: I5e75467121c3d791bb84f9e6fdfcf07c1840f81a
It looks like there is something missing after the last statement
Also remove some other empty lines at begin of functions, ifs or loops
while at these files
Change-Id: Ib00b5cfd31ca4dcd0c32ce33754d3c80bae70641
* Load the data of this variable from a JSON file to the same
data structure that ResourceLoader uses for digitTransformTable,
pluralRules, etc.
* Change the JSON structure to ensure the order of the rules.
Otherwise JavaScript processes the keys in a random order.
* Delete the grammar code from JS and replace it with
the same logic that is used in PHP for processing the data.
For now this is done only for Russian.
The next step will be to make the PHP and JS
data processing logic reusable.
Bug: T115217
Change-Id: I6b9b29b7017f958d62611671be017f97cee73415
Deletes LanguageEo.php class which only had remains of the server-side
character conversion (sx <-> ŝ, etc). This is being obsoleted in favor
of client-side IMEs provided by UniversalLanguageSelector extension.
Removes deprecated $wgEditEncoding, which was only used for this.
Turns Language::recodeInput() and Language::recordForEdit() into no-ops
for any old or extension code that happened to still use them.
Bug: T62677
Change-Id: Ib647353538d258dee941f2f7c571191060bc9c7d
All of them are already being used outside the class:
* getMonthAbbreviation
* getMonthAbbreviationsArray
* getWeekdayName
* sprintfDate
* userAdjust
* date
* time
* timeanddate
* getMessage
* iconv
* ucfirst
* uc
Change-Id: I63ec93858cebc02cdf3b9b042eddf4ef620cc110
For this add an user parameter to Language::translateBlockExpiry.
This allows the function to display the absolute block expiry in the
user's time zone. Use this when formatting block log entries.
This also avoids the use of $wgUser
Bug: T131241
Change-Id: If0a1d3c88bb4242a016eb9b2df115413de786149
* Removed fallback code from Language, the associated data file
(Utf8Case.ser), and the code to generate that data file.
* Removed comment in LanguageFi that "mb_substr has a compatibility
function in GlobalFunctions.php".
* Removed check for mbstring in bench_utf8_title_check.php.
* In the tests for StringUtils::isUtf8():
* Removed separate test for the non-mbstring code path.
* Removed mentions of mbstring from function names and assertion
messages, since mb_check_encoding() is now always used.
* Also updated the comment in StringUtils::isUtf8() referring to
PHP 5.3, which is no longer supported in MediaWiki, to indicate
that the same issue also exists in old versions of HHVM. (If
we don't have to support 3.4 or older, then the function could
be deprecated and removed if desired.)
Follows-up 943563062f.
Change-Id: I55e5cd534b849c6ea06a7fadacbbf34a12d87ebe
Move ZhConversion.php and Names.php to languages/data and make them both
expose their data as static class variables instead of in the local
scope. This means that the autoloader can be used to load the data,
which is efficient and secure. This also makes additional request-local
caching of the arrays unnecessary.
Change-Id: Iafb96ac4165d0965fcb9a69f1d0a91139ea9790c
Previously various language objects would install a hook to update the
shared conversion table cache when the object was constructed. This is
not a good idea since language objects may be constructed even when they
are not the content language, but only the content language is
associated with variant conversion and the conversion cache.
Instead, have WikiPage call a method on $wgContLang directly. I put this
with message cache update since the logic is almost identical.
Change-Id: Ief9c0ef993e39645e74a6e158cb4e6e2139ce91d
Remove require_once for LanguageConverter and base classes. These
are in the autoloader now, so an explicit require is no longer
necessary.
Change-Id: Ie34ffc58fd9ec89fb57cf077dd5ac1746c35c48e
Two rules are ignored for now to allow us to upgrade:
* MediaWiki.ControlStructures.AssignmentInControlStructures.AssignmentInControlStructures
* Generic.ControlStructures.InlineControlStructure.NotAllowed
Also ignore the .git folder.
Change-Id: I1b149c72b27be54e22e369999ad0c41c2d1fc2b4
This method calls out to LanguageConverter which involve the User,
Request, and additional validation.
Change-Id: I3edae1244073767a8d8888708024bb5498c70dc9