Commit graph

244 commits

Author SHA1 Message Date
Aryeh Gregor
f23c4570e4 Make grammar transform cache an instance member
Normally there shouldn't be more than one Language object in existence
at a time because of $mLangObjCache, so there's no need for a
MapCacheLRU for grammar transformations. Just make it an instance member
of Language.

If someone directly called the Language constructor instead of
factory(), or meddled with $mLangObjCache, with one of the three
languages that have transforms defined, and called
getGrammarTransformations() on both distinct objects, this change could
result in duplication of an array of about 50 elements. I think the risk
is acceptable.

The change should be covered acceptably by existing tests for LanguageHe
and LanguageRu.  (There's not much to test.)

Bug: T201405
Change-Id: I483bafbbb7d109b670596f16381def9e3bd26d89
2019-10-08 22:24:22 +00:00
Aryeh Gregor
8c4f59db64 New LanguageFallback service
This replaces the static Language methods getFallbackFor(),
getFallbacksFor(), and getFallbacksIncludingSiteLanguage(). There is
100% unit and integration test coverage for the new class.

One deliberate functional change: I changed one place where we threw
MWException to InvalidArgumentException.

Bug: T201405
Depends-On: Ie7a89f6ed7d52a0bc01672019ff92e7ee105a1f3
Change-Id: I49222eb55f1feec5b1dcd40f364cffe0c8801855
2019-10-08 15:11:39 -07:00
Aryeh Gregor
6d80b6c082 Split some Language methods to LanguageNameUtils
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.

In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).

There is 100% test coverage for the code in the new class.

This was previously committed as 2e52f48c2e and reverted because it
depended on e4468a1d6b, which had to be reverted for performance
issues. There should be no changes other than rebasing.

Bug: T201405
Change-Id: Ifa346c8a92bf1eb57dc5e79458b32b7b26f1ee8a
2019-10-07 15:20:52 -07:00
Aryeh Gregor
043d88f680 Make LocalisationCache a service
This removes Language::$dataCache without deprecation, because 1) I
don't know of a way to properly simulate it in the new paradigm, and 2)
I found no direct access to the member outside of the Language and
LanguageTest classes.

An earlier version of this patch (e4468a1d6b) had to be reverted
because of a massive slowdown on test runs. Based on some local testing,
this should fix the problem. Running all tests in languages is slowed
down by only around 20% instead of a factor of five, and memory usage is
actually reduced greatly (~350 MB -> ~200 MB). The slowdown is still not
great, but I assume it's par for the course for converting things to
services and is acceptable. If not, I can try to optimize further.

Bug: T231220
Bug: T231198
Bug: T231200
Bug: T201405
Change-Id: Ieadbd820379a006d8ad2d2e4a1e96241e172ec5a
2019-10-07 13:18:47 -07:00
Thiemo Kreuz
e06ce9f467 tests: Prefer PHPUnit's assertSame() when comparing empty strings
assertSame() is guaranteed to never do any magic type conversion.
This can be critical when accidentially comparing empty strings (a
value PHP considers to be "falsy") to false, 0, 0.0, null, and such.

Change-Id: I2e2685c5992cae252f629a68ffe1a049f2e5ed1b
2019-09-20 15:27:58 +00:00
Aryeh Gregor
97b9dbc6ce Integration tests for Language fallback methods
100% coverage.  Next: reimplementation as a service, and tests for the
new service.

Change-Id: Icdd5b84eb91fc83905209ce08d499011f7288e33
2019-08-29 18:25:12 +03:00
Amir Sarabadani
308e6427ae Revert "Make LocalisationCache a service"
This reverts commits:
 - 76a940350d
 - b78b8804d0
 - 2e52f48c2e
 - e4468a1d6b

Bug: T231200
Bug: T231198
Change-Id: I1a7e46a979ae5c9c8130dd3927f6663a216ba753
2019-08-26 18:28:26 +02:00
Aryeh Gregor
2e52f48c2e Split some Language methods to LanguageNameUtils
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.

In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).

There is 100% test coverage for the code in the new class.

Change-Id: I245ae94bfc1f62b6af75ea57525139adf2539fe6
2019-08-23 12:52:35 +03:00
Aryeh Gregor
e4468a1d6b Make LocalisationCache a service
This removes Language::$dataCache without deprecation, because 1) I
don't know of a way to properly simulate it in the new paradigm, and 2)
I found no direct access to the member outside of the Language and
LanguageTest classes.

Change-Id: Iaa86c48e7434f081a53f5bae8723c37c5a34f503
2019-08-22 14:25:18 +03:00
Umherirrender
2664eeb632 Clean up spacing of doc comments
Align the doc stars and normalize start and end tokens

Change-Id: Ib0d92e128e7b882bb5b838bd00c74fc16ef14303
2019-08-05 22:29:50 +00:00
Amir Sarabadani
1e9ed7adab Move LanguageCodeTest from integration to unit
76 more unit tests

Change-Id: I68e9ffff30a744ef6b31598e0176e67892defb95
2019-07-11 20:58:09 +02:00
Amir Sarabadani
d23af35764 Unset all globals unneeded for unit tests, assert correct directory
* Unset globals to avoid tests that look like unit tests but actually rely on
  globals
* move some tests out of unit directory so that the test suite will pass.
* Assert that tests which extend MediaWikiUnitTestCase are in a directory with
  "/unit/" in its path name

Depends-On: I67b37b1bde94eaa3d4298d9bd98ac57995ce93b9
Depends-On: I90921679518ee95fe393f8b1bbd9134daf0ba032
Bug: T87781
Change-Id: I16691fc8ac063705ba0c2bc63b96c4534ca8660b
2019-07-09 14:09:29 -04:00
Máté Szabó
344481f60d Move trivially compatible tests to the unit tests suite
This changeset resumes work on T89432 and related tickets
by porting an initial set of tests to the new unit test suite
separated out in I69b92db3e70093570e05cc0a64c7780a278b321a.
The tests were only ported if they worked immediately without
requiring any changes other than changing the test case class
to MediaWikiUnitTestCase and moving the test to the new suite.
If a test failed for any reason (even trivial misconfiguration),
it was NOT ported.

With this change, the unit tests suite now consits of a total
of 455 tests. As before, you can run these tests via the following
command:
$ composer phpunit:unit

Bug: T84948
Bug: T89432
Bug: T87781
Change-Id: Ibb8175981092d7f41864e641cc3c118af70a5c76
2019-06-30 15:23:53 +02:00
Legoktm
4e35134f7a Revert "Separate MediaWiki unit and integration tests"
This reverts commit 0a2b996278.

Reason for revert: Broke postgres tests.

Change-Id: I27d8e0c807ad5f0748b9611a4f3df84cc213fbe1
2019-06-13 23:00:08 +00:00
Máté Szabó
0a2b996278 Separate MediaWiki unit and integration tests
This changeset implements T89432 and related tickets and is based on exploration
done at the Prague Hackathon. The goal is to identify tests in MediaWiki core
that can be run without having to install & configure MediaWiki and its dependencies,
and provide a way to execute these tests via the standard phpunit entry point,
allowing for faster development and integration with existing tooling like IDEs.

The initial set of tests that met these criteria were identified using the work Amir did in
I88822667693d9e00ac3d4639c87bc24e5083e5e8. These tests were then moved into a new subdirectory
under phpunit/ and organized into a separate test suite. The environment for this suite
is set up via a PHPUnit bootstrap file without a custom entry point.

You can execute these tests by running:
$ vendor/bin/phpunit -d memory_limit=512M -c tests/phpunit/unit-tests.xml

Bug: T89432
Bug: T87781
Bug: T84948
Change-Id: Iad01033a0548afd4d2a6f2c1ef6fcc9debf72c0d
2019-06-13 22:56:31 +02:00
Reedy
9f2ffdfbd4 Remove "Squiz.WhiteSpace.FunctionSpacing" from phpcs exclusions
Change-Id: I78b3315f26ab91b6b443f5b028a635552f82f5a3
2019-05-11 02:44:26 +01:00
Jack Phoenix
b704c9f4de Add 'avoidhours' option to Language#formatTimePeriod
Example use case: in some skins we want to show how many *days* ago a page was edited, but we don't really care about the precise _hours_.
Thus we'll set [ 'avoid' => 'avoidhours' ] when calling Language#formatTimePeriod to output something like "Page last edited 60 days ago" instead of "Page last edited 60 days 9 hours ago".

Change-Id: I0a737aab14ccb2b8d4eccdc41e1eb9232eedcb8a
2019-05-10 11:06:00 +03:00
rxy
b7d53539f0 Add support for new Japanese era name "Reiwa"
Bug: T219728
Change-Id: I28c26291c38e7e6c167011472236fb81a8adf032
2019-04-20 11:24:26 +00:00
Giuseppe Lavagetto
d46835ef4f Add ability to override mb_strtoupper in Language::ucfirst
Different PHP versions treat unicode differently, and specifically some
wiki resources become unreachable if mb_strtoupper's behavior has changed.
This patch allows to introduce an override table that allows to smooth
the transition period.

It also provides maintenance scripts to generate such an override table.

Bug: T219279
Change-Id: I0503ff4207fded4648c58c7b50e67c55422a4849
2019-04-17 15:18:44 +00:00
Fomafix
25ccc44018 Fix comments in language class tests
* Add `@covers LanguageGa`.
* Language code `bs` is for "Bosnian (bosanski)" and not for "Croatian
  (hrvatski)".

Change-Id: I605bdd254518dd708343e36a2dee65dd0aa17b63
2018-12-25 15:00:34 +01:00
C. Scott Ananian
fcbde8ae4e Make Language::hasVariant() more strict
In d59f27aeab we made
LanguageConverter::validateVariant() try harder to convert a variant
into an acceptable MediaWiki-internal form, looking at deprecated
codes and BCP 47 aliases.  However, this misled Language::hasVariant()
into thinking that bogus names (like all-uppercase strings) were
acceptable variant names, which then led exceptions when they were
passed to the various conversion methods.

This is a belt-and-suspenders patch for T207433 -- in that case we
shouldn't have created a Language object with code 'sr-cyrl' in the
first place, but once one was created we shouldn't have tried to
ask LanguageSr to convert texts to 'sr-cyrl'.  The latter problem
is fixed by this patch.

Bug: T207433
Change-Id: Id993bc7989144b5031a551662e8e492bd23f698a
2018-10-22 16:35:26 -04:00
C. Scott Ananian
103a4f76dc Deprecate $wgFixArabicUnicode / $wgFixMalayalamUnicode
These were introduced in MW 1.17 and are always true in production.

They were useful to allow folks to defer title conversion, but it's
been a long time now.  We don't need to make this optional any more.

Change-Id: I65dcfe80dc3e1dfeb4d63924a8928655e012a20c
2018-10-21 21:55:39 -04:00
jenkins-bot
690f563edc Merge "Accept BCP 47 codes as aliases for nonstandard variants" 2018-10-11 20:46:42 +00:00
jenkins-bot
64ef09d6a8 Merge "Ensure LanguageCode::bcp47() returns a valid BCP 47 language code" 2018-10-11 20:46:35 +00:00
C. Scott Ananian
d59f27aeab Accept BCP 47 codes as aliases for nonstandard variants
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places.  Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.

This is a replay of an earlier merged patch,
0818070c59, which had to be reverted
because it was based on 8380f0173e which
caused regressions in the Babel extension (T199941).

Change-Id: Ica89d9547c58967747ab0fa15d4e83be5378796d
2018-10-11 02:23:20 -04:00
C. Scott Ananian
21ead7a98d Ensure LanguageCode::bcp47() returns a valid BCP 47 language code
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry.  Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).

Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes.  Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well, and return the standard BCP 47 codes in the siteinfo
API.

Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!

Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.

Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.

This is a replay of an earlier merged patch,
8380f0173e, which had to be reverted
because it caused regressions in the Babel extension (T199941).

Bug: T34483
Bug: T106367
Bug: T120847
Depends-On: I27a5b8e45b34c6b57c1b612b11548001c88cd483
Change-Id: Iebbc604af21d7f2af9c1f1ab2574cb5f309bf6ed
2018-10-11 01:53:54 -04:00
Kunal Mehta
a4e8bea57d tests: Add helper function for ini_set with automatic cleanup
Some tests need to change the value of an ini setting, and typically implement
cleanup handling themselves, usually imperfectly.

Provide a helper function, $this->setIniSetting(), which will take care of
teardown in the same way that $this->setMwGlobals() does.

Change-Id: I7be4198592f0aaf73a28d3c60acb307a918b1a1f
2018-10-10 22:31:37 -07:00
Fomafix
5632815976 Write Latin and other scripts with captial letter
Change-Id: I16c660e54191b63cd6eb3407cb00504665930c4e
2018-10-05 18:49:08 +02:00
Fomafix
50944a1410 Deprecate Language::setCode as public method
setCode changes the language code for the Language object but it also
replaces the whole language codes for all Language objects.

> $lang = Language::factory( 'fr' )

> $lang2 = Language::factory( 'fr' )

> $lang->setCode( 'it' )

> print $lang2->getCode()
it
> $lang3 = Language::factory( 'fr' )

> print $lang3->getCode()
it

Better assign a new Language object.

Also add more tests for Language::equals.

Depends-On: I61439bac82021344c3f9a6056cccd937b3450af2
Depends-On: I2d9e551d6eb33f28f42aeaf48160eba21b83881f
Change-Id: I201b479f58e63c9c40fb8a3ec9575a551fb35235
2018-10-02 23:48:53 -07:00
Timo Tijhof
dbe89abb9e languages: Add coverage for 'ar' and 'ml' normalize()
* Exclude the data files from PHPUnit coverage.
* Add tests covering the normalize() implementations.
* Fix a small todo about using data providers.
* Set explicit visibility.

Change-Id: Ib104cc3215a36901cff853ad5969d92a6e0cf6a0
2018-08-14 23:19:35 +00:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
Aryeh Gregor
63d7f2ad13 Automatically reset namespace caches when needed
This avoids error-prone code written separately in every test.  In
addition to no existing tests resetting the TitleFormatter (more
services probably need to be reset as well), they mostly reset only the
namespace cache on $wgContLang, which wouldn't help for any other
language.

The parser test runner still doesn't do this, but maybe it should.

Change-Id: I44b7a1aec48f14b0950907fa14bd0df80f674296
2018-08-01 16:30:08 +03:00
Aryeh Gregor
355e21590a Use setContentLang() instead of setMwGlobals()
This changes behavior in some tests by making them set $wgLanguageCode
as well as $wgContLang, but that seems like a good thing.

Bug: T200246
Change-Id: I936888f46ff9fefe2707efba837e2ce3a7ca5e3f
2018-07-26 11:35:58 +00:00
Greg Grossmeier
b302b0cd1c Revert "Ensure LanguageCode::bcp47() returns a valid BCP 47 language code"
This reverts commit 8380f0173e.

Reason for revert: Caused T199941

Bug: T199941
Change-Id: I93af756a2d70d6bc91f828fe6ac19bf10ca8788f
2018-07-23 17:27:23 +00:00
Greg Grossmeier
dc282a46d7 Revert "Accept BCP 47 codes as aliases for nonstandard variants"
This reverts commit 0818070c59.

Reason for revert: Caused T199941

Bug: T199941
Change-Id: I24c178eb33890477de79cbb3122861c140578011
2018-07-23 16:44:55 +00:00
C. Scott Ananian
0818070c59 Accept BCP 47 codes as aliases for nonstandard variants
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places.  Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.

Change-Id: I8468a56d5b88f5786abd0a17b67bda2f1687fd0c
2018-07-13 17:43:20 -04:00
C. Scott Ananian
8380f0173e Ensure LanguageCode::bcp47() returns a valid BCP 47 language code
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry.  Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).

Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes.  Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well.

Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!

Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.

Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.

Bug: T34483
Bug: T106367
Bug: T120847
Change-Id: I807dd55d49e9bd19443329231326a5b0d3e6c453
2018-07-13 14:56:18 -04:00
jenkins-bot
8c96aec32c Merge "Fix the bug for dates between 1912 and 1941 in Thai language" 2018-07-10 08:55:56 +00:00
Kunal Mehta
4acb7ed51c Add @coversNothing to tests that don't cover specific PHP classes
Change-Id: Idbd364561bc28547e9fac20d7a80b9a44edf14a9
2018-06-12 13:27:40 -07:00
jenkins-bot
e602b197ab Merge "(y)etsin fixes, test refactoring, and misc fixes" 2018-06-08 20:46:12 +00:00
Bartosz Dziewoński
0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
Bartosz Dziewoński
4fd27f006f Use PHP 5.6 '**' operator instead of 'pow()' function
Change-Id: Ieb22e1dbfcffaa4e7b3dcfabbcc999e5dd59a4bf
2018-05-30 18:05:19 -07:00
tjones
669d1ed192 (y)etsin fixes, test refactoring, and misc fixes
* Fix etsin/етсин/этсин as noted in If933fc67845ac994d9ddfdf8349aff445ec9b13a
** only convert tsin to тсин and let the other rules sort out the e

* Refactor most tests to be word-specific, which uncovered a couple of
bugs in corner cases
** rol/üst prefix matches should match whole words (original [^ü] regex
assumed word could not be end of string

* Fixed incidental bugs I noticed while looking into the items above
** куркчи => kürkçi was in the wrong section
** cönk => джонк was in the right section, but reversed

* Added additional tests cases for all of the above.

Change-Id: Ia96be488a7b41c3ddba623b5c9262703b1c82687
2018-05-29 14:30:04 -04:00
tjones
cbb07cdc33 Crimean Tatar/crh transliteration odds and ends
* refactor '\b' into WB const to make it easy to update in the future
* add new ц-related exceptions

Bug: T193764
Change-Id: Ib707136f8f2598d1f8ec995bf129b436dfb53cd9
2018-05-22 14:59:55 -04:00
C. Scott Ananian
685eba4360 Minor fixes to CRH language conversion.
* Move a many-to-one mapping from the L2C to the C2L table where it
  belongs.

* Fix some regular expression patterns which ended up with misnumbered
  replacement strings.

* All regular expressions should have the `u` (unicode) flag set.

* Typo/spelling fixes in comments

Change-Id: If933fc67845ac994d9ddfdf8349aff445ec9b13a
2018-05-12 14:37:09 -04:00
superyetkin
3aaa2367b2 Fix the bug for dates between 1912 and 1941 in Thai language
Added an if-else block to see if the parameters passed to the function
designate a year between 1912 and 1941 or not. Resulting month values
are also adjusted.
Added a unit test for the related formatting.

Bug: T68648
Change-Id: Ic676b5c140de8878971a786a1a1811770a848016
2018-05-12 15:10:13 +00:00
tjones
14f8dc35db CRH Transliteration Pattern Matching Fixes
Refactor to match exceptions as patterns, not words
- break exception list to C2L and L2C pattern sets
- change main loop to break only on Roman numerals and transliterate
  everything else, rather than tokenizing on single-script words
  (this fixes the km² problem, too)
  - update word anchors from ^ and $ to \b
  - only process Roman numerals for L2C translit
  - add exception for single "Roman" character followed by a period
    which looks like an initial
- consolidate multi-step transliteration into regsConverter()
- remove regex support from main exception list to support strtr()
- re-organize some prefix/suffix/whole word patterns to the right place
- add tests for recently fixed use cases
- add support for many-to-one mappings in both directions
- update character classes, exception lists,  and regexes based on
  speaker feedback and example texts

Misc other fixes:
- fix some character classes errors
- remove unneeded character classes
- add tests for Roman numerals and quotes
- add tests for affixes and regexes

Bug: T188321
Bug: T189512
Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d
2018-04-27 19:17:51 -04:00
jenkins-bot
a6abe2ad7a Merge "Add Russian grammar forms to support Wikiversity" 2018-03-14 08:37:27 +00:00
jenkins-bot
3c198b9dc8 Merge "Fix table loading bug for CRH transliteration" 2018-02-28 21:09:01 +00:00
tjones
70dede013c Fix table loading bug for CRH transliteration
In production, the regex and exception tables were not being loaded,
resulting in very poor transliteration. The loading has been moved to
the contructor, similar to the implementation of the Kazakh
transliteration.

Also, a bug in the mappings for Ö/ö -> Ё/ё and Ü/ü -> Ю/ю has been
fixed.

Test cases for specific additional examples have been added. (Though
it is worth noting that the regex and exception tables did load
properly during unit testing, so the problem wasn't caught there.)

Bug: T186727
Change-Id: I6bacee7d9de6f4a870a8a9ef1f04b819ad489c02
2018-02-26 13:22:04 -05:00