Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
C. Scott Ananian	fcbde8ae4e	Make Language::hasVariant() more strict In `d59f27aeab` we made LanguageConverter::validateVariant() try harder to convert a variant into an acceptable MediaWiki-internal form, looking at deprecated codes and BCP 47 aliases. However, this misled Language::hasVariant() into thinking that bogus names (like all-uppercase strings) were acceptable variant names, which then led exceptions when they were passed to the various conversion methods. This is a belt-and-suspenders patch for T207433 -- in that case we shouldn't have created a Language object with code 'sr-cyrl' in the first place, but once one was created we shouldn't have tried to ask LanguageSr to convert texts to 'sr-cyrl'. The latter problem is fixed by this patch. Bug: T207433 Change-Id: Id993bc7989144b5031a551662e8e492bd23f698a	2018-10-22 16:35:26 -04:00
C. Scott Ananian	103a4f76dc	Deprecate $wgFixArabicUnicode / $wgFixMalayalamUnicode These were introduced in MW 1.17 and are always true in production. They were useful to allow folks to defer title conversion, but it's been a long time now. We don't need to make this optional any more. Change-Id: I65dcfe80dc3e1dfeb4d63924a8928655e012a20c	2018-10-21 21:55:39 -04:00
Fomafix	5632815976	Write Latin and other scripts with captial letter Change-Id: I16c660e54191b63cd6eb3407cb00504665930c4e	2018-10-05 18:49:08 +02:00
Timo Tijhof	dbe89abb9e	languages: Add coverage for 'ar' and 'ml' normalize() * Exclude the data files from PHPUnit coverage. * Add tests covering the normalize() implementations. * Fix a small todo about using data providers. * Set explicit visibility. Change-Id: Ib104cc3215a36901cff853ad5969d92a6e0cf6a0	2018-08-14 23:19:35 +00:00
tjones	669d1ed192	(y)etsin fixes, test refactoring, and misc fixes * Fix etsin/етсин/этсин as noted in If933fc67845ac994d9ddfdf8349aff445ec9b13a ** only convert tsin to тсин and let the other rules sort out the e * Refactor most tests to be word-specific, which uncovered a couple of bugs in corner cases ** rol/üst prefix matches should match whole words (original [^ü] regex assumed word could not be end of string * Fixed incidental bugs I noticed while looking into the items above куркчи => kürkçi was in the wrong section cönk => джонк was in the right section, but reversed * Added additional tests cases for all of the above. Change-Id: Ia96be488a7b41c3ddba623b5c9262703b1c82687	2018-05-29 14:30:04 -04:00
tjones	cbb07cdc33	Crimean Tatar/crh transliteration odds and ends * refactor '\b' into WB const to make it easy to update in the future * add new ц-related exceptions Bug: T193764 Change-Id: Ib707136f8f2598d1f8ec995bf129b436dfb53cd9	2018-05-22 14:59:55 -04:00
C. Scott Ananian	685eba4360	Minor fixes to CRH language conversion. * Move a many-to-one mapping from the L2C to the C2L table where it belongs. * Fix some regular expression patterns which ended up with misnumbered replacement strings. * All regular expressions should have the `u` (unicode) flag set. * Typo/spelling fixes in comments Change-Id: If933fc67845ac994d9ddfdf8349aff445ec9b13a	2018-05-12 14:37:09 -04:00
tjones	14f8dc35db	CRH Transliteration Pattern Matching Fixes Refactor to match exceptions as patterns, not words - break exception list to C2L and L2C pattern sets - change main loop to break only on Roman numerals and transliterate everything else, rather than tokenizing on single-script words (this fixes the km² problem, too) - update word anchors from ^ and $ to \b - only process Roman numerals for L2C translit - add exception for single "Roman" character followed by a period which looks like an initial - consolidate multi-step transliteration into regsConverter() - remove regex support from main exception list to support strtr() - re-organize some prefix/suffix/whole word patterns to the right place - add tests for recently fixed use cases - add support for many-to-one mappings in both directions - update character classes, exception lists, and regexes based on speaker feedback and example texts Misc other fixes: - fix some character classes errors - remove unneeded character classes - add tests for Roman numerals and quotes - add tests for affixes and regexes Bug: T188321 Bug: T189512 Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d	2018-04-27 19:17:51 -04:00
jenkins-bot	a6abe2ad7a	Merge "Add Russian grammar forms to support Wikiversity"	2018-03-14 08:37:27 +00:00
tjones	70dede013c	Fix table loading bug for CRH transliteration In production, the regex and exception tables were not being loaded, resulting in very poor transliteration. The loading has been moved to the contructor, similar to the implementation of the Kazakh transliteration. Also, a bug in the mappings for Ö/ö -> Ё/ё and Ü/ü -> Ю/ю has been fixed. Test cases for specific additional examples have been added. (Though it is worth noting that the regex and exception tables did load properly during unit testing, so the problem wasn't caught there.) Bug: T186727 Change-Id: I6bacee7d9de6f4a870a8a9ef1f04b819ad489c02	2018-02-26 13:22:04 -05:00
Amire80	398e2a7c9d	Add Russian grammar forms to support Wikiversity Change-Id: I70fcb03db62307116ec96d4c242e6796534b57a1	2018-02-26 14:18:01 +02:00
Bartosz Dziewoński	eb6bb6b7b9	Generalize non-digit-grouping of four-digit numbers In some languages it's conventional not to insert a thousands separator in numbers that are four digits long (1000-9999). Rather than copy-paste the custom code to do this between 13 files, introduce another option and have the base Language class handle it. This also fixes an issue in several languages where this logic previously would not work for negative or fractional numbers. To implement this, a new option is added to MessagesXx.php files, `$minimumGroupingDigits = 2;`, with the meaning as defined in <http://unicode.org/reports/tr35/tr35-numbers.html>. It is a little roundabout, but it could allow us to migrate the number formatting (currently all custom code) to some generic library easily. Bug: T177846 Change-Id: Iedd8de5648cf2de1c94044918626de2f96365d48	2018-01-02 11:17:25 +01:00
Umherirrender	255d76f2a1	build: Updating mediawiki/mediawiki-codesniffer to 15.0.0 Clean up use of @codingStandardsIgnore - @codingStandardsIgnoreFile -> phpcs:ignoreFile - @codingStandardsIgnoreLine -> phpcs:ignore - @codingStandardsIgnoreStart -> phpcs:disable - @codingStandardsIgnoreEnd -> phpcs:enable For phpcs:disable always the necessary sniffs are provided. Some start/end pairs are changed to line ignore Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8	2018-01-01 14:10:16 +01:00
Kunal Mehta	fc23633035	Add @covers tags to languages tests I removed comments that merely repeated the location of the class being tested. There are other tests in this directory that don't have a corresponding class and need further investigation. Change-Id: Ic16f0887b5030ac53fab4382cfaedfb5426cdb08	2017-12-28 08:52:56 +00:00
tjones	a0b511319c	Crimean Tatar Transliteration This is a first pass at Latin/Cyrillic translitertion for Crimean Tatar (crh). Includes transliteration tables, prefix/suffix mappings, regex mappings, and exceptions lists for words and abbreviations. Regularize CRH language name in messages/* files. Fix "varient" typos in qqq.json. Add unit tests for CRH transliteration. Bug: T23582 Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068	2017-11-20 16:56:38 -05:00
Bartosz Dziewoński	3f62813c51	Add test cases for digit grouping (commafy) in Polish According to the typographical convention, a thousands separator should not be inserted in numbers that are four digits long (between 1000 and 9999), unlike in English where it's usually acceptable. This logic is currently implemented in LanguagePl::commafy(). Bug: T177846 Change-Id: I6dbd8febcf59000067cdd7d3c11111f2f77f4e66	2017-10-10 22:52:11 +02:00
James D. Forrester	1e9c361960	tests: Replace implicit Bugzilla bug numbers with Phab ones It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345" except where it doesn't, so let's just standardise on the real numbers. Change-Id: I46261416f7603558dceb76ebe695a5cac274e417	2017-02-21 02:14:34 +00:00
Amir E. Aharoni	6b03e2e88e	Make the code for grammar data processing common This makes the code for processing JSON files with grammar transformations reusable by different languages and applies the same logic to Russian and Hebrew. It will be done to other languages in further patches. This patch is not supposed to change any functionality, and the tests are intact (except a comment in the test for Hebrew - the class doesn't exist any longer). PHP: * Move the JSON grammar transformation data processing logic from LanguageRu.php to convertGrammar() in Language.php. By default all these data files are supposed to be processed identically, so the code should be common. If there is no JSON data file, nothing new happens. * LanguageRu's own convertGrammar() method is removed. * The LanguageHe class is removed, now that all its functionality is handled by generic JSON data processing in the Language class. LanguageHe.php file is removed from the repo and from autoloading. JavaScript: * Move the JSON grammar transformation data processing logic from ru.js to mediawiki.language.js. * JavaScript grammar code files he.js and ru.js are removed from the repo and from Resources.php, because all the data is in JSON, and the default logic in mediawiki.language.js works for both languages. Bug: T115217 Change-Id: I5e75467121c3d791bb84f9e6fdfcf07c1840f81a	2016-12-16 15:52:14 +02:00
Fomafix	7de07e8991	Update weblinks in comments from HTTP to HTTPS Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link. Change-Id: I06d9e043730accc4ae71b927e0f8229f0fc3b340	2016-10-11 17:25:10 +00:00
Kunal Mehta	6e9b4f0e9c	Convert all array() syntax to [] Per wikitech-l consensus: https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html Notes: * Disabled CallTimePassByReference due to false positives (T127163) Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b	2016-02-17 01:33:00 -08:00
Tim Starling	f0ba7a69a1	Add tests for LanguageConverter classes that didn't have them Some of them don't have many test cases, or have test cases that don't represent the ideal transliteration and so are subject to change. But this is better than nothing. Change-Id: I4aae693bd77d9ff365f48113923ed7f9fed8d668	2016-02-08 09:19:25 +11:00
jenkins-bot	88081365b3	Merge "Add new grammar forms for language names in Russian"	2015-09-28 13:41:33 +00:00
Amir E. Aharoni	8b0c0b49ce	Add new grammar forms for language names in Russian CLDR provides translated language names. They are useful for showing names by themselves in menus and lists, but it's often problematic to add them to Russian sentences, because they need to be declined, so a message like "This page is not available in the $1 language" is hard to localize. This patch adds new cases for Russian - "languagegen", "languageprep" and "languageadverb". (The last one, as its name says, it's not actually a grammatical case, but a transformation to an adverbial expression.) This covers most of the needs for language names that MediaWiki supports. Change-Id: Ib6a0afa5c3736f8b9b2e121cd752c53ee50fad75	2015-09-28 15:51:24 +03:00
Amir E. Aharoni	b175f585db	Update Ukrainian grammar rules and tests * Fix the '-ти' rule to match the name of Wikiquote. * Add tests for '-ти' and '-ник' rules. * Remove the '-ь' and '-ка' rules, which were copied from Russian and are not used in Ukrainian, and remove their tests as well. * Remove non-implemented ("stub") cases. * Cleanup the code of commafy(). Change-Id: I98647ceb8806d845f3c8150b92a5d9f7fe5866f2	2015-09-27 15:21:49 +03:00
Amir E. Aharoni	5ccbaf2c48	Update grammar rules and test for Ukrainian The grammar rules for Ukrainian have several mistakes. This is the first in a series of commits that fix this. * Add grammar tests for PHP. There weren't any tests at all, and now there are some. Not tests are added for rules that are wrong and irrelevant and will be removed in subsequent commits. * Add tests for JavaScript, and update a grammar rule that was incorrectly copied from Russian. Change-Id: I6de4581e2908eba39b33a13b07d048a34a3bd803	2015-09-27 11:49:07 +03:00
Vivek Ghaisas	c54766586a	Fix issues identified by SpaceBeforeSingleLineComment sniff Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9	2015-09-26 23:06:52 +00:00
Vivek Ghaisas	9f5b6f5aeb	Fix whitespace issues around parentheses Fix issues found by MediaWiki.WhiteSpace.SpaceyParenthesis sniff. Bug: T102617 Change-Id: Iec7f71e64081659fba373ec20d9d2006306a98f4	2015-06-16 22:14:02 +03:00
umherirrender	0d39b3bb0d	Move Test files under same folder structure where class is (/languages/) Change-Id: I25c99272a1c2e318e6c61b4a497bf04886430e9b	2015-01-10 19:53:59 +00:00

28 commits