Commit graph

405 commits

Author SHA1 Message Date
Reedy
9dfc1941bc Remove PageContentSaveComplete hook subscriber that won't work..
Change-Id: Ib68ea80e9db85e79589699cbbc6a731bc37f52a3
2018-06-21 11:51:20 +01:00
Reedy
bea5b8526c Reduce indenting, remove else conditions
Change-Id: If39ed94f12108dee231ff99dbe740418d192f349
2018-06-21 11:50:27 +01:00
jenkins-bot
d8a144d38e Merge "languages: Use static array files for normalizer data" 2018-05-25 23:03:18 +00:00
Kunal Mehta
e0193327bd Fix MediaWiki.Commenting.LicenseComment.InvalidLicenseTag errors
Change-Id: I936c3f5fca1a0061f215e80469f5d882cb32ee29
2018-05-23 16:23:42 -07:00
Timo Tijhof
4f22361759 languages: Use static array files for normalizer data
This reduces the number of '.ser' files to 1 (we still have
first-letters-root.ser).

Change-Id: Ib0ee0d826da34b1825fd5bb74563c6bbadeec75c
2018-05-22 21:38:43 +00:00
jenkins-bot
47db328765 Merge "Minor fixes to CRH language conversion." 2018-05-22 18:47:13 +00:00
Kunal Mehta
230958d97c Autofix MediaWiki.Commenting.FunctionComment.SpacingDoc* errors
Change-Id: I63761ebce04c03b9b13237919c27cc10180f198f
2018-05-19 14:07:03 -07:00
C. Scott Ananian
685eba4360 Minor fixes to CRH language conversion.
* Move a many-to-one mapping from the L2C to the C2L table where it
  belongs.

* Fix some regular expression patterns which ended up with misnumbered
  replacement strings.

* All regular expressions should have the `u` (unicode) flag set.

* Typo/spelling fixes in comments

Change-Id: If933fc67845ac994d9ddfdf8349aff445ec9b13a
2018-05-12 14:37:09 -04:00
tjones
14f8dc35db CRH Transliteration Pattern Matching Fixes
Refactor to match exceptions as patterns, not words
- break exception list to C2L and L2C pattern sets
- change main loop to break only on Roman numerals and transliterate
  everything else, rather than tokenizing on single-script words
  (this fixes the km² problem, too)
  - update word anchors from ^ and $ to \b
  - only process Roman numerals for L2C translit
  - add exception for single "Roman" character followed by a period
    which looks like an initial
- consolidate multi-step transliteration into regsConverter()
- remove regex support from main exception list to support strtr()
- re-organize some prefix/suffix/whole word patterns to the right place
- add tests for recently fixed use cases
- add support for many-to-one mappings in both directions
- update character classes, exception lists,  and regexes based on
  speaker feedback and example texts

Misc other fixes:
- fix some character classes errors
- remove unneeded character classes
- add tests for Roman numerals and quotes
- add tests for affixes and regexes

Bug: T188321
Bug: T189512
Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d
2018-04-27 19:17:51 -04:00
C. Scott Ananian
2e7018e22c LanguageConverter tweaks to Pig Latin converter
Add apostrophe to the set of valid word characters for the en-x-piglatin
converter, so that "don't" and "can't" are properly converted to Pig
Latin (eg, to "on'tday" and "an'tcay").

Add an optional `s` before `qu` so that "squish" is properly converted
to "ishsquay".

Change-Id: Ibc5cf2c007a42d9447688b857aa75f9a3d8ae152
2018-04-01 23:50:58 -04:00
C. Scott Ananian
ee90bd4c5c Make LanguageConverter roman-numeral cases consistent
Add a look-ahead to ensure that the regex intended to match roman numerals
doesn't also match the empty string.  Tweak the regular expressions slightly
to ensure that Sr/Ku/Crh all have identical regular expressions.

Change-Id: If43bf99a21c42c6c5050f814c0bc99edec353228
2018-04-01 23:49:19 -04:00
tjones
70dede013c Fix table loading bug for CRH transliteration
In production, the regex and exception tables were not being loaded,
resulting in very poor transliteration. The loading has been moved to
the contructor, similar to the implementation of the Kazakh
transliteration.

Also, a bug in the mappings for Ö/ö -> Ё/ё and Ü/ü -> Ю/ю has been
fixed.

Test cases for specific additional examples have been added. (Though
it is worth noting that the regex and exception tables did load
properly during unit testing, so the problem wasn't caught there.)

Bug: T186727
Change-Id: I6bacee7d9de6f4a870a8a9ef1f04b819ad489c02
2018-02-26 13:22:04 -05:00
Bartosz Dziewoński
eb6bb6b7b9 Generalize non-digit-grouping of four-digit numbers
In some languages it's conventional not to insert a thousands
separator in numbers that are four digits long (1000-9999).
Rather than copy-paste the custom code to do this between 13 files,
introduce another option and have the base Language class handle it.

This also fixes an issue in several languages where this logic
previously would not work for negative or fractional numbers.

To implement this, a new option is added to MessagesXx.php files,
`$minimumGroupingDigits = 2;`, with the meaning as defined in
<http://unicode.org/reports/tr35/tr35-numbers.html>. It is a little
roundabout, but it could allow us to migrate the number formatting
(currently all custom code) to some generic library easily.

Bug: T177846
Change-Id: Iedd8de5648cf2de1c94044918626de2f96365d48
2018-01-02 11:17:25 +01:00
Umherirrender
255d76f2a1 build: Updating mediawiki/mediawiki-codesniffer to 15.0.0
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable

For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore

Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
2018-01-01 14:10:16 +01:00
Huji Lee
e74bfe13f6 Require indentation of CASE statements in PHP code
Bug: T182546
Change-Id: I91a9555893a08e4ec58da97c6cc4d1e70000ff6b
2017-12-10 22:07:50 -05:00
Tim Starling
dc2948d76d A few doc comment fixups
* Remove some creation dates, they are not protected by GPL
* Remove duplicate @defgroup API
* Remove @ingroup from some @file doc comments on class files. It is not
  useful to list class files alongside classes in the doxygen module menu.
  Add @ingroup to some more class files that had @ingroup on their file,
  that was probably the author's intent.
* In PackedOverlayImageGallery, use the file comment as a class comment
* Don't put @defgroup and @file in the same comment. @defgroup makes the
  whole doc comment describe the group.
* Instead of putting AnsiTermColorer in two groups, use hierarchical
  groups.

Change-Id: If54f6e0b2bc1ea6de42045885cf836ee67b8e961
2017-12-04 11:11:52 +11:00
tjones
a0b511319c Crimean Tatar Transliteration
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).

Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.

Regularize CRH language name in messages/* files.

Fix "varient" typos in qqq.json.

Add unit tests for CRH transliteration.

Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
2017-11-20 16:56:38 -05:00
jenkins-bot
84b6d5c2e5 Merge "Add missing type to @param documentation" 2017-08-11 21:31:51 +00:00
WMDE-Fisch
6df9ed1ad6 update mediawiki-codesniffer to 0.11.0 and fix issues
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case  passing by reference

Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
2017-08-11 22:27:51 +02:00
Umherirrender
5544cef16b Add missing type to @param documentation
Change-Id: I6b2c9c7af9a281fe457099cc3a336a60a25e74aa
2017-08-11 20:37:35 +02:00
Umherirrender
b5cddfb27b Remove empty lines at begin of function, if, foreach, switch
Organize phpcs.xml a bit

Change-Id: Ifb767729b481b4b686e6d6444cf48b1f580cc478
2017-07-01 11:34:16 +00:00
Liangent
d8375bee24 New language variant 'en-x-piglatin' for easier variant testing
Guarded by the $wgUsePigLatinVariant variable, off by default.

Pig Latin is a language game where words in English are altered
according to the following rules:

* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
  group) moved to the end and an '-ay' suffix appended.

https://en.wikipedia.org/wiki/Pig_Latin

* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
  provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
  doesn't have a language class or converter.

Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
2017-06-12 16:59:57 -04:00
Antoine Musso
1819a85bed Check for string initialization in lcfirst() for HHVM 3.18
HHVM 3.18 emits a notice when attempting to access the first offset of
an empty string.  We had that fixed for ucfirst() in 3605066c96. This is
the same for lcfirst().

Bug: T161095
Change-Id: I1456611222c24290f259298e883ca89dd830c74b
2017-03-24 15:19:36 +01:00
Andre Klapper
3605066c96 Check for string initialization in ucfirst() to make HHVM 3.18 happy
Bug: T161095
Change-Id: I45b5d9e819061f443d4342c004bad80bd87c2a17
2017-03-22 13:17:11 +01:00
jenkins-bot
aa3319c4c0 Merge "Miscellaneous indentation tweaks" 2017-02-28 18:38:36 +00:00
James D. Forrester
3526417586 languages: Replace implicit Bugzilla bug numbers with Phab ones
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.

Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
2017-02-28 00:33:38 +00:00
Bartosz Dziewoński
ecdef925bb Miscellaneous indentation tweaks
I was bored. What? Don't look at me that way.

I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.

Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
2017-02-27 19:23:54 +01:00
Bartosz Dziewoński
01936fa994 BlockLogFormatter: Durations are relative to block's timestamp, not Unix epoch
Also fixed legacy code in LogFormatter producing messages for IRC feed.

Bug: T55907
Change-Id: I0df19574f74210a91ce72c79188b6618f04ef9a2
2017-01-18 13:21:56 +00:00
Leszek Manicki
cb4bb23df6 Remove unused static methods in LanguageConverter subclasses
It seems LanguageConverter::parseManualRule was removed by
69dbeb97f1 (2008),
and LanguageConverter::parserConvert by
c568220e61 (2010),
so it seems safe and reasonable to remove their implementations
from few remaining language-specific Converter classes.

Change-Id: I7092f5c8856723fabd2b1f99944451344feb5711
2017-01-03 15:13:51 +01:00
Amir E. Aharoni
58612ee9fa Move the Ukrainian grammar rules from PHP and JS to JSON
Bug: T115217
Change-Id: I15a06b07e381cc9074e64e746d22ec51e9e638c4
2016-12-18 11:10:16 +00:00
Amir E. Aharoni
6b03e2e88e Make the code for grammar data processing common
This makes the code for processing JSON files with
grammar transformations reusable by different languages
and applies the same logic to Russian and Hebrew.
It will be done to other languages in further patches.

This patch is not supposed to change any functionality,
and the tests are intact (except a comment in the test
for Hebrew - the class doesn't exist any longer).

PHP:
* Move the JSON grammar transformation data processing logic
  from LanguageRu.php to convertGrammar() in Language.php.
  By default all these data files are supposed to be
  processed identically, so the code should be common.
  If there is no JSON data file, nothing new happens.
* LanguageRu's own convertGrammar() method is removed.
* The LanguageHe class is removed, now that all its functionality
  is handled by generic JSON data processing in the Language class.
  LanguageHe.php file is removed from the repo and from autoloading.

JavaScript:
* Move the JSON grammar transformation data processing logic
  from ru.js to mediawiki.language.js.
* JavaScript grammar code files he.js and ru.js are removed
  from the repo and from Resources.php, because all the data
  is in JSON, and the default logic in mediawiki.language.js
  works for both languages.

Bug: T115217
Change-Id: I5e75467121c3d791bb84f9e6fdfcf07c1840f81a
2016-12-16 15:52:14 +02:00
umherirrender
34fe90ac52 Remove empty lines at end of functions
It looks like there is something missing after the last statement
Also remove some other empty lines at begin of functions, ifs or loops
while at these files

Change-Id: Ib00b5cfd31ca4dcd0c32ce33754d3c80bae70641
2016-11-05 11:55:10 +01:00
Amir E. Aharoni
df5a848de8 Make grammar data loadable as an RL module and usable in JS
* Load the data of this variable from a JSON file to the same
  data structure that ResourceLoader uses for digitTransformTable,
  pluralRules, etc.
* Change the JSON structure to ensure the order of the rules.
  Otherwise JavaScript processes the keys in a random order.
* Delete the grammar code from JS and replace it with
  the same logic that is used in PHP for processing the data.

For now this is done only for Russian.

The next step will be to make the PHP and JS
data processing logic reusable.

Bug: T115217
Change-Id: I6b9b29b7017f958d62611671be017f97cee73415
2016-10-21 12:25:16 -07:00
Fomafix
7de07e8991 Update weblinks in comments from HTTP to HTTPS
Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link.

Change-Id: I06d9e043730accc4ae71b927e0f8229f0fc3b340
2016-10-11 17:25:10 +00:00
Amir Sarabadani
9850c542c6 Clean up array() syntax in docs, part VII
Last part

Change-Id: I38f015e2122ef4fd2d2141718bd889794c29f06c
2016-09-27 06:53:25 +03:30
Brion Vibber
3b5f60f2c8 Remove old Esperanto character conversion support
Deletes LanguageEo.php class which only had remains of the server-side
character conversion (sx <-> ŝ, etc). This is being obsoleted in favor
of client-side IMEs provided by UniversalLanguageSelector extension.

Removes deprecated $wgEditEncoding, which was only used for this.

Turns Language::recodeInput() and Language::recordForEdit() into no-ops
for any old or extension code that happened to still use them.

Bug: T62677
Change-Id: Ib647353538d258dee941f2f7c571191060bc9c7d
2016-07-18 19:20:49 +00:00
Ricordisamoa
cd7da66fb5 Add missing 'public' keywords to some more Language methods
All of them are already being used outside the class:
* getMonthAbbreviation
* getMonthAbbreviationsArray
* getWeekdayName
* sprintfDate
* userAdjust
* date
* time
* timeanddate
* getMessage
* iconv
* ucfirst
* uc

Change-Id: I63ec93858cebc02cdf3b9b042eddf4ef620cc110
2016-05-04 16:02:39 +02:00
jenkins-bot
bb333d91a4 Merge "Show absolute block expiry in user timezone on block logs" 2016-04-09 19:37:29 +00:00
umherirrender
afd72226b8 Show absolute block expiry in user timezone on block logs
For this add an user parameter to Language::translateBlockExpiry.
This allows the function to display the absolute block expiry in the
user's time zone. Use this when formatting block log entries.

This also avoids the use of $wgUser

Bug: T131241
Change-Id: If0a1d3c88bb4242a016eb9b2df115413de786149
2016-04-09 20:57:33 +02:00
Kevin Israel
5b48bf1b92 Clean up after "Kill mbstring fallbacks"
* Removed fallback code from Language, the associated data file
  (Utf8Case.ser), and the code to generate that data file.
* Removed comment in LanguageFi that "mb_substr has a compatibility
  function in GlobalFunctions.php".
* Removed check for mbstring in bench_utf8_title_check.php.
* In the tests for StringUtils::isUtf8():
  * Removed separate test for the non-mbstring code path.
  * Removed mentions of mbstring from function names and assertion
    messages, since mb_check_encoding() is now always used.
* Also updated the comment in StringUtils::isUtf8() referring to
  PHP 5.3, which is no longer supported in MediaWiki, to indicate
  that the same issue also exists in old versions of HHVM. (If
  we don't have to support 3.4 or older, then the function could
  be deprecated and removed if desired.)

Follows-up 943563062f.

Change-Id: I55e5cd534b849c6ea06a7fadacbbf34a12d87ebe
2016-04-07 09:02:37 -04:00
Siebrand Mazeland
5b119a0e44 Replace uses of join() by implode()
All of core uses implode() consistently now.

Change-Id: Iba50898c64c43f356d1caf8869f484e90d9ff651
2016-03-08 18:24:16 +00:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
jenkins-bot
d4ecfc1a5c Merge "Don't modify $wgHooks on language object construction" 2016-02-11 04:05:56 +00:00
Tim Starling
f43e0d840f Use autoloader for PHP data files instead of include/require
Move ZhConversion.php and Names.php to languages/data and make them both
expose their data as static class variables instead of in the local
scope. This means that the autoloader can be used to load the data,
which is efficient and secure. This also makes additional request-local
caching of the arrays unnecessary.

Change-Id: Iafb96ac4165d0965fcb9a69f1d0a91139ea9790c
2016-01-30 13:08:46 +11:00
Tim Starling
059fd9a2ae Don't modify $wgHooks on language object construction
Previously various language objects would install a hook to update the
shared conversion table cache when the object was constructed. This is
not a good idea since language objects may be constructed even when they
are not the content language, but only the content language is
associated with variant conversion and the conversion cache.

Instead, have WikiPage call a method on $wgContLang directly. I put this
with message cache update since the logic is almost identical.

Change-Id: Ief9c0ef993e39645e74a6e158cb4e6e2139ce91d
2016-01-29 15:03:56 +11:00
Tim Starling
e0a74a848c Remove require_once for language classes
Remove require_once for LanguageConverter and base classes. These
are in the autoloader now, so an explicit require is no longer
necessary.

Change-Id: Ie34ffc58fd9ec89fb57cf077dd5ac1746c35c48e
2016-01-29 10:05:25 +11:00
jenkins-bot
f56a905990 Merge "Fix declension in grammar rules for Latin language" 2016-01-26 10:34:31 +00:00
Paladox
2b61957cfe build: Update mediawiki/mediawiki-codesniffer to 0.5.1
Two rules are ignored for now to allow us to upgrade:
* MediaWiki.ControlStructures.AssignmentInControlStructures.AssignmentInControlStructures
* Generic.ControlStructures.InlineControlStructure.NotAllowed

Also ignore the .git folder.

Change-Id: I1b149c72b27be54e22e369999ad0c41c2d1fc2b4
2016-01-02 09:50:09 +00:00
Étienne Beaulé
8ce92ee8d9 Fix declension in grammar rules for Latin language
On behalf of User:UV on la.wiki.

Bug: T122022
Change-Id: Icc24b29558947989dc35468ea0f6e1741824cb58
2015-12-21 15:46:49 -04:00
Timo Tijhof
86b4704d09 languages: Avoid getPreferredVariant() in ucfirst/lcfirst unless needed
This method calls out to LanguageConverter which involve the User,
Request, and additional validation.

Change-Id: I3edae1244073767a8d8888708024bb5498c70dc9
2015-11-10 00:47:18 +00:00