Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
Reedy	f44c1ff158	Add ICU mapping for versions 62 and 63 Change-Id: I5e1238e856d4149c30806e6b2cb3619c0c9c1dbf	2018-10-18 20:03:59 +01:00
Fomafix	5632815976	Write Latin and other scripts with captial letter Change-Id: I16c660e54191b63cd6eb3407cb00504665930c4e	2018-10-05 18:49:08 +02:00
Pikne	9c46871d60	Remove xx-uca-et collation workaround Remove workaround introduced in I3e8031b9. No longer needed. Bug: T202977 Change-Id: I39921ef83cddc33535b99bd9c0b75f8afb52ea9a	2018-09-11 13:33:24 +00:00
jenkins-bot	e66bef06b2	Merge "collation: Move first-letters-root to includes/collation/data"	2018-08-14 23:53:14 +00:00
Aryeh Gregor	90d4f56fe4	Mass conversion of $wgContLang to service Brought to you by vim macros. Bug: T200246 Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a	2018-08-11 22:44:29 -06:00
jhsoby	3469713f4e	Remove special collation for Northern Sami This removes the special collation for Northern Sami that was added in 396007, when UCA support for Northern Sami was not yet in MediaWiki. Now it is, so this is no longer needed. Bug: T182431 Change-Id: I760eb7ae8bf92f0ac93b5fca5cb69148a28d8f6f	2018-08-07 01:21:21 +02:00
Timo Tijhof	3186d2d04b	collation: Move first-letters-root to includes/collation/data For consistency with other data files. Also, like the other data files: * For automated fetching of the Unicode files, move the steps from Makefile to a bash script. * Switch to a static array file format. Change-Id: If07487950a270283b8eaeda9a507e723ed2d89c4	2018-08-01 22:40:22 +00:00
Fomafix	0f1858321c	Use PHP 7 '??' operator instead of if-then-else Change-Id: I790b86e2e9e3e41386144637659516a4bfca1cfe	2018-06-12 23:14:18 +02:00
Bartosz Dziewoński	0313128b10	Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals In cases where we're operating on text data (and not binary data), use e.g. "\u{00A0}" to refer directly to the Unicode character 'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h (which correspond to the UTF-8 encoding of that character). This makes it easier to look up those mysterious sequences, as not all are as recognizable as the no-break space. This is not enforced by PHP, but I think we should write those in uppercase and zero-padded to at least four characters, like the Unicode standard does. Note that not all "\xNN" escapes can be automatically replaced: * We can't use Unicode escapes for binary data that is not UTF-8 (e.g. in code converting from legacy encodings or testing the handling of invalid UTF-8 byte sequences). * '\xNN' escapes in regular expressions in single-quoted strings are actually handled by PCRE and have to be dealt with carefully (those regexps should probably be changed to use the /u modifier). * "\xNN" referring to ASCII characters ("\x7F" and lower) should probably be left as-is. The replacements in this commit were done semi-manually by piping the existing "\xNN" escapes through the following terrible Ruby script I devised: chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8') puts chars.split('').map{\|char\| '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}' }.join('') Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a	2018-06-04 16:20:13 +00:00
Bartosz Dziewoński	b191e5e860	Use PHP 7 '<=>' operator in 'sort()' callbacks `$a <=> $b` returns `-1` if `$a` is lesser, `1` if `$b` is lesser, and `0` if they are equal, which are exactly the values 'sort()' callbacks are supposed to return. It also enables the neat idiom `$a[x] <=> $b[x] ?: $a[y] <=> $b[y]` to sort arrays of objects first by 'x', and by 'y' if they are equal. * Replace a common pattern like `return $a < $b ? -1 : 1` with the new operator (and similar patterns with the variables, the numbers or the comparison inverted). Some of the uses were previously not correctly handling the variables being equal; this is now automatically fixed. * Also replace `return $a - $b`, which is equivalent to `return $a <=> $b` if both variables are integers but less intuitive. * (Do not replace `return strcmp( $a, $b )`. It is also equivalent when both variables are strings, but if any of the variables is not, 'strcmp()' converts it to a string before comparison, which could give different results than '<=>', so changing this would require careful review and isn't worth it.) * Also replace `return $a > $b`, which presumably sort of works most of the time (returns `1` if `$b` is lesser, and `0` if they are equal or `$a` is lesser) but is erroneous. Change-Id: I19a3d2fc8fcdb208c10330bd7a42c4e05d7f5cf3	2018-05-30 18:05:20 -07:00
James D. Forrester	70c711a6bc	Follow-up If8dfdaf1: Hard-deprecate, drop two uses, other pre-5.3 back-compat code Change-Id: I1c5eee3fe30d6687d88e07011a3d40b6770d0daf	2018-05-24 17:01:02 -07:00
jenkins-bot	60ee1e8110	Merge "Add unicode mapping for ICU 60 and 61"	2018-05-24 21:46:32 +00:00
Reedy	fdb8724e7f	Add unicode mapping for ICU 60 and 61 Change-Id: Ifbbc8d7ecc788bc2c6b07a8ebba46a9648545786	2018-05-24 22:28:19 +01:00
James D. Forrester	a6c4d473de	IcuCollation: Deprecate getICUVersion(), no need for PHP53 back-compat Change-Id: If8dfdaf187b32b7b9a2c09a240416b9f481593f1	2018-05-24 21:23:18 +00:00
Amir Sarabadani	5a21de8abb	Remove everything related to CollationFa This workaround was needed when ICU in production was broken but after T189295 this is not needed anymore and we switched off this collation from all Persian Wikis already Bug: T139110 Change-Id: Ifad89555b6ac96a3eb36ca24b55e1f8ee57a1f05	2018-05-18 18:33:25 +02:00
Bartosz Dziewoński	390ff7fca1	IcuCollation: Use codepoint as tiebreaker when getting first-letters This prevents unexpected cuneiform digits from acting as headings for 2 and 3 on category pages. Bug: T187645 Change-Id: I0424a24769899cb23b28704f97e1002fa44999fd	2018-05-11 06:36:24 +00:00
jenkins-bot	1a21a63d52	Merge "Add collation for Abkhaz (ab)"	2018-01-23 18:42:29 +00:00
Umherirrender	23ef520a1c	Improve some parameter docs Change-Id: I31e983d7ac287158101b18ad95779d83537302a2	2018-01-07 11:39:08 +01:00
Bartosz Dziewoński	e94587dfbb	Add collation for Abkhaz (ab) * Adding new class AbkhazUppercaseCollation, mapped to 'uppercase-ab'. * Extended CustomUppercaseCollation with support for sorting digraphs and for alphabets larger than 64 letters (up to 4096). Bug: T183430 Change-Id: I16d44568e44d7ef5b39c38b1a6257b9fe10a34d4	2017-12-25 14:37:14 +00:00
jhsoby	660caf9b88	Add custom collation for Northern Sami This commit adds a custom collation order for Northern Sami ('se'). Northern Sami exists in ICU, but the version of ICU that Wikimedia uses is a few years old, and does not include Northern Sami. It could be years before Wikimedia's production servers use the one that includes Northern Sami (see bug), so this is a temporary workaround to amend this issue. Bug: T181503 Change-Id: Ib8a48b8db99bef8ec4b05144aace5dbdcacfeded	2017-12-07 21:32:11 +00:00
Reedy	7b3add76b1	Add Unicode to ICU mappings for versions 58 and 59 Change-Id: I87a5e6ce3a44a2be1e6bf8adf2f98cd0a4745574	2017-10-25 23:42:28 +01:00
Umherirrender	14dfc3dbc5	Fix typo in 'language' Change-Id: I3c4d090640892ae07d3da33dcfe3ace397a40808	2017-10-07 18:53:04 +02:00
Umherirrender	f739a8f368	Improve some parameter docs Add missing @return and @param to function docs and fixed some @param Change-Id: I810727961057cfdcc274428b239af5975c57468d	2017-09-10 20:32:31 +02:00
jenkins-bot	1d7a1bf8bd	Merge "Move around "ا" to after "آ" and not before"	2017-09-06 13:12:13 +00:00
Amir Sarabadani	2ceba3b145	Move around "ا" to after "آ" and not before Bug: T173601 Change-Id: I0f6b3ecc2800180a2c6a8217803411862a299e04	2017-08-31 08:02:08 +00:00
Umherirrender	3f1a52805e	Use short type bool/int in param documentation Enable the phpcs sniffs for this and used phpcbf Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925	2017-08-20 13:20:59 +02:00
Umherirrender	5544cef16b	Add missing type to @param documentation Change-Id: I6b2c9c7af9a281fe457099cc3a336a60a25e74aa	2017-08-11 20:37:35 +02:00
Umherirrender	ace44e2064	Use correct variable name in @param documentation For some varargs a variable name is added with suffix ,... as seen for many other varargs Some @param are swapped, because there are in the wrong order Enable Sniff MediaWiki.Commenting.FunctionComment.ParamNameNoMatch Change-Id: I60fec6025bce824d5c67563ab7b65ad6cd628ad8	2017-08-11 19:27:19 +02:00
Kunal Mehta	d1cf48a397	build: Update mediawiki/mediawiki-codesniffer to 0.10.1 And auto-fix all errors. The `<exclude-pattern>` stanzas are now included in the default ruleset and don't need to be repeated. Change-Id: I928af549dc88ac2c6cb82058f64c7c7f3111598a	2017-07-22 18:24:09 -07:00
jenkins-bot	e72303c9f3	Merge "Remove auto-generated "Constructor" documentation on constructors"	2017-07-21 13:19:44 +00:00
Thiemo Mättig	91a920fd85	Remove auto-generated "Constructor" documentation on constructors Having such comments is worse than not having them. They add zero information. But you must read the text to understand there is nothing you don't already know from the class and the method name. This is similar to I994d11e. Even more trivial, because this here is about comments that don't say anything but "constructor". Change-Id: I474dcdb5997bea3aafd11c0760ee072dfaff124c	2017-07-21 12:19:30 +02:00
Bartosz Dziewoński	98627d4cab	IcuCollation: Fix diacritic characters for Aromanian (rup) and Moldovan (mo) headings They should be Ș, Ț (comma-below) and instead they were cedilla-below (Ş, Ţ). Same as for Romanian (ro) in `486f64f283`. Both of these languages are unsupported by libicu and so the collations are unlikely to have been used in practice. Bug: T171043 Bug: T171044 Change-Id: Idd0d593e73cd784fbef7b75e8985f988f5555e26	2017-07-19 21:49:27 +02:00
Brian Wolff	22cb66c175	Update FIRST_LETTER_VERSION for rowiki changes Can't just clear cache on production, as this now uses per-server apc instance. Follow-up `486f64f283` Change-Id: I88df6d5a91c86ef687543d1a6988e0ec050bbfce	2017-07-19 17:56:38 +00:00
Bartosz Dziewoński	486f64f283	IcuCollation: Fix diacritic characters for Romanian (ro) headings They should be Ș, Ț (comma-below) and instead they were cedilla-below (Ş, Ţ). Bug: T168711 Change-Id: I6dc873c3ce93bca3e425439f70d0fb30aecc9533	2017-07-19 16:28:02 +02:00
Bartosz Dziewoński	b3caa05a38	CollationFa: Avoid PHP 7 Unicode escape syntax We still support PHP 5.5. Change-Id: I587cb794cded95afe7ad493614a6090a108efe6c	2017-06-22 16:22:49 +02:00
Brian Wolff	0bfcbd7240	Hack around icu breakage for fa sorting Bug: T139110 Change-Id: I35bcdaf309f595258289f01bbe5713ce6d1ffad1	2017-05-19 22:14:43 +00:00
Brian Wolff	73f5937047	Add collation for Bashkir (ba) This is based on a numeric uppercase collation. Bashkir characters will be remapped to the private use area for the purpose of sorting. Bug: T162823 Change-Id: I65f1af0b57ff6ded7d464e39efd401f178a3519e	2017-05-10 04:17:46 +00:00
Timo Tijhof	3a2a707546	Clean up remaining get_class() uses * get_class() -> __CLASS__ (same as self::class) * get_called_class() -> static::class * get_class($this) -> static::class Change-Id: I1888a1897ecf4548a2e5a67a942e5c080dd7e3d3	2017-03-07 22:03:47 +00:00
jenkins-bot	17eda64357	Merge "includes: Replace implicit Bugzilla bug numbers with Phab ones"	2017-02-28 00:51:57 +00:00
Bartosz Dziewoński	267efadac7	Collation: Allow uppercase letters in UCA collations' names We have several such collations defined in IcuCollation: * bs-Cyrl * de-AT@collation=phonebook * fr-CA * sr-Latn They couldn't actually be used. Change-Id: I3a62073583c49d3e90910aa8240fe9fcc0682386	2017-02-22 21:17:54 +01:00
James D. Forrester	9635dda73a	includes: Replace implicit Bugzilla bug numbers with Phab ones It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345" except where it doesn't, so let's just standardise on the real numbers. Change-Id: I6f59febaf8fc96e80f8cfc11f4356283f461142a	2017-02-21 18:13:24 +00:00
Bartosz Dziewoński	afc6e7cd15	CollationFa: Third time's the charm We have to use a tertiary sortkey for everything with the primary sortkey of 2627. Otherwise, the "Remove duplicate prefixes" logic in IcuCollation would remove them. The following characters will now be considered separate letters in the 'xx-uca-fa' collation for the purpose of displaying the headings on category pages: ء ئ ا و ٲ ٳ Bug: T139110 Change-Id: Ibbea5d76348e4cdc38b74cba44286910b2ed592f	2017-01-05 15:54:00 +01:00
Bartosz Dziewoński	611801a38d	IcuCollation: Add the current class name to 'first-letters' cache key Instances of subclasses of IcuCollation with customizations for specific languages probably shouldn't share this cache with instances of IcuCollation with the same language. Change-Id: I06d66d199c99448a3375381baef0366c4d99c8c4	2016-12-15 15:17:56 +01:00
jenkins-bot	ce079cf6ad	Merge "Add CollationFa"	2016-12-15 13:37:56 +00:00
Amir Sarabadani	708c02281e	Add CollationFa Bug: T139110 Change-Id: Ie15a2ee1c22ff4a1d2b721ed137227fe83dd12ea	2016-12-15 13:25:56 +00:00
jenkins-bot	ea42d90053	Merge "Make NumericUppercaseCollation use localized digit transforms"	2016-11-16 02:46:31 +00:00
Brian Wolff	779aa4ce5a	Add first letter data for bn collation (Standard and Traditional) This is based solely on looking at the bn.txt collation data file. It has not been tested by native speakers. Bug: T148885 Change-Id: Ide926bc5ee8752269ef6a1bfe972e19b7188d193	2016-11-15 16:09:45 -08:00
Bartosz Dziewoński	37b1fc9456	IcuCollation: Do not split $tailoringFirstLetters into verified/not verified At this point I think it's safe to assume that these mostly work well, and the split makes maintenance of the alphabetical list more difficult (some entries were already in wrong order). We've been enabling these collations for more and more Wikimedia wikis and not hearing about any problems. Mistakes, if any are present, should be treated like any other bug. Also made some comments consistent. Change-Id: I4b5fbcf4dbbdd4dc194ed821341296171fa64bb0	2016-10-31 16:48:13 +01:00
Brian Wolff	95c299e67f	Add firstLetter data for ~50 additional languages Based on CLDR 29 data files. This did the relatively easy languages in CLDR 29 (Which is most of them). I skipped languages with complicated tailoring files. Change-Id: I8367604f7d3a1cdef9cb4e15813893c8cbfff1ff	2016-10-29 12:10:52 +00:00
Brian Wolff	e7464f3481	Make NumericUppercaseCollation use localized digit transforms This will cause the numeric collation to sort localized digits for the current content language the same as how 0-9 are. This only deals with the localized digit numbers, commas and other number formatting are still not handled. Weird "numerical" unicode characters are also not handled. I was unsure if to make a "family" of numeric collations where you specify numeric-<lang code>, or if it should just use $wgContLang. Given that $wgContLang effectively never changes, and also affects all other digit handling, I opted to just use $wgContLang. Any wikis currently using the 'numeric' collation will have to have updateCollation.php --force run after this change is deployed. At the moment that includes: bnwiki, bnwikisource and hewiki Bug: T148873 Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf	2016-10-29 08:38:39 +00:00

1 2

65 commits