Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
jenkins-bot	dca2c238b8	Merge "Allow uca-sv@collation=standard to be a collation name."	2013-06-27 18:48:21 +00:00
Brian Wolff	ac88b636b8	Allow uca-sv@collation=standard to be a collation name. The "standard" collation for Swedish sorts V and W only as secondary differences. Compare this to the "reformed" collation which sorts them as separate letters. Which collation is default for sv seems to vary on icu version, but for icu 4.8 (which wmf uses) reformed is default. svwikisource wants to use the "standard" collation. Change-Id: I051590cf687ddea2e2cd84203d6e8eed3a6efd99	2013-06-27 20:37:14 +02:00
Brian Wolff	a075f0de28	Add fa to collation list. Based on http://collation-charts.org/icu442/icu442-fa.html Should be verified by a native speaker. Bug: 30287 Change-Id: I3c30824f7d133cf615ec7c2c39d31f27c39f89fe	2013-05-16 22:59:46 -03:00
umherirrender	da39005596	Removed space after isset While at it, added/removed some other spaces in the same files Change-Id: Iabb23a448f6f53eb6020155f9c744f74f8b11786	2013-04-26 14:18:06 +02:00
umherirrender	ef2f507d23	Fixed spacing in files direct in includes folder Added spaces before if, foreach Added some braces for one line statements Change-Id: Ibb8dd102db045522d12ff939075ba7420d95ab6b	2013-04-21 06:38:49 +00:00
umherirrender	15abcf71ca	Added/Removed spaces around string concatenation And added/removed spaces around some other tokens, like +, -, *, /, <, >, =, ! Fixed windows newline style Change-Id: I0b9c8c408f3f6bfc0d685a074d7ec468fb848fc8	2013-04-13 13:36:24 +02:00
Brian Wolff	3d70637a42	Remove first letters that have an overlapping prefix. First letters are supposed to be primary collation elements. However, we do not want expansions to be considered as firstletters (aka thorn "þ" -> "th" which isn't the same as any other first letter (since "t" !== "th" ) however if þ was a first letter, the word "the" and even worse the word "too" would be sorted under it, which is wrong. Looking for feedback if this all sounds sane. I have tested it, it got rid of the contractions while at the same time not removing any letter it wasn't supposed to. Once this is merged, we could get rid of all the -<langcode> entries. The other firstLetter array entries for tailorings could be merged into generateCollationData.php too, since incorrect things would get pruned automatically, which would probably make the logic in Collation.php simpler. Bug: 43740 Change-Id: I4bd3d39ec2938a53e2c6728adc48ee6cf9778d74	2013-04-08 22:52:40 +00:00
umherirrender	6c278b6d7e	fix some spacing * Removed spaces around array index * Removed double spaces or added spaces to begin or end of function calls, method signature, conditions or foreachs * Added braces to one-line ifs * Changed multi line conditions to one line conditions * Realigned some arrays Change-Id: Ia04d2a99d663b07101013c2d53b3b2e872fd9cc3	2013-03-25 22:22:46 +00:00
Brian Wolff	6662199c53	Merge "IcuCollation::$tailoringFirstLetters: letter removal rules for Finnish"	2013-03-23 20:53:00 +00:00
MatmaRex	3d7966d28c	IcuCollation::$tailoringFirstLetters: letter removal rules for Finnish Four non-ASCII letters - Ǥ, Ŋ, Ŧ, Ʒ - are sorted the same as their unaccented base ASCII versions - G, N, T, Z - causing unexpected output on category pages. Bug: 46330 Change-Id: I976dedfdc651fcc96a2291934924aa40b27f4c2f	2013-03-21 00:12:00 +01:00
MatmaRex	9c6655adb2	IcuCollation::$tailoringFirstLetters: 'sv', 'vi' verified * sv: per Lejonel in comments on bug 45446 * vi: per Minh Nguyễn in comments on bug 45979 Change-Id: I96bbcd73e75f9fc85a5c0b402eae87e5cda2259e	2013-03-18 13:24:25 +01:00
Tim Starling	029dcc9953	Allow first letter data to be invalidated Just a class constant for now, but that should suffice to deal with the current emergency. Proper dependency tracking via the CacheDependency hierarchy would be pretty cool in the long term. Change-Id: Ibbe7fa2814434d4869aba20f628bd43269e611fa	2013-03-13 14:53:20 +11:00
MatmaRex	ae38b340dc	IcuCollation::$tailoringFirstLetters: implement letter removal This is necessary for Swedish, where 'Þ' ("thorn") - considered a separate letter by default in the first-letters-root.ser file - is sorted as 'th', causing unexpected output on category pages - words starting with 'th'..'u' were placed under a heading with the thorn. There were three obvious ways to do this: * somehow include information that this letter is to be removed in the string itself, as in 'sv' => array( "Å", "Ä", "Ö", "-Þ" ) - could potentially clash with valid uses * create a separate array other than $tailoringFirstLetters to store this information - would cause the data to be fragmented all over the file * include information about letters to be removed in a separate key "linked" to the regular one, as in '-sv' => array( "Þ" ) - I see no obvious downsides, so this is what I ended up doing Bug: 45446 Change-Id: I57e07a2027c391c5baa767a68f4409b9de7b4618	2013-03-11 22:24:30 +01:00
Tyler Anthony Romeo	4dcc7961df	Fixed @param tags to conform with Doxygen format. Doxygen expects parameter types to come before the parameter name in @param tags. Used a quick regex to switch everything around where possible. This only fixes cases where a primitve variable (or a primitive followed by other types) is the variable type. Other cases will need to be fixed manually. Change-Id: Ic59fd20856eb0489d70f3469a56ebce0efb3db13	2013-03-11 13:15:01 -04:00
MatmaRex	453ed1818e	IcuCollation::$tailoringFirstLetters: 'en', 'it', 'hu', 'pt', 'uk' verified * en: obviously * it: per Nemo_bis in comments on change I97273c52 * hu: per Tisza Gergő in comments on bug 45596 * pt: 'uca-default' collation is deployed on pt.wiki, 'uca-pt' is the same thing * uk: per Dmytro Dziuma in comments on bug 45444 Change-Id: Ia7568a9ad40ef991b73059b5269e6236f52681f1	2013-03-11 05:23:18 +00:00
MatmaRex	c95cf323ff	lowercase second character in digraph letters in IcuCollation tailorings This is the valid way for Hungarian (per bug 45596 comment 10), and it's likely more appropriate for other languages as well. I should have done it this way in the first place; the original data source includes these forms along with the all-uppercase ones (I checked them all), so they're certainly at least not wrong. Just an overlooking on my part. Change-Id: Ie0ca297a082ddba8d757beb85655f86b3ee70b02	2013-03-11 05:18:29 +00:00
umherirrender	d63121016d	fix some spacing Added/removed spaces around logical/arithmetic operator Reduced multiple empty lines to one empty line Removed wrong tabs before comments at end of line Removed too many spaces in assigments Change-Id: I2bba4e72f9b5f88c53324d7b70e6042f1aad8f6b	2013-03-07 17:53:21 +01:00
MatmaRex	d01cbb4148	adjusted comments for IcuCollation::$tailoringFirstLetters More information about what actually sits in that array. Summary of modifications to the Mimer data so far: * removed data for "traditional" variants of de (German) and es (Spanish) * used code 'tl' instead of 'fil' for Tagalog/Filipino * added be-tarask (Belarusian Taraškievica) Change-Id: I97273c52599a5eda3f63366d697b077d6b17ba81	2013-03-05 13:45:15 +01:00
Pavel Selitskas	afec7906ad	language-specific collations: be-tarask added; be, be-tarask, ru verified Change-Id: I560d766f9b9e9a4ff79e35aa4eec79be875c84c7	2013-02-27 23:55:48 +00:00
MatmaRex	0c28ca1422	Revert "(bug 29788) Swedish Collation (uppercase-sv). Swaps Ä and Æ" This workaround is unnecessary now that I838484b9 was merged. This reverts commit `13dc8ff88f`. Change-Id: I2cd22ad87eb7a56c5742b20c6089a4b8607e5614	2013-02-26 22:18:36 +00:00
MatmaRex	9143494912	(bug 43799) create language-specific collations for category sorting This allows one to finally get articles to be correctly sorted on category pages for 67 languages based in latin, greek and cyrillic alphabets. Fixes bug 29788, bug 41040, and bug 42412 (implementing collations for Swedish, Polish, Ukrainian). Full list of language codes this adds support for: af, ast, az, be, bg, br, bs, ca, co, cs, cy, da, de, dsb, el, en, eo, es, et, eu, fi, fo, fr, fur, fy, ga, gd, gl, hr, hsb, hu, is, it, kk, kl, ku, ky, la, lb, lt, lv, mk, mo, mt, nl, no, oc, pl, pt, rm, ro, ru, rup, sco, sk, sl, smn, sq, sr, sv, tk, tl, tr, tt, uk, uz, vi. * Include data about first-letter characters for 67 language tailorings. This data was generated from based on http://developer.mimer.com/charts/tailorings.htm by a Ruby script (https://www.mediawiki.org/wiki/User:Matma_Rex/generateCollationTailoringData.rb), then adjusted by hand (removed duplicate definitions for Spanish and German, changed code fil -> tl (Filipino -> Tagalog). * Mark languages verified by native speakers (currently only pl (Polish) I verified by myself and fi (Finnish) checked by Niklas). * Allow for collations named like 'uca-<langcode>', mapping them to IcuCollation with appropriate parameter. The code doesn't check if we actually have data for given language, as it's checked after the IcuCollation class instance is constructed. * Add the tailoring data to the default first-letter file (for root collation) before it's cached for given locale. Change-Id: I838484b9aaf23945fe7880fef2e3da5f5c06877f	2013-02-26 20:58:55 +01:00
MatmaRex	e8c0c2ad46	(bug 43801) add a getter for ICU version to ICUCollation It will be necessary to be able to use correct version of Unicode data files. The constant INTL_ICU_VERSION this getter returns isn't really documented. It is available since PHP 5.3.7 (see PHP bug 54561), the getter will fail gracefully on older PHPs. It should be possible to determine the ICU version on these by grepping the output of phpinfo(), but I don't think such a minor improvement is worth such a huge hack. Change-Id: I85353559439bfddee7c5ba90894d30dd8ef0e0e8	2013-02-08 16:57:08 -04:00
jenkins-bot	f8daed077a	Merge "(bug 43801) add a getter for ICU version to ICUCollation"	2013-02-06 19:35:36 +00:00
Brian Wolff	13dc8ff88f	(bug 29788) Swedish Collation (uppercase-sv). Swaps Ä and Æ See I4542f57a. Meant as a temporary meassure until such a time generic tailoring code is implemented for uca. This patch is mostly Lejonel's code, with the class renamed. Change-Id: Id39406c37a5277d9e7a9216544de2140411c2b01	2013-02-05 22:21:50 +00:00
Antoine Musso	f6b92231fd	style: normalize end of files By PSR2 PHP Standard, the files should ends with exactly one newline. Some of our files have 2 or more and some other were missing a newline. Fix almost all occurences of CodeSniffer sniff: PSR2.Files.EndFileNewline.TooMany I have not fixed the selenium files, I believe we will drop them. Change-Id: I89fca8c1786fee94855b7b77bb0f364001ee84b6	2013-02-03 15:04:39 +01:00
MatmaRex	1bcba60f80	(bug 43801) add a getter for ICU version to ICUCollation It will be necessary to be able to use correct version of Unicode data files. The constant INTL_ICU_VERSION this getter returns isn't really documented. It is available since PHP 5.3.7 (see PHP bug 54561), the getter will fail gracefully on older PHPs. It should be possible to determine the ICU version on these by grepping the output of phpinfo(), but I don't think such a minor improvement is worth such a huge hack. Change-Id: Iee4b8380406ae71c980dfdd7b9fdd0b58ecb9cd0	2013-01-30 19:46:25 +01:00
Tim Starling	1eca50c383	Fix various boundary cases in IcuCollation::findLowerBound() Fix the following edge cases which were previously broken: * Zero-length input array * Target value before the start * Target value past the end They didn't really matter for my original application, but Liangent wants to use this function for something else. Change-Id: Ia5f5ed4ab3cb6c463177a4812fd3ce96c6d37b33	2012-10-17 14:30:49 +11:00
umherirrender	85d8ee1f87	Remove a bunch of trailing spaces and unneeded newlines Change-Id: I00f369641320acd7f087427ef031f3ee7efa0997	2012-10-10 20:14:40 +02:00
Alexandre Emsenhuber	1082c71e9b	Added missing GPLv2 headers in some places. Also made file/class documentation more consistent. Change-Id: Ibe7815124d6915792dcbb150d01df21d9b22b0b0	2012-05-21 21:56:39 +02:00
Sam Reed	7b25f8231f	Fixing some of the "@return true" or "@return false", need to be "@return bool" and then the metadata can say true if foo, false if bar Other documentation improvements	2012-02-09 19:30:01 +00:00
Brian Wolff	a658eee7fc	(bug 30722) Add an identity collation that sorts things based on what the unicode code point is (aka pre-1.17 behaviour). I'm tagging this 1.18 because the original bug was for iswiktionary wanting it, so it'd be nice to get it in 1.18.	2011-09-11 01:13:08 +00:00
Brian Wolff	f980458a9b	(Follow-up r90759 per CR) Use a hook to register new Collations instead of just taking the collation name as a class name	2011-07-05 05:30:04 +00:00
Brian Wolff	99ee7b7cf6	Let $wgCategoryCollation take a class name as a value so that extensions can define new Collation classes. (I plan to commit such an extension shortly) Wasn't sure if it would be better to make an array mapping collation names => class names instead. However, that seemed to be unneededly complicated so I went with letting that variable take class names.	2011-06-25 07:21:29 +00:00
Chad Horohoe	783d4e0862	Remove @static from all over the place. That's what the static keyword is for, this being PHP5 and all	2011-04-21 00:07:09 +00:00
Sam Reed	ca7ea0b1ad	More function documentation	2011-04-15 17:44:19 +00:00
Platonides	82eab17c16	Update comments to take into account r80443 and r80614 changes, per CR.	2011-01-28 22:27:52 +00:00
Tim Starling	eaeea84b44	* Introduced a non-dummy collation for $wgCategoryCollation, namely UCA with default tables. * Added a maintenance script which generates a list of first letters. Unified Han are omitted for performance, and because they shouldn't be used as headings anyway. A future collation specific to Chinese would provide the KangXi radicals as "first letters". * Provided a precomputed list of first letters. Used Unicode 6.0.0 data and ICU 4.2. * Moved collation functionality from Language to a Collation class hierarchy with factory function. Removed the recently-added methods from Language and updated all callers. * Changed Title::getCategorySortkey() to separate its parts with a line break instead of a null character. All collations supported by the intl extension ignore the null character, i.e. "ab" == "a\0b". It would have required a lot of hacking to make it work. * Fixed the uppercase collation to handle non-ASCII characters, redundantly with r80436. I don't think it's necessary to change the collation name as was done there, so I reverted that in the course of my conflict merge. A --force option to updateCollation.php might be nice though.	2011-01-17 14:02:22 +00:00

37 commits