This is not a real fix for the cause of the bug (which is a
pcre.recursion_limit that is far too low), but I do wonder
about the efficiency of using a regexp to test for valid
UTF-8 encoding. After all the regexp has to be compiled first
into a state machine.
Patch set 2: Php unit test for Language.checkTitleEncoding
Patch set 3: benchmark
Patch set 4: add benchmark for non-capturing subgroup in regexp, and
since that's faster than a capturing subgroup, use it in
checkTitleEncoding() in the regexp branch.
Patch set 5: use Tim's suggestion (once-only pattern) in the regexp
branch. Also add to benchmark.
Change-Id: I551f096921d4c9c57cbcb091b80ab5970ca86a9b
The Language::formatDuration() method introduced by this patch let us
easily render an amount of seconds for easier human reading.
$ maintenance/eval.php
> var_dump( $wgLang->formatDuration( 1000 );
string(25) "16 minutes and 40 seconds"
Also ran rebuildLanguage.php on Siebrands request
Change-Id: If287fb10e897d3d2374cf6eeae3bc5be00cdfc01
This reverts the SpecialCachedPage and formatDuration sagas, with some collateral damage here and there. All of these revisions are tagged with 'gerritmigration' and will be resubmitted into Gerrit after the Gerrit switchover. See also http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059124.html
Make Language::formatNum() handle TB through YB
Rewrote code to be simpler and less indenty
Though, something like formatBitrate might be be better in future... We'll see!
* Add a $noAbbrevs parameter that causes the 'seconds', 'minutes', etc. messages to be used instead of the 'seconds-abbrev', 'minutes-abbrev', etc. messages
* Add the 'seconds', 'minutes', 'hours' and 'days' messages
* Change the -abbrev messages to take a parameter rather than having the number prepended to them. This is for compatibility with 'seconds' et al, which need the parameter for {{PLURAL:}}. It also generally makes more sense. This does BREAK the messages in non-English languages that override them; Niklas told me to leave this alone and ping the TranslateWiki folks
* Introduce an 'ago' message for '$1 ago'. Not currently used in core, but I want to use it in an extension and it seemed stupid not to have such a thing in core.
* Refactor the function to use message objects and pass the number as a parameter
* Add tests! They exposed a subtle bug in my first iteration; all hail tests!
* Some fixes/changes to truncateHTML() based on tests
** Something like "<span>hello</span>" ends up as "<span>...</span>" instead of just "..." for relevant cases)
** If we get something like "<span></span" in, just return it back instead of ""
* Renamed $dispLength -> $dispLen in truncateHTML()
* Reject underscore in validation
* Still case unsensitive
* Corrected tests using underscore
Follow up r83160 which was a follow up of r82927 (language code validation)
A language code may contains the underscore character (be_tarask)
and might as well be upper case (FR).
Add tests for Language::isValidBuiltInCode() against some language codes