This is not a real fix for the cause of the bug (which is a
pcre.recursion_limit that is far too low), but I do wonder
about the efficiency of using a regexp to test for valid
UTF-8 encoding. After all the regexp has to be compiled first
into a state machine.
Patch set 2: Php unit test for Language.checkTitleEncoding
Patch set 3: benchmark
Patch set 4: add benchmark for non-capturing subgroup in regexp, and
since that's faster than a capturing subgroup, use it in
checkTitleEncoding() in the regexp branch.
Patch set 5: use Tim's suggestion (once-only pattern) in the regexp
branch. Also add to benchmark.
Change-Id: I551f096921d4c9c57cbcb091b80ab5970ca86a9b
mb_ functions are in an extension, but apparently not enabled/compiled in by default. Safer to just keep these around till it maybe happens in the future
This reverts commit 1497bd1f36
Change the first letter of language names in Names.php lowercase where this is usual in the spelling, for most languages based on what I know, Wikipedia and CLDR. This makes them more consistent with CLDR.
Change Language::fetchLanguageNames() so these MediaWiki names are always used, and so CLDR names do not override them when using $inLanguage.
Change-Id: I7b41b978a309c40e0210f2a295d3cba65cd5ec4e
* Fixed "Notice: Undefined variable: addmsg in /var/www/TrunkWiki/core/languages/Language.php on line 254"
Change-Id: Ib5fa6b7c1137f24bc998249af72eaf7301a2aaa8
This make isValidBuiltInCode to throw an exception whenever it is passed
something which is not a string. The rational being to easily find out
errors when the method is wrongly used.
An alternative would be to detect the object being passed is a Language
object and get its Language code.
Change-Id: I37cc419cc725df8d8022e619d8f5191f58a8fd5e
In Language::fetchLanguageNames, fallback to the default option (mw) instead of returning false if none of the three options (all/mw/mwfile) is recognized
Change-Id: I743540bb0d1e7572a5a7e2f4ed9b57e7552d99b2
The Language::formatDuration() method introduced by this patch let us
easily render an amount of seconds for easier human reading.
$ maintenance/eval.php
> var_dump( $wgLang->formatDuration( 1000 );
string(25) "16 minutes and 40 seconds"
Also ran rebuildLanguage.php on Siebrands request
Change-Id: If287fb10e897d3d2374cf6eeae3bc5be00cdfc01
This add GRAMMAR support to the mediawiki.jqueryMsg module:
1. make jqueryMsg understand GRAMMAR(case insensitive)
2. mw.language get convertGrammar, can be overridden per language as in
php
3. Introduce resourceloader module ResourceLoaderLanguageDataModule
4. Language.php get a method to filter wgGrammerForms for the current
contentLanguage.
5. Qunit tests
6. This code was originally written in jsgrammar branch of svn and had
reviewed during the last slush time.
Change-Id: I90dd0b2f0cb30fd30539896c292829adc4fc7364
This function is similar to getDirMark(), but it adds HTML entities
instead of invisible Unicode characters.
It's based on MaxSem's suggestion in
https://gerrit.wikimedia.org/r/#change,3929
Change-Id: I5bd362d6e6a56478bf9f58b2b81fcad31be12d35
This reverts the SpecialCachedPage and formatDuration sagas, with some collateral damage here and there. All of these revisions are tagged with 'gerritmigration' and will be resubmitted into Gerrit after the Gerrit switchover. See also http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059124.html
* Add @since, fix indentation.
* Change default from 'all' to 'mw' as it's the most used (so default fetchLanguageNames() is equivalent to default getLanguageNames()).
* Add the include parameter also to fetchLanguageName() as it's needed in Parser: interlanguage links should only take into account mediawiki names. (Doesn't make a difference with how the functions are now, but could have been later.)
* Reduces the overly long code in r107002, and reduces code for {{#language:}}
* Fixes the language list in Special:Translate which contained languages that gave "invalid code" when selecting
Since we're here: nothing uses $namespaceNames, $mNamespaceIds or $namespaceAliases
outside of this class (core or extensions) so lets make it protected.
Problem was caused by inexact floating-point comparisons with values returned from
log10(); worked around by simply duplicating the very similar code in the function
immediately below, which does the same thing with 1024 instead of 1000 unit sizes,
uses only simple division, and passes the test cases.
Language::formatBitrate() uses log10() to makes a long number human readeable.
There is a nasty rounding error on Mac OS X for log10():
log10(pow(10,15)) => gives 15
floor( log10(pow(10,15)) ) => gives 14 (should be 15)
The end result is that pow(10,15) is formatted as 1,000Tbps instead of 1Pbps
log( $foo, 10) does not suffer from this:
php -r 'print floor(log(pow(10,15),10)) ."\n";'
PHP Version used:
$ php -v
PHP 5.3.6 with Suhosin-Patch (cli) (built: Sep 8 2011 19:34:00)
Copyright (c) 1997-2011 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies
with Xdebug v2.1.2, Copyright (c) 2002-2011, by Derick Rethans
$
TEST PLAN:
BEFORE
======
$ php phpunit.php ./languages/LanguageTest.php
PHPUnit 3.6.3 by Sebastian Bergmann.
............................................................... 63 / 170 ( 37%)
............................................................... 126 / 170 ( 74%)
.......................................F....
Time: 2 seconds, Memory: 32.25Mb
There was 1 failure:
1) LanguageTest::testFormatBitrate with data set #5 (1000000000000000, '1Pbps', '1 petabit per second')
formatBitrate('1000000000000000'): 1 petabit per second
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'1Pbps'
+'1,000Tbps'
FAILURES!
Tests: 170, Assertions: 174, Failures: 1.
AFTER
=====
PHPUnit 3.6.3 by Sebastian Bergmann.
............................................................... 63 / 170 ( 37%)
............................................................... 126 / 170 ( 74%)
............................................
Time: 1 second, Memory: 32.25Mb
OK (170 tests, 174 assertions)