Language class had a code snippet to verify whether a text is valid
UTF-8 though that could not be used from another place. The snippet use
mb_check_encoding() and fallback to some regex whenever mbstring is not
available.
* introduce StringUtils::isUtf8() which is mostly code moved out of the
language class.
* Enhance regex readability by using an expanded regex (//x)
* Made the regex to recognize longer sequences
* Add some unit tests to the mbstring and the PHP native implementation
* An optional second parameter can be passed to isUtf8() to force the
use of our PHP implementation. This is used for unit testing.
Change-Id: I4cf4dfe2eb02f046db1726f4654ba649e01419f2
Make Language::isValidBuiltInCode() return true or false, as
documented. Previously returned 0 or 1. Hacked around in I81ba3228.
Change-Id: Iaa7515095d687d745d878faaa957aae51737abf7
This replaces the suicide of LangObjCache and replaces it with a
LRU of configurable length.
The patch seems to be functional when I dumped some debug info.
Patchset 2: Updated commit message, reformatted a few lines.
Patchset 3: Removed the "do not merge"
Change-Id: Iee2c796a0c4dd491e31425e04121a1bf0554d7c9
* IRIs are getting more and more widely used these days so Chinese
characters are also needed to be prevented from being converted
in text of external links.
* So now all markNoConversion() functions in languages with variants
do the same thing. Merge them into a single function in the
Language class and drop implementations in individual languages.
* By the way rephrase phpdoc of that function, and (bug 24798) fix
the link detection regex to use wfUrlProtocolsWithoutProtRel().
Protocol-relative regex is excluded to avoid false positives.
* Add parser test for it.
Change-Id: I2ec0ac2b9b11221584adb72555168498de209d57
This reverts commit b218064865
Appears to have been merged prematurely. More comments were made after merge, there's an i18n technicality that I5840cc2f would address and there appear to be some design issues that have been discussed on wikitech-1 in thread http://lists.wikimedia.org/pipermail/wikitech-l/2012-October/064036.html
Uses one of the following formats:
- Just now
- 1 minute ago
- 35 minutes ago
- 13:04
- Yesterday at 13:04
- Wednesday at 13:04
- July 16
- July 16, 2012
Change-Id: I53dcf54763c68f15fc4f59b2668001b0cf84adf3
rails-i18n has the same, lets see if this is flexible
enough or whether we need to allow more complex expressions.
Change-Id: I50eb0c6d1c02ca936848d310de625ed1fe43d91a
* Fixes Bug 40251 and this is alternate for I403a29e2
* This brings back the old mediawiki behavior for languages without
defined plural rules
* Add a test for hu, which had issue as per Bug 40251
Change-Id: I345c305134a62d43c9dfedc5243981d0e77e326d
Variables in classes should be declared using public $foo
instead of var $foo for various reasons. As we require PHP 5.3
we don't have to take care about that PHP4 left over, but can
get rid of it in favour of the more clear and better readable
public.
See also: http://php.net/manual/en/language.oop5.visibility.php
(Divided into several commits to keep reviewable)
Change-Id: Ic723d0347ab2e3c78bc0097345c68bbee3dc035a
We can now do this since we finally switched to PHP 5.3 for MW 1.20 and get rid of the silly dirname(__FILE__) stuff :)
Change-Id: Id9b2c9cd2e678197aa81c78adced5d1d31ff57b1
Wrote a CLDR plural rule parser to replace the eval()-based one from
I58a9cdfe. It converts the infix notation of the XML files to a
sanitized RPN notation, referred to in external interfaces as the
"compiled" form. The RPN notation is cached and then executed by a
fast non-validating evaluator.
Timings for the largest rule in the XML file are ~1.2ms for
compilation and ~200us for execution.
Also:
* Lazy-load the plural rules when recache() requests them, instead of
loading them for every request.
* Language::convertPlural() needs integer keys, and CLDR only gives
string keys. The previous code was not mapping them so it didn't work
at all. I just mapped them in the order they appear in the XML file,
i.e. the first rule becomes MediaWiki's $pluralForm=0, the second
becomes $pluralForm=1, etc. Not sure if there is a more rigorous way
to do it.
Change-Id: I65ee788c1a8e5ee2ede2091990d86eb722749dd3
* Use the plurals.xml of CLDR for the plural rules of languages
* Use plurals-mediawiki.xml to override or extend the rules inside MW
* Remove the convertPlural method in each LanguageXX.php
* Parse and load the xml files in LocalisationCache
* Use the CLDRPluralRuleEvaluator.php for parsing the cldr plural rules
(This is taken from Translate extension and might require a replacement
parser without using eval)
* Add getPluralRules() to make the CLDR plural rules available to JS.
PS3: More method documentation, cleanup
Change-Id: I58a9cdfe60c7b9027bf031c91370472054f04ae2
Up Language::romanNumeral() to work upto 10,000
Does anyone know how do do letters with an underscore ontop of them? ;)
Change-Id: Ib1b1415126af855ce5fb55f81b71534c26d84cc9
@fixme is simply not recognized by doxygen whereas @todo is used to
generate a nice ... todo list!!
Change-Id: If956c0a164373126ce48b791d45c56962034eecd