The Chinese conversion table is substantially updated to fix a lot of
bugs reported in recent years, and the script generating conversion
table (LanguageZh.php) is also modified to facilitate the maintenance.
Zh-sg and zh-my is set to fallback to zh-cn to improve reading
experience, since there is only trivial difference among them, just like
zh-hk and zh-mo. Further optimization for zh-sg and zh-my will be
performed in local conversion table of Chinese WikiProjects.
Bug: T91620
Change-Id: I1bb0315d6d7a2c9653905654d933942e362bcc42
Apply the conversion variants from specific zones before zh-hans
and zh-hant, to allow fitting specific linguistic habits before
falling back to the generic ones. The actual rules will be added
in a followup patch.
Previously, the zh-cn table was composed by:
(1) Load zh2Hans as zh-hans table
(2) Load zh2CN + zh2Hans as zh-cn table
(3) Load Conversiontable/zh-hans + zh-hans as zh-hans table
(4) Load Conversiontable/zh-cn + zh-cn as zh-cn table
(5) Load zh-hans + zh-cn as the final zh-cn table
The new loading order is:
(1) Load zh2Hans as zh-hans table
(2) Load zh2CN as zh-cn table
(3) Load Conversiontable/zh-hans + zh-hans as zh-hans table
(4) Load Conversiontable/zh-cn + zh-cn as zh-cn table
(5) Load zh-cn + zh-hans as the final zh-cn table
Change-Id: Ie9d08b85d4911618946fa7efd23eb898412449e5
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Change-Id: I7a4dec6a8de96ee21ef34e52bb755f723aa3b0e6
Follows-up I1343872de7, Ia533aedf63 and I2df2f80b81.
Also updated usage in text in documentation and the
installer LocalSettingsGenerator.
Most of them were handled by this regex:
- find: (require|include|require_once|include_once)\s*\(\s*(.+?)\s*\)\s*;$
- replace: $1 $2;
Change-Id: I6b38aad9a5149c9c43ce18bd8edbab14b8ce43fa
This is deprecated as of PHP 5.5, and the remaining uses are quite
silly. Tim said I should remove his easter egg from Special:Version,
as it already was broken, and a new one can be added in a separate
commit.
Change-Id: I0f09f4efc7afe5933c8317462026a475530a5324
* IRIs are getting more and more widely used these days so Chinese
characters are also needed to be prevented from being converted
in text of external links.
* So now all markNoConversion() functions in languages with variants
do the same thing. Merge them into a single function in the
Language class and drop implementations in individual languages.
* By the way rephrase phpdoc of that function, and (bug 24798) fix
the link detection regex to use wfUrlProtocolsWithoutProtRel().
Protocol-relative regex is excluded to avoid false positives.
* Add parser test for it.
Change-Id: I2ec0ac2b9b11221584adb72555168498de209d57
* Rewrote convertArray() as an RD parser (with inline tokenizer) as suggested on CR r60986. Fixes unclosed rule issue (with parser test). Fixes O(N^2) timing.
* Removed $this->mMarkup abstraction. Life is complicated enough as it is.
* Replaced a couple of instances of explode() with StringUtils::explode(), limited element count in a couple more.
In ConverterRule:
* Removed mConvTable initialisation from the constructor, unnecessary
* Optimised the "-{xxx}-" tight loop by replacing function calls such as count() and in_array() with language constructs such as isset(). Reduced execution time from 356us to 275us.
* Cached $varsep_pattern for further reduction to 243us.
* A couple more parseFlags() hacks brings it back to 230us.
* Split out $this->mVariantFlags from $this->mFlags. Rearranged flag detection into a foreach/switch to avoid unnecessary isset() calls. 189us.
* Added a special-case optimisation to generateConvTable() for the case where there are no tables defined inline in the article. 116us.
* Fixed bug from r37499: "!R || !N" is always true since they are mutually exclusive, "!R && !N" was intended (with parser test).
* Fixed E_NOTICE from "-{N|foo}-"
2. Patch for situations that some wikis like zhwikisource may disabled some language variants. We should treat these disabled variants unacceptable in LanguageConverter.
Big fixup for Chinese word breaks and variant conversions in the MySQL search backend...
- removed redunant variant terms for Chinese, which forces all search indexing to canonical zh-hans
- added parens to properly group variants for languages such as Serbian which do need them at search time
- added quotes to properly group multi-word terms coming out of stripForSearch, as for Chinese where we segment up the characters. This is based on Language::hasWordBreaks() check.
- also cleaned up LanguageZh_hans::stripForSearch() to just do segmentation and pass on the Unicode stripping to the base Language implementation, avoiding scary code duplication. Segmentation was already pulled up to LanguageZh, but was being run again at the second level. :P
- made a fix to Chinese word segmentation to handle the case where a Han character is followed by a Latin char or numeral; a space is now added after as well. Spaces are then normalized for prettiness.
PHP Notice: Undefined property: FakeConverter::$mMainLanguageCode in /var/www/w/languages/Language.php on line 2230
PHP Notice: Undefined property: FakeConverter::$mVariants in /var/www/w/languages/Language.php on line 2233
PHP Warning: in_array() [<a href='function.in-array'>function.in-array</a>]: Wrong datatype for second argument in /var/www/w/languages/Language.php on line 2233
PHP Notice: Undefined property: FakeConverter::$mMainLanguageCode in /var/www/w/languages/Language.php on line 2234
Uses $dir in extension files, and assumes that it remains unchanged in require_once( 'maintenance/commandLine.inc' ).
In fact, it is likely that '$dir' will be set when setting up command-line, as some extensions will use the same var.
Recommended fix: Use $CentralAuth_dir, $EmailPage_dir, etc.
requiring customization of MySQL server settings
Short words are padded so they now get indexed. Yay!
Adapted part of Werdna's patch, with some additional cleanup:
* Using 'U00' to pad instead of 'SMALL' to reduce false positives (eg search for "small*" could match "Smallville" and "SMALLc")
* Checking server's ft_min_word_len variable to see if we need to do anything. This preserves index compatibility with existing installations which have customized their index length.
* Some further cleanup on redundant code -- just toss everything through lc() and be done with it :D
* Cleaned out some more evals in zh and yue classes :P
* Fixed yue class to call the parent adjustor properly
Doxygen documentation update:
* Changed alls @addtogroup to @ingroup. @addtogroup adds the comment to the group description, but doesn't add the file, class, function, ... to the group like @ingroup does. See for example http://svn.wikimedia.org/doc/group__SpecialPage.html where it's impossible to see related files, classes, ... that should belong to that group.
* Added @file to file description, it seems that it should be explicitely decalred for file descriptions, otherwise doxygen will think that the comment document the first class, variabled, function, ... that is in that file.
* Removed some empty comments
* Removed some ?>
Added following groups:
* ExternalStorage
* JobQueue
* MaintenanceLanguage
One more thing: there are still a lot of warnings when generating the doc.
* Fix some scripts that assumed include_path was set with various additional directories
Stuff now seems to mostly work when not overriding include_path.
Taking that out of LocalSettings is the next step... whee!