Commit graph

1069 commits

Author SHA1 Message Date
Alexandre Emsenhuber
98b7af8d12 * (bug 22876) Avoid possible PHP Notice if $wgDefaultUserOptions is not correctly set
Per Nikerabbit, use User::getDefaultOption()
2010-03-27 16:15:17 +00:00
Mark A. Hershberger
560b72c4ab * Implement normalization of fullwidth latin characters for all Languages, not just Japanese and Chinese.
* Tune Language::convertDoubleWidth() so that it is 8-10x faster.  (See http://xrl.us/bg2mon)
2010-03-23 19:50:59 +00:00
Max Semenik
02729fedab Tweaked LanguageConverter to use spaces in formatTimePeriod, things like "1h23m45s" look really ugly 2010-03-23 08:57:59 +00:00
Mark A. Hershberger
b0e1bc1b19 off-by-one error: the fullwidth z was not being converted. 2010-03-15 17:31:29 +00:00
Mark A. Hershberger
54badce2d8 Follow-up r61856
* Rename wordSegmentation() to segmentByWord().
* Consolidate search index locking and iteration to Maintenance.php
* Add maintenance/updateDoubleWidthSearch.php to take care of new
  format for normalized double-width roman characters.
* Add error checking to updateSearchIndex.php for creating $posFile.
* Add note to UPGRADE about running updateDoubleWidthSearch.php.
2010-03-10 21:54:23 +00:00
Aaron Schulz
8f39d2af96 Doc typo 2010-03-07 23:47:00 +00:00
Aaron Schulz
62a79446e3 * truncate() comment fix
* truncateHtml() tweaks:
** Fixed miscount of remaining length wrt. entities
** Improved performance of "ellipsis makes string longer" check
2010-03-07 12:32:21 +00:00
Aaron Schulz
864666153c r62907: removed static calls, renamed helper functions 2010-02-26 07:11:01 +00:00
Aaron Schulz
e834c0b9b2 * Moved truncateHtml() to language.php
* Renamed $maxLen -> $length
* Made $length=0 case match truncate()
2010-02-24 04:14:45 +00:00
Alexandre Emsenhuber
a73327494e Fixed some doxygen warnings 2010-02-22 21:34:19 +00:00
Raimond Spekking
e6ad4da6f3 Simplify a bit per Platonides CR http://www.mediawiki.org/wiki/Special:Code/MediaWiki/61661#c5509 2010-02-11 21:21:22 +00:00
Philip Tzou
d6b6766f3a Follow up r60742, r60743, r60764, r60766, r61214, r61390. Split stripForSearch into wordSegmentation and normalizeForSearch. So the wordSegmentation could be called by search engines separately. 2010-02-02 15:09:01 +00:00
Raimond Spekking
ae8e8bcdd0 * (bug 22181) Do not truncate if the ellipsis actually make the string longer 2010-01-29 13:39:06 +00:00
Max Semenik
eff719b75d Fixed r61214: moved MySQL munging to SearchEngine, updated calls. Can we kill $doStrip now? 2010-01-22 20:36:26 +00:00
Max Semenik
f6dee6c1e7 Factored MySQL-specific munging out of Language::stripForSearch() to DatabaseMysql. This will also allow other backends to provide seamlessly their own munging algorithms in the future. 2010-01-18 20:54:43 +00:00
Mark A. Hershberger
2800ca2db7 follow up r60832 and follow up r60763
* Don't set Parser::$mTitle to random garbage.
* Remove ParserOutput::$displayTitle, make setDisplayTitle() and
  getDisplayTitle() wrappers for their *TitleText() equivalents.
* Remove Parser::$mDo*Convert member variables, move test for
  $mDoubleUnderScores[] directives closer to the action.
* Remove bogus "global $wgContLang".
* Use accessor to get at $mConvRuleTitle
* Fix up showtitle option in parserTests.inc
* TODO: refactor FakeConverter class away
2010-01-15 19:14:23 +00:00
Mark A. Hershberger
8148861c05 trailing whitespace fixup before I fix(??) a bug. 2010-01-11 04:23:41 +00:00
Mark A. Hershberger
934c2bcd50 follow up r60763
Recover the -{T| }- rule.  Add the ability to test for it to the parserTests and add a test for it.  Add a couple of disabled tests that I think demonstrate bugs in the LanguageTranslator
2010-01-08 08:22:19 +00:00
Philip Tzou
8bbfbf5628 follow-up r60743.
1. Changed the conditions, not only for LuceneSearch, but also more commonly to others.
2. Reduced code duplication.
2010-01-07 04:50:32 +00:00
Mark A. Hershberger
c568220e61 Refactor LanguageConversion so that title conversion isn't so flimsy. Pull MagicWord detection into Parser->doDoubleUnderscore() && remove ParserConvert. 2010-01-07 04:13:14 +00:00
Philip Tzou
38e71e6a2f Change or to ||. 2010-01-06 20:17:01 +00:00
Philip Tzou
339f0bb3d9 1. Add conditions to stripForSearch for LuceneSearch / MWSearch.
2. Add double-width roman characters conversion support to zh, gan, and yue.
2010-01-06 19:51:29 +00:00
Roan Kattouw
bfbbd2e8a9 Followup to r60271: pass the delimeter to preg_quote() per CR comment 2010-01-06 19:24:26 +00:00
Siebrand Mazeland
1021d7b7ce Follow-up to r60721. Broken maintanance/language/rebuildLanguage.php and probably more. Fix by Nikerabbit. 2010-01-06 18:59:32 +00:00
Roan Kattouw
f0dda2c5fa Per CR on r58358, refactor obtaining the language code from a filename into Language::getCodeFromFileName() and use it in Language::getLanguageNames() and LocalisationUpdate 2010-01-06 10:20:38 +00:00
Tim Starling
ad19c032b0 Fix for bug 9413 and the related Malayalam issue reported on wikitech-l.
* Added $wgFixArchaicUnicode, which, if enabled, converts some deprecated Unicode sequences in Arabic and Malayalam text to their Unicode 5.1 equivalents.
* Added generateNormalizerData.php to generate the relevant data files. Added the generated data files also. 
* Made most things call the new wrapper method $wgContLang->normalize() instead of UtfNormal::cleanUp(), so that Unicode normalization can be customised on a per-language basis.
* Added some generic support for conversion tables to Language so that subclasses can easily implement these kinds of transformations.
2010-01-04 08:28:50 +00:00
Philip Tzou
3a3ce3fd7f follow-up r59522 and r59541. To make the condition when we'll use Accept-Language in Vary and XVO more clear. 2009-12-04 15:47:25 +00:00
Brion Vibber
7d67e73e84 Merge live hack from wmf-deployment r53208: profiling for LanguageGetMagic hook call 2009-09-14 21:30:01 +00:00
Tim Starling
cb2f984e0d Reverted breakage of non-ASCII message keys, Domas says that's not allowed. Optimised Language::lcfirst and Language::ucfirst() instead.
Timings in microseconds for ASCII no-change, ASCII change, non-ASCII no-change, non-ASCII change:
lcfirst: 1.8, 3.6, 21.2, 22.1
ucfirst: 1.5, 2.3, 21.1, 21.7
2009-08-28 17:58:54 +00:00
Brion Vibber
71432fb487 Pet peeve time: reduce clutter from common $wgContLang->isRTL() ? 'x' : 'y' pattern. :)
Introduced helpers:
  $lang->getDir() returns 'ltr' or 'rtl' for HTML 'dir' attrib
  $lang->alignStart() returns 'left' or 'right' for HTML 'align' attrib or CSS 'text-align' property
  $lang->alignEnd() returns 'right' or 'left'

And cleaned up a couple arrays of icons to just reverse the order of items rather than repeating the items twice for each possibility.
2009-08-22 01:24:04 +00:00
Alexandre Emsenhuber
6c3adddb6b * (bug 20296) Fixed a PHP warning in Parser::preSaveTransform() in PHP 5.3: Parameter 1 was expected to be a reference but value given when unstubbing $wgContLang 2009-08-21 20:18:20 +00:00
Philip Tzou
be0cb93759 Follow up on r46020 and r46489. Improve the $wgContLang->convert() calling procedure on CategoryPage. 2009-07-26 15:54:11 +00:00
Niklas Laxström
1fadd47eb2 (bug 16885) Silence warnings about invalid characters in input string too with iconv 2009-07-07 09:56:53 +00:00
Tim Starling
04e972fd2a Tweak docs 2009-07-03 06:55:30 +00:00
Tim Starling
23cfebd3d2 * Introduced a new system for localisation caching. The system is based around fast fetches of individual messages, minimising memory overhead and startup time in the typical case. It handles both core messages (formerly in Language.php) and extension messages (formerly in MessageCache.php). Profiling indicates a significant win for average throughput.
* The serialized message cache, which would have been redundant, has been removed. Similar performance characteristics can be achieved with $wgLocalisationCacheConf['manualRecache'] = true;
* Added a maintenance script rebuildLocalisationCache.php for offline rebuilding of the localisation cache.
* Extension i18n files can now contain any of the variables which can be set in Messages*.php. It is possible, and recommended, to use this feature instead of the hooks for special page aliases and magic words. 
* $wgExtensionAliasesFiles, LanguageGetMagic and LanguageGetSpecialPageAliases are retained for backwards compatibility. $wgMessageCache->addMessages() and related functions have been removed. wfLoadExtensionMessages() is a no-op and can continue to be called for b/c. 
* Introduced $wgCacheDirectory as a default location for the various local caches that have accumulated. Suggested $IP/cache as a good place for it in the default LocalSettings.php and created this directory with a deny-all .htaccess.
* Patched Exception.php to avoid using the message cache when an exception is thrown from within LocalisationCache, since this tends to fail horribly.
* Removed Language::getLocalisationArray(), Language::loadLocalisation(), Language::load()
* Fixed FileDependency::__sleep()
* In Cdb.php, fixed newlines in debug messages

In MessageCache::get(): 
* Replaced calls to $wgContLang capitalisation functions with plain PHP functions, reducing the typical case from 99us to 93us. Message cache keys are already documented as being restricted to ASCII.
* Implemented a more efficient way to filter out bogus language codes, reducing the "foo/en" case from 430us to 101us
* Optimised wfRunHooks() in the typical do-nothing case, from ~30us to ~3us. This reduced MessageCache::get() typical case time from 93us to 38us.
* Removed hook MessageNotInMwNs to save an extra 3us per cache hit. Reimplemented the only user (LocalisationUpdate) using the new hook LocalisationCacheRecache.
2009-06-28 07:11:43 +00:00
Brion Vibber
ceedb37941 * (bug 8445) Multiple-character search terms are now handled properly for Chinese
Big fixup for Chinese word breaks and variant conversions in the MySQL search backend...
- removed redunant variant terms for Chinese, which forces all search indexing to canonical zh-hans
- added parens to properly group variants for languages such as Serbian which do need them at search time
- added quotes to properly group multi-word terms coming out of stripForSearch, as for Chinese where we segment up the characters. This is based on Language::hasWordBreaks() check.
- also cleaned up LanguageZh_hans::stripForSearch() to just do segmentation and pass on the Unicode stripping to the base Language implementation, avoiding scary code duplication. Segmentation was already pulled up to LanguageZh, but was being run again at the second level. :P
- made a fix to Chinese word segmentation to handle the case where a Han character is followed by a Latin char or numeral; a space is now added after as well. Spaces are then normalized for prettiness.
2009-06-24 02:27:51 +00:00
Remember the dot
034114ee69 Follow-up to r49331: Moved decapitalization code to "a Messages*.php property, a <body>
class and a descendant selector, like we do for RTL", as requested by Tim Starling in r51924
2009-06-17 04:26:06 +00:00
Tim Starling
1179421217 Reverted r48984. Fragile, doesn't work if memcached is enabled. See CodeReview. 2009-06-03 14:51:08 +00:00
Philip Tzou
5336b9ba6f 1. Follow up on r49157, r50902 and r50938. According RFC 2616 section 14.4, language code name should always use '-' but not '_'.
2. metadata 'keywords' should have all variant forms of keyword.
2009-05-30 05:07:46 +00:00
Tim Starling
323c74c734 Reverted r49855, r49656, r49401, r49399, r49397. The language converter cannot be used outside the parser at present without generating a large number of bugs, due to global lifetime state variables, inappropriate $wgParser references, etc. Some refactoring needs to be done before it can be used in this way. 2009-05-26 07:46:29 +00:00
Niklas Laxström
12de0ff83d Revert a bit of too much escaping 2009-05-22 14:35:23 +00:00
Shinjiman
2f5a335a00 follow up r50804, adding two further Japanese era names. 2009-05-20 15:49:30 +00:00
Shinjiman
46e9566793 follow up r50804, fix undefined variable error. 2009-05-20 15:08:18 +00:00
Shinjiman
3c5e65927f * (bug 18849) Added Japanese and North Korean calendars support 2009-05-20 01:53:39 +00:00
Shinjiman
2f2bdad912 added Minguo calendar support for the Taiwan Chinese language 2009-05-19 16:38:21 +00:00
Alexandre Emsenhuber
45486d8c2c PHP doesn't have a "none" constant, changed to "null" 2009-04-11 11:24:19 +00:00
Philip Tzou
0b5569d94e A new optional param for LanguageConverter::convert(), to enable a new function named 'convert()' which added to AbuseFilter. 2009-04-11 10:56:09 +00:00
Alexandre Emsenhuber
e65749b02b Per Sunwell5's report on IRC and my talk page: fixed a PHP notice when $wgEnableSerializedMessages was set to false 2009-04-03 20:07:01 +00:00
Philip Tzou
3cf3ea5f0b Add group convertsion support for LanguageConverter. New magic word "{{GROUPCONVERT:xxx}}" enabled for this new feature. You can set related conversion rules in [[MediaWiki:Groupconversiontable-xxx]]. 2009-03-29 08:55:45 +00:00
Raimond Spekking
8d7026713f * Add Language::semicolonList() function
** Todo: combine all three list functions (comma, semicolon, pipe) into one function with a parameter?
* Use pipe as backlink separator to be consistent with other navigation elements
* Show the colon for case 'afh_actions' only if parameters exist
** Remove the now useless message
* Localize the usages of comma and semicolon
2009-03-06 10:56:37 +00:00