Commit graph

6 commits

Author SHA1 Message Date
Kevin Israel
5b48bf1b92 Clean up after "Kill mbstring fallbacks"
* Removed fallback code from Language, the associated data file
  (Utf8Case.ser), and the code to generate that data file.
* Removed comment in LanguageFi that "mb_substr has a compatibility
  function in GlobalFunctions.php".
* Removed check for mbstring in bench_utf8_title_check.php.
* In the tests for StringUtils::isUtf8():
  * Removed separate test for the non-mbstring code path.
  * Removed mentions of mbstring from function names and assertion
    messages, since mb_check_encoding() is now always used.
* Also updated the comment in StringUtils::isUtf8() referring to
  PHP 5.3, which is no longer supported in MediaWiki, to indicate
  that the same issue also exists in old versions of HHVM. (If
  we don't have to support 3.4 or older, then the function could
  be deprecated and removed if desired.)

Follows-up 943563062f.

Change-Id: I55e5cd534b849c6ea06a7fadacbbf34a12d87ebe
2016-04-07 09:02:37 -04:00
Kevin Israel
74557dedd0 Generate Utf8Case.ser directly from UnicodeData.txt
This allows getting rid of serialized/serialize.php. I also moved
includes/normal/Utf8CaseGenerate.php to maintenance/language/
generateUtf8Case.php and updated it to subclass Maintenance, as
it seems to be largely unrelated to normalization.

Using version 6.0.0 of UnicodeData.txt, the updated script generates
exactly the same serialized output as was previously checked in.

Also updated the Makefile to reflect the current set of .ser files
and added some .gitignore entries.

Change-Id: I05afece3dc4505a9f43993ac4d7726b37d9c6956
2014-01-06 18:22:24 -05:00
Tim Starling
23cfebd3d2 * Introduced a new system for localisation caching. The system is based around fast fetches of individual messages, minimising memory overhead and startup time in the typical case. It handles both core messages (formerly in Language.php) and extension messages (formerly in MessageCache.php). Profiling indicates a significant win for average throughput.
* The serialized message cache, which would have been redundant, has been removed. Similar performance characteristics can be achieved with $wgLocalisationCacheConf['manualRecache'] = true;
* Added a maintenance script rebuildLocalisationCache.php for offline rebuilding of the localisation cache.
* Extension i18n files can now contain any of the variables which can be set in Messages*.php. It is possible, and recommended, to use this feature instead of the hooks for special page aliases and magic words. 
* $wgExtensionAliasesFiles, LanguageGetMagic and LanguageGetSpecialPageAliases are retained for backwards compatibility. $wgMessageCache->addMessages() and related functions have been removed. wfLoadExtensionMessages() is a no-op and can continue to be called for b/c. 
* Introduced $wgCacheDirectory as a default location for the various local caches that have accumulated. Suggested $IP/cache as a good place for it in the default LocalSettings.php and created this directory with a deny-all .htaccess.
* Patched Exception.php to avoid using the message cache when an exception is thrown from within LocalisationCache, since this tends to fail horribly.
* Removed Language::getLocalisationArray(), Language::loadLocalisation(), Language::load()
* Fixed FileDependency::__sleep()
* In Cdb.php, fixed newlines in debug messages

In MessageCache::get(): 
* Replaced calls to $wgContLang capitalisation functions with plain PHP functions, reducing the typical case from 99us to 93us. Message cache keys are already documented as being restricted to ASCII.
* Implemented a more efficient way to filter out bogus language codes, reducing the "foo/en" case from 430us to 101us
* Optimised wfRunHooks() in the typical do-nothing case, from ~30us to ~3us. This reduced MessageCache::get() typical case time from 93us to 38us.
* Removed hook MessageNotInMwNs to save an extra 3us per cache hit. Reimplemented the only user (LocalisationUpdate) using the new hook LocalisationCacheRecache.
2009-06-28 07:11:43 +00:00
Brion Vibber
c012a63d95 * (bug 13615) Update case mappings and normalization to Unicode 5.1.0
Note that case mappings will only be used if mbstring extension is not present.

Normalization data files updated to Unicode 5.1.0; passes the automated tests.

Seem to have long since lost the script I originally used to generate the Utf8Case.php mapping file, which appears not to have been updated since 2002 or so. :)
Made a new one and moved it into the UtfNormal sub-library.

Note a couple limitations:
* Case mapping (still) uses only the 1:1 simple mappings. Any full or locale-specific mappings are ignored.
* These case mappings are not used anyway when the PHP mbstring extension is available; mbstring's case conversion functions are used instead, with whatever version of Unicode support and whatever complex mapping support they may or may not have.
* The generated Utf8Case.php file is not used directly -- you must also regenerate the serialized version in the 'serialized' directory after updating it to a new Unicode version.
2008-05-08 06:28:50 +00:00
Jimmy Collins
6e3f572dc1 fix for new directory structure 2006-10-04 18:50:39 +00:00
Tim Starling
43b2fb56b6 Merged localisation-work branch:
* Made lines from initialiseMessages() appear as list items during installation
* Moved the bulk of the localisation data from the Language*.php files to the Messages*.php files. Deleted most of the Languages*.php files.
* Introduced "stub global" framework to provide deferred initialisation of core modules. 
* Removed placeholder values for $wgTitle and $wgArticle, these variables will now be null during the initialisation process, until they are set by index.php or another entry point.
* Added DBA cache type, for BDB-style caches. 
* Removed custom date format functions, replacing them with a format string in the style of PHP's date(). Used string identifiers instead of integer identifiers, in both the language files and user preferences. Migration should be transparent in most cases.
* Simplified the initialisation API for LoadBalancer objects.
* Removed the broken altencoding feature.
* Moved default user options and toggles from Language to User. Language objects are still able to define default preference overrides and extra user toggles, via a slightly different interface.
* Don't include the date option in the parser cache rendering hash unless $wgUseDynamicDates is enabled.
* Merged LanguageUtf8 with Language. Removed LanguageUtf8.php. 
* Removed inclusion of language files from the bottom of Language.php. This is now consistently done from Language::factory(). 
* Add the name of the executing maintenance script to the debug log. Start the profiler during maintenance scripts.
* Added "serialized" directory, for storing precompiled data in serialized form.
2006-07-26 07:15:39 +00:00