* Missing cases for StartupModule::getModuleRegistrations
(now 100% covered)
- Raw modules are omitted from the manifest.
E.g. The base modules ('jquery', 'mediawiki') are raw modules
that we don't register client side (they can't load themselves).
- Exceptions from getVersionHash() are caught.
- Oversized versions are re-hashed.
* Missing cases for ResourceLoader::makeLoaderRegisterScript.
(now 100% covered)
* Missing cases for ResourceLoader::getModule.
(now 100% covered)
Change-Id: If9717a48195fc6ae776da5d0e86f323d7f60426d
Follows-up 6fa1e56. This is already fixed for http caches by
shortening the Cache-Control max-age in case of a version mismatch.
However the client still cached it blindly in mw.loader.store.
Resolve this by communicating to the client what version of the module
was exported. The client can then compare this version to the version
it originally requested and decide not to cache it.
Adopt the module key format (name@version) from mw.loader.store
in mw.loader.implement() as well.
Bug: T117587
Change-Id: I1a7c44d0222893afefac20bef507bdd1a1a87ecd
* Fix signature of makeLoaderSourcesScript() to match
the change in behaviour since e103ba265.
* Consistently order providers before the test.
* Simplify testRegisterValid() and remove needless @depends.
* Remove unused private method stripNoflip().
Coverage:
* Expand test coverage for register().
* Add tests for getModuleNames().
* Add tests for getModule().
* Expand test coverage for addSource().
(case of invalid array)
* Expand test coverage for makeLoaderImplementScript().
(case of unwrapped user script, and case of invalid scripts)
* Add tests for makeLoaderSourcesScript().
Change-Id: Ibca3e486fcd3664f171f135327a0f340ee6da9ee
SHA-1 is not secure enough to be used as a cryptographic hash function, and its
implementation in JavaScript is too long and too slow for it to be a good
general-purpose hash function. And we currently throw away most of the work:
SHA-1 produces 160-bit hash values, of which we keep 48.
Although the JavaScript implementation is not exported, SHA-1 is a well-known
hash function, and I'm willing to bet that sooner or later someone will move to
make it accessible to other modules, at which point usage will start to spread.
For ResourceLoader, the qualities we're looking for in a hash function are:
* Already implemented in PHP
* Easy to implement in JavaScript
* Fast
* Collision-resistant
The requirement that hashes be cheap to compute in JavaScript narrows the field
to 32-bit hash functions, because in JavaScript bitwise operators treat their
operands as 32 bits, and arithmetic uses double-precision floats, which have a
total precision of 53 bits. It's possible to work around these limitations, but
it's a lot of extra work.
The best match I found is the 32-bit variant of FNV-1, which is available in
PHP as of version 5.4 (as 'fnv1a32'). The fnv132 JavaScript function is
around ten times faster and eight times shorter than sha1.
Change-Id: I1e4fb08d17948538d96f241b2464d594fdc14578
MessageBlobStore class:
* Make logger aware.
* Log an error if json encoding fails.
* Stop using the DB table. WANObjectCache supports everything we need:
- Batch retrieval.
- Invalidate keys with wildcard selects or cascading check keys.
* Update tests slightly since the actual update now happens on-demand as
part of get() instead of within updateMessage().
ResourceLoader class:
* Remove all interaction with the msg_resource table. Remove db table later.
* Refactor code to use a hash of the blob instead of a timestamp.
Timestamps are unreliable and roll over too frequently for message blob store
because there is no authoritative source. The timestamps were inferred based on
when a change is observed. Message overrides from the local wiki have an
explicit update event when the page is edited. All other messages, such as
from MediaWiki core and extensions using LocalisationCache, have a single
timestamp for all messages which rolls over every time the cache is rebuilt.
A hash is deterministic, and won't cause needless invalidation (T102578).
* Remove redundant pre-fetching in makeModuleResponse().
This is already done by preloadModuleInfo() in respond().
* Don't bother storing and retreiving empty "{}" objects.
Instead, detect whether a module's message list is empty at runtime.
ResourceLoaderModule class:
* Make logger aware.
* Log if a module's message blob was not preloaded.
cleanupRemovedModules:
* Now that blobs have a TTL, there's no need to prune old entries.
Bug: T113092
Bug: T92357
Change-Id: Id8c26f41a82597e34013f95294cdc3971a4f52ae
To avoid repeating this too much and make it easier to update when
underlying components incorporated into the version hash change.
Change-Id: I986988c07f5f888046b1d93c7e72f2e9585893cf
The code is easier to maintain in an actual JavaScript file.
Especially with how variables were declared and concatenated in
a different order.
Change-Id: I758acb78de1cdf2128e81c86f992807ef0dbf444
This greatly simplifies logic required to compute module versions.
It also makes it significantly less error-prone.
Since f37cee996e, we support hashes as versions (instead of timestamps).
This means we can build a hash of the content directly, instead of compiling a
large array with all values that may influence the module content somehow.
Benefits:
* Remove all methods and logic related to querying database and disk for
timestamps, revision numbers, definition summaries, cache epochs, and more.
* No longer needlessly invalidate cache as a result of no-op changes to
implementation datails. Due to inclusion of absolute file paths in the
definition summary, cache was always invalidated when moving wikis to newer
MediaWiki branches; even if the module observed no actual changes.
* When changes are reverted within a certain period of time, old caches can now
be re-used. The module would produce the same version hash as before.
Previously when a change was deployed and then reverted, all web clients (even
those that never saw the bad version) would have re-fetch modules because the
version increased.
Updated unit tests to account for the change in version. New default version of
empty test modules is: "mvgTPvXh". For the record, this comes from the base64
encoding of the SHA1 digest of the JSON serialised form of the module content:
> $str = '{"scripts":"","styles":{"css":[]},"messagesBlob":"{}"}';
> echo base64_encode(sha1($str, true));
> FEb3+VuiUm/fOMfod1bjw/te+AQ=
Enabled content versioning for the data modules in MediaWiki core:
* EditToolbarModule
* JqueryMsgModule
* LanguageDataModule
* LanguageNamesModule
* SpecialCharacterDataModule
* UserCSSPrefsModule
* UserDefaultsModule
* UserOptionsModule
The FileModule and base class explicitly disable it for now and keep their
current behaviour of using the definition summary. We may remove it later, but
that requires more performance testing first.
Explicitly disable it in the WikiModule class to avoid breakage when the
default changes.
Ref T98087.
Change-Id: I782df43c50dfcfb7d7592f744e13a3a0430b0dc6
Modules now track their version via getVersionHash() instead of getModifiedTime().
== Background ==
While some resources have observeable timestamps (e.g. files stored on disk),
many other resources do not. E.g. config variables, and module definitions.
For static file modules, one can e.g. revert one of more files in a module to a
previous version and not affect the max timestamp.
Wiki modules include pages only if they exist. The user module supports common.js
and skin.js. By default neither exists. If a user has both, and then the
less-recently modified one is deleted, the max-timestamp remains unchanged.
For client-side caching, batch requests use "Math.max" on the relevant timestamps.
Again, if a module changes but another module is more recent (e.g. out-of-order
deployment, or out-of-order discovery), the change would not result in a cache miss.
More scenarios can be found in the associated Phabricator tasks.
== Version hash ==
Previously we virtually mapped these variables to a timestamp by storing the current
time alongside a hash of the value in ObjectCache. Considering the number of
possible request contexts (wikis * modules * users * skins * languages) this doesn't
work well. It results in needless cache invalidation when the first time observation
is purged due to LRU algorithms. It also has other minor bugs leading to fewer
cache hits.
All modules automatically get the benefits of version hashing with this change.
The old getDefinitionMtime() and getHashMtime() have been replaced with dummies
that return 1. These functions are often called from getModifiedTime() in subclasses.
For backward-compatibility, their respective values (definition summary and hash)
are now included in getVersionHash directly.
As examples, the following modules have been updated to use getVersionHash directly.
Other modules still work fine and can be updated later.
* ResourceLoaderFileModule
* ResourceLoaderEditToolbarModule
* ResourceLoaderStartUpModule
* ResourceLoaderWikiModule
The presence of hashes in place of timestamps increases the startup module size on
a default MediaWiki install from 4.4k to 5.8k (after gzip and minification).
== ETag ==
Since timestamps are no longer tracked, we need a different way to implement caching
for cache proxies (e.g. Varnish) and web browsers. Previously we used the
Last-Modified header (in combination with Cache-Control and Expires).
Instead of Last-Modified (and If-Modified-Since), we use ETag (and If-None-Match).
Entity tags (new in HTTP/1.1) are much stricter than Last-Modified by default.
They instruct browsers to allow usage of partial Range requests. Since our responses
are dynamically generated, we need to use the Weak version of ETag.
While this sounds bad, it's no different than Last-Modified. As reassured by
RFC 2616 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.3> the
specified behaviour behind Last-Modified follows the same "Weak" caching logic as
Entity tags. It's just that entity tags are capable of a stricter mode (whereas
Last-Modified is inherently weak).
== File cache ==
If $wgUseFileCache is enabled, ResourceLoader uses ResourceFileCache to cache
load.php responses. While the blind TTL handling (during the allowed expiry period)
is still maxage/timestamp based, tryRespondNotModified() now requires the caller to
know the expected ETag.
For this to work, the FileCache handling had to be moved from the top of
ResoureLoader::respond() to after the expected ETag is computed.
This also allows us to remove the duplicate tryRespondNotModified() handling since
that's is already handled by ResourceLoader::respond() meanwhile.
== Misc ==
* Remove redundant modifiedTime cache in ResourceLoaderFileModule.
* Change bugzilla references to Phabricator.
* Centralised inclusion of wgCacheEpoch using getDefinitionSummary. Previously this
logic was duplicated in each place the modified timestamp was used.
* It's easy to forget calling the parent class in getDefinitionSummary().
Previously this method only tracked 'class' by default. As such, various
extensions hardcoded that one value instead of calling the parent and extending
the array. To better prevent this in the future, getVersionHash() now asserts
that the '_cacheEpoch' property made it through.
* tests: Don't use getDefinitionSummary() as an API.
Fix ResourceLoaderWikiModuleTest to call getPages properly.
* In tests, the default timestamp used to be 1388534400000 (which is the unix time
of 20140101000000; the unit tests' CacheEpoch). The new version hash of these
modules is "XyCC+PSK", which is the base64 encoded prefix of the SHA1 digest of:
'{"_class":"ResourceLoaderTestModule","_cacheEpoch":"20140101000000"}'
* Add sha1.js library for client-side hash generation.
Compared various different implementations for code size (after minfication/gzip),
and speed (when used for short hexidecimal strings).
https://jsperf.com/sha1-implementations
- CryptoJS <https://code.google.com/p/crypto-js/#SHA-1> (min+gzip: 2.5k)
http://crypto-js.googlecode.com/svn/tags/3.1.2/build/rollups/sha1.js
Chrome: 45k, Firefox: 89k, Safari: 92k
- jsSHA <https://github.com/Caligatio/jsSHA>
https://github.com/Caligatio/jsSHA/blob/3c1d4f2e/src/sha1.js (min+gzip: 1.8k)
Chrome: 65k, Firefox: 53k, Safari: 69k
- phpjs-sha1 <https://github.com/kvz/phpjs> (RL min+gzip: 0.8k)
https://github.com/kvz/phpjs/blob/1eaab15d/functions/strings/sha1.js
Chrome: 200k, Firefox: 280k, Safari: 78k
Modern browsers implement the HTML5 Crypto API. However, this API is asynchronous,
only enabled when on HTTPS in Chromium, and is quite low-level. It requires boilerplate
code to actually use with TextEncoder, ArrayBuffer and Uint32Array. Due this being
needed in the module loader, we'd have to load the fallback regardless. Considering
this is not used in a critical path for performance, it's not worth shipping two
implementations for this optimisation.
May also resolve:
* T44094
* T90411
* T94810
Bug: T94074
Change-Id: Ibb292d2416839327d1807a66c78fd96dac0637d0