Follows-up 6f8dc27ca2 and dbd11e04aa. You'd expect a bug in preloading
to fallback to fetching it on-demand but due to a title format mismatch
the preload method did use the same title format for the cache key, but
not the same title format for discovering the results. As such, it set
the right cache key to an empty array.
* Make relevant methods testable and mockable.
* Add regression test.
* Change call to array_interect_key to use the same format as
fetchTitleInfo(): Normalised title keys from Title::getPrefixedText.
Previously, the intersect used the declared titles from getPages() which
are not localised - causing an empty array to be returned from the
intersect on wikis where the namespace name is localised.
Bug: T145673
Change-Id: Ibe788157724d73c727b9e2127b6828db32ca9420
While rev_sha1 was preferred to rev_id (page_latest) to allow
client-cache to be re-used in case of a bad change and revert,
this is no longer possible since we recently added support for
explicit purging by considering page_touched.
As such, use of rev_sha1 is no longer useful.
Change-Id: Iddb65305ca4655098fdea9fcf736fd4046035dd7
No longer needed per doing Ic137cb494ba23 in a different way.
This may be useful to revisit, but for now preferring to keep
simplicity and removing this unused option.
This reverts commit 9e217bf42d.
Change-Id: I9c0c316a3b58a3d0a3d3282dd74c7fa4eef8e378
To be used for the 'site' module. This will make it easier to split up
'site' into 'site' and 'site.styles' (per T92459).
Change-Id: Iaac3e458d5107e4c10c2826bd64608d5c47e8b87
Wiki modules are special due to their isKnownEmpty implementation and support
for foreign databases. MediaWiki doesn't have convenient ways of making
Revision objects for remote wikis. As such, wiki modules will keep using meta
data to generate the hash.
However minimise needless cache invalidation by refining the implementation.
Impact:
* Remove use of getMsgBlobMtime(). This module doesn't support getMessages().
* In the title info, use the revision content sha1 and size for tracking.
The page_touched previously used updates too often. It's updated both on edits
for various types of purges. Using the rev_sha1 means old versions return
when the content is the same. Regardless of how the content changed via
revert or actual edits resulting in the same contnet.
* Change in-process cache to be keyed by page list instead of entire
ResourceLoaderContext.
Because of this, getTitleInfo() was previously performing its batch query
twice on the same page. Once for only=styles (top) and only=scripts (bottom).
Both operate on the full getPages() set but had different context keys.
Clean up:
* Better document the support for foreign databases.
* Move Title construction to getContent to reduce duplication.
* Remove use of getDefinitionMtime(). That method is a no-op since the switch
to version hashing.
* Remove remaining use of mtime in getModifiedTime(). This is now covered by
hashing the title info in getDefinitionSummary().
Also refactor the code to be more readable. No intended change in behaviour.
Bug: T98087
Change-Id: Id46740db04c0c42bc5ca87d1487230a32feb34df
Modules now track their version via getVersionHash() instead of getModifiedTime().
== Background ==
While some resources have observeable timestamps (e.g. files stored on disk),
many other resources do not. E.g. config variables, and module definitions.
For static file modules, one can e.g. revert one of more files in a module to a
previous version and not affect the max timestamp.
Wiki modules include pages only if they exist. The user module supports common.js
and skin.js. By default neither exists. If a user has both, and then the
less-recently modified one is deleted, the max-timestamp remains unchanged.
For client-side caching, batch requests use "Math.max" on the relevant timestamps.
Again, if a module changes but another module is more recent (e.g. out-of-order
deployment, or out-of-order discovery), the change would not result in a cache miss.
More scenarios can be found in the associated Phabricator tasks.
== Version hash ==
Previously we virtually mapped these variables to a timestamp by storing the current
time alongside a hash of the value in ObjectCache. Considering the number of
possible request contexts (wikis * modules * users * skins * languages) this doesn't
work well. It results in needless cache invalidation when the first time observation
is purged due to LRU algorithms. It also has other minor bugs leading to fewer
cache hits.
All modules automatically get the benefits of version hashing with this change.
The old getDefinitionMtime() and getHashMtime() have been replaced with dummies
that return 1. These functions are often called from getModifiedTime() in subclasses.
For backward-compatibility, their respective values (definition summary and hash)
are now included in getVersionHash directly.
As examples, the following modules have been updated to use getVersionHash directly.
Other modules still work fine and can be updated later.
* ResourceLoaderFileModule
* ResourceLoaderEditToolbarModule
* ResourceLoaderStartUpModule
* ResourceLoaderWikiModule
The presence of hashes in place of timestamps increases the startup module size on
a default MediaWiki install from 4.4k to 5.8k (after gzip and minification).
== ETag ==
Since timestamps are no longer tracked, we need a different way to implement caching
for cache proxies (e.g. Varnish) and web browsers. Previously we used the
Last-Modified header (in combination with Cache-Control and Expires).
Instead of Last-Modified (and If-Modified-Since), we use ETag (and If-None-Match).
Entity tags (new in HTTP/1.1) are much stricter than Last-Modified by default.
They instruct browsers to allow usage of partial Range requests. Since our responses
are dynamically generated, we need to use the Weak version of ETag.
While this sounds bad, it's no different than Last-Modified. As reassured by
RFC 2616 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.3> the
specified behaviour behind Last-Modified follows the same "Weak" caching logic as
Entity tags. It's just that entity tags are capable of a stricter mode (whereas
Last-Modified is inherently weak).
== File cache ==
If $wgUseFileCache is enabled, ResourceLoader uses ResourceFileCache to cache
load.php responses. While the blind TTL handling (during the allowed expiry period)
is still maxage/timestamp based, tryRespondNotModified() now requires the caller to
know the expected ETag.
For this to work, the FileCache handling had to be moved from the top of
ResoureLoader::respond() to after the expected ETag is computed.
This also allows us to remove the duplicate tryRespondNotModified() handling since
that's is already handled by ResourceLoader::respond() meanwhile.
== Misc ==
* Remove redundant modifiedTime cache in ResourceLoaderFileModule.
* Change bugzilla references to Phabricator.
* Centralised inclusion of wgCacheEpoch using getDefinitionSummary. Previously this
logic was duplicated in each place the modified timestamp was used.
* It's easy to forget calling the parent class in getDefinitionSummary().
Previously this method only tracked 'class' by default. As such, various
extensions hardcoded that one value instead of calling the parent and extending
the array. To better prevent this in the future, getVersionHash() now asserts
that the '_cacheEpoch' property made it through.
* tests: Don't use getDefinitionSummary() as an API.
Fix ResourceLoaderWikiModuleTest to call getPages properly.
* In tests, the default timestamp used to be 1388534400000 (which is the unix time
of 20140101000000; the unit tests' CacheEpoch). The new version hash of these
modules is "XyCC+PSK", which is the base64 encoded prefix of the SHA1 digest of:
'{"_class":"ResourceLoaderTestModule","_cacheEpoch":"20140101000000"}'
* Add sha1.js library for client-side hash generation.
Compared various different implementations for code size (after minfication/gzip),
and speed (when used for short hexidecimal strings).
https://jsperf.com/sha1-implementations
- CryptoJS <https://code.google.com/p/crypto-js/#SHA-1> (min+gzip: 2.5k)
http://crypto-js.googlecode.com/svn/tags/3.1.2/build/rollups/sha1.js
Chrome: 45k, Firefox: 89k, Safari: 92k
- jsSHA <https://github.com/Caligatio/jsSHA>
https://github.com/Caligatio/jsSHA/blob/3c1d4f2e/src/sha1.js (min+gzip: 1.8k)
Chrome: 65k, Firefox: 53k, Safari: 69k
- phpjs-sha1 <https://github.com/kvz/phpjs> (RL min+gzip: 0.8k)
https://github.com/kvz/phpjs/blob/1eaab15d/functions/strings/sha1.js
Chrome: 200k, Firefox: 280k, Safari: 78k
Modern browsers implement the HTML5 Crypto API. However, this API is asynchronous,
only enabled when on HTTPS in Chromium, and is quite low-level. It requires boilerplate
code to actually use with TextEncoder, ArrayBuffer and Uint32Array. Due this being
needed in the module loader, we'd have to load the fallback regardless. Considering
this is not used in a critical path for performance, it's not worth shipping two
implementations for this optimisation.
May also resolve:
* T44094
* T90411
* T94810
Bug: T94074
Change-Id: Ibb292d2416839327d1807a66c78fd96dac0637d0
This makes it easier for subclasses to use ResourceLoaderWikiModule. Currently
many subclasses of this simply need to override the getPages() method.
UserModule and SiteModule keep their getPages override due to the set of pages
being dependent on context.
Change-Id: I388531398671afacfec36c6c5746d72267b5bdac
Follows-up b36d883.
By far most data providers are static (and PHPUnit expects them
to be static and calls them that way).
Most of these classes already had their data providers static
but additional commits sloppily introduced non-static ones.
* ResourceLoaderWikiModuleTest, 8968d8787f.
* TitleTest, 545f1d3a73.
Odd unused method 'dataTestIsValidMoveOperation' was introduced
in 550b878e63.
* GlobalVarConfigTest, a3e18c3670.
Change-Id: I5da99f7cd3da68c550ae507ffe1f725d31e7666f
In most cases, we just check whether the pages exist before saying
the module is not empty to avoid generating cached HTML without
the appropriate <script> or <link> tags.
However, for modules in the 'user' group, normal users cannot
delete their personal JavaScript/CSS pages, causing needless
extra requests, even though we know the pages are empty.
ResourceLoader::isKnownEmpty() now checks the page_len field
for modules in the 'user' group to check that there is
some actual content.
Bug: 68488
Change-Id: I0570f62887fd4642fd60367ae0b51d7dc19488ca