wiki.techinc.nl/includes/resourceloader/ResourceLoaderModule.php
Timo Tijhof f37cee996e resourceloader: Replace timestamp system with version hashing
Modules now track their version via getVersionHash() instead of getModifiedTime().

== Background ==

While some resources have observeable timestamps (e.g. files stored on disk),
many other resources do not. E.g. config variables, and module definitions.

For static file modules, one can e.g. revert one of more files in a module to a
previous version and not affect the max timestamp.

Wiki modules include pages only if they exist. The user module supports common.js
and skin.js. By default neither exists. If a user has both, and then the
less-recently modified one is deleted, the max-timestamp remains unchanged.

For client-side caching, batch requests use "Math.max" on the relevant timestamps.
Again, if a module changes but another module is more recent (e.g. out-of-order
deployment, or out-of-order discovery), the change would not result in a cache miss.

More scenarios can be found in the associated Phabricator tasks.

== Version hash ==

Previously we virtually mapped these variables to a timestamp by storing the current
time alongside a hash of the value in ObjectCache. Considering the number of
possible request contexts (wikis * modules * users * skins * languages) this doesn't
work well. It results in needless cache invalidation when the first time observation
is purged due to LRU algorithms. It also has other minor bugs leading to fewer
cache hits.

All modules automatically get the benefits of version hashing with this change.
The old getDefinitionMtime() and getHashMtime() have been replaced with dummies
that return 1. These functions are often called from getModifiedTime() in subclasses.

For backward-compatibility, their respective values (definition summary and hash)
are now included in getVersionHash directly.

As examples, the following modules have been updated to use getVersionHash directly.
Other modules still work fine and can be updated later.

* ResourceLoaderFileModule
* ResourceLoaderEditToolbarModule
* ResourceLoaderStartUpModule
* ResourceLoaderWikiModule

The presence of hashes in place of timestamps increases the startup module size on
a default MediaWiki install from 4.4k to 5.8k (after gzip and minification).

== ETag ==

Since timestamps are no longer tracked, we need a different way to implement caching
for cache proxies (e.g. Varnish) and web browsers. Previously we used the
Last-Modified header (in combination with Cache-Control and Expires).

Instead of Last-Modified (and If-Modified-Since), we use ETag (and If-None-Match).

Entity tags (new in HTTP/1.1) are much stricter than Last-Modified by default.
They instruct browsers to allow usage of partial Range requests. Since our responses
are dynamically generated, we need to use the Weak version of ETag.

While this sounds bad, it's no different than Last-Modified. As reassured by
RFC 2616 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.3> the
specified behaviour behind Last-Modified follows the same "Weak" caching logic as
Entity tags. It's just that entity tags are capable of a stricter mode (whereas
Last-Modified is inherently weak).

== File cache ==

If $wgUseFileCache is enabled, ResourceLoader uses ResourceFileCache to cache
load.php responses. While the blind TTL handling (during the allowed expiry period)
is still maxage/timestamp based, tryRespondNotModified() now requires the caller to
know the expected ETag.

For this to work, the FileCache handling had to be moved from the top of
ResoureLoader::respond() to after the expected ETag is computed.

This also allows us to remove the duplicate tryRespondNotModified() handling since
that's is already handled by ResourceLoader::respond() meanwhile.

== Misc ==

* Remove redundant modifiedTime cache in ResourceLoaderFileModule.

* Change bugzilla references to Phabricator.

* Centralised inclusion of wgCacheEpoch using getDefinitionSummary. Previously this
  logic was duplicated in each place the modified timestamp was used.

* It's easy to forget calling the parent class in getDefinitionSummary().
  Previously this method only tracked 'class' by default. As such, various
  extensions hardcoded that one value instead of calling the parent and extending
  the array. To better prevent this in the future, getVersionHash() now asserts
  that the '_cacheEpoch' property made it through.

* tests: Don't use getDefinitionSummary() as an API.
  Fix ResourceLoaderWikiModuleTest to call getPages properly.

* In tests, the default timestamp used to be 1388534400000 (which is the unix time
  of 20140101000000; the unit tests' CacheEpoch). The new version hash of these
  modules is "XyCC+PSK", which is the base64 encoded prefix of the SHA1 digest of:
  '{"_class":"ResourceLoaderTestModule","_cacheEpoch":"20140101000000"}'

* Add sha1.js library for client-side hash generation.
  Compared various different implementations for code size (after minfication/gzip),
  and speed (when used for short hexidecimal strings).
  https://jsperf.com/sha1-implementations
  - CryptoJS <https://code.google.com/p/crypto-js/#SHA-1> (min+gzip: 2.5k)
    http://crypto-js.googlecode.com/svn/tags/3.1.2/build/rollups/sha1.js
    Chrome: 45k, Firefox: 89k, Safari: 92k
  - jsSHA <https://github.com/Caligatio/jsSHA>
    https://github.com/Caligatio/jsSHA/blob/3c1d4f2e/src/sha1.js (min+gzip: 1.8k)
    Chrome: 65k, Firefox: 53k, Safari: 69k
  - phpjs-sha1 <https://github.com/kvz/phpjs> (RL min+gzip: 0.8k)
    https://github.com/kvz/phpjs/blob/1eaab15d/functions/strings/sha1.js
    Chrome: 200k, Firefox: 280k, Safari: 78k

  Modern browsers implement the HTML5 Crypto API. However, this API is asynchronous,
  only enabled when on HTTPS in Chromium, and is quite low-level. It requires boilerplate
  code to actually use with TextEncoder, ArrayBuffer and Uint32Array. Due this being
  needed in the module loader, we'd have to load the fallback regardless. Considering
  this is not used in a critical path for performance, it's not worth shipping two
  implementations for this optimisation.

May also resolve:
* T44094
* T90411
* T94810

Bug: T94074
Change-Id: Ibb292d2416839327d1807a66c78fd96dac0637d0
2015-05-19 22:28:17 +00:00

659 lines
20 KiB
PHP

<?php
/**
* Abstraction for resource loader modules.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @file
* @author Trevor Parscal
* @author Roan Kattouw
*/
/**
* Abstraction for resource loader modules, with name registration and maxage functionality.
*/
abstract class ResourceLoaderModule {
# Type of resource
const TYPE_SCRIPTS = 'scripts';
const TYPE_STYLES = 'styles';
const TYPE_COMBINED = 'combined';
# sitewide core module like a skin file or jQuery component
const ORIGIN_CORE_SITEWIDE = 1;
# per-user module generated by the software
const ORIGIN_CORE_INDIVIDUAL = 2;
# sitewide module generated from user-editable files, like MediaWiki:Common.js, or
# modules accessible to multiple users, such as those generated by the Gadgets extension.
const ORIGIN_USER_SITEWIDE = 3;
# per-user module generated from user-editable files, like User:Me/vector.js
const ORIGIN_USER_INDIVIDUAL = 4;
# an access constant; make sure this is kept as the largest number in this group
const ORIGIN_ALL = 10;
# script and style modules form a hierarchy of trustworthiness, with core modules like
# skins and jQuery as most trustworthy, and user scripts as least trustworthy. We can
# limit the types of scripts and styles we allow to load on, say, sensitive special
# pages like Special:UserLogin and Special:Preferences
protected $origin = self::ORIGIN_CORE_SITEWIDE;
/* Protected Members */
protected $name = null;
protected $targets = array( 'desktop' );
// In-object cache for file dependencies
protected $fileDeps = array();
// In-object cache for message blob mtime
protected $msgBlobMtime = array();
// In-object cache for version hash
protected $versionHash = array();
/**
* @var Config
*/
protected $config;
/* Methods */
/**
* Get this module's name. This is set when the module is registered
* with ResourceLoader::register()
*
* @return string|null Name (string) or null if no name was set
*/
public function getName() {
return $this->name;
}
/**
* Set this module's name. This is called by ResourceLoader::register()
* when registering the module. Other code should not call this.
*
* @param string $name Name
*/
public function setName( $name ) {
$this->name = $name;
}
/**
* Get this module's origin. This is set when the module is registered
* with ResourceLoader::register()
*
* @return int ResourceLoaderModule class constant, the subclass default
* if not set manually
*/
public function getOrigin() {
return $this->origin;
}
/**
* Set this module's origin. This is called by ResourceLoader::register()
* when registering the module. Other code should not call this.
*
* @param int $origin Origin
*/
public function setOrigin( $origin ) {
$this->origin = $origin;
}
/**
* @param ResourceLoaderContext $context
* @return bool
*/
public function getFlip( $context ) {
global $wgContLang;
return $wgContLang->getDir() !== $context->getDirection();
}
/**
* Get all JS for this module for a given language and skin.
* Includes all relevant JS except loader scripts.
*
* @param ResourceLoaderContext $context
* @return string JavaScript code
*/
public function getScript( ResourceLoaderContext $context ) {
// Stub, override expected
return '';
}
/**
* Takes named templates by the module and returns an array mapping.
*
* @return array of templates mapping template alias to content
*/
public function getTemplates() {
// Stub, override expected.
return array();
}
/**
* @return Config
* @since 1.24
*/
public function getConfig() {
if ( $this->config === null ) {
// Ugh, fall back to default
$this->config = ConfigFactory::getDefaultInstance()->makeConfig( 'main' );
}
return $this->config;
}
/**
* @param Config $config
* @since 1.24
*/
public function setConfig( Config $config ) {
$this->config = $config;
}
/**
* Get the URL or URLs to load for this module's JS in debug mode.
* The default behavior is to return a load.php?only=scripts URL for
* the module, but file-based modules will want to override this to
* load the files directly.
*
* This function is called only when 1) we're in debug mode, 2) there
* is no only= parameter and 3) supportsURLLoading() returns true.
* #2 is important to prevent an infinite loop, therefore this function
* MUST return either an only= URL or a non-load.php URL.
*
* @param ResourceLoaderContext $context
* @return array Array of URLs
*/
public function getScriptURLsForDebug( ResourceLoaderContext $context ) {
$resourceLoader = $context->getResourceLoader();
$derivative = new DerivativeResourceLoaderContext( $context );
$derivative->setModules( array( $this->getName() ) );
$derivative->setOnly( 'scripts' );
$derivative->setDebug( true );
$url = $resourceLoader->createLoaderURL(
$this->getSource(),
$derivative
);
return array( $url );
}
/**
* Whether this module supports URL loading. If this function returns false,
* getScript() will be used even in cases (debug mode, no only param) where
* getScriptURLsForDebug() would normally be used instead.
* @return bool
*/
public function supportsURLLoading() {
return true;
}
/**
* Get all CSS for this module for a given skin.
*
* @param ResourceLoaderContext $context
* @return array List of CSS strings or array of CSS strings keyed by media type.
* like array( 'screen' => '.foo { width: 0 }' );
* or array( 'screen' => array( '.foo { width: 0 }' ) );
*/
public function getStyles( ResourceLoaderContext $context ) {
// Stub, override expected
return array();
}
/**
* Get the URL or URLs to load for this module's CSS in debug mode.
* The default behavior is to return a load.php?only=styles URL for
* the module, but file-based modules will want to override this to
* load the files directly. See also getScriptURLsForDebug()
*
* @param ResourceLoaderContext $context
* @return array Array( mediaType => array( URL1, URL2, ... ), ... )
*/
public function getStyleURLsForDebug( ResourceLoaderContext $context ) {
$resourceLoader = $context->getResourceLoader();
$derivative = new DerivativeResourceLoaderContext( $context );
$derivative->setModules( array( $this->getName() ) );
$derivative->setOnly( 'styles' );
$derivative->setDebug( true );
$url = $resourceLoader->createLoaderURL(
$this->getSource(),
$derivative
);
return array( 'all' => array( $url ) );
}
/**
* Get the messages needed for this module.
*
* To get a JSON blob with messages, use MessageBlobStore::get()
*
* @return array List of message keys. Keys may occur more than once
*/
public function getMessages() {
// Stub, override expected
return array();
}
/**
* Get the group this module is in.
*
* @return string Group name
*/
public function getGroup() {
// Stub, override expected
return null;
}
/**
* Get the origin of this module. Should only be overridden for foreign modules.
*
* @return string Origin name, 'local' for local modules
*/
public function getSource() {
// Stub, override expected
return 'local';
}
/**
* Where on the HTML page should this module's JS be loaded?
* - 'top': in the "<head>"
* - 'bottom': at the bottom of the "<body>"
*
* @return string
*/
public function getPosition() {
return 'bottom';
}
/**
* Whether this module's JS expects to work without the client-side ResourceLoader module.
* Returning true from this function will prevent mw.loader.state() call from being
* appended to the bottom of the script.
*
* @return bool
*/
public function isRaw() {
return false;
}
/**
* Get the loader JS for this module, if set.
*
* @return mixed JavaScript loader code as a string or boolean false if no custom loader set
*/
public function getLoaderScript() {
// Stub, override expected
return false;
}
/**
* Get a list of modules this module depends on.
*
* Dependency information is taken into account when loading a module
* on the client side.
*
* To add dependencies dynamically on the client side, use a custom
* loader script, see getLoaderScript()
* @return array List of module names as strings
*/
public function getDependencies() {
// Stub, override expected
return array();
}
/**
* Get target(s) for the module, eg ['desktop'] or ['desktop', 'mobile']
*
* @return array Array of strings
*/
public function getTargets() {
return $this->targets;
}
/**
* Get the skip function.
*
* Modules that provide fallback functionality can provide a "skip function". This
* function, if provided, will be passed along to the module registry on the client.
* When this module is loaded (either directly or as a dependency of another module),
* then this function is executed first. If the function returns true, the module will
* instantly be considered "ready" without requesting the associated module resources.
*
* The value returned here must be valid javascript for execution in a private function.
* It must not contain the "function () {" and "}" wrapper though.
*
* @return string|null A JavaScript function body returning a boolean value, or null
*/
public function getSkipFunction() {
return null;
}
/**
* Get the files this module depends on indirectly for a given skin.
* Currently these are only image files referenced by the module's CSS.
*
* @param string $skin Skin name
* @return array List of files
*/
public function getFileDependencies( $skin ) {
// Try in-object cache first
if ( isset( $this->fileDeps[$skin] ) ) {
return $this->fileDeps[$skin];
}
$dbr = wfGetDB( DB_SLAVE );
$deps = $dbr->selectField( 'module_deps', 'md_deps', array(
'md_module' => $this->getName(),
'md_skin' => $skin,
), __METHOD__
);
if ( !is_null( $deps ) ) {
$this->fileDeps[$skin] = (array)FormatJson::decode( $deps, true );
} else {
$this->fileDeps[$skin] = array();
}
return $this->fileDeps[$skin];
}
/**
* Set preloaded file dependency information. Used so we can load this
* information for all modules at once.
* @param string $skin Skin name
* @param array $deps Array of file names
*/
public function setFileDependencies( $skin, $deps ) {
$this->fileDeps[$skin] = $deps;
}
/**
* Get the last modification timestamp of the messages in this module for a given language.
* @param string $lang Language code
* @return int UNIX timestamp
*/
public function getMsgBlobMtime( $lang ) {
if ( !isset( $this->msgBlobMtime[$lang] ) ) {
if ( !count( $this->getMessages() ) ) {
return 1;
}
$dbr = wfGetDB( DB_SLAVE );
$msgBlobMtime = $dbr->selectField( 'msg_resource', 'mr_timestamp', array(
'mr_resource' => $this->getName(),
'mr_lang' => $lang
), __METHOD__
);
// If no blob was found, but the module does have messages, that means we need
// to regenerate it. Return NOW
if ( $msgBlobMtime === false ) {
$msgBlobMtime = wfTimestampNow();
}
$this->msgBlobMtime[$lang] = wfTimestamp( TS_UNIX, $msgBlobMtime );
}
return $this->msgBlobMtime[$lang];
}
/**
* Set a preloaded message blob last modification timestamp. Used so we
* can load this information for all modules at once.
* @param string $lang Language code
* @param int $mtime UNIX timestamp
*/
public function setMsgBlobMtime( $lang, $mtime ) {
$this->msgBlobMtime[$lang] = $mtime;
}
/**
* Get a string identifying the current version of this module in a given context.
*
* Whenever anything happens that changes the module's response (e.g. scripts, styles, and
* messages) this value must change. This value is used to store module responses in cache.
* (Both client-side and server-side.)
*
* It is not recommended to override this directly. Use getDefinitionSummary() instead.
* If overridden, one must call the parent getVersionHash(), append data and re-hash.
*
* This method should be quick because it is frequently run by ResourceLoaderStartUpModule to
* propagate changes to the client and effectively invalidate cache.
*
* For backward-compatibility, the following optional data providers are automatically included:
*
* - getModifiedTime()
* - getModifiedHash()
*
* @since 1.26
* @param ResourceLoaderContext $context
* @return string Hash (should use ResourceLoader::makeHash)
*/
public function getVersionHash( ResourceLoaderContext $context ) {
// Cache this somewhat expensive operation. Especially because some classes
// (e.g. startup module) iterate more than once over all modules to get versions.
$contextHash = $context->getHash();
if ( !array_key_exists( $contextHash, $this->versionHash ) ) {
$summary = $this->getDefinitionSummary( $context );
if ( !isset( $summary['_cacheEpoch'] ) ) {
throw new Exception( 'getDefinitionSummary must call parent method' );
}
$str = json_encode( $summary );
$mtime = $this->getModifiedTime( $context );
if ( $mtime !== null ) {
// Support: MediaWiki 1.25 and earlier
$str .= strval( $mtime );
}
$mhash = $this->getModifiedHash( $context );
if ( $mhash !== null ) {
// Support: MediaWiki 1.25 and earlier
$str .= strval( $mhash );
}
$this->versionHash[ $contextHash ] = ResourceLoader::makeHash( $str );
}
return $this->versionHash[ $contextHash ];
}
/**
* Get the definition summary for this module.
*
* This is the method subclasses are recommended to use to track values in their
* version hash. Call this in getVersionHash() and pass it to e.g. json_encode.
*
* Subclasses must call the parent getDefinitionSummary() and build on that.
* It is recommended that each subclass appends its own new array. This prevents
* clashes or accidental overwrites of existing keys and gives each subclass
* its own scope for simple array keys.
*
* @code
* $summary = parent::getDefinitionSummary( $context );
* $summary[] = array(
* 'foo' => 123,
* 'bar' => 'quux',
* );
* return $summary;
* @endcode
*
* Return an array containing values from all significant properties of this
* module's definition.
*
* Be careful not to normalise too much. Especially preserve the order of things
* that carry significance in getScript and getStyles (T39812).
*
* Avoid including things that are insiginificant (e.g. order of message keys is
* insignificant and should be sorted to avoid unnecessary cache invalidation).
*
* This data structure must exclusively contain arrays and scalars as values (avoid
* object instances) to allow simple serialisation using json_encode.
*
* If modules have a hash or timestamp from another source, that may be incuded as-is.
*
* A number of utility methods are available to help you gather data. These are not
* called by default and must be included by the subclass' getDefinitionSummary().
*
* - getMsgBlobMtime()
*
* @since 1.23
* @param ResourceLoaderContext $context
* @return array|null
*/
public function getDefinitionSummary( ResourceLoaderContext $context ) {
return array(
'_class' => get_class( $this ),
'_cacheEpoch' => $this->getConfig()->get( 'CacheEpoch' ),
);
}
/**
* Get this module's last modification timestamp for a given context.
*
* @deprecated since 1.26 Use getDefinitionSummary() instead
* @param ResourceLoaderContext $context Context object
* @return int|null UNIX timestamp
*/
public function getModifiedTime( ResourceLoaderContext $context ) {
return null;
}
/**
* Helper method for providing a version hash to getVersionHash().
*
* @deprecated since 1.26 Use getDefinitionSummary() instead
* @param ResourceLoaderContext $context
* @return string|null Hash
*/
public function getModifiedHash( ResourceLoaderContext $context ) {
return null;
}
/**
* Back-compat dummy for old subclass implementations of getModifiedTime().
*
* This method used to use ObjectCache to track when a hash was first seen. That principle
* stems from a time that ResourceLoader could only identify module versions by timestamp.
* That is no longer the case. Use getDefinitionSummary() directly.
*
* @deprecated since 1.26 Superseded by getVersionHash()
* @param ResourceLoaderContext $context
* @return int UNIX timestamp
*/
public function getHashMtime( ResourceLoaderContext $context ) {
if ( !is_string( $this->getModifiedHash( $context ) ) ) {
return 1;
}
// Dummy that is > 1
return 2;
}
/**
* Back-compat dummy for old subclass implementations of getModifiedTime().
*
* @since 1.23
* @deprecated since 1.26 Superseded by getVersionHash()
* @param ResourceLoaderContext $context
* @return int UNIX timestamp
*/
public function getDefinitionMtime( ResourceLoaderContext $context ) {
if ( $this->getDefinitionSummary( $context ) === null ) {
return 1;
}
// Dummy that is > 1
return 2;
}
/**
* Check whether this module is known to be empty. If a child class
* has an easy and cheap way to determine that this module is
* definitely going to be empty, it should override this method to
* return true in that case. Callers may optimize the request for this
* module away if this function returns true.
* @param ResourceLoaderContext $context
* @return bool
*/
public function isKnownEmpty( ResourceLoaderContext $context ) {
return false;
}
/** @var JSParser Lazy-initialized; use self::javaScriptParser() */
private static $jsParser;
private static $parseCacheVersion = 1;
/**
* Validate a given script file; if valid returns the original source.
* If invalid, returns replacement JS source that throws an exception.
*
* @param string $fileName
* @param string $contents
* @return string JS with the original, or a replacement error
*/
protected function validateScriptFile( $fileName, $contents ) {
if ( $this->getConfig()->get( 'ResourceLoaderValidateJS' ) ) {
// Try for cache hit
// Use CACHE_ANYTHING since filtering is very slow compared to DB queries
$key = wfMemcKey( 'resourceloader', 'jsparse', self::$parseCacheVersion, md5( $contents ) );
$cache = wfGetCache( CACHE_ANYTHING );
$cacheEntry = $cache->get( $key );
if ( is_string( $cacheEntry ) ) {
return $cacheEntry;
}
$parser = self::javaScriptParser();
try {
$parser->parse( $contents, $fileName, 1 );
$result = $contents;
} catch ( Exception $e ) {
// We'll save this to cache to avoid having to validate broken JS over and over...
$err = $e->getMessage();
$result = "throw new Error(" . Xml::encodeJsVar( "JavaScript parse error: $err" ) . ");";
}
$cache->set( $key, $result );
return $result;
} else {
return $contents;
}
}
/**
* @return JSParser
*/
protected static function javaScriptParser() {
if ( !self::$jsParser ) {
self::$jsParser = new JSParser();
}
return self::$jsParser;
}
/**
* Safe version of filemtime(), which doesn't throw a PHP warning if the file doesn't exist
* but returns 1 instead.
* @param string $filename File name
* @return int UNIX timestamp
*/
protected static function safeFilemtime( $filename ) {
wfSuppressWarnings();
$mtime = filemtime( $filename ) ?: 1;
wfRestoreWarnings();
return $mtime;
}
}