wiki.techinc.nl/includes/parser/ParserCache.php

697 lines
21 KiB
PHP
Raw Normal View History

<?php
/**
* Cache for outputs of the PHP parser
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @file
* @ingroup Cache Parser
*/
Hooks::run() call site migration Migrate all callers of Hooks::run() to use the new HookContainer/HookRunner system. General principles: * Use DI if it is already used. We're not changing the way state is managed in this patch. * HookContainer is always injected, not HookRunner. HookContainer is a service, it's a more generic interface, it is the only thing that provides isRegistered() which is needed in some cases, and a HookRunner can be efficiently constructed from it (confirmed by benchmark). Because HookContainer is needed for object construction, it is also needed by all factories. * "Ask your friendly local base class". Big hierarchies like SpecialPage and ApiBase have getHookContainer() and getHookRunner() methods in the base class, and classes that extend that base class are not expected to know or care where the base class gets its HookContainer from. * ProtectedHookAccessorTrait provides protected getHookContainer() and getHookRunner() methods, getting them from the global service container. The point of this is to ease migration to DI by ensuring that call sites ask their local friendly base class rather than getting a HookRunner from the service container directly. * Private $this->hookRunner. In some smaller classes where accessor methods did not seem warranted, there is a private HookRunner property which is accessed directly. Very rarely (two cases), there is a protected property, for consistency with code that conventionally assumes protected=private, but in cases where the class might actually be overridden, a protected accessor is preferred over a protected property. * The last resort: Hooks::runner(). Mostly for static, file-scope and global code. In a few cases it was used for objects with broken construction schemes, out of horror or laziness. Constructors with new required arguments: * AuthManager * BadFileLookup * BlockManager * ClassicInterwikiLookup * ContentHandlerFactory * ContentSecurityPolicy * DefaultOptionsManager * DerivedPageDataUpdater * FullSearchResultWidget * HtmlCacheUpdater * LanguageFactory * LanguageNameUtils * LinkRenderer * LinkRendererFactory * LocalisationCache * MagicWordFactory * MessageCache * NamespaceInfo * PageEditStash * PageHandlerFactory * PageUpdater * ParserFactory * PermissionManager * RevisionStore * RevisionStoreFactory * SearchEngineConfig * SearchEngineFactory * SearchFormWidget * SearchNearMatcher * SessionBackend * SpecialPageFactory * UserNameUtils * UserOptionsManager * WatchedItemQueryService * WatchedItemStore Constructors with new optional arguments: * DefaultPreferencesFactory * Language * LinkHolderArray * MovePage * Parser * ParserCache * PasswordReset * Router setHookContainer() now required after construction: * AuthenticationProvider * ResourceLoaderModule * SearchEngine Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-03-19 02:42:09 +00:00
use MediaWiki\HookContainer\HookContainer;
use MediaWiki\HookContainer\HookRunner;
use MediaWiki\Json\JsonCodec;
use MediaWiki\Page\PageRecord;
use MediaWiki\Page\WikiPageFactory;
use MediaWiki\Parser\ParserCacheFilter;
use MediaWiki\Parser\ParserCacheMetadata;
use MediaWiki\Parser\ParserOutput;
use MediaWiki\Title\TitleFactory;
use Psr\Log\LoggerInterface;
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
use Wikimedia\UUID\GlobalIdGenerator;
/**
* Cache for ParserOutput objects corresponding to the latest page revisions.
*
* The ParserCache is a two-tiered cache backed by BagOStuff which supports
* varying the stored content on the values of ParserOptions used during
* a page parse.
*
* First tier is keyed by the page ID and stores ParserCacheMetadata, which
* contains information about cache expiration and the list of ParserOptions
* used during the parse of the page. For example, if only 'dateformat' and
* 'userlang' options were accessed by the parser when producing output for the
* page, array [ 'dateformat', 'userlang' ] will be stored in the metadata cache.
* This means none of the other existing options had any effect on the output.
*
* The second tier of the cache contains ParserOutput objects. The key for the
* second tier is constructed from the page ID and values of those ParserOptions
* used during a page parse which affected the output. Upon cache lookup, the list
* of used option names is retrieved from tier 1 cache, and only the values of
* those options are hashed together with the page ID to produce a key, while
* the rest of the options are ignored. Following the example above where
* only [ 'dateformat', 'userlang' ] options changed the parser output for a
* page, the key will look like 'page_id!dateformat=default:userlang=ru'.
* Thus any cache lookup with dateformat=default and userlang=ru will hit the
* same cache entry regardless of the values of the rest of the options, since they
* were not accessed during a parse and thus did not change the output.
*
* @see ParserOutput::recordOption()
* @see ParserOutput::getUsedOptions()
* @see ParserOptions::allCacheVaryingOptions()
* @ingroup Cache Parser
*/
class ParserCache {
/**
* Constants for self::getKey()
* @since 1.30
* @since 1.36 the constants were made public
*/
/** Use only current data */
public const USE_CURRENT_ONLY = 0;
/** Use expired data if current data is unavailable */
public const USE_EXPIRED = 1;
/** Use expired data or data from different revisions if current data is unavailable */
public const USE_OUTDATED = 2;
/**
* Use expired data and data from different revisions, and if all else
* fails vary on all variable options
*/
private const USE_ANYTHING = 3;
/** @var string The name of this ParserCache. Used as a root of the cache key. */
private $name;
/** @var BagOStuff */
private $cache;
/**
* Anything cached prior to this is invalidated
*
* @var string
*/
private $cacheEpoch;
Hooks::run() call site migration Migrate all callers of Hooks::run() to use the new HookContainer/HookRunner system. General principles: * Use DI if it is already used. We're not changing the way state is managed in this patch. * HookContainer is always injected, not HookRunner. HookContainer is a service, it's a more generic interface, it is the only thing that provides isRegistered() which is needed in some cases, and a HookRunner can be efficiently constructed from it (confirmed by benchmark). Because HookContainer is needed for object construction, it is also needed by all factories. * "Ask your friendly local base class". Big hierarchies like SpecialPage and ApiBase have getHookContainer() and getHookRunner() methods in the base class, and classes that extend that base class are not expected to know or care where the base class gets its HookContainer from. * ProtectedHookAccessorTrait provides protected getHookContainer() and getHookRunner() methods, getting them from the global service container. The point of this is to ease migration to DI by ensuring that call sites ask their local friendly base class rather than getting a HookRunner from the service container directly. * Private $this->hookRunner. In some smaller classes where accessor methods did not seem warranted, there is a private HookRunner property which is accessed directly. Very rarely (two cases), there is a protected property, for consistency with code that conventionally assumes protected=private, but in cases where the class might actually be overridden, a protected accessor is preferred over a protected property. * The last resort: Hooks::runner(). Mostly for static, file-scope and global code. In a few cases it was used for objects with broken construction schemes, out of horror or laziness. Constructors with new required arguments: * AuthManager * BadFileLookup * BlockManager * ClassicInterwikiLookup * ContentHandlerFactory * ContentSecurityPolicy * DefaultOptionsManager * DerivedPageDataUpdater * FullSearchResultWidget * HtmlCacheUpdater * LanguageFactory * LanguageNameUtils * LinkRenderer * LinkRendererFactory * LocalisationCache * MagicWordFactory * MessageCache * NamespaceInfo * PageEditStash * PageHandlerFactory * PageUpdater * ParserFactory * PermissionManager * RevisionStore * RevisionStoreFactory * SearchEngineConfig * SearchEngineFactory * SearchFormWidget * SearchNearMatcher * SessionBackend * SpecialPageFactory * UserNameUtils * UserOptionsManager * WatchedItemQueryService * WatchedItemStore Constructors with new optional arguments: * DefaultPreferencesFactory * Language * LinkHolderArray * MovePage * Parser * ParserCache * PasswordReset * Router setHookContainer() now required after construction: * AuthenticationProvider * ResourceLoaderModule * SearchEngine Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-03-19 02:42:09 +00:00
/** @var HookRunner */
private $hookRunner;
/** @var JsonCodec */
private $jsonCodec;
/** @var IBufferingStatsdDataFactory */
private $stats;
/** @var LoggerInterface */
private $logger;
2006-01-07 13:31:29 +00:00
/** @var TitleFactory */
private $titleFactory;
/** @var WikiPageFactory */
private $wikiPageFactory;
private ?ParserCacheFilter $filter = null;
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
private GlobalIdGenerator $globalIdGenerator;
/**
* @var BagOStuff small in-process cache to store metadata.
* It's needed multiple times during the request, for example
* to build a PoolWorkArticleView key, and then to fetch the
* actual ParserCache entry.
*/
private $metadataProcCache;
/**
* Setup a cache pathway with a given back-end storage mechanism.
*
* This class use an invalidation strategy that is compatible with
* MultiWriteBagOStuff in async replication mode.
*
* @param string $name
* @param BagOStuff $cache
* @param string $cacheEpoch Anything before this timestamp is invalidated
* @param HookContainer $hookContainer
* @param JsonCodec $jsonCodec
* @param IBufferingStatsdDataFactory $stats
* @param LoggerInterface $logger
* @param TitleFactory $titleFactory
* @param WikiPageFactory $wikiPageFactory
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
* @param GlobalIdGenerator $globalIdGenerator
*/
public function __construct(
string $name,
BagOStuff $cache,
string $cacheEpoch,
HookContainer $hookContainer,
JsonCodec $jsonCodec,
IBufferingStatsdDataFactory $stats,
LoggerInterface $logger,
TitleFactory $titleFactory,
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
WikiPageFactory $wikiPageFactory,
GlobalIdGenerator $globalIdGenerator
Hooks::run() call site migration Migrate all callers of Hooks::run() to use the new HookContainer/HookRunner system. General principles: * Use DI if it is already used. We're not changing the way state is managed in this patch. * HookContainer is always injected, not HookRunner. HookContainer is a service, it's a more generic interface, it is the only thing that provides isRegistered() which is needed in some cases, and a HookRunner can be efficiently constructed from it (confirmed by benchmark). Because HookContainer is needed for object construction, it is also needed by all factories. * "Ask your friendly local base class". Big hierarchies like SpecialPage and ApiBase have getHookContainer() and getHookRunner() methods in the base class, and classes that extend that base class are not expected to know or care where the base class gets its HookContainer from. * ProtectedHookAccessorTrait provides protected getHookContainer() and getHookRunner() methods, getting them from the global service container. The point of this is to ease migration to DI by ensuring that call sites ask their local friendly base class rather than getting a HookRunner from the service container directly. * Private $this->hookRunner. In some smaller classes where accessor methods did not seem warranted, there is a private HookRunner property which is accessed directly. Very rarely (two cases), there is a protected property, for consistency with code that conventionally assumes protected=private, but in cases where the class might actually be overridden, a protected accessor is preferred over a protected property. * The last resort: Hooks::runner(). Mostly for static, file-scope and global code. In a few cases it was used for objects with broken construction schemes, out of horror or laziness. Constructors with new required arguments: * AuthManager * BadFileLookup * BlockManager * ClassicInterwikiLookup * ContentHandlerFactory * ContentSecurityPolicy * DefaultOptionsManager * DerivedPageDataUpdater * FullSearchResultWidget * HtmlCacheUpdater * LanguageFactory * LanguageNameUtils * LinkRenderer * LinkRendererFactory * LocalisationCache * MagicWordFactory * MessageCache * NamespaceInfo * PageEditStash * PageHandlerFactory * PageUpdater * ParserFactory * PermissionManager * RevisionStore * RevisionStoreFactory * SearchEngineConfig * SearchEngineFactory * SearchFormWidget * SearchNearMatcher * SessionBackend * SpecialPageFactory * UserNameUtils * UserOptionsManager * WatchedItemQueryService * WatchedItemStore Constructors with new optional arguments: * DefaultPreferencesFactory * Language * LinkHolderArray * MovePage * Parser * ParserCache * PasswordReset * Router setHookContainer() now required after construction: * AuthenticationProvider * ResourceLoaderModule * SearchEngine Change-Id: Id442b0dbe43aba84bd5cf801d86dedc768b082c7
2020-03-19 02:42:09 +00:00
) {
$this->name = $name;
$this->cache = $cache;
$this->cacheEpoch = $cacheEpoch;
$this->hookRunner = new HookRunner( $hookContainer );
$this->jsonCodec = $jsonCodec;
$this->stats = $stats;
$this->logger = $logger;
$this->titleFactory = $titleFactory;
$this->wikiPageFactory = $wikiPageFactory;
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
$this->globalIdGenerator = $globalIdGenerator;
$this->metadataProcCache = new HashBagOStuff( [ 'maxKeys' => 2 ] );
}
2010-09-12 19:26:01 +00:00
/**
* @since 1.41
* @param ParserCacheFilter $filter
*/
public function setFilter( ParserCacheFilter $filter ): void {
$this->filter = $filter;
}
/**
* @param PageRecord $page
* @since 1.28
*/
public function deleteOptionsKey( PageRecord $page ) {
$page->assertWiki( PageRecord::LOCAL );
$key = $this->makeMetadataKey( $page );
$this->metadataProcCache->delete( $key );
$this->cache->delete( $key );
}
/**
* Retrieve the ParserOutput from ParserCache, even if it's outdated.
* @param PageRecord $page
* @param ParserOptions $popts
* @return ParserOutput|false
*/
public function getDirty( PageRecord $page, $popts ) {
$page->assertWiki( PageRecord::LOCAL );
$value = $this->get( $page, $popts, true );
return is_object( $value ) ? $value : false;
}
/**
* @param PageRecord $page
* @param string $metricSuffix
*/
private function incrementStats( PageRecord $page, $metricSuffix ) {
$wikiPage = $this->wikiPageFactory->newFromTitle( $page );
$contentModel = str_replace( '.', '_', $wikiPage->getContentModel() );
$this->stats->increment( "{$this->name}.{$contentModel}.{$metricSuffix}" );
}
/**
* Returns the ParserCache metadata about the given page
* considering the given options.
*
* @note Which parser options influence the cache key
* is controlled via ParserOutput::recordOption() or
* ParserOptions::addExtraKey().
*
* @param PageRecord $page
* @param int $staleConstraint one of the self::USE_ constants
* @return ParserCacheMetadata|null
* @since 1.36
*/
public function getMetadata(
PageRecord $page,
int $staleConstraint = self::USE_ANYTHING
): ?ParserCacheMetadata {
$page->assertWiki( PageRecord::LOCAL );
$pageKey = $this->makeMetadataKey( $page );
$metadata = $this->metadataProcCache->get( $pageKey );
if ( !$metadata ) {
$metadata = $this->cache->get(
$pageKey,
BagOStuff::READ_VERIFIED
);
}
if ( $metadata === false ) {
$this->incrementStats( $page, "miss_absent_metadata" );
$this->logger->debug( 'ParserOutput metadata cache miss', [ 'name' => $this->name ] );
return null;
}
// NOTE: If the value wasn't serialized to JSON when being stored,
// we may already have a ParserOutput object here. This used
// to be the default behavior before 1.36. We need to retain
// support so we can handle cached objects after an update
// from an earlier revision.
// NOTE: Support for reading string values from the cache must be
// deployed a while before starting to write JSON to the cache,
// in case we have to revert either change.
if ( is_string( $metadata ) ) {
$metadata = $this->restoreFromJson( $metadata, $pageKey, CacheTime::class );
}
if ( !$metadata instanceof CacheTime ) {
$this->incrementStats( $page, 'miss_unserialize' );
return null;
}
2010-09-12 19:26:01 +00:00
if ( $this->checkExpired( $metadata, $page, $staleConstraint, 'metadata' ) ) {
return null;
}
if ( $this->checkOutdated( $metadata, $page, $staleConstraint, 'metadata' ) ) {
return null;
}
$this->logger->debug( 'Parser cache options found', [ 'name' => $this->name ] );
return $metadata;
}
/**
* @param PageRecord $page
* @return string
*/
private function makeMetadataKey( PageRecord $page ): string {
return $this->cache->makeKey( $this->name, 'idoptions', $page->getId( PageRecord::LOCAL ) );
}
/**
* Get a key that will be used by the ParserCache to store the content
* for a given page considering the given options and the array of
* used options.
*
* @warning The exact format of the key is considered internal and is subject
* to change, thus should not be used as storage or long-term caching key.
* This is intended to be used for logging or keying something transient.
*
* @param PageRecord $page
* @param ParserOptions $options
* @param array|null $usedOptions Defaults to all cache varying options.
* @return string
* @internal
* @since 1.36
*/
public function makeParserOutputKey(
PageRecord $page,
ParserOptions $options,
array $usedOptions = null
): string {
$usedOptions ??= ParserOptions::allCacheVaryingOptions();
// idhash seem to mean 'page id' + 'rendering hash' (r3710)
$pageid = $page->getId( PageRecord::LOCAL );
$title = $this->titleFactory->newFromPageIdentity( $page );
$hash = $options->optionsHash( $usedOptions, $title );
// Before T263581 ParserCache was split between normal page views
// and action=parse. -0 is left in the key to avoid invalidating the entire
// cache when removing the cache split.
return $this->cache->makeKey( $this->name, 'idhash', "{$pageid}-0!{$hash}" );
}
/**
* Retrieve the ParserOutput from ParserCache.
* false if not found or outdated.
2011-05-28 18:59:42 +00:00
*
* @param PageRecord $page
* @param ParserOptions $popts
* @param bool $useOutdated (default false)
2011-05-28 18:59:42 +00:00
*
* @return ParserOutput|false
*/
public function get( PageRecord $page, $popts, $useOutdated = false ) {
$page->assertWiki( PageRecord::LOCAL );
if ( !$page->exists() ) {
$this->incrementStats( $page, 'miss_nonexistent' );
return false;
}
if ( $page->isRedirect() ) {
// It's a redirect now
$this->incrementStats( $page, 'miss_redirect' );
return false;
}
$staleConstraint = $useOutdated ? self::USE_OUTDATED : self::USE_CURRENT_ONLY;
$parserOutputMetadata = $this->getMetadata( $page, $staleConstraint );
if ( !$parserOutputMetadata ) {
return false;
}
if ( !$popts->isSafeToCache( $parserOutputMetadata->getUsedOptions() ) ) {
$this->incrementStats( $page, 'miss_unsafe' );
return false;
}
$parserOutputKey = $this->makeParserOutputKey(
$page,
$popts,
$parserOutputMetadata->getUsedOptions()
);
$value = $this->cache->get( $parserOutputKey, BagOStuff::READ_VERIFIED );
if ( $value === false ) {
$this->incrementStats( $page, "miss_absent" );
$this->logger->debug( 'ParserOutput cache miss', [ 'name' => $this->name ] );
return false;
}
// NOTE: If the value wasn't serialized to JSON when being stored,
// we may already have a ParserOutput object here. This used
// to be the default behavior before 1.36. We need to retain
// support so we can handle cached objects after an update
// from an earlier revision.
// NOTE: Support for reading string values from the cache must be
// deployed a while before starting to write JSON to the cache,
// in case we have to revert either change.
if ( is_string( $value ) ) {
$value = $this->restoreFromJson( $value, $parserOutputKey, ParserOutput::class );
}
if ( !$value instanceof ParserOutput ) {
$this->incrementStats( $page, 'miss_unserialize' );
return false;
}
if ( $this->checkExpired( $value, $page, $staleConstraint, 'output' ) ) {
return false;
}
if ( $this->checkOutdated( $value, $page, $staleConstraint, 'output' ) ) {
return false;
}
$wikiPage = $this->wikiPageFactory->newFromTitle( $page );
if ( $this->hookRunner->onRejectParserCacheValue( $value, $wikiPage, $popts ) === false ) {
$this->incrementStats( $page, 'miss_rejected' );
$this->logger->debug( 'key valid, but rejected by RejectParserCacheValue hook handler',
[ 'name' => $this->name ] );
return false;
}
$this->logger->debug( 'ParserOutput cache found', [ 'name' => $this->name ] );
$this->incrementStats( $page, 'hit' );
2004-05-27 15:24:04 +00:00
return $value;
}
/**
* @param ParserOutput $parserOutput
* @param PageRecord $page
* @param ParserOptions $popts
* @param string|null $cacheTime TS_MW timestamp when the cache was generated
* @param int|null $revId Revision ID that was parsed
*/
public function save(
ParserOutput $parserOutput,
PageRecord $page,
$popts,
$cacheTime = null,
$revId = null
) {
$page->assertWiki( PageRecord::LOCAL );
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
// T350538: Eventually we'll warn if the $cacheTime and $revId
// parameters are non-null here, since we *should* be getting
// them from the ParserOutput.
if ( $revId !== null && $revId !== $parserOutput->getCacheRevisionId() ) {
$this->logger->warning(
'Inconsistent revision ID',
[
'name' => $this->name,
'reason' => $popts->getRenderReason(),
'revid1' => $revId,
'revid2' => $parserOutput->getCacheRevisionId(),
]
);
}
if ( !$parserOutput->hasText() ) {
throw new InvalidArgumentException( 'Attempt to cache a ParserOutput with no text set!' );
}
$expire = $parserOutput->getCacheExpiry();
if ( !$popts->isSafeToCache( $parserOutput->getUsedOptions() ) ) {
$this->logger->debug(
'Parser options are not safe to cache and has not been saved',
[ 'name' => $this->name ]
);
$this->incrementStats( $page, 'save_unsafe' );
return;
}
if ( $expire <= 0 ) {
$this->logger->debug(
'Parser output was marked as uncacheable and has not been saved',
[ 'name' => $this->name ]
);
$this->incrementStats( $page, 'save_uncacheable' );
return;
}
if ( $this->filter && !$this->filter->shouldCache( $parserOutput, $page, $popts ) ) {
$this->logger->debug(
'Parser output was filtered and has not been saved',
[ 'name' => $this->name ]
);
$this->incrementStats( $page, 'save_filtered' );
// TODO: In this case, we still want to cache in RevisionOutputCache (T350669).
return;
}
if ( $this->cache instanceof EmptyBagOStuff ) {
return;
}
2010-09-12 19:26:01 +00:00
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
// Ensure cache properties are set in the ParserOutput
// T350538: These should be turned into assertions that the
// properties are already present.
if ( $cacheTime ) {
$parserOutput->setCacheTime( $cacheTime );
} else {
if ( !$parserOutput->hasCacheTime() ) {
$this->logger->warning(
'No cache time set',
[
'name' => $this->name,
'reason' => $popts->getRenderReason(),
]
);
}
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
$cacheTime = $parserOutput->getCacheTime();
}
2007-04-21 21:35:21 +00:00
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
if ( $revId ) {
$parserOutput->setCacheRevisionId( $revId );
} elseif ( $parserOutput->getCacheRevisionId() ) {
$revId = $parserOutput->getCacheRevisionId();
} else {
$revId = $page->getLatest( PageRecord::LOCAL );
$parserOutput->setCacheRevisionId( $revId );
}
if ( !$revId ) {
$this->logger->warning(
'Parser output cannot be saved if the revision ID is not known',
[ 'name' => $this->name ]
);
$this->incrementStats( $page, 'save_norevid' );
return;
}
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
if ( !$parserOutput->getRenderId() ) {
$this->logger->warning(
'Parser output missing render ID',
[
'name' => $this->name,
'reason' => $popts->getRenderReason(),
]
);
Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer Set the render ID for each parse stored into cache so that we are able to identify a specific parse when there are dependencies (for example in an edit based on that parse). This is recorded as a property added to the ParserOutput, not the parent CacheTime interface. Even though the render ID is /related/ to the CacheTime interface, CacheTime is also used directly as a parser cache key, and the UUID should not be part of the lookup key. In general we are trying to move the location where these cache properties are set as early as possible, so we check at each location to ensure we don't overwrite a previously-set value. Eventually we can convert most of these checks into assertions that the cache properties have already been set (T350538). The primary location for setting cache properties is the ContentRenderer. Moved setting the revision timestamp into ContentRenderer as well, as it was set along the same code paths. An extra parameter was added to ContentRenderer::getParserOutput() to support this. Added merge code to ParserOutput::mergeInternalMetaDataFrom() which should ensure that cache time, revision, timestamp, and render id are all set properly when multiple slots are combined together in MCR. In order to ensure the render ID is set on all codepaths we needed to plumb the GlobalIdGenerator service into ContentRenderer, ParserCache, ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it should only be necessary in the ContentRenderer. Bug: T350538 Bug: T349868 Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80 Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2023-09-14 16:11:20 +00:00
$parserOutput->setRenderId( $this->globalIdGenerator->newUUIDv1() );
}
// Transfer cache properties to the cache metadata
$metadata = new CacheTime;
$metadata->recordOptions( $parserOutput->getUsedOptions() );
$metadata->updateCacheExpiry( $expire );
$metadata->setCacheTime( $cacheTime );
$metadata->setCacheRevisionId( $revId );
2007-04-21 21:35:21 +00:00
$parserOutputKey = $this->makeParserOutputKey(
$page,
$popts,
$metadata->getUsedOptions()
);
$msg = "Saved in parser cache with key $parserOutputKey" .
" and timestamp $cacheTime" .
" and revision id $revId.";
$reason = $popts->getRenderReason();
$msg .= " Rendering was triggered because: $reason";
$parserOutput->addCacheMessage( $msg );
$pageKey = $this->makeMetadataKey( $page );
$parserOutputData = $this->convertForCache( $parserOutput, $parserOutputKey );
$metadataData = $this->convertForCache( $metadata, $pageKey );
if ( !$parserOutputData || !$metadataData ) {
$this->logger->warning(
'Parser output failed to serialize and was not saved',
[ 'name' => $this->name ]
);
$this->incrementStats( $page, 'save_nonserializable' );
return;
}
// Save the parser output
$this->cache->set(
$parserOutputKey,
$parserOutputData,
$expire,
BagOStuff::WRITE_ALLOW_SEGMENTS
);
// ...and its pointer to the local cache.
$this->metadataProcCache->set( $pageKey, $metadataData, $expire );
// ...and to the global cache.
$this->cache->set( $pageKey, $metadataData, $expire );
$title = $this->titleFactory->newFromPageIdentity( $page );
$this->hookRunner->onParserCacheSaveComplete( $this, $parserOutput, $title, $popts, $revId );
$this->logger->debug( 'Saved in parser cache', [
'name' => $this->name,
'key' => $parserOutputKey,
'cache_time' => $cacheTime,
'rev_id' => $revId
] );
$this->incrementStats( $page, 'save_success' );
$reasonKey = preg_replace( '/\W+/', '_', $popts->getRenderReason() );
$this->incrementStats( $page, "reason.$reasonKey" );
}
/**
* Get the backend BagOStuff instance that
* powers the parser cache
*
* @since 1.30
* @internal
* @return BagOStuff
*/
public function getCacheStorage() {
return $this->cache;
}
/**
* Check if $entry expired for $page given the $staleConstraint
* when fetching from $cacheTier.
* @param CacheTime $entry
* @param PageRecord $page
* @param int $staleConstraint One of USE_* constants.
* @param string $cacheTier
* @return bool
*/
private function checkExpired(
CacheTime $entry,
PageRecord $page,
int $staleConstraint,
string $cacheTier
): bool {
if ( $staleConstraint < self::USE_EXPIRED && $entry->expired( $page->getTouched() ) ) {
$this->incrementStats( $page, 'miss_expired' );
$this->logger->debug( "{$cacheTier} key expired", [
'name' => $this->name,
'touched' => $page->getTouched(),
'epoch' => $this->cacheEpoch,
'cache_time' => $entry->getCacheTime()
] );
return true;
}
return false;
}
/**
* Check if $entry belongs to the latest revision of $page
* given $staleConstraint when fetched from $cacheTier.
* @param CacheTime $entry
* @param PageRecord $page
* @param int $staleConstraint One of USE_* constants.
* @param string $cacheTier
* @return bool
*/
private function checkOutdated(
CacheTime $entry,
PageRecord $page,
int $staleConstraint,
string $cacheTier
): bool {
$latestRevId = $page->getLatest( PageRecord::LOCAL );
if ( $staleConstraint < self::USE_OUTDATED && $entry->isDifferentRevision( $latestRevId ) ) {
$this->incrementStats( $page, "miss_revid" );
$this->logger->debug( "{$cacheTier} key is for an old revision", [
'name' => $this->name,
'rev_id' => $latestRevId,
'cached_rev_id' => $entry->getCacheRevisionId()
] );
return true;
}
return false;
}
/**
* @param string $jsonData
* @param string $key
* @param string $expectedClass
* @return CacheTime|ParserOutput|null
*/
private function restoreFromJson( string $jsonData, string $key, string $expectedClass ) {
try {
/** @var CacheTime $obj */
$obj = $this->jsonCodec->unserialize( $jsonData, $expectedClass );
return $obj;
} catch ( JsonException $e ) {
$this->logger->error( "Unable to unserialize JSON", [
'name' => $this->name,
'cache_key' => $key,
'message' => $e->getMessage()
] );
return null;
} catch ( Exception $e ) {
$this->logger->error( "Unexpected failure during cache load", [
'name' => $this->name,
'cache_key' => $key,
'message' => $e->getMessage()
] );
return null;
}
}
/**
* @param CacheTime $obj
* @param string $key
* @return string|null
*/
protected function convertForCache( CacheTime $obj, string $key ) {
try {
return $this->jsonCodec->serialize( $obj );
} catch ( JsonException $e ) {
$this->logger->error( "Unable to serialize JSON", [
'name' => $this->name,
'cache_key' => $key,
'message' => $e->getMessage(),
] );
return null;
}
}
}