wiki.techinc.nl/maintenance/benchmarks/benchmarkParse.php

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

209 lines
6.7 KiB
PHP
Raw Normal View History

<?php
/**
* Benchmark script for parse operations
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @file
* @author Tim Starling <tstarling@wikimedia.org>
* @ingroup Benchmark
*/
// @codeCoverageIgnoreStart
Safer autoloading with respect to file-scope code Many files were in the autoloader despite having potentially harmful file-scope code. * Exclude all CommandLineInc maintenance scripts from the autoloader. * Introduce "NO_AUTOLOAD" tag which excludes the file containing it from the autoloader. Use it on CommandLineInc.php and a few suspicious-looking files without classes in case they are refactored to add classes in the future. * Add a test which parses all non-PSR4 class files and confirms that they do not contain dangerous file-scope code. It's slow (15s) but its results were enlightening. * Several maintenance scripts define constants in the file scope, intending to modify the behaviour of MediaWiki. Either move the define() to a later setup function, or protect with NO_AUTOLOAD. * Use require_once consistently with Maintenance.php and doMaintenance.php, per the original convention which is supposed to allow one maintenance script to use the class of another maintenance script. Using require breaks autoloading of these maintenance class files. * When Maintenance.php is included, check if MediaWiki has already started, and if so, return early. Revert the fix for T250003 which is incompatible with this safety measure. Hopefully it was superseded by splitting out the class file. * In runScript.php add a redundant PHP_SAPI check since it does some things in file-scope code before any other check will be run. * Change the if(false) class_alias(...) to something more hackish and more compatible with the new test. * Some site-related scripts found Maintenance.php in a non-standard way. Use the standard way. * fileOpPerfTest.php called error_reporting(). Probably debugging code left in; removed. * Moved mediawiki.compress.7z registration from the class file to the caller. Change-Id: I1b1be90343a5ab678df6f1b1bdd03319dcf6537f
2021-01-08 02:16:02 +00:00
require_once __DIR__ . '/../Maintenance.php';
// @codeCoverageIgnoreEnd
use MediaWiki\Cache\LinkCache;
parser: new BeforeParserFetchTemplateRevisionRecord hook This new hook provides for the use case in T47096 (allowing the Translate extension to transclude a page from another language) by adding a new hook which would let us deprecate and replace two awkward legacy hooks (one with an embarrassing capitalization issue). The new hook is a little more tightly scoped in terms of what it allows and gives access to, and it uses the new RevisionRecord API. In addition, the new hook uses LinkTarget instead of Title per current best practices. (PageIdentity is not appropriate for reasons documented at the hook invocation site.) The original BeforeParserFetchTemplateAndtitle (sic) hook allowed redirecting the revision id of a template inclusion, but not the title. The only known current use is Extension:ApprovedRevs; the FlaggedRevs extension replaces the entire function using ParserOptions::setCurrentRevisionRecordCallback(). Extension:Translate would like to redirect the title as well, possibly recursively (for a limited number of hops) to handle fallback languages. That is, when invoked on Foo/fr, including Template:Bar would redirect to Template:Bar/fr -- and, if that doesn't exist, then Template:Bar/fr would redirect to its fallback language, say Template:Bar/en. It uses the top-level page title as context to set the desired page language. This would require 2 invocations of the hook; we've set the recursion limit to 3 to provide a little bit of future-proofing. The hook added in this patch uses RevisionRecord instead of int $rev_id, and thus can handle the case where the redirect is to a page which doesn't exist (by setting the RevisionRecord to a MutableRevisionRecord with the correct title and no main slot content) in the fallback language case above. The new hook deprecates BeforeParserFetchTemplateAndtitle and replaces ParserFetchTemplate as well (deprecated in 1.35). Code search: https://codesearch.wmcloud.org/search/?q=BeforeParserFetchTemplateAndtitle&i=nope&files=&repos= Bug: T47096 Change-Id: Ia5b5d339706ce4084c16948300e0e3418b11792e
2020-07-29 23:32:45 +00:00
use MediaWiki\Linker\LinkTarget;
use MediaWiki\Maintenance\Maintenance;
use MediaWiki\Revision\RevisionRecord;
use MediaWiki\Revision\SlotRecord;
use MediaWiki\Title\Title;
use Wikimedia\Rdbms\SelectQueryBuilder;
/**
* Maintenance script to benchmark how long it takes to parse a given title at an optionally
* specified timestamp
*
* @since 1.23
*/
class BenchmarkParse extends Maintenance {
/** @var string MediaWiki concatenated string timestamp (YYYYMMDDHHMMSS) */
private $templateTimestamp = null;
/** @var bool */
private $clearLinkCache = false;
/**
* @var LinkCache
*/
private $linkCache;
/** @var array Cache that maps a Title DB key to revision ID for the requested timestamp */
private $idCache = [];
public function __construct() {
parent::__construct();
$this->addDescription( 'Benchmark parse operation' );
$this->addArg( 'title', 'The name of the page to parse' );
$this->addOption( 'warmup', 'Repeat the parse operation this number of times to warm the cache',
false, true );
$this->addOption( 'loops', 'Number of times to repeat parse operation post-warmup',
false, true );
$this->addOption( 'page-time',
'Use the version of the page which was current at the given time',
false, true );
$this->addOption( 'tpl-time',
'Use templates which were current at the given time (except that moves and ' .
'deletes are not handled properly)',
false, true );
$this->addOption( 'reset-linkcache', 'Reset the LinkCache after every parse.',
false, false );
}
public function execute() {
if ( $this->hasOption( 'tpl-time' ) ) {
$this->templateTimestamp = wfTimestamp( TS_MW, strtotime( $this->getOption( 'tpl-time' ) ) );
$hookContainer = $this->getHookContainer();
$hookContainer->register( 'BeforeParserFetchTemplateRevisionRecord', [ $this, 'onFetchTemplate' ] );
}
$this->clearLinkCache = $this->hasOption( 'reset-linkcache' );
// Set as a member variable to avoid function calls when we're timing the parse
$this->linkCache = $this->getServiceContainer()->getLinkCache();
$title = Title::newFromText( $this->getArg( 0 ) );
if ( !$title ) {
$this->fatalError( "Invalid title" );
}
$revLookup = $this->getServiceContainer()->getRevisionLookup();
if ( $this->hasOption( 'page-time' ) ) {
$pageTimestamp = wfTimestamp( TS_MW, strtotime( $this->getOption( 'page-time' ) ) );
$id = $this->getRevIdForTime( $title, $pageTimestamp );
if ( !$id ) {
$this->fatalError( "The page did not exist at that time" );
}
$revision = $revLookup->getRevisionById( (int)$id );
} else {
$revision = $revLookup->getRevisionByTitle( $title );
}
if ( !$revision ) {
$this->fatalError( "Unable to load revision, incorrect title?" );
}
$warmup = $this->getOption( 'warmup', 1 );
for ( $i = 0; $i < $warmup; $i++ ) {
$this->runParser( $revision );
}
$loops = $this->getOption( 'loops', 1 );
if ( $loops < 1 ) {
$this->fatalError( 'Invalid number of loops specified' );
}
$startUsage = getrusage();
$startTime = microtime( true );
for ( $i = 0; $i < $loops; $i++ ) {
$this->runParser( $revision );
}
$endUsage = getrusage();
$endTime = microtime( true );
printf( "CPU time = %.3f s, wall clock time = %.3f s\n",
// CPU time
( $endUsage['ru_utime.tv_sec'] + $endUsage['ru_utime.tv_usec'] * 1e-6
- $startUsage['ru_utime.tv_sec'] - $startUsage['ru_utime.tv_usec'] * 1e-6 ) / $loops,
// Wall clock time
( $endTime - $startTime ) / $loops
);
}
/**
* Fetch the ID of the revision of a Title that occurred
*
* @param Title $title
* @param string $timestamp
* @return bool|string Revision ID, or false if not found or error
*/
private function getRevIdForTime( Title $title, $timestamp ) {
$dbr = $this->getReplicaDB();
$id = $dbr->newSelectQueryBuilder()
->select( 'rev_id' )
->from( 'revision' )
->join( 'page', null, 'rev_page=page_id' )
->where( [ 'page_namespace' => $title->getNamespace(), 'page_title' => $title->getDBkey() ] )
->andWhere( $dbr->expr( 'rev_timestamp', '<=', $timestamp ) )
->orderBy( 'rev_timestamp', SelectQueryBuilder::SORT_DESC )
->caller( __METHOD__ )->fetchField();
return $id;
}
/**
* Parse the text from a given RevisionRecord
*
* @param RevisionRecord $revision
*/
private function runParser( RevisionRecord $revision ) {
$content = $revision->getContent( SlotRecord::MAIN );
$contentRenderer = $this->getServiceContainer()->getContentRenderer();
// @phan-suppress-next-line PhanTypeMismatchArgumentNullable getId does not return null here
$contentRenderer->getParserOutput( $content, $revision->getPage(), $revision->getId() );
if ( $this->clearLinkCache ) {
$this->linkCache->clear();
}
}
/**
* Hook into the parser's revision ID fetcher. Make sure that the parser only
* uses revisions around the specified timestamp.
*
parser: new BeforeParserFetchTemplateRevisionRecord hook This new hook provides for the use case in T47096 (allowing the Translate extension to transclude a page from another language) by adding a new hook which would let us deprecate and replace two awkward legacy hooks (one with an embarrassing capitalization issue). The new hook is a little more tightly scoped in terms of what it allows and gives access to, and it uses the new RevisionRecord API. In addition, the new hook uses LinkTarget instead of Title per current best practices. (PageIdentity is not appropriate for reasons documented at the hook invocation site.) The original BeforeParserFetchTemplateAndtitle (sic) hook allowed redirecting the revision id of a template inclusion, but not the title. The only known current use is Extension:ApprovedRevs; the FlaggedRevs extension replaces the entire function using ParserOptions::setCurrentRevisionRecordCallback(). Extension:Translate would like to redirect the title as well, possibly recursively (for a limited number of hops) to handle fallback languages. That is, when invoked on Foo/fr, including Template:Bar would redirect to Template:Bar/fr -- and, if that doesn't exist, then Template:Bar/fr would redirect to its fallback language, say Template:Bar/en. It uses the top-level page title as context to set the desired page language. This would require 2 invocations of the hook; we've set the recursion limit to 3 to provide a little bit of future-proofing. The hook added in this patch uses RevisionRecord instead of int $rev_id, and thus can handle the case where the redirect is to a page which doesn't exist (by setting the RevisionRecord to a MutableRevisionRecord with the correct title and no main slot content) in the fallback language case above. The new hook deprecates BeforeParserFetchTemplateAndtitle and replaces ParserFetchTemplate as well (deprecated in 1.35). Code search: https://codesearch.wmcloud.org/search/?q=BeforeParserFetchTemplateAndtitle&i=nope&files=&repos= Bug: T47096 Change-Id: Ia5b5d339706ce4084c16948300e0e3418b11792e
2020-07-29 23:32:45 +00:00
* @param ?LinkTarget $contextTitle
* @param LinkTarget $titleTarget
* @param bool &$skip
parser: new BeforeParserFetchTemplateRevisionRecord hook This new hook provides for the use case in T47096 (allowing the Translate extension to transclude a page from another language) by adding a new hook which would let us deprecate and replace two awkward legacy hooks (one with an embarrassing capitalization issue). The new hook is a little more tightly scoped in terms of what it allows and gives access to, and it uses the new RevisionRecord API. In addition, the new hook uses LinkTarget instead of Title per current best practices. (PageIdentity is not appropriate for reasons documented at the hook invocation site.) The original BeforeParserFetchTemplateAndtitle (sic) hook allowed redirecting the revision id of a template inclusion, but not the title. The only known current use is Extension:ApprovedRevs; the FlaggedRevs extension replaces the entire function using ParserOptions::setCurrentRevisionRecordCallback(). Extension:Translate would like to redirect the title as well, possibly recursively (for a limited number of hops) to handle fallback languages. That is, when invoked on Foo/fr, including Template:Bar would redirect to Template:Bar/fr -- and, if that doesn't exist, then Template:Bar/fr would redirect to its fallback language, say Template:Bar/en. It uses the top-level page title as context to set the desired page language. This would require 2 invocations of the hook; we've set the recursion limit to 3 to provide a little bit of future-proofing. The hook added in this patch uses RevisionRecord instead of int $rev_id, and thus can handle the case where the redirect is to a page which doesn't exist (by setting the RevisionRecord to a MutableRevisionRecord with the correct title and no main slot content) in the fallback language case above. The new hook deprecates BeforeParserFetchTemplateAndtitle and replaces ParserFetchTemplate as well (deprecated in 1.35). Code search: https://codesearch.wmcloud.org/search/?q=BeforeParserFetchTemplateAndtitle&i=nope&files=&repos= Bug: T47096 Change-Id: Ia5b5d339706ce4084c16948300e0e3418b11792e
2020-07-29 23:32:45 +00:00
* @param ?RevisionRecord &$revRecord
* @return bool
*/
parser: new BeforeParserFetchTemplateRevisionRecord hook This new hook provides for the use case in T47096 (allowing the Translate extension to transclude a page from another language) by adding a new hook which would let us deprecate and replace two awkward legacy hooks (one with an embarrassing capitalization issue). The new hook is a little more tightly scoped in terms of what it allows and gives access to, and it uses the new RevisionRecord API. In addition, the new hook uses LinkTarget instead of Title per current best practices. (PageIdentity is not appropriate for reasons documented at the hook invocation site.) The original BeforeParserFetchTemplateAndtitle (sic) hook allowed redirecting the revision id of a template inclusion, but not the title. The only known current use is Extension:ApprovedRevs; the FlaggedRevs extension replaces the entire function using ParserOptions::setCurrentRevisionRecordCallback(). Extension:Translate would like to redirect the title as well, possibly recursively (for a limited number of hops) to handle fallback languages. That is, when invoked on Foo/fr, including Template:Bar would redirect to Template:Bar/fr -- and, if that doesn't exist, then Template:Bar/fr would redirect to its fallback language, say Template:Bar/en. It uses the top-level page title as context to set the desired page language. This would require 2 invocations of the hook; we've set the recursion limit to 3 to provide a little bit of future-proofing. The hook added in this patch uses RevisionRecord instead of int $rev_id, and thus can handle the case where the redirect is to a page which doesn't exist (by setting the RevisionRecord to a MutableRevisionRecord with the correct title and no main slot content) in the fallback language case above. The new hook deprecates BeforeParserFetchTemplateAndtitle and replaces ParserFetchTemplate as well (deprecated in 1.35). Code search: https://codesearch.wmcloud.org/search/?q=BeforeParserFetchTemplateAndtitle&i=nope&files=&repos= Bug: T47096 Change-Id: Ia5b5d339706ce4084c16948300e0e3418b11792e
2020-07-29 23:32:45 +00:00
private function onFetchTemplate(
?LinkTarget $contextTitle,
LinkTarget $titleTarget,
bool &$skip,
?RevisionRecord &$revRecord
): bool {
$title = Title::newFromLinkTarget( $titleTarget );
parser: new BeforeParserFetchTemplateRevisionRecord hook This new hook provides for the use case in T47096 (allowing the Translate extension to transclude a page from another language) by adding a new hook which would let us deprecate and replace two awkward legacy hooks (one with an embarrassing capitalization issue). The new hook is a little more tightly scoped in terms of what it allows and gives access to, and it uses the new RevisionRecord API. In addition, the new hook uses LinkTarget instead of Title per current best practices. (PageIdentity is not appropriate for reasons documented at the hook invocation site.) The original BeforeParserFetchTemplateAndtitle (sic) hook allowed redirecting the revision id of a template inclusion, but not the title. The only known current use is Extension:ApprovedRevs; the FlaggedRevs extension replaces the entire function using ParserOptions::setCurrentRevisionRecordCallback(). Extension:Translate would like to redirect the title as well, possibly recursively (for a limited number of hops) to handle fallback languages. That is, when invoked on Foo/fr, including Template:Bar would redirect to Template:Bar/fr -- and, if that doesn't exist, then Template:Bar/fr would redirect to its fallback language, say Template:Bar/en. It uses the top-level page title as context to set the desired page language. This would require 2 invocations of the hook; we've set the recursion limit to 3 to provide a little bit of future-proofing. The hook added in this patch uses RevisionRecord instead of int $rev_id, and thus can handle the case where the redirect is to a page which doesn't exist (by setting the RevisionRecord to a MutableRevisionRecord with the correct title and no main slot content) in the fallback language case above. The new hook deprecates BeforeParserFetchTemplateAndtitle and replaces ParserFetchTemplate as well (deprecated in 1.35). Code search: https://codesearch.wmcloud.org/search/?q=BeforeParserFetchTemplateAndtitle&i=nope&files=&repos= Bug: T47096 Change-Id: Ia5b5d339706ce4084c16948300e0e3418b11792e
2020-07-29 23:32:45 +00:00
$pdbk = $title->getPrefixedDBkey();
if ( !isset( $this->idCache[$pdbk] ) ) {
$proposedId = $this->getRevIdForTime( $title, $this->templateTimestamp );
$this->idCache[$pdbk] = $proposedId;
}
if ( $this->idCache[$pdbk] !== false ) {
$revLookup = $this->getServiceContainer()->getRevisionLookup();
parser: new BeforeParserFetchTemplateRevisionRecord hook This new hook provides for the use case in T47096 (allowing the Translate extension to transclude a page from another language) by adding a new hook which would let us deprecate and replace two awkward legacy hooks (one with an embarrassing capitalization issue). The new hook is a little more tightly scoped in terms of what it allows and gives access to, and it uses the new RevisionRecord API. In addition, the new hook uses LinkTarget instead of Title per current best practices. (PageIdentity is not appropriate for reasons documented at the hook invocation site.) The original BeforeParserFetchTemplateAndtitle (sic) hook allowed redirecting the revision id of a template inclusion, but not the title. The only known current use is Extension:ApprovedRevs; the FlaggedRevs extension replaces the entire function using ParserOptions::setCurrentRevisionRecordCallback(). Extension:Translate would like to redirect the title as well, possibly recursively (for a limited number of hops) to handle fallback languages. That is, when invoked on Foo/fr, including Template:Bar would redirect to Template:Bar/fr -- and, if that doesn't exist, then Template:Bar/fr would redirect to its fallback language, say Template:Bar/en. It uses the top-level page title as context to set the desired page language. This would require 2 invocations of the hook; we've set the recursion limit to 3 to provide a little bit of future-proofing. The hook added in this patch uses RevisionRecord instead of int $rev_id, and thus can handle the case where the redirect is to a page which doesn't exist (by setting the RevisionRecord to a MutableRevisionRecord with the correct title and no main slot content) in the fallback language case above. The new hook deprecates BeforeParserFetchTemplateAndtitle and replaces ParserFetchTemplate as well (deprecated in 1.35). Code search: https://codesearch.wmcloud.org/search/?q=BeforeParserFetchTemplateAndtitle&i=nope&files=&repos= Bug: T47096 Change-Id: Ia5b5d339706ce4084c16948300e0e3418b11792e
2020-07-29 23:32:45 +00:00
$revRecord = $revLookup->getRevisionById( $this->idCache[$pdbk] );
}
return true;
}
}
// @codeCoverageIgnoreStart
$maintClass = BenchmarkParse::class;
Safer autoloading with respect to file-scope code Many files were in the autoloader despite having potentially harmful file-scope code. * Exclude all CommandLineInc maintenance scripts from the autoloader. * Introduce "NO_AUTOLOAD" tag which excludes the file containing it from the autoloader. Use it on CommandLineInc.php and a few suspicious-looking files without classes in case they are refactored to add classes in the future. * Add a test which parses all non-PSR4 class files and confirms that they do not contain dangerous file-scope code. It's slow (15s) but its results were enlightening. * Several maintenance scripts define constants in the file scope, intending to modify the behaviour of MediaWiki. Either move the define() to a later setup function, or protect with NO_AUTOLOAD. * Use require_once consistently with Maintenance.php and doMaintenance.php, per the original convention which is supposed to allow one maintenance script to use the class of another maintenance script. Using require breaks autoloading of these maintenance class files. * When Maintenance.php is included, check if MediaWiki has already started, and if so, return early. Revert the fix for T250003 which is incompatible with this safety measure. Hopefully it was superseded by splitting out the class file. * In runScript.php add a redundant PHP_SAPI check since it does some things in file-scope code before any other check will be run. * Change the if(false) class_alias(...) to something more hackish and more compatible with the new test. * Some site-related scripts found Maintenance.php in a non-standard way. Use the standard way. * fileOpPerfTest.php called error_reporting(). Probably debugging code left in; removed. * Moved mediawiki.compress.7z registration from the class file to the caller. Change-Id: I1b1be90343a5ab678df6f1b1bdd03319dcf6537f
2021-01-08 02:16:02 +00:00
require_once RUN_MAINTENANCE_IF_MAIN;
// @codeCoverageIgnoreEnd