wiki.techinc.nl/includes/content/WikitextContentHandler.php

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

431 lines
13 KiB
PHP
Raw Normal View History

<?php
/**
* Content handler for wiki text pages.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @since 1.21
*
* @file
* @ingroup Content
*/
namespace MediaWiki\Content;
use MediaWiki\Content\Renderer\ContentParseParams;
use MediaWiki\Content\Transform\PreloadTransformParams;
use MediaWiki\Content\Transform\PreSaveTransformParams;
use MediaWiki\Languages\LanguageNameUtils;
use MediaWiki\Linker\LinkRenderer;
use MediaWiki\Logger\LoggerFactory;
use MediaWiki\Parser\MagicWordFactory;
use MediaWiki\Parser\ParserFactory;
use MediaWiki\Parser\ParserOutput;
use MediaWiki\Parser\ParserOutputFlags;
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
use MediaWiki\Parser\Parsoid\ParsoidParserFactory;
use MediaWiki\Revision\RevisionRecord;
use MediaWiki\Title\Title;
use MediaWiki\Title\TitleFactory;
use SearchEngine;
use SearchIndexField;
use Wikimedia\UUID\GlobalIdGenerator;
use WikiPage;
/**
* Content handler for wiki text pages.
*
* @ingroup Content
*/
class WikitextContentHandler extends TextContentHandler {
private TitleFactory $titleFactory;
private ParserFactory $parserFactory;
private GlobalIdGenerator $globalIdGenerator;
private LanguageNameUtils $languageNameUtils;
private LinkRenderer $linkRenderer;
private MagicWordFactory $magicWordFactory;
private ParsoidParserFactory $parsoidParserFactory;
public function __construct(
string $modelId,
TitleFactory $titleFactory,
ParserFactory $parserFactory,
GlobalIdGenerator $globalIdGenerator,
LanguageNameUtils $languageNameUtils,
LinkRenderer $linkRenderer,
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
MagicWordFactory $magicWordFactory,
ParsoidParserFactory $parsoidParserFactory
) {
// $modelId should always be CONTENT_MODEL_WIKITEXT
parent::__construct( $modelId, [ CONTENT_FORMAT_WIKITEXT ] );
$this->titleFactory = $titleFactory;
$this->parserFactory = $parserFactory;
$this->globalIdGenerator = $globalIdGenerator;
$this->languageNameUtils = $languageNameUtils;
$this->linkRenderer = $linkRenderer;
$this->magicWordFactory = $magicWordFactory;
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
$this->parsoidParserFactory = $parsoidParserFactory;
}
/**
* @return class-string<WikitextContent>
*/
protected function getContentClass() {
return WikitextContent::class;
}
/**
* Returns a WikitextContent object representing a redirect to the given destination page.
*
* @param Title $destination The page to redirect to.
* @param string $text Text to include in the redirect, if possible.
*
* @return Content
*
* @see ContentHandler::makeRedirectContent
*/
public function makeRedirectContent( Title $destination, $text = '' ) {
$optionalColon = '';
if ( $destination->getNamespace() === NS_CATEGORY ) {
$optionalColon = ':';
} else {
$iw = $destination->getInterwiki();
if ( $iw && $this->languageNameUtils->getLanguageName( $iw,
LanguageNameUtils::AUTONYMS,
LanguageNameUtils::DEFINED
) ) {
$optionalColon = ':';
}
}
$mwRedir = $this->magicWordFactory->get( 'redirect' );
$redirectText = $mwRedir->getSynonym( 0 ) .
' [[' . $optionalColon . $destination->getFullText() . ']]';
if ( $text != '' ) {
$redirectText .= "\n" . $text;
}
$class = $this->getContentClass();
return new $class( $redirectText );
}
/**
* Returns true because wikitext supports redirects.
*
* @return bool Always true.
*
* @see ContentHandler::supportsRedirects
*/
public function supportsRedirects() {
return true;
}
/**
* Returns true because wikitext supports sections.
*
* @return bool Always true.
*
* @see ContentHandler::supportsSections
*/
public function supportsSections() {
return true;
}
/**
* Returns true, because wikitext supports caching using the
* ParserCache mechanism.
*
* @since 1.21
*
* @return bool Always true.
*
* @see ContentHandler::isParserCacheSupported
*/
public function isParserCacheSupported() {
return true;
}
/** @inheritDoc */
public function supportsPreloadContent(): bool {
return true;
}
/**
* @return FileContentHandler
*/
protected function getFileHandler() {
return new FileContentHandler(
$this->getModelID(),
$this->titleFactory,
$this->parserFactory,
$this->globalIdGenerator,
$this->languageNameUtils,
$this->linkRenderer,
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
$this->magicWordFactory,
$this->parsoidParserFactory
);
}
public function getFieldsForSearchIndex( SearchEngine $engine ) {
$fields = parent::getFieldsForSearchIndex( $engine );
$fields['heading'] =
$engine->makeSearchFieldMapping( 'heading', SearchIndexField::INDEX_TYPE_TEXT );
$fields['heading']->setFlag( SearchIndexField::FLAG_SCORING );
$fields['auxiliary_text'] =
$engine->makeSearchFieldMapping( 'auxiliary_text', SearchIndexField::INDEX_TYPE_TEXT );
$fields['opening_text'] =
$engine->makeSearchFieldMapping( 'opening_text', SearchIndexField::INDEX_TYPE_TEXT );
$fields['opening_text']->setFlag(
SearchIndexField::FLAG_SCORING | SearchIndexField::FLAG_NO_HIGHLIGHT
);
// Until we have the full first-class content handler for files, we invoke it explicitly here
return array_merge( $fields, $this->getFileHandler()->getFieldsForSearchIndex( $engine ) );
}
public function getDataForSearchIndex(
WikiPage $page,
ParserOutput $parserOutput,
SearchEngine $engine,
?RevisionRecord $revision = null
) {
$fields = parent::getDataForSearchIndex( $page, $parserOutput, $engine, $revision );
$structure = new WikiTextStructure( $parserOutput );
$fields['heading'] = $structure->headings();
// text fields
$fields['opening_text'] = $structure->getOpeningText();
$fields['text'] = $structure->getMainText(); // overwrites one from ContentHandler
$fields['auxiliary_text'] = $structure->getAuxiliaryText();
$fields['defaultsort'] = $structure->getDefaultSort();
$fields['file_text'] = null;
// Until we have the full first-class content handler for files, we invoke it explicitly here
if ( $page->getTitle()->getNamespace() === NS_FILE ) {
$fields = array_merge(
$fields,
$this->getFileHandler()->getDataForSearchIndex( $page, $parserOutput, $engine, $revision )
);
}
return $fields;
}
/**
* Returns the content's text as-is.
*
* @param Content $content
* @param string|null $format The serialization format to check
*
* @return mixed
*/
public function serializeContent( Content $content, $format = null ) {
$this->checkFormat( $format );
return parent::serializeContent( $content, $format );
}
public function preSaveTransform(
Content $content,
PreSaveTransformParams $pstParams
): Content {
'@phan-var WikitextContent $content';
$text = $content->getText();
$parser = $this->parserFactory->getInstance();
$pst = $parser->preSaveTransform(
$text,
$pstParams->getPage(),
$pstParams->getUser(),
$pstParams->getParserOptions()
);
if ( $text === $pst ) {
return $content;
}
$contentClass = $this->getContentClass();
$ret = new $contentClass( $pst );
$ret->setPreSaveTransformFlags( $parser->getOutput()->getAllFlags() );
return $ret;
}
/**
* Returns a Content object with preload transformations applied (or this
* object if no transformations apply).
*
* @param Content $content
* @param PreloadTransformParams $pltParams
*
* @return Content
*/
public function preloadTransform(
Content $content,
PreloadTransformParams $pltParams
): Content {
'@phan-var WikitextContent $content';
$text = $content->getText();
$plt = $this->parserFactory->getInstance()->getPreloadText(
$text,
$pltParams->getPage(),
$pltParams->getParserOptions(),
$pltParams->getParams()
);
$contentClass = $this->getContentClass();
return new $contentClass( $plt );
}
/**
* Extract the redirect target and the remaining text on the page.
*
* @since 1.41 (used to be a method on WikitextContent since 1.23)
*
* @return array List of two elements: LinkTarget|null and WikitextContent object.
*/
public function extractRedirectTargetAndText( WikitextContent $content ): array {
$redir = $this->magicWordFactory->get( 'redirect' );
$text = ltrim( $content->getText() );
if ( !$redir->matchStartAndRemove( $text ) ) {
return [ null, $content ];
}
// Extract the first link and see if it's usable
// Ensure that it really does come directly after #REDIRECT
// Some older redirects included a colon, so don't freak about that!
$m = [];
if ( preg_match( '!^\s*:?\s*\[{2}(.*?)(?:\|.*?)?\]{2}\s*!', $text, $m ) ) {
// Strip preceding colon used to "escape" categories, etc.
// and URL-decode links
if ( strpos( $m[1], '%' ) !== false ) {
// Match behavior of inline link parsing here;
$m[1] = rawurldecode( ltrim( $m[1], ':' ) );
}
// TODO: Move isValidRedirectTarget() out Title, so we can use a TitleValue here.
$title = $this->titleFactory->newFromText( $m[1] );
// If the title is a redirect to bad special pages or is invalid, return null
if ( !$title instanceof Title || !$title->isValidRedirectTarget() ) {
return [ null, $content ];
}
$remainingContent = new WikitextContent( substr( $text, strlen( $m[0] ) ) );
return [ $title, $remainingContent ];
}
return [ null, $content ];
}
/**
* Returns a ParserOutput object resulting from parsing the content's text
* using the global Parser service.
*
* @since 1.38
*
* @param Content $content
* @param ContentParseParams $cpoParams
Narrow the signature of ParserOutput::addModules() and ::addModuleStyles() We always implicitly converted a string argument to an array anyway; just ask the caller to do this instead so that we can have a simpler and more straight-forward method signature which matches the plural form of the method name. Part of the ParserOutput API cleanup / Parsoid unification discussed in T287216. In a number of places we also rename $out to $parserOutput, to make it easier for codesearch (and human readers) to distinguish between ParserOutput and OutputPage methods. Code search: https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T296123 Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6 Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0 Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1 Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095 Depends-On: I52dda72aee6c7784a8961488c437863e31affc17 Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97 Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900 Depends-On: If54244a0278d532c8553029c487c916068e1300f Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81 Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0 Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21 Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4 Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1 Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493 Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562 Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095 Depends-On: If34330acf97d2c4e357b693b086264a718738fb1 Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
2022-01-11 20:15:36 +00:00
* @param ParserOutput &$parserOutput The output object to fill (reference).
*/
protected function fillParserOutput(
Content $content,
ContentParseParams $cpoParams,
Narrow the signature of ParserOutput::addModules() and ::addModuleStyles() We always implicitly converted a string argument to an array anyway; just ask the caller to do this instead so that we can have a simpler and more straight-forward method signature which matches the plural form of the method name. Part of the ParserOutput API cleanup / Parsoid unification discussed in T287216. In a number of places we also rename $out to $parserOutput, to make it easier for codesearch (and human readers) to distinguish between ParserOutput and OutputPage methods. Code search: https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T296123 Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6 Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0 Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1 Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095 Depends-On: I52dda72aee6c7784a8961488c437863e31affc17 Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97 Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900 Depends-On: If54244a0278d532c8553029c487c916068e1300f Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81 Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0 Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21 Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4 Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1 Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493 Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562 Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095 Depends-On: If34330acf97d2c4e357b693b086264a718738fb1 Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
2022-01-11 20:15:36 +00:00
ParserOutput &$parserOutput
) {
'@phan-var WikitextContent $content';
$title = $this->titleFactory->newFromPageReference( $cpoParams->getPage() );
$parserOptions = $cpoParams->getParserOptions();
$revId = $cpoParams->getRevId();
[ $redir, $contentWithoutRedirect ] = $this->extractRedirectTargetAndText( $content );
if ( $parserOptions->getUseParsoid() ) {
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
$parser = $this->parsoidParserFactory->create();
// Parsoid renders the #REDIRECT magic word as an invisible
// <link> tag and doesn't require it to be stripped.
// T349087: ...and in fact, RESTBase relies on getting
// redirect information from this <link> tag, so it needs
// to be present.
// Further, Parsoid can accept a Content in place of a string.
$text = $content;
$extraArgs = [ $cpoParams->getPreviousOutput() ];
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
} else {
// The legacy parser requires the #REDIRECT magic word to
// be stripped from the content before parsing.
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
$parser = $this->parserFactory->getInstance();
$text = $contentWithoutRedirect->getText();
$extraArgs = [];
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
}
$time = -microtime( true );
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
$parserOutput = $parser
->parse( $text, $title, $parserOptions, true, true, $revId, ...$extraArgs );
$time += microtime( true );
// Timing hack
if ( $time > 3 ) {
// TODO: Use Parser's logger (once it has one)
$channel = $parserOptions->getUseParsoid() ? 'slow-parsoid' : 'slow-parse';
$logger = LoggerFactory::getInstance( $channel );
$logger->info( 'Parsing {title} was slow, took {time} seconds', [
'time' => number_format( $time, 2 ),
'title' => (string)$title,
'trigger' => $parserOptions->getRenderReason(),
] );
}
Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should *always* be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2022-05-27 16:38:32 +00:00
// T330667: Record the fact that we used the value of
// 'useParsoid' to influence this parse. Note that
// ::getUseParsoid() has a side-effect on $parserOutput here
// which didn't occur when we called ::getUseParsoid() earlier
// because $parserOutput didn't exist at that time.
$parserOptions->getUseParsoid();
// Add redirect indicator at the top
if ( $redir ) {
// Make sure to include the redirect link in pagelinks
Narrow the signature of ParserOutput::addModules() and ::addModuleStyles() We always implicitly converted a string argument to an array anyway; just ask the caller to do this instead so that we can have a simpler and more straight-forward method signature which matches the plural form of the method name. Part of the ParserOutput API cleanup / Parsoid unification discussed in T287216. In a number of places we also rename $out to $parserOutput, to make it easier for codesearch (and human readers) to distinguish between ParserOutput and OutputPage methods. Code search: https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T296123 Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6 Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0 Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1 Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095 Depends-On: I52dda72aee6c7784a8961488c437863e31affc17 Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97 Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900 Depends-On: If54244a0278d532c8553029c487c916068e1300f Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81 Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0 Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21 Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4 Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1 Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493 Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562 Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095 Depends-On: If34330acf97d2c4e357b693b086264a718738fb1 Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
2022-01-11 20:15:36 +00:00
$parserOutput->addLink( $redir );
if ( $cpoParams->getGenerateHtml() ) {
$parserOutput->setRedirectHeader(
$this->linkRenderer->makeRedirectHeader(
$title->getPageLanguage(), $redir, false
)
);
Narrow the signature of ParserOutput::addModules() and ::addModuleStyles() We always implicitly converted a string argument to an array anyway; just ask the caller to do this instead so that we can have a simpler and more straight-forward method signature which matches the plural form of the method name. Part of the ParserOutput API cleanup / Parsoid unification discussed in T287216. In a number of places we also rename $out to $parserOutput, to make it easier for codesearch (and human readers) to distinguish between ParserOutput and OutputPage methods. Code search: https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T296123 Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6 Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0 Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1 Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095 Depends-On: I52dda72aee6c7784a8961488c437863e31affc17 Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97 Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900 Depends-On: If54244a0278d532c8553029c487c916068e1300f Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81 Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0 Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21 Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4 Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1 Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493 Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562 Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095 Depends-On: If34330acf97d2c4e357b693b086264a718738fb1 Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
2022-01-11 20:15:36 +00:00
$parserOutput->addModuleStyles( [ 'mediawiki.action.view.redirectPage' ] );
} else {
$parserOutput->setRawText( null );
}
}
// Pass along user-signature flag
if ( in_array( 'user-signature', $content->getPreSaveTransformFlags() ) ) {
Narrow the signature of ParserOutput::addModules() and ::addModuleStyles() We always implicitly converted a string argument to an array anyway; just ask the caller to do this instead so that we can have a simpler and more straight-forward method signature which matches the plural form of the method name. Part of the ParserOutput API cleanup / Parsoid unification discussed in T287216. In a number of places we also rename $out to $parserOutput, to make it easier for codesearch (and human readers) to distinguish between ParserOutput and OutputPage methods. Code search: https://codesearch.wmcloud.org/deployed/?q=p%28arser%29%3F%28Out%7Cout%29%28put%29%3F-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= https://codesearch.wmcloud.org/deployed/?q=arser-%3EgetOutput%5C%28%5C%29-%3EaddModule%28Style%29%3Fs%5C%28&i=nope&files=&excludeFiles=&repos= Bug: T296123 Depends-On: Iedea960bd450474966eb60ff8dfbf31c127025b6 Depends-On: I7900c5746a9ea75ce4918ffd97d45128038ab3f0 Depends-On: If29dc1d696b3a4c249fa9b150cedf2a502796ea1 Depends-On: I8f1bc7233a00382123a9b1b0bb549bd4dbc4a095 Depends-On: I52dda72aee6c7784a8961488c437863e31affc17 Depends-On: Ia1dcc86cb64f6aa39c68403d37bd76f970e55b97 Depends-On: Ib89ef9c900514d50173e13ab49d17c312b729900 Depends-On: If54244a0278d532c8553029c487c916068e1300f Depends-On: I8d9b34f5d1ed5b1534bb29f5cd6edcdc086b71ca Depends-On: I068f9f8e85e88a5c457d40e6a92f09b7eddd6b81 Depends-On: Iced2fc7b4f3cda5296532f22d233875bbc2f5d1b Depends-On: If14866f76703aa62d33e197bb18a5eacde7a55c0 Depends-On: I9b7fe5acee73c3a378153c0820b46816164ebf21 Depends-On: I95858c08bce0d90709ac7771a910f73d78cc8be4 Depends-On: If9a70e8f8545d4f9ee3b605ad849dbd7de742fc1 Depends-On: I982c81e1ad73b58a90649648e19501cf9172d493 Depends-On: I53a8fd22b22c93bba703233b62377c49ba9f5562 Depends-On: Ic532bca4348b17882716fcb2ca8656a04766c095 Depends-On: If34330acf97d2c4e357b693b086264a718738fb1 Change-Id: Ie4d6bbe258cc483d5693f7a27dbccb60d8f37e2c
2022-01-11 20:15:36 +00:00
$parserOutput->setOutputFlag( ParserOutputFlags::USER_SIGNATURE );
}
}
}
/** @deprecated class alias since 1.43 */
class_alias( WikitextContentHandler::class, 'WikitextContentHandler' );