wiki.techinc.nl/includes/parser/Parsoid/PageBundleParserOutputConverter.php
Subramanya Sastry 68805e2f50 ParsoidParser: Record ParserOptions watcher on ParserOutput object
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
  Because of this, you can see that the current parser cache key
  (in deployed production code) doesn't have 'useParsoid=1' in it.

  Ex: View source on enwiki:Hospet shows that the parser cache key
  there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".

  The only reason this doesn't conflict with legacy parser output
  is because we use "parsoid-pcache", a diferent cache instance than
  "pcache" used for legacy parser output. But if/when we decide to use
  the same parser cache instance, this could cause cache corruptions.

  With FlaggedRevisions, where a single "stable-pcache" parser cache
  instance is used, in local testing, this was causing Parsoid HTML to be
  saved without "useParsoid=1", and so Parsoid HTML was being returned
  for legacy parser cache requests.

* In addition, fix the code in PageBundleParserOutputConverter to copy
  over internal metadata (which includes used options). This ensures
  that any tracked parser options aren't lost and the right parser cache
  key is constructed later on.

* Added / updated a number of new tests that verifies that usedOptions
  is tracked correctly in the useParsoid code paths. The tests fail
  without the code changes in this patch.

Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
2023-07-11 10:53:11 -05:00

91 lines
2.7 KiB
PHP

<?php
namespace MediaWiki\Parser\Parsoid;
use ParserOutput;
use Wikimedia\Parsoid\Core\PageBundle;
/**
* Provides methods for conversion between PageBundle and ParserOutput
* TODO: Convert to a trait once we drop support for PHP < 8.2 since
* support for constants in traits was added in PHP 8.2
* @since 1.40
* @internal
*/
final class PageBundleParserOutputConverter {
/**
* @var string Key used to store parsoid page bundle data in ParserOutput
*/
public const PARSOID_PAGE_BUNDLE_KEY = 'parsoid-page-bundle';
/**
* We do not want instances of this class to be created
* @return void
*/
private function __construct() {
}
/**
* Creates a ParserOutput object containing the relevant data from
* the given PageBundle object.
*
* We need to inject data-parsoid and other properties into the
* parser output object for caching, so we can use it for VE edits
* and transformations.
*
* @param PageBundle $pageBundle
* @param ?ParserOutput $originalParserOutput Any non-parsoid metadata
* from $originalParserOutput will be copied into the new ParserOutput object.
*
* @return ParserOutput
*/
public static function parserOutputFromPageBundle(
PageBundle $pageBundle, ?ParserOutput $originalParserOutput = null
): ParserOutput {
$parserOutput = new ParserOutput( $pageBundle->html );
if ( $originalParserOutput ) {
$parserOutput->mergeHtmlMetaDataFrom( $originalParserOutput );
$parserOutput->mergeTrackingMetaDataFrom( $originalParserOutput );
$parserOutput->mergeInternalMetaDataFrom( $originalParserOutput );
}
$parserOutput->setExtensionData(
self::PARSOID_PAGE_BUNDLE_KEY,
[
'parsoid' => $pageBundle->parsoid,
'mw' => $pageBundle->mw,
'version' => $pageBundle->version,
'headers' => $pageBundle->headers,
'contentmodel' => $pageBundle->contentmodel,
]
);
return $parserOutput;
}
/**
* Returns a Parsoid PageBundle equivalent to the given ParserOutput.
*
* @param ParserOutput $parserOutput
*
* @return PageBundle
*/
public static function pageBundleFromParserOutput( ParserOutput $parserOutput ): PageBundle {
$pageBundleData = $parserOutput->getExtensionData( self::PARSOID_PAGE_BUNDLE_KEY );
return new PageBundle(
$parserOutput->getRawText(),
$pageBundleData['parsoid'] ?? [],
$pageBundleData['mw'] ?? [],
// It would be nice to have this be "null", but PageBundle::responseData()
// chocks on that: T325137.
$pageBundleData['version'] ?? '0.0.0',
$pageBundleData['headers'] ?? [],
$pageBundleData['contentmodel'] ?? null
);
}
public static function hasPageBundle( ParserOutput $parserOutput ) {
return $parserOutput->getExtensionData( self::PARSOID_PAGE_BUNDLE_KEY ) !== null;
}
}