When we are merging a ParserOutput into a ContentMetadataCollector,
convert categories to LinkTarget, which is the preferred parameter
type of CMC::addCategory().
This also reverts the temporary fix in
I0715f4fbc870e401e5759dd7c7a3c19077c40a6a.
Note that the category names *should* be in dbkey form for proper
deduplication, but both TitleValue:tryNew() and
CategoryLinksTable::setParserOutput() will renormalize if needed
(see I2b08edd90666e0fa4eafe91444a58806909b02d6 / T328477).
Depends-On: Iea894aa2cee90f4ca5c7688493b0654e4605ce23
Change-Id: I5a903396edb4da0900ecef37cb3bf4bd03b5ba68
Due to a botched signature change on the Parsoid side, in -a8 Parsoid
only accepts `string|int` for ContentMetadataCollector::addCategory()
and in -a9 Parsoid only accept `LinkTarget`. The ParserOutput in
core, of course, accepts both. So move the code which merges
categories into the section of ContentMetadataCollector::collectMetadata()
where we know that the CMC we're merging with is really a ParserOutput.
Change-Id: I0715f4fbc870e401e5759dd7c7a3c19077c40a6a
OutputPage::getParserOutputText/addParserOutputContent expects
ParserOutput to be mutated (e.g. by
PostCacheTransformHookRunner). Hence, cloning it before running the
pipeline is breaking DiscussionTools, probably among others.
Suppress the clone for the case where the output pipeline is invoked
from ParserOutput::getText() (which is a deprecated method anyway) and
additionally suppress the side-effects to ParserOutput::$mText on that
code path.
Bug: T353257
Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org>
Co-Authored-By: Isabelle Hurbain-Palatin <ihurbainpalatin@wikimedia.org>
Change-Id: I85c690fd37b781cb27c21970467639e852113b2a
Broadened the argument type to allow passing LinkTarget to:
* ParserOutput::addCategory()
* ParserOutput::addLanguageLink()
* ParserOutput::addLink()
* ParserOutput::addImage()
* ParserOutput::addTemplate()
This allows for a tighter interface with Parsoid's
ContentMetadataCollector class and avoids errors caused by passing the
wrong form of string title ("text" with spaces versus "dbkey" with
underscores).
There are a few performance problems remaining after this patch, which
only apply to use by Parsoid (not the legacy parser):
1. ::addLink() does inefficient db requests to fetch the page id for
each link if the optional $id parameter is not passed. These lookups
should be deferred and a LinkBatch used. (The legacy parser always
passes $id.)
2. ::addTemplate() similarly requires $page_id (and $rev_id) to be
passed, so is not currently usable by Parsoid.
3. ::addLanguageLink() uses Title::getFullText() which is not present
in LinkTarget and is currently implemented as a full Title lookup.
This is not an issue for the legacy parser, because it already has a
Title object so the lookup is a no-op, but could be improved for
Parsoid's use.
Bug: T296023
Change-Id: If21ec8563c8a619bdde7c0cb6534bb9009480a21
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.
The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.
Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.
Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
Instead of waiting until ParserOutput::mergeList() is called to uniquify
the list of modules, use an array<string,true> to ensure that the
modules and module styles are a set.
Change-Id: I49673bc369dec373bce23fe7b831e6be5a256c46
== Skin::wrapHTML ==
Skin::wrapHTML no longer has to perform any guessing of the
ParserOutput language. Nor does it have to special wiki pages vs
special pages in this regard. Yay, code removal.
== ImagePage ==
On URLs like /wiki/File:Example.jpg, the main output handler is
ImagePage::view. This calls the parent Article::view to handle most of
its output. Article::view obtains the ParserOptions, and then fetches
ParserOutput, and then adds `<div class=mw-parser-output>` and its
metadata to OutputPage.
Before this change, ImagePage::view was creating a wrapper based
on "predicting" what language the ParserOutput will contain. It
couldn't call the new OutputPage::getContentLanguage or some
equivalent as Article::view wouldn't have populated that yet.
This leaky abstraction is fixed by this change as now the `<div>`
from ParserOutput no longer comes with a "please wrap it properly"
contract that Article subclasses couldn't possibly implement correctly
(it coudln't wrap it after the fact because Article::view writes to
OutputPage directly).
RECENT (T310445):
A special case was recently added for file pages about translated SVGs.
For those, we decide which language to use for the "fullMedia" thumb
atop the page. This was recently changed as part of T310445 from a
hardcoded $wgLanguageCode (site content lang) to new problematic
Title::getPageViewLanguage, which tries to guestimate the page
language of the rendered ParserOutput and then gets the preferred
variant for the current user. The motivation for this was to support
language variants but used Title::getPageViewLanguage as a kitchen
sink to achieve that minor side-effect. The only part of this
now-deprecated method that we actually need is
LanguageConverter::getPreferredVariant().
Test plan: Covered by ImagePageTest.
== Skin mainpage-title ==
RECENT (T331095, T298715):
A special case was added to Skin::getTemplateData that powers the
mainpage-title interface message feature. This is empty by default,
but when created via MediaWiki:mainpage-title allows interface admins
to replace the H1 with a custom and localised page heading.
A few months ago, in Ifc9f0a7174, Title::getPageViewLanguage was
applied here to support language variants. Replace with the same
fix as for ImagePage. Revert back to Message::inContentLanguage()
but refactor to inLanguage() via MediaWikiServices::getContentLanguage
so that LanguageConverter::getPreferredVariant can be applied.
== EditPage ==
This was doing similar "predicting" of the ParserOutput language to
create an empty preview placeholder for use by preview.js. Now that
ApiParse (via ParserOutput::getText) returns a usable element without
any secret "you magically know the right class, lang, and dir" contract,
this placeholder is no longer needed.
Test Plan:
* EditPage: Default preview
1. index.php?title=Main_Page&action=edit
2. Show preview
3. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr>
* EditPage: JS preview
1. Preferences > Editing > Show preview without reload
2. index.php?title=Main_Page&action=edit
3. Show preview
4. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr>
5. Type something and 'Show preview' again
6. Assert old element gone, new text is shown, and new element
attributes are the same as the above.
== McrUndoAction ==
Same as EditPage basically, but without the JS preview use case.
== DifferenceEngine ==
Test:
1. Open /w/index.php?title=Main_Page&diff=0
(this shows the latest diff, can do manually by viewing
/wiki/Main_Page, click "View history", click "Compare selected revisions")
2. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr>
3. Open /w/index.php?title=Main_Page&diff=0&action=render
4. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr>
== Special:ExpandTemplates ==
Test:
1. /wiki/Special:ExpandTemplates
2. Write "Hello".
3. "OK"
4. Assert <div class="mw-content-ltr mw-parser-output" lang=en dir=ltr>
Bug: T341244
Depends-On: Icd9c079f5896ee83d86b9c2699636dc81d25a14c
Depends-On: I4e7484b3b94f1cb6062e7cef9f20626b650bb4b1
Depends-On: I90b88f3b3a3bbeba4f48d118f92f54864997e105
Change-Id: Ib130a055e46764544af0f1a46d2bc2b3a7ee85b7
This also introduces the ephemeral field "$mTransformedText" to store
the result of transformation in ParserOutput.
This is a first step before the transformation uses HtmlHolder as input
and output.
Bug: T348253
Change-Id: I312f3748ebfb0373ee3542ba0abdeefe7db1d488
* Updated ParserOutput to set Parsoid render ids that REST API
functionality expects in ParserOutput objects.
* CacheThresholdTime functionality no longer exists since it was
implemented in ParsoidOutputAccess and ParserOutputAccess doesn't
support it. This is tracked in T346765.
* Enforce the constraint that uncacheable parses are only for fake or
mutable revisions. Updated tests that violated this constraint to
use 'getParseOutput' instead of calling the parse method directly.
* Had to make some changes in ParsoidParser around use of preferredVariant
passed to Parsoid. I also left some TODO comments for future fixes.
T267067 is also relevant here.
PARSOID-SPECIFIC OPTIONS:
* logLinterData: linter data is always logged by default -- removed
support to disable it. Linter extension handles stale lints properly
and it is better to let it handle it rather than add special cases
to the API.
* offsetType: Moved this support to ParsoidHandler as a post-processing
of byte-offset output. This eliminates the need to support this
Parsoid-specific options in the ContentHandler hierarchies.
* body_only / wrapSections: Handled this in HtmlOutputRendererHelper
as a post-processing of regular output by removing sections and
returning the body content only. This does result in some useless
section-wrapping work with Parsoid, but the simplification is probably
worth it. If in the future, we support Parsoid-specific options in
the ContentHandler hierarchy, we could re-introduce this. But, in any
case, this "fragment" flavor options is likely to get moved out of
core into the VisualEditor extension code.
DEPLOYMENT:
* This patch changes the cache key by setting the useParsoid option
in ParserOptions. The parent patch handles this to ensure we don't
encounter a cold cache on deploy.
TESTS:
* Updated tests and mocks to reflect new reality.
* Do we need any new tests?
Bug: T332931
Change-Id: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
We'll remove ::getTOCHTML() and ::setTOCHTML() shortly as well, but
we need to adjust our parser cache serialization tests first.
Bug: T348134
Bug: T305161
Change-Id: I19f1e3d0ecbbf1225a3cb41d48e668cad9867bc5
The ::setTOCHTML() and ::getTOCHTML() method have been deprecated
since 1.40; there's no reason we should be updating ::$mTOCHTML
behind their backs.
Bug: T348134
Change-Id: I9396bc0a2caeb974a06c5b47075b3e2bb9f4278a
* Document the bodyContentOnly option.
Introduced in Ica09a4284c (cfd9c516e1) and renamed in
commit I04e56ff2c3 (abee9b61f0).
* Fix default for absoluteURLs option.
Introduced in Id660e10261 (d334de960a).
* Match order between docs and defaults for readability
and easier review for correctness.
* Move potentially duplicate brief and ingroup from file doc
to class doc and clean up file doc to be less novel and more like
99% of other class files in MediaWiki. See also
<https://gerrit.wikimedia.org/r/q/message:ingroup+owner:Krinkle>
* Document what the class does and it relates to several other
prominent in MediaWiki core.
Bug: T341244
Change-Id: Id2e3124652315a74869f504056fa8a99ad794350
It is difficult to distinguish this method from OutputPage::addJsConfigVars()
in code search:
https://codesearch.wmcloud.org/deployed/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3EgetCategories%5C%28&files=&excludeFiles=&repos=
We generally try to replace $output with $parserOutput or $pOutput
as we touch code to improve the ability of codesearch to dig up
deprecated ParserOutput methods.
Bug: T305161
Depends-On: I02dd4f61c43c225b0ef6dc51c3e4f9d967a0a272
Depends-On: I61d2d77591579d825ad9d37f902e40366be55dd6
Depends-On: I91155106b7a9e10d3334f95ba4936d02851bfb11
Depends-On: Iaca745c79d9587571af03b23b21d76a6cba0ebf1
Depends-On: Id10a171c44411b1233ee4d6cf8fbd3dc57744eef
Depends-On: I47a25c011d9bd4b1a15dda4e673e32c25eb64f2b
Depends-On: I683fc768aba50b801f46467fcfa1668fa8731ea6
Change-Id: I5a2ac1c99b8b199102e12f0d32dd6ec5cdc24054
ParserOutput::addOutputHook() has been deprecated since 1.38, and without
any calls to ::addOutputHook() the associated ::getOutputHooks() and
$wgParserOutputHooks configuration do nothing.
Bug: T292321
Bug: T305161
Change-Id: Ib770c680d5e0697980e7e36a323ec56ba1d806b8
These were deprecated in 1.38 and replaced with ::{get,set}PageProperty()
and ::getPageProperties(), avoiding a heavily-aliased use of the term
"property" and making the relationship between the ParserOutput and page
properties clearer.
Bug: T305161
Change-Id: Ib1a5d0a2c1387584b81c958fa32516034e7b3d05
Insert the redirect handler as part of the post-processing done in
ParserOutput::getText(). This ensures that it does not corrupt
edit-mode Parsoid output.
Depends-On: Ia6e390d849830993a6b97004f099773cfd4fa54b
Change-Id: I20db09619999919bfeda997d79561d21e3bf8718
This was removed by a getter/setter pair with a more standard name:
::{set,get}PreventClickjacking()
in both ParserOutput and OutputPage.
In addition, OutputPage::allowClickjacking(), similiarly deprecated,
was removed.
Bug: T305161
Change-Id: I141ec9e9cb4a285edc633c0f9b61516c33f9281c
empty() only makes sense when the expression it checks is possibly
undefined, otherwise it's equivalent to a truthiness check with the
additional downside of suppressing errors when it's not wanted.
Replace it with simple truthiness checks, using strict comparison when
that seems to help with polymorphic variables.
These were caught by a bespoke phan plugin.
Change-Id: I70b629dbf9e47cf3ba48ff439b18f19e839677f4
Use strong PHP type hint on argument to enforce that the first parameter
must be an array; formerly we allowed a string as well. Non-array
arguments have ben deprecated since 1.38 but this allows us to actually
clean up the code a bit.
Bug: T305161
Change-Id: I1566609990524e48faf1fa36079e2f4a4642979d
Now that the latest Parsoid has been released to mediawiki-vendor,
the method_exists() calls aren't necessary.
Bug: T343155
Followup-To: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Change-Id: I081225a268d608f763814245f9cab1c44bf49bad
The method_exists are kept, not sure if old objects are in any cache
Follow-Up: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Bug: T343155
Change-Id: I0aaa3dce26df1619bedc39696a115145a61d4d14
In Parsoid 'body only' means the <body> tag and all of its contents.
In ParserOutput::getText() the option means "just the contents of the
<body> tag" so give it a slightly different name.
Change-Id: I04e56ff2c3e03eb56b919d9ac09b5820e4badb21
Tweaked the pluralization of the newly-added
ParserOutput::appendOutputString() method (now ::appendOutputStrings()
and ::getOutputStrings()), and name of the ParserOutputStrings class
(now ParserOutputStringSets), in an effort to continue repainting
bikesheds until the color is juuuust right.
Also extended the new method to cover ::addModules() and ::addModuleStyles()
and added support for these string sets in ::collectMetadata().
(These methods and the enumeration class were originally added in
b2cfa31eb6173e9f5e8607eadd126c33f8ce440b.)
Depends-On: I8bdffa55498d90e990af5bfc3332e3028b0a3539
Change-Id: Ibd41485d5db7779f01642e2144c50ed49d409812
This allows any bad cached parses due to a train deploy to be selectively
rolled back in the RejectParserCacheValue hook, which provides some
operational insurance against corrupted caches. The version is also
added to the debug information in the HTML footer to aid diagnosis
of any issue in real time.
Depends-On: I3d3caabd959c1ba16f4dc702c2eae38d5d4dcb14
Change-Id: Ibb37a82ec0ce764aefd8c9fab2868073a66301ec
This aims at providing an interface similar to setOutputFlag for string
sets, such as the ones used in CSP properties.
Change-Id: I6f103bd88802e66611e483403a2f8a540d54aae9