Versions are changed in 8e940c4f21,
but that makes the version wrong
Follow-Up: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
Change-Id: Iae43725b8e0fffc4d44bf57f6227334b41290bd9
MessageValue and friends are pure value objects and newable, so
it makes sense for them to be (de)serializable too. There are some
places where we want to serialize messages, such as in ParserOutput.
The structure of the resulting JSON is inspired by the way we
represent Message objects as plain values elsewhere in MediaWiki,
e.g. StatusValue::getStatusArray().
Co-Authored-By: C. Scott Ananian <cscott@cscott.net>
Depends-On: Ia32f95a6bdf342262b4ef044140527f0676402b9
Depends-On: I7bafe80cd36c2558517f474871148286350a4e76
Change-Id: Id47d58b5e26707fa0e0dbdd37418c0d54c8dd503
This is to make it clearer that they're related to converting serialized
content back into JSON, rather than stating that things are not
representable in JSON.
Change-Id: Ic440ac2d05b5ac238a1c0e4821d3f2d858bc3d76
The serialization test cases look for files based on the name of the
class they are testing. After the namespacing of ParserOutput, they
were looking for files named like:
1.42-MediaWiki\Parser\ParserOutput-binaryPageProperties.json
The embedded backslashes in these filenames would raise havoc on Windows
machines. What's more, none of the existing ParserOutput tests will
actually be checked anymore because the filenames don't match up
with what is expected after namespacing.
Fix this by stripping the namespace from the classname when forming
the test file names.
When this is done, the tests cases for GhostFieldAccess begin running
again, revealing that they were broken when GhostFieldTestClass was
re-namespaced. Add a class alias for the GhostFieldTestClass to fix
this.
Finally, PHP <= 8.1 does not deserialize private properties correctly
after a class is renamed and aliased, because the internal name of the
private property contains the "old" class name in the serialization.
Add a new ::restoreAliasedGhostField() method to the
GhostFieldAccessTrait to workaround this issue and restore proper
deserialization of ParserOutput.
Bug: T365060
Followup-To: I9c64a631b0b4e8e4fef8a72ee0f749d35f918052
Followup-To: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
Change-Id: I7bafe80cd36c2558517f474871148286350a4e76
'string|int|float|bool' (in any order) can be replaced by 'scalar'.
'string|int|float|bool|null' (likewise) can be replaced by '?scalar'.
This is convenient for functions that can accept any primitive value,
which comes up sometimes when serializing things as SQL, JSON etc.
Change-Id: I4a711ee59611d76d6745f3640e4aa6bebec02918
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing. This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.
Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.
Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
Even though this JSON property is unused on master, the previous
train release read it from the JSON (and threw the value away).
In order to provide error-free roll-forward and roll-back of the
train, temporarily write an empty string as the value of TOCHTML
so that the read from `$jsonData['TOCHTML']` won't cause a PHP
notice in the logs if we roll back.
This patch is only needed for one train release, and can then
be removed.
Bug: T363107
Change-Id: I77add3bd7f00941cb81481f738bc59d6008c2406
Before this method name gets baked forever into the 1.42 release, rename
the ParserOutput::setIndexedPageProperty() and ::setUnindexedPageProperty()
methods to ::setNumericPageProperty() and ::setUnsortedPageProperty() to
try to address some confusion about whether the *presence* of the page
property is still indexed (it is!), in contrast to whether there's an
additional "sort key" associated with the *value* assigned to the page
property.
This naming is compatible with the feature request in T357783 to have
the sort key and property value specified independently. The new
method signature in that case would be:
...setSortedPageProperty( string $name, string $value, int|float $sortKey )
Although PHP 8.0 will throw a TypeError if a non-numeric type is coerced
to numeric using `0 + ...`, use an explicit is_numeric check to obtain
the same behavior in PHP 7.x.
Change-Id: Ia94c192c429d0482c58467bed787fd2e0aca052f
Not *all* ParserOutputs represent parsed articles, and describe the
merging operations on ParserOutputs in more depth. The interaction
with Content and ContentHandlers is also described (thanks, Daniel!).
Followup-To: Id2e3124652315a74869f504056fa8a99ad794350
Change-Id: I5c1016532eba1b71dc4d3d5d5d0c46775713efb5
If a placeholder value is needed, it is recommended to use the empty string
to avoid wasting database space unnecessarily. Operationalize this
recommendation by providing a default value for the method argument.
Bug: T305158
Bug: T350224
Change-Id: I9ea8d93298d771c2d38fdfb451a2817220ca679a
Deprecate non-string values to ::setPageProperty(), which introduce easy
traps for programmers to fall into. Instead if page properties are intended
to be indexed, use the new ::setIndexedPageProperty() instead. Also add
::setUnindexedPageProperty() for symmetry, with a tighter string type on
the value.
Bug: T305158
Bug: T350224
Change-Id: I8a39a7c90341dfee932aa819c9a0a637a8782f69
This ensures uniform treatment of all places that call `addCategory`
without duplicating the `defaultsort` code; it also ensures that the
effect of the {{DEFAULTSORT}} parser function is independent of page
position.
Bug: T40435
Bug: T353530
Change-Id: I4480a6d59e766fa4eddc9ec9117c58b66771bb47
Fixed SkinModuleTest::provideGetFeatureFilePathsOrder as nesting of
arrays for parameters is wrong
Change-Id: I9875008adf62d284c48662ebfbd245d72e5be064
ParserOutput::getText() is not a simple getter, but does
transformations on the "text" of the ParserOutput; the simple getter
is named ::getRawText().
To maintain consistency, rename ParserOutput::setText() to
::setRawText() and the property name ParserOutput::$mText to
::$mRawText so future readers are not confused.
The JSON property name as it appears in the serialized ParserCache
is left as 'Text' so that we don't have any forward- or backward-
rollback issues.
Change-Id: I3ef34814ab9473cc70d0a6806e8c5a4a02b73491
Non-scalar values passed to ParserOutput::setPageProperty() have never
"worked"; they've been stringified (and null has been stored as an empty
string). Emit a warning so we can fail harder in future releases.
Bug: T305158
Depends-On: Ib36787d04c0ca713587dc8b814ca1c5a827f6f72
Change-Id: I38234084fdc7427ca577bb33a7fce1541581188d
String and non-string values behave very differently when passed to
::setPageProperty(), resulting in some unexpected gotchas for the
unaware caller.
Bug: T350224
Bug: T305158
Change-Id: I23b35b250f27a117d1353ea8a26d2b3f77c568e7
We closed T296023 and opened a new task for the work remaining, so
update the comments in the code to match.
The task relating to `addLanguageLink` is actually T296019.
Change-Id: I28b942a57ed41751d44d8565a290d925f6d7f180
This was formerly used by the REST api, but instead that code just
uses ParserOutput::getRawText() when it needs the full HTML document.
This option has been broken, with various passes like RenderDebugInfo
and AddWrapperDiv adding content in inappropriate places if
bodyContentOnly was false.
Change-Id: Ib45f95ded59c81c16d61803f977d1edbfe82b262
This will allow the Translate extension to set this parser option
in the ArticleParserOptions hook, instead of mutating $options passed
to ParserOutput::getText() in the ParserOutputPostCacheTransform hook.
It ought to also help to handle the many places which call:
... = $parserOutput->getText( [
'enableSectionEditLinks' => false,
] );
by allowing them to set the appropriate ParserOption instead
of passing arguments to ::getText().
Bug: T350626
Change-Id: I719c115194059060f7f888608417a194ac80cc92
This class belongs with the rest of the Parsoid output stash code.
This class has been marked @unstable since 1.39 and thus the move
does not need release notes.
Change-Id: I16061c0c28b1549fbe90ea082cc717fee4a09a6e
This avoids confusion with the "render timestamp" held by the cache,
and is consistent with ::get*RevisionId() etc.
The old ::getTimestamp() and ::setTimestamp() methods have been
deprecated.
Change-Id: Idb5e687709c98086c5d3075d31885c58a0723197
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse). This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface. Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.
In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value. Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538). The primary location for
setting cache properties is the ContentRenderer.
Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths. An extra parameter was added to
ContentRenderer::getParserOutput() to support this.
Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.
In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it
should only be necessary in the ContentRenderer.
Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
When we are merging a ParserOutput into a ContentMetadataCollector,
convert categories to LinkTarget, which is the preferred parameter
type of CMC::addCategory().
This also reverts the temporary fix in
I0715f4fbc870e401e5759dd7c7a3c19077c40a6a.
Note that the category names *should* be in dbkey form for proper
deduplication, but both TitleValue:tryNew() and
CategoryLinksTable::setParserOutput() will renormalize if needed
(see I2b08edd90666e0fa4eafe91444a58806909b02d6 / T328477).
Depends-On: Iea894aa2cee90f4ca5c7688493b0654e4605ce23
Change-Id: I5a903396edb4da0900ecef37cb3bf4bd03b5ba68
Due to a botched signature change on the Parsoid side, in -a8 Parsoid
only accepts `string|int` for ContentMetadataCollector::addCategory()
and in -a9 Parsoid only accept `LinkTarget`. The ParserOutput in
core, of course, accepts both. So move the code which merges
categories into the section of ContentMetadataCollector::collectMetadata()
where we know that the CMC we're merging with is really a ParserOutput.
Change-Id: I0715f4fbc870e401e5759dd7c7a3c19077c40a6a
OutputPage::getParserOutputText/addParserOutputContent expects
ParserOutput to be mutated (e.g. by
PostCacheTransformHookRunner). Hence, cloning it before running the
pipeline is breaking DiscussionTools, probably among others.
Suppress the clone for the case where the output pipeline is invoked
from ParserOutput::getText() (which is a deprecated method anyway) and
additionally suppress the side-effects to ParserOutput::$mText on that
code path.
Bug: T353257
Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org>
Co-Authored-By: Isabelle Hurbain-Palatin <ihurbainpalatin@wikimedia.org>
Change-Id: I85c690fd37b781cb27c21970467639e852113b2a
Broadened the argument type to allow passing LinkTarget to:
* ParserOutput::addCategory()
* ParserOutput::addLanguageLink()
* ParserOutput::addLink()
* ParserOutput::addImage()
* ParserOutput::addTemplate()
This allows for a tighter interface with Parsoid's
ContentMetadataCollector class and avoids errors caused by passing the
wrong form of string title ("text" with spaces versus "dbkey" with
underscores).
There are a few performance problems remaining after this patch, which
only apply to use by Parsoid (not the legacy parser):
1. ::addLink() does inefficient db requests to fetch the page id for
each link if the optional $id parameter is not passed. These lookups
should be deferred and a LinkBatch used. (The legacy parser always
passes $id.)
2. ::addTemplate() similarly requires $page_id (and $rev_id) to be
passed, so is not currently usable by Parsoid.
3. ::addLanguageLink() uses Title::getFullText() which is not present
in LinkTarget and is currently implemented as a full Title lookup.
This is not an issue for the legacy parser, because it already has a
Title object so the lookup is a no-op, but could be improved for
Parsoid's use.
Bug: T296023
Change-Id: If21ec8563c8a619bdde7c0cb6534bb9009480a21
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.
The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.
Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.
Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac