Commit graph

467 commits

Author SHA1 Message Date
Umherirrender
1951aea6b8 Fix various version mention for class_alias
Versions are changed in 8e940c4f21,
but that makes the version wrong

Follow-Up: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
Change-Id: Iae43725b8e0fffc4d44bf57f6227334b41290bd9
2024-07-05 18:39:49 +02:00
Bartosz Dziewoński
c7f52f0ddb Make MessageValue implement JsonDeserializable
MessageValue and friends are pure value objects and newable, so
it makes sense for them to be (de)serializable too. There are some
places where we want to serialize messages, such as in ParserOutput.

The structure of the resulting JSON is inspired by the way we
represent Message objects as plain values elsewhere in MediaWiki,
e.g. StatusValue::getStatusArray().

Co-Authored-By: C. Scott Ananian <cscott@cscott.net>
Depends-On: Ia32f95a6bdf342262b4ef044140527f0676402b9
Depends-On: I7bafe80cd36c2558517f474871148286350a4e76
Change-Id: Id47d58b5e26707fa0e0dbdd37418c0d54c8dd503
2024-06-12 15:47:37 -04:00
James D. Forrester
19f4e6945a Rename JsonUnserial… to JsonDeserial…
This is to make it clearer that they're related to converting serialized
content back into JSON, rather than stating that things are not
representable in JSON.

Change-Id: Ic440ac2d05b5ac238a1c0e4821d3f2d858bc3d76
2024-06-12 14:50:58 -04:00
C. Scott Ananian
47bdd8b1c8 [ParserOutput] Remove unused TOCHTML from ParserCache serialization
This reverts commit b4cf4aa6bd,
which is no longer needed for ParserCache compatibility across
trains.  REL1_42 contains b4cf4aa6bd,
so MW 1.43 will not need this.

This also adds new serialization test cases for 1.43 with this field
removed; see
https://www.mediawiki.org/wiki/Manual:Parser_cache/Serialization_compatibility

Change-Id: I716e2efe7a491002e6e6b2300016165fffe3c0d6
2024-05-17 21:46:00 +00:00
C. Scott Ananian
19ee8c4f91 Serialization test cases: fix filename after ParserOutput namespacing
The serialization test cases look for files based on the name of the
class they are testing.  After the namespacing of ParserOutput, they
were looking for files named like:
  1.42-MediaWiki\Parser\ParserOutput-binaryPageProperties.json

The embedded backslashes in these filenames would raise havoc on Windows
machines.  What's more, none of the existing ParserOutput tests will
actually be checked anymore because the filenames don't match up
with what is expected after namespacing.

Fix this by stripping the namespace from the classname when forming
the test file names.

When this is done, the tests cases for GhostFieldAccess begin running
again, revealing that they were broken when GhostFieldTestClass was
re-namespaced.  Add a class alias for the GhostFieldTestClass to fix
this.

Finally, PHP <= 8.1 does not deserialize private properties correctly
after a class is renamed and aliased, because the internal name of the
private property contains the "old" class name in the serialization.
Add a new ::restoreAliasedGhostField() method to the
GhostFieldAccessTrait to workaround this issue and restore proper
deserialization of ParserOutput.

Bug: T365060
Followup-To: I9c64a631b0b4e8e4fef8a72ee0f749d35f918052
Followup-To: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
Change-Id: I7bafe80cd36c2558517f474871148286350a4e76
2024-05-17 17:07:47 -04:00
Bartosz Dziewoński
73de566949 Use 'scalar' type alias to shorten PHPDoc annotations
'string|int|float|bool' (in any order) can be replaced by 'scalar'.
'string|int|float|bool|null' (likewise) can be replaced by '?scalar'.

This is convenient for functions that can accept any primitive value,
which comes up sometimes when serializing things as SQL, JSON etc.

Change-Id: I4a711ee59611d76d6745f3640e4aa6bebec02918
2024-05-11 23:21:22 +00:00
jenkins-bot
b671e574eb Merge "Add ParserOptions::setCollapsibleSections()" 2024-04-29 21:17:15 +00:00
C. Scott Ananian
8d031bcf87 Add ParserOptions::setCollapsibleSections()
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing.  This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.

Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.

Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
2024-04-29 12:11:09 -04:00
C. Scott Ananian
b4cf4aa6bd ParserOutput: Temporarily write (unused) TOCHTML to ParserCache
Even though this JSON property is unused on master, the previous
train release read it from the JSON (and threw the value away).
In order to provide error-free roll-forward and roll-back of the
train, temporarily write an empty string as the value of TOCHTML
so that the read from `$jsonData['TOCHTML']` won't cause a PHP
notice in the logs if we roll back.

This patch is only needed for one train release, and can then
be removed.

Bug: T363107
Change-Id: I77add3bd7f00941cb81481f738bc59d6008c2406
2024-04-22 11:26:10 -04:00
Umherirrender
8d97313f81 Fix some line indent
Change-Id: I8f82724197d20f9289d80e138d80310f1eab29f2
2024-04-20 00:25:15 +02:00
C. Scott Ananian
195ac55bfe [ParserOutput] Remove deprecated ::getTOCHTML() and ::setTOCHTML() methods
These were deprecated with warnings in 1.40.

Change-Id: I8027bc26c71ae94d3d5c7e5112545cd1b35749aa
2024-04-16 13:00:58 -04:00
C. Scott Ananian
db2f1ad606 [ParserOutput] Remove deprecated ::getCategories() method
This was deprecated with warnings in 1.40.

Change-Id: I7b8a86f6efbdd86c1f493db6741c37bfb325e9bb
2024-04-16 12:57:17 -04:00
jenkins-bot
1caf41bb73 Merge "ParserOutput: Rename ::setIndexedPageProperty() to ::setNumericPageProperty()" 2024-04-16 10:57:58 +00:00
C. Scott Ananian
2429785470 ParserOutput: Rename ::setIndexedPageProperty() to ::setNumericPageProperty()
Before this method name gets baked forever into the 1.42 release, rename
the ParserOutput::setIndexedPageProperty() and ::setUnindexedPageProperty()
methods to ::setNumericPageProperty() and ::setUnsortedPageProperty() to
try to address some confusion about whether the *presence* of the page
property is still indexed (it is!), in contrast to whether there's an
additional "sort key" associated with the *value* assigned to the page
property.

This naming is compatible with the feature request in T357783 to have
the sort key and property value specified independently.  The new
method signature in that case would be:

  ...setSortedPageProperty( string $name, string $value, int|float $sortKey )

Although PHP 8.0 will throw a TypeError if a non-numeric type is coerced
to numeric using `0 + ...`, use an explicit is_numeric check to obtain
the same behavior in PHP 7.x.

Change-Id: Ia94c192c429d0482c58467bed787fd2e0aca052f
2024-04-15 15:13:56 -04:00
C. Scott Ananian
f1a45cf2b9 Expand documentation of ParserOutput class
Not *all* ParserOutputs represent parsed articles, and describe the
merging operations on ParserOutputs in more depth.  The interaction
with Content and ContentHandlers is also described (thanks, Daniel!).

Followup-To: Id2e3124652315a74869f504056fa8a99ad794350
Change-Id: I5c1016532eba1b71dc4d3d5d5d0c46775713efb5
2024-04-12 12:53:23 -04:00
Lucas Werkmeister
1adefb10e3 ParserOutput: clarify that “indexed” refers to value
Bug: T305158
Change-Id: Ic6ea22b5188e575b288d57c8f692f492cb69452d
2024-04-12 12:09:02 +02:00
C. Scott Ananian
b4721e24aa ParserOutput::setUnindexedPageProperty(): use empty string as default value
If a placeholder value is needed, it is recommended to use the empty string
to avoid wasting database space unnecessarily.  Operationalize this
recommendation by providing a default value for the method argument.

Bug: T305158
Bug: T350224
Change-Id: I9ea8d93298d771c2d38fdfb451a2817220ca679a
2024-04-11 11:58:13 -04:00
jenkins-bot
2472cd9247 Merge "Substitute category default sort key when filling links table, not at parse time" 2024-04-11 14:59:33 +00:00
jenkins-bot
e4981c9702 Merge "Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties" 2024-04-10 17:44:00 +00:00
C. Scott Ananian
de57c4e7c2 Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties
Deprecate non-string values to ::setPageProperty(), which introduce easy
traps for programmers to fall into.  Instead if page properties are intended
to be indexed, use the new ::setIndexedPageProperty() instead.  Also add
::setUnindexedPageProperty() for symmetry, with a tighter string type on
the value.

Bug: T305158
Bug: T350224
Change-Id: I8a39a7c90341dfee932aa819c9a0a637a8782f69
2024-04-05 19:12:29 -04:00
C. Scott Ananian
01590b89bf ParserOutput: Emit deprecation warning if interwiki passed to addTemplate
Bug: T361330
Depends-On: Ia8fd49a6f9af18e32d47d1dcd052c5f33123f44b
Change-Id: Id4104dff4acaa60d94155d7915b9c1f2af4baaf0
2024-04-04 10:38:45 -04:00
C. Scott Ananian
c2df535b9c Substitute category default sort key when filling links table, not at parse time
This ensures uniform treatment of all places that call `addCategory`
without duplicating the `defaultsort` code; it also ensures that the
effect of the {{DEFAULTSORT}} parser function is independent of page
position.

Bug: T40435
Bug: T353530
Change-Id: I4480a6d59e766fa4eddc9ec9117c58b66771bb47
2024-03-29 18:30:02 -04:00
James D. Forrester
8e940c4f21 Standardise all our class alias deprecation comments for ease of grepping
Change-Id: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
2024-03-19 20:11:29 +00:00
jenkins-bot
9232985bd8 Merge "ParserOutput::setPageProperty(): Emit deprecation warning for non-scalar values" 2024-03-11 17:08:20 +00:00
Umherirrender
f3524224f0 build: Fix line indents
Fixed SkinModuleTest::provideGetFeatureFilePathsOrder as nesting of
arrays for parameters is wrong

Change-Id: I9875008adf62d284c48662ebfbd245d72e5be064
2024-03-11 00:14:16 +01:00
jenkins-bot
a62f5c7911 Merge "[ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()" 2024-02-21 17:11:00 +00:00
C. Scott Ananian
72c4945a72 [ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()
ParserOutput::getText() is not a simple getter, but does
transformations on the "text" of the ParserOutput; the simple getter
is named ::getRawText().

To maintain consistency, rename ParserOutput::setText() to
::setRawText() and the property name ParserOutput::$mText to
::$mRawText so future readers are not confused.

The JSON property name as it appears in the serialized ParserCache
is left as 'Text' so that we don't have any forward- or backward-
rollback issues.

Change-Id: I3ef34814ab9473cc70d0a6806e8c5a4a02b73491
2024-02-20 17:13:28 +00:00
C. Scott Ananian
6846f8aa10 ParserOutput::setPageProperty(): Emit deprecation warning for non-scalar values
Non-scalar values passed to ParserOutput::setPageProperty() have never
"worked"; they've been stringified (and null has been stored as an empty
string).  Emit a warning so we can fail harder in future releases.

Bug: T305158
Depends-On: Ib36787d04c0ca713587dc8b814ca1c5a827f6f72
Change-Id: I38234084fdc7427ca577bb33a7fce1541581188d
2024-02-20 11:29:49 -05:00
C. Scott Ananian
b5d44bf339 ParserOutput::setPageProperty(): Update documentation
String and non-string values behave very differently when passed to
::setPageProperty(), resulting in some unexpected gotchas for the
unaware caller.

Bug: T350224
Bug: T305158
Change-Id: I23b35b250f27a117d1353ea8a26d2b3f77c568e7
2024-02-20 11:26:38 -05:00
Subramanya Sastry
e55cc517da Move Parser to Mediawiki\Parser namespace
Bug: T166010
Co-Authored-By: Daimona Eaytoy <daimona.wiki@gmail.com>
Co-Authored-By: James Forrester <jforrester@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I79b4e732c45095eedbaa80afa5eb7479b387ed8a
2024-02-16 09:18:38 -05:00
jenkins-bot
2ca5bb9a96 Merge "ParserOutput: update task id in documentation" 2024-02-15 23:36:35 +00:00
C. Scott Ananian
13873a35b9 ParserOutput: update task id in documentation
We closed T296023 and opened a new task for the work remaining, so
update the comments in the code to match.

The task relating to `addLanguageLink` is actually T296019.

Change-Id: I28b942a57ed41751d44d8565a290d925f6d7f180
2024-02-15 15:23:57 -05:00
C. Scott Ananian
28a3371382 [OutputTransform] Remove broken and unused 'bodyContentOnly' option
This was formerly used by the REST api, but instead that code just
uses ParserOutput::getRawText() when it needs the full HTML document.
This option has been broken, with various passes like RenderDebugInfo
and AddWrapperDiv adding content in inappropriate places if
bodyContentOnly was false.

Change-Id: Ib45f95ded59c81c16d61803f977d1edbfe82b262
2024-02-15 13:05:53 -05:00
C. Scott Ananian
770d2bf040 [ParserOutput] Make 'enableSectionEditLinks' a ParserOption
This will allow the Translate extension to set this parser option
in the ArticleParserOptions hook, instead of mutating $options passed
to ParserOutput::getText() in the ParserOutputPostCacheTransform hook.

It ought to also help to handle the many places which call:

   ... = $parserOutput->getText( [
       'enableSectionEditLinks' => false,
   ] );

by allowing them to set the appropriate ParserOption instead
of passing arguments to ::getText().

Bug: T350626
Change-Id: I719c115194059060f7f888608417a194ac80cc92
2024-02-09 23:42:03 +00:00
C. Scott Ananian
242c6d2cf9 Introduce ParserOutput:setFromParserOptions() and use for preview flag
Bug: T341010
Co-Authored-by: cananian <cananian@wikimedia.org>
Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org>
Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6
2024-02-07 21:22:06 -05:00
C. Scott Ananian
52320c0902 Move ParsoidRenderID to MediaWiki\Edit
This class belongs with the rest of the Parsoid output stash code.

This class has been marked @unstable since 1.39 and thus the move
does not need release notes.

Change-Id: I16061c0c28b1549fbe90ea082cc717fee4a09a6e
2024-02-07 21:22:06 -05:00
C. Scott Ananian
1858e1cdd7 Rename ParserOutput::{get,set}Timestamp() to ::{get,set}RevisionTimestamp()
This avoids confusion with the "render timestamp" held by the cache,
and is consistent with ::get*RevisionId() etc.

The old ::getTimestamp() and ::setTimestamp() methods have been
deprecated.

Change-Id: Idb5e687709c98086c5d3075d31885c58a0723197
2024-02-07 21:22:06 -05:00
C. Scott Ananian
0de13d7662 Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse).  This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface.  Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.

In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value.  Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538).  The primary location for
setting cache properties is the ContentRenderer.

Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths.  An extra parameter was added to
ContentRenderer::getParserOutput() to support this.

Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.

In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache.  Eventually (T350538) it
should only be necessary in the ContentRenderer.

Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2024-02-07 21:22:06 -05:00
Daimona Eaytoy
1d6776fdbc Replace deprecated MWException
Also remove some unchecked exception from doc comments.

Bug: T328220
Bug: T240672
Change-Id: I88b1e948ce5da77d9c4862a2b98793d6ba00cf8b
2024-01-19 21:58:42 +00:00
Brian Wolff
f1af33be38 Add taint annotations for ParserOutput
Change-Id: Id73b8f22f8877442f114bf7b41d0f9ea47fb4283
2024-01-12 14:17:21 +00:00
C. Scott Ananian
f2d910844f ParserOutput: Convert category name back to a LinkTarget when merging CMC
When we are merging a ParserOutput into a ContentMetadataCollector,
convert categories to LinkTarget, which is the preferred parameter
type of CMC::addCategory().

This also reverts the temporary fix in
I0715f4fbc870e401e5759dd7c7a3c19077c40a6a.

Note that the category names *should* be in dbkey form for proper
deduplication, but both TitleValue:tryNew() and
CategoryLinksTable::setParserOutput() will renormalize if needed
(see I2b08edd90666e0fa4eafe91444a58806909b02d6 / T328477).

Depends-On: Iea894aa2cee90f4ca5c7688493b0654e4605ce23
Change-Id: I5a903396edb4da0900ecef37cb3bf4bd03b5ba68
2023-12-18 21:01:51 +00:00
C. Scott Ananian
df1f18cc9d ParserOutput: Temporarily move "merge categories" in ::collectMetadata
Due to a botched signature change on the Parsoid side, in -a8 Parsoid
only accepts `string|int` for ContentMetadataCollector::addCategory()
and in -a9 Parsoid only accept `LinkTarget`.  The ParserOutput in
core, of course, accepts both.  So move the code which merges
categories into the section of ContentMetadataCollector::collectMetadata()
where we know that the CMC we're merging with is really a ParserOutput.

Change-Id: I0715f4fbc870e401e5759dd7c7a3c19077c40a6a
2023-12-18 14:31:19 -05:00
jenkins-bot
0d45f127f5 Merge "ParserOutput: keep modules and module styles unique" 2023-12-16 04:25:53 +00:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
Isabelle Hurbain-Palatin
3935cd1b05 ParserOutput::getText(): do not clone ParserOutput when invoking pipeline
OutputPage::getParserOutputText/addParserOutputContent expects
ParserOutput to be mutated (e.g. by
PostCacheTransformHookRunner). Hence, cloning it before running the
pipeline is breaking DiscussionTools, probably among others.

Suppress the clone for the case where the output pipeline is invoked
from ParserOutput::getText() (which is a deprecated method anyway) and
additionally suppress the side-effects to ParserOutput::$mText on that
code path.

Bug: T353257
Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org>
Co-Authored-By: Isabelle Hurbain-Palatin <ihurbainpalatin@wikimedia.org>
Change-Id: I85c690fd37b781cb27c21970467639e852113b2a
2023-12-12 11:39:32 -05:00
jenkins-bot
c57120300a Merge "ParserOutput: Allow passing LinkTarget to title-related methods" 2023-12-11 18:02:25 +00:00
Isabelle Hurbain-Palatin
a3f51c732d Refactor DefaultOutputTransform into a pipeline of transforms
Bug: T348253
Change-Id: I53551ec6d6471569709c71c1155729e550f64de8
2023-12-08 18:06:19 -05:00
C. Scott Ananian
4b83285954 ParserOutput: Allow passing LinkTarget to title-related methods
Broadened the argument type to allow passing LinkTarget to:
* ParserOutput::addCategory()
* ParserOutput::addLanguageLink()
* ParserOutput::addLink()
* ParserOutput::addImage()
* ParserOutput::addTemplate()

This allows for a tighter interface with Parsoid's
ContentMetadataCollector class and avoids errors caused by passing the
wrong form of string title ("text" with spaces versus "dbkey" with
underscores).

There are a few performance problems remaining after this patch, which
only apply to use by Parsoid (not the legacy parser):

1. ::addLink() does inefficient db requests to fetch the page id for
each link if the optional $id parameter is not passed.  These lookups
should be deferred and a LinkBatch used.  (The legacy parser always
passes $id.)

2. ::addTemplate() similarly requires $page_id (and $rev_id) to be
passed, so is not currently usable by Parsoid.

3. ::addLanguageLink() uses Title::getFullText() which is not present
in LinkTarget and is currently implemented as a full Title lookup.
This is not an issue for the legacy parser, because it already has a
Title object so the lookup is a no-op, but could be improved for
Parsoid's use.

Bug: T296023
Change-Id: If21ec8563c8a619bdde7c0cb6534bb9009480a21
2023-12-08 17:50:29 -05:00
jenkins-bot
b7fc1b2f43 Merge "Only cache expensive renderings" 2023-11-30 21:24:34 +00:00
daniel
e3fb964439 Only cache expensive renderings
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.

The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.

Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.

Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
2023-11-30 20:56:12 +00:00