Commit graph

479 commits

Author SHA1 Message Date
jenkins-bot
20514decd8 Merge "ParserOutput::collectMetadata(): fix handling of links" 2024-09-16 21:45:58 +00:00
C. Scott Ananian
9b60718f92 ParserOutput::collectMetadata(): fix handling of links
For language links, when there are conflicts between namespaces and
interwiki prefixes, it is important to use TitleValue for language links
rather than to try to reparse the Title.  Language links also preserve
fragments, unlike other link types in ParserOutput; added tests to
document this.

Added handling for interwiki links and template links.

Bug: T363538
Change-Id: I6e8ff8ed7f8819000cc3f80e49c0739b568217a4
2024-09-13 14:42:27 -04:00
Umherirrender
8e8a56c880 parser: Add missing documentation to class properties
Add doc-typehints to class properties found by the PropertyDocumentation
sniff to improve the documentation.

Once the sniff is enabled it avoids that new code is missing type
declarations. This is focused on documentation and does not change code.

Change-Id: I3afaba387663320187c49ff1cdb2ff3ae01681ad
2024-09-07 22:46:08 +02:00
jenkins-bot
15bfd2e9f8 Merge "Remove ParserOutput::getText() calls from core (runOutputPipeline)" 2024-09-06 19:39:11 +00:00
jenkins-bot
e9409ac31d Merge "Introduce runOutputPipeline and clone by default" 2024-09-06 19:37:11 +00:00
Isabelle Hurbain-Palatin
ce2bccc0a8 Remove ParserOutput::getText() calls from core (runOutputPipeline)
This is the fourth patch of a series of patches to remove
ParserOutput::getText() calls from core. This series of patches should
be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e.

Here we replace calls to getText where a ContentRenderer is available
close by by temporary ParserOutput::runOutputPipeline that will
eventually be replaced by a call to (probably) ContentRenderer
 (T371004). Doing this work in stages allows us to separate the work of
"bring ParserOptions to the call site" from the work of "bringing
ContentRenderer(ish) to the call site", since both need to be done for
to make ParserOutput a value object (T293512).

Change-Id: Ib4f9357293dc230df6e0ca2379a1e2a4cc1b91b7
Bug: T293512
2024-09-06 19:07:49 +00:00
Isabelle Hurbain-Palatin
cd3240f044 Introduce runOutputPipeline and clone by default
This is the third patch of a series of patches to remove
ParserOutput::getText() calls from core. This series of patches should
be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e.

Here we temporarily introduce runOutputPipeline in ParserOutput. It
creates and runs the pipeline with default options, and is called by
getText. (This is not entirely truthful because we go through a
runPipelineInternal transient method for null-argument-passing reasons,
but let's not over-complicate this commit message.)

getText is responsible for maintaining the current behaviour,
that is "disallow the cloning of the ParserOutput and putting text back
to as it was" to mitigate T353257. As we get rid of getText, this
behaviour should be moved, if necessary, to the caller site.

The new method is currently added to ParserOutput so that further
refactorings are, for the moment, simpler. It will eventually be moved
to another place within the Content framework.

We also rename 'suppressClone' to 'allowClone' (which is actually its
negation) to avoid multiple levels of negations that make the code
confusing. Note that the default value of 'allowClone' is true, and is
currently overriden in two places: getText and
OutputPage::getParserOutputText (which calls the pipeline directly and
not through ParserOutput).

Bug: T293512
Bug: T371022
Change-Id: Ibf04af1079aaa1934dc78685b00e636ff4d38a9a
2024-09-06 19:06:38 +00:00
C. Scott Ananian
13ca36821a ParserOutput::collectMetadata: Properly handle non-scalar page properties
ParserOutput::setPageProperty() was deprecated for use with non-string
values in 1.42 but there are still callers out there; handle these cases
without the implicit cast to string which ::setUnsortablePageProperty()
would do via its argument type hint.

Bug: T374046
Bug: T373920
Followup-To: I68c28b0d5d23decc058a46c55e767a83c80452f8
Followup-To: I9a235ae828c2cadc9d2c619760f759e51ba73874
Change-Id: I52ecce78fcee8b18cf9d7ea848946f29e2d8b51b
2024-09-04 19:19:57 +00:00
Subramanya Sastry
44b6ba80d6 ParserOutput: Turn off noisy log - we have the info we need for now
Bug: T374046
Change-Id: I9a235ae828c2cadc9d2c619760f759e51ba73874
2024-09-04 13:42:26 -05:00
James D. Forrester
482222931f ParserOutput::collectMetadata: Log if given value is non-numeric and also non-string, for easier debugging, and don't fatal
Bug: T373920
Change-Id: I68c28b0d5d23decc058a46c55e767a83c80452f8
2024-09-03 15:21:13 -04:00
C. Scott Ananian
550f916fa9 Add missing cases to ParserOutput::collectMetadata()
The ParserOutput::collectMetadata() method is used to transfer parsing
metadata from the legacy parser (ParserOutput) to Parsoid
(ContentMetadataCollecctor).  Several new methods were added to Parsoid's
ContentMetadataCollector class but weren't being transferred from
the ParserOutput.

Change-Id: If2b933005c1ebd0f8b33884242a1c97b94f97a2b
2024-08-29 15:51:54 -04:00
C. Scott Ananian
8212f8c67d Hard-deprecate ParserOutput::addJsConfigVars(), deprecated in 1.38
It is difficult to distinguish this method from OutputPage::addJsConfigVars()
in code search:

  https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3EaddJsConfigVars%5C%28

We generally try to replace $output with $parserOutput or $pOutput
as we touch code, to improve the ability of codesearch to dig up
deprecated ParserOutput methods.

A future project will unify those parts of OutputPage which duplicate
ParserOutput: T301020.

Bug: T300307
Bug: T305161
Depends-On: I39ae7d7a40190eedaa024097a6442cd02b6a02e7
Depends-On: I2c660972b289bbad730ceee1325d70d5ba75d27e
Change-Id: I53c28ee7c80b889c893c1d00f37678e716e55783
2024-08-09 14:04:38 +02:00
Umherirrender
1951aea6b8 Fix various version mention for class_alias
Versions are changed in 8e940c4f21,
but that makes the version wrong

Follow-Up: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
Change-Id: Iae43725b8e0fffc4d44bf57f6227334b41290bd9
2024-07-05 18:39:49 +02:00
Bartosz Dziewoński
c7f52f0ddb Make MessageValue implement JsonDeserializable
MessageValue and friends are pure value objects and newable, so
it makes sense for them to be (de)serializable too. There are some
places where we want to serialize messages, such as in ParserOutput.

The structure of the resulting JSON is inspired by the way we
represent Message objects as plain values elsewhere in MediaWiki,
e.g. StatusValue::getStatusArray().

Co-Authored-By: C. Scott Ananian <cscott@cscott.net>
Depends-On: Ia32f95a6bdf342262b4ef044140527f0676402b9
Depends-On: I7bafe80cd36c2558517f474871148286350a4e76
Change-Id: Id47d58b5e26707fa0e0dbdd37418c0d54c8dd503
2024-06-12 15:47:37 -04:00
James D. Forrester
19f4e6945a Rename JsonUnserial… to JsonDeserial…
This is to make it clearer that they're related to converting serialized
content back into JSON, rather than stating that things are not
representable in JSON.

Change-Id: Ic440ac2d05b5ac238a1c0e4821d3f2d858bc3d76
2024-06-12 14:50:58 -04:00
C. Scott Ananian
47bdd8b1c8 [ParserOutput] Remove unused TOCHTML from ParserCache serialization
This reverts commit b4cf4aa6bd,
which is no longer needed for ParserCache compatibility across
trains.  REL1_42 contains b4cf4aa6bd,
so MW 1.43 will not need this.

This also adds new serialization test cases for 1.43 with this field
removed; see
https://www.mediawiki.org/wiki/Manual:Parser_cache/Serialization_compatibility

Change-Id: I716e2efe7a491002e6e6b2300016165fffe3c0d6
2024-05-17 21:46:00 +00:00
C. Scott Ananian
19ee8c4f91 Serialization test cases: fix filename after ParserOutput namespacing
The serialization test cases look for files based on the name of the
class they are testing.  After the namespacing of ParserOutput, they
were looking for files named like:
  1.42-MediaWiki\Parser\ParserOutput-binaryPageProperties.json

The embedded backslashes in these filenames would raise havoc on Windows
machines.  What's more, none of the existing ParserOutput tests will
actually be checked anymore because the filenames don't match up
with what is expected after namespacing.

Fix this by stripping the namespace from the classname when forming
the test file names.

When this is done, the tests cases for GhostFieldAccess begin running
again, revealing that they were broken when GhostFieldTestClass was
re-namespaced.  Add a class alias for the GhostFieldTestClass to fix
this.

Finally, PHP <= 8.1 does not deserialize private properties correctly
after a class is renamed and aliased, because the internal name of the
private property contains the "old" class name in the serialization.
Add a new ::restoreAliasedGhostField() method to the
GhostFieldAccessTrait to workaround this issue and restore proper
deserialization of ParserOutput.

Bug: T365060
Followup-To: I9c64a631b0b4e8e4fef8a72ee0f749d35f918052
Followup-To: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
Change-Id: I7bafe80cd36c2558517f474871148286350a4e76
2024-05-17 17:07:47 -04:00
Bartosz Dziewoński
73de566949 Use 'scalar' type alias to shorten PHPDoc annotations
'string|int|float|bool' (in any order) can be replaced by 'scalar'.
'string|int|float|bool|null' (likewise) can be replaced by '?scalar'.

This is convenient for functions that can accept any primitive value,
which comes up sometimes when serializing things as SQL, JSON etc.

Change-Id: I4a711ee59611d76d6745f3640e4aa6bebec02918
2024-05-11 23:21:22 +00:00
jenkins-bot
b671e574eb Merge "Add ParserOptions::setCollapsibleSections()" 2024-04-29 21:17:15 +00:00
C. Scott Ananian
8d031bcf87 Add ParserOptions::setCollapsibleSections()
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing.  This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.

Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.

Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
2024-04-29 12:11:09 -04:00
C. Scott Ananian
b4cf4aa6bd ParserOutput: Temporarily write (unused) TOCHTML to ParserCache
Even though this JSON property is unused on master, the previous
train release read it from the JSON (and threw the value away).
In order to provide error-free roll-forward and roll-back of the
train, temporarily write an empty string as the value of TOCHTML
so that the read from `$jsonData['TOCHTML']` won't cause a PHP
notice in the logs if we roll back.

This patch is only needed for one train release, and can then
be removed.

Bug: T363107
Change-Id: I77add3bd7f00941cb81481f738bc59d6008c2406
2024-04-22 11:26:10 -04:00
Umherirrender
8d97313f81 Fix some line indent
Change-Id: I8f82724197d20f9289d80e138d80310f1eab29f2
2024-04-20 00:25:15 +02:00
C. Scott Ananian
195ac55bfe [ParserOutput] Remove deprecated ::getTOCHTML() and ::setTOCHTML() methods
These were deprecated with warnings in 1.40.

Change-Id: I8027bc26c71ae94d3d5c7e5112545cd1b35749aa
2024-04-16 13:00:58 -04:00
C. Scott Ananian
db2f1ad606 [ParserOutput] Remove deprecated ::getCategories() method
This was deprecated with warnings in 1.40.

Change-Id: I7b8a86f6efbdd86c1f493db6741c37bfb325e9bb
2024-04-16 12:57:17 -04:00
jenkins-bot
1caf41bb73 Merge "ParserOutput: Rename ::setIndexedPageProperty() to ::setNumericPageProperty()" 2024-04-16 10:57:58 +00:00
C. Scott Ananian
2429785470 ParserOutput: Rename ::setIndexedPageProperty() to ::setNumericPageProperty()
Before this method name gets baked forever into the 1.42 release, rename
the ParserOutput::setIndexedPageProperty() and ::setUnindexedPageProperty()
methods to ::setNumericPageProperty() and ::setUnsortedPageProperty() to
try to address some confusion about whether the *presence* of the page
property is still indexed (it is!), in contrast to whether there's an
additional "sort key" associated with the *value* assigned to the page
property.

This naming is compatible with the feature request in T357783 to have
the sort key and property value specified independently.  The new
method signature in that case would be:

  ...setSortedPageProperty( string $name, string $value, int|float $sortKey )

Although PHP 8.0 will throw a TypeError if a non-numeric type is coerced
to numeric using `0 + ...`, use an explicit is_numeric check to obtain
the same behavior in PHP 7.x.

Change-Id: Ia94c192c429d0482c58467bed787fd2e0aca052f
2024-04-15 15:13:56 -04:00
C. Scott Ananian
f1a45cf2b9 Expand documentation of ParserOutput class
Not *all* ParserOutputs represent parsed articles, and describe the
merging operations on ParserOutputs in more depth.  The interaction
with Content and ContentHandlers is also described (thanks, Daniel!).

Followup-To: Id2e3124652315a74869f504056fa8a99ad794350
Change-Id: I5c1016532eba1b71dc4d3d5d5d0c46775713efb5
2024-04-12 12:53:23 -04:00
Lucas Werkmeister
1adefb10e3 ParserOutput: clarify that “indexed” refers to value
Bug: T305158
Change-Id: Ic6ea22b5188e575b288d57c8f692f492cb69452d
2024-04-12 12:09:02 +02:00
C. Scott Ananian
b4721e24aa ParserOutput::setUnindexedPageProperty(): use empty string as default value
If a placeholder value is needed, it is recommended to use the empty string
to avoid wasting database space unnecessarily.  Operationalize this
recommendation by providing a default value for the method argument.

Bug: T305158
Bug: T350224
Change-Id: I9ea8d93298d771c2d38fdfb451a2817220ca679a
2024-04-11 11:58:13 -04:00
jenkins-bot
2472cd9247 Merge "Substitute category default sort key when filling links table, not at parse time" 2024-04-11 14:59:33 +00:00
jenkins-bot
e4981c9702 Merge "Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties" 2024-04-10 17:44:00 +00:00
C. Scott Ananian
de57c4e7c2 Add ParserOutput::setIndexedPageProperty(); deprecate numeric properties
Deprecate non-string values to ::setPageProperty(), which introduce easy
traps for programmers to fall into.  Instead if page properties are intended
to be indexed, use the new ::setIndexedPageProperty() instead.  Also add
::setUnindexedPageProperty() for symmetry, with a tighter string type on
the value.

Bug: T305158
Bug: T350224
Change-Id: I8a39a7c90341dfee932aa819c9a0a637a8782f69
2024-04-05 19:12:29 -04:00
C. Scott Ananian
01590b89bf ParserOutput: Emit deprecation warning if interwiki passed to addTemplate
Bug: T361330
Depends-On: Ia8fd49a6f9af18e32d47d1dcd052c5f33123f44b
Change-Id: Id4104dff4acaa60d94155d7915b9c1f2af4baaf0
2024-04-04 10:38:45 -04:00
C. Scott Ananian
c2df535b9c Substitute category default sort key when filling links table, not at parse time
This ensures uniform treatment of all places that call `addCategory`
without duplicating the `defaultsort` code; it also ensures that the
effect of the {{DEFAULTSORT}} parser function is independent of page
position.

Bug: T40435
Bug: T353530
Change-Id: I4480a6d59e766fa4eddc9ec9117c58b66771bb47
2024-03-29 18:30:02 -04:00
James D. Forrester
8e940c4f21 Standardise all our class alias deprecation comments for ease of grepping
Change-Id: I7f85d931d3b79da23e87b4e5692b2e14be8fcaa0
2024-03-19 20:11:29 +00:00
jenkins-bot
9232985bd8 Merge "ParserOutput::setPageProperty(): Emit deprecation warning for non-scalar values" 2024-03-11 17:08:20 +00:00
Umherirrender
f3524224f0 build: Fix line indents
Fixed SkinModuleTest::provideGetFeatureFilePathsOrder as nesting of
arrays for parameters is wrong

Change-Id: I9875008adf62d284c48662ebfbd245d72e5be064
2024-03-11 00:14:16 +01:00
jenkins-bot
a62f5c7911 Merge "[ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()" 2024-02-21 17:11:00 +00:00
C. Scott Ananian
72c4945a72 [ParserOutput] Rename $mText to $mRawText and ::setText() to ::setRawText()
ParserOutput::getText() is not a simple getter, but does
transformations on the "text" of the ParserOutput; the simple getter
is named ::getRawText().

To maintain consistency, rename ParserOutput::setText() to
::setRawText() and the property name ParserOutput::$mText to
::$mRawText so future readers are not confused.

The JSON property name as it appears in the serialized ParserCache
is left as 'Text' so that we don't have any forward- or backward-
rollback issues.

Change-Id: I3ef34814ab9473cc70d0a6806e8c5a4a02b73491
2024-02-20 17:13:28 +00:00
C. Scott Ananian
6846f8aa10 ParserOutput::setPageProperty(): Emit deprecation warning for non-scalar values
Non-scalar values passed to ParserOutput::setPageProperty() have never
"worked"; they've been stringified (and null has been stored as an empty
string).  Emit a warning so we can fail harder in future releases.

Bug: T305158
Depends-On: Ib36787d04c0ca713587dc8b814ca1c5a827f6f72
Change-Id: I38234084fdc7427ca577bb33a7fce1541581188d
2024-02-20 11:29:49 -05:00
C. Scott Ananian
b5d44bf339 ParserOutput::setPageProperty(): Update documentation
String and non-string values behave very differently when passed to
::setPageProperty(), resulting in some unexpected gotchas for the
unaware caller.

Bug: T350224
Bug: T305158
Change-Id: I23b35b250f27a117d1353ea8a26d2b3f77c568e7
2024-02-20 11:26:38 -05:00
Subramanya Sastry
e55cc517da Move Parser to Mediawiki\Parser namespace
Bug: T166010
Co-Authored-By: Daimona Eaytoy <daimona.wiki@gmail.com>
Co-Authored-By: James Forrester <jforrester@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I79b4e732c45095eedbaa80afa5eb7479b387ed8a
2024-02-16 09:18:38 -05:00
jenkins-bot
2ca5bb9a96 Merge "ParserOutput: update task id in documentation" 2024-02-15 23:36:35 +00:00
C. Scott Ananian
13873a35b9 ParserOutput: update task id in documentation
We closed T296023 and opened a new task for the work remaining, so
update the comments in the code to match.

The task relating to `addLanguageLink` is actually T296019.

Change-Id: I28b942a57ed41751d44d8565a290d925f6d7f180
2024-02-15 15:23:57 -05:00
C. Scott Ananian
28a3371382 [OutputTransform] Remove broken and unused 'bodyContentOnly' option
This was formerly used by the REST api, but instead that code just
uses ParserOutput::getRawText() when it needs the full HTML document.
This option has been broken, with various passes like RenderDebugInfo
and AddWrapperDiv adding content in inappropriate places if
bodyContentOnly was false.

Change-Id: Ib45f95ded59c81c16d61803f977d1edbfe82b262
2024-02-15 13:05:53 -05:00
C. Scott Ananian
770d2bf040 [ParserOutput] Make 'enableSectionEditLinks' a ParserOption
This will allow the Translate extension to set this parser option
in the ArticleParserOptions hook, instead of mutating $options passed
to ParserOutput::getText() in the ParserOutputPostCacheTransform hook.

It ought to also help to handle the many places which call:

   ... = $parserOutput->getText( [
       'enableSectionEditLinks' => false,
   ] );

by allowing them to set the appropriate ParserOption instead
of passing arguments to ::getText().

Bug: T350626
Change-Id: I719c115194059060f7f888608417a194ac80cc92
2024-02-09 23:42:03 +00:00
C. Scott Ananian
242c6d2cf9 Introduce ParserOutput:setFromParserOptions() and use for preview flag
Bug: T341010
Co-Authored-by: cananian <cananian@wikimedia.org>
Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org>
Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6
2024-02-07 21:22:06 -05:00
C. Scott Ananian
52320c0902 Move ParsoidRenderID to MediaWiki\Edit
This class belongs with the rest of the Parsoid output stash code.

This class has been marked @unstable since 1.39 and thus the move
does not need release notes.

Change-Id: I16061c0c28b1549fbe90ea082cc717fee4a09a6e
2024-02-07 21:22:06 -05:00
C. Scott Ananian
1858e1cdd7 Rename ParserOutput::{get,set}Timestamp() to ::{get,set}RevisionTimestamp()
This avoids confusion with the "render timestamp" held by the cache,
and is consistent with ::get*RevisionId() etc.

The old ::getTimestamp() and ::setTimestamp() methods have been
deprecated.

Change-Id: Idb5e687709c98086c5d3075d31885c58a0723197
2024-02-07 21:22:06 -05:00
C. Scott Ananian
0de13d7662 Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse).  This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface.  Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.

In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value.  Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538).  The primary location for
setting cache properties is the ContentRenderer.

Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths.  An extra parameter was added to
ContentRenderer::getParserOutput() to support this.

Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.

In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache.  Eventually (T350538) it
should only be necessary in the ContentRenderer.

Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2024-02-07 21:22:06 -05:00