Commit graph

379 commits

Author SHA1 Message Date
C. Scott Ananian
e22d93a6bb Hard-deprecate ParserOutput::{get,set}Flag()
These were deprecated in 1.38; users are expected to use
ParserOutput::{get,set}OutputFlag() instead, which helps eliminate a
confusing aliasing of many MW methods named "flag".

Original deprecation: 06ab90f163

Code search:
    https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=

Patches for non-production extensions:
 PageProperties: I592d43e2c912df635cd9162180ed20a6136535f1
 CIForms: I238a6c557891bb6d271d2641261ef69542b7957e

Bug: T292868
Bug: T305161
Change-Id: I4525443ab0932241b0cf64ab606f7ab7d6d70b6e
2023-07-28 13:51:02 -04:00
Isabelle Hurbain-Palatin
b2cfa31eb6 Add append/getOutputString to ParserOutput
This aims at providing an interface similar to setOutputFlag for string
sets, such as the ones used in CSP properties.

Change-Id: I6f103bd88802e66611e483403a2f8a540d54aae9
2023-07-27 11:37:11 +02:00
thiemowmde
3c631a59f2 More specific array type hints in ParserOutput/OutputPage
Change-Id: I7dbecebb8b26e57afda13f46d3b895f085c4e95e
2023-07-03 15:52:18 +02:00
Subramanya Sastry
0e9656e6da Add return type to getIndicators() in ParserOutput & OutputPage
This is in preparation for changes on the Parsoid side to make
sure its signature is compatible with the ContentMetadataCollector
interface there.

Change-Id: Ife4ae81dbc304097da7dcba40b143f7030b959f3
2023-06-02 16:13:01 +05:30
jenkins-bot
e1c1632d9c Merge "ParserOutput: Ensure page title is updated after merging properties" 2023-05-11 18:23:20 +00:00
Umherirrender
e04d3a28f6 Replace internal Hooks::runner
The Hooks class contains deprecated functions and the whole class is
going to get removed, so remove the convenience function and inline the
code.

Bug: T335536
Change-Id: I8ef3468a64a0199996f26ef293543fcacdf2797f
2023-05-11 06:17:38 +00:00
Subramanya Sastry
632481c382 ParserOutput: Ensure page title is updated after merging properties
Eventually we should merge the "title text" and "display title" in
ParserOutput (T293514) but for now mirror the logic in
ParserOutput::mergeHtmlMetadataFrom() and update the title text
from the source if it hasn't already been set in the destination.
This patch ensures that after page properties are merged during
metadata collection, the title text is suitably updated if the
'displaytitle' property is set.

This will let Parsoid pass displaytitle (metadata) tests in integrated
mode since Parsoid relies on merging metadata from multiple ParserOutput
objects (in the DataAccess object that is used to expand templates, etc.)

Once this patch is merged, Parsoid patches may start failing CI till
we submit a patch there to fix up the integrated test failures list
since some previously failing tests may now pass.

Bug: T293514
Bug: T294621
Change-Id: Ia673f1261ccd03caf455122b71cfb9769b02f22e
2023-05-10 08:53:41 +00:00
jenkins-bot
c5152db020 Merge "Remove back-compat for <editsection>" 2023-04-28 15:59:12 +00:00
Subramanya Sastry
3e297c43ad Fix breakages generating TOC for API Help pages
* TOCData in Parsoid expects to process non-string-key indexed arrays.

* Don't use 'null' as the default for maxtoclevel to ensure that
  TOC is always displayed even when it isn't passed in as a param
  by callers.

* Follows up on 05535be6 which only partially fixed the breakage
  caused by 153a4157 and 439656e0

Bug: T334551
Change-Id: I8883b58574ea8ed0566de2c44dba3408a47d2d0c
2023-04-12 15:37:03 -05:00
jenkins-bot
90997943f9 Merge "Parser: Remove back-compatibility NO_TOC_CONVERSION code" 2023-03-27 20:43:53 +00:00
C. Scott Ananian
cfd9c516e1 Allow setting a ParserOption to generate Parsoid HTML
This is an initial quick-and-dirty implementation.  The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.

Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.

This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing.  This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.

We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata.  Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa.  T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.

Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2023-03-26 21:46:05 -04:00
C. Scott Ananian
8aae904254 Parser: Remove back-compatibility NO_TOC_CONVERSION code
The TOC used to be language-converted in ParserOutput::getText(), but
it wasn't possible to apply custom rules defined in the wikitext
article body at ::getText() time.  Remove the various hacks that we'd
added in an attempt to do so, which were made unnecessary by
I321cd31dae64bbf845d53282e5d28a55bc4ec319.

Bug: T306862
Change-Id: Ib12cd02e9ade91d5794462e8833f2aa3b45a51f2
2023-03-24 22:14:42 +00:00
C. Scott Ananian
99e9d4927f Remove back-compat for <editsection>
The tag has been <mw:editsection> since at least 2011
(f0fd318a4e), we no longer need to
include the ancient <editsection> variant in our regexp and
test cases.

Change-Id: I5fd783556810ea13b07a69066ea6762d1a1863e1
2023-03-15 13:53:01 -04:00
jenkins-bot
6de76f1fad Merge "Add ParserOutput::getLanguage()" 2023-03-13 14:18:47 +00:00
C. Scott Ananian
29853113f7 Deprecate ParserOutput::{get,set}TOCHTML()
No uses in deployed code outside mediawiki-core:

 https://codesearch.wmcloud.org/deployed/?q=%5Bgs%5DetTOCHTML%5C%28&i=nope&files=&excludeFiles=&repos=

Bug: T293513
Change-Id: I3fd82150ac581afbeb94f401672702063586fff0
2023-03-10 20:34:33 -05:00
C. Scott Ananian
183a6da420 Add ParserOutput::getLanguage()
Provide a way for backend code to determine the primary language of a
ParserOutput, eg for setting the Content-Language header of an API
response.

This is read-only and backed by extension data at the moment for
transition purposes; if this API sticks we'll graduate it to a
"real" property in the future, with appropriate serialization
to/from JSON (T303329).

Similarly, this patch only includes the most basic code to handle
the various ParserOutput merge cases in
ParserOutput::merge{Internal,Html,Tracking}MetaDataFrom(),
ParserOutput::collectMetadata(), and
OutputPage::addParserOutput{Content,Metadata,Text,}(); mostly
inherited from the fact that the storage is backed by extension
data at the moment.

Generally only the "top-level" parser output gets to set the
primary language; we'll presumably need to ensure that the
language is consistent during merge.

Change-Id: I767daba22805a877d9b806fd77334e508902844b
2023-03-10 18:42:29 -05:00
C. Scott Ananian
d2446a77dd Deprecate ParserOutput::getCategories()
This undocumented method returns a reference to ParserOutput's private
storage array, yet very few callers actually require a reference or try
to use this to mutate the internal storage.  Further, the keys of the
array can be converted to `int` when the category names are numeric,
which can further confuse users.  Most users found through codesearch
can/should use ::getCategoryNames() instead.

Add a new ::getCategorySortKey() method to provide access to the sort
keys for those few callers who require them, in a manner which doesn't
expose that the internal `mCategories` array stores numeric category
names as 'int'.

Bug: T331727
Change-Id: I8dc85e76bfbb9ed49a603d990c14b7ee798bd821
2023-03-10 10:02:42 -05:00
C. Scott Ananian
e34b25a09f Ensure categories are returned as strings
Numeric category strings like '1' are converted to ints when they are
used as array keys.  Convert back to strings as needed to ensure this
doesn't surprise any clients.

Bug: T331084
Change-Id: Ib39707216d213e414c09226a6378047ffaf43892
2023-03-10 10:02:23 -05:00
James D. Forrester
ad06527fb4 Reorg: Namespace the Title class
This is moderately messy.

Process was principally:

* xargs rg --files-with-matches '^use Title;' | grep 'php$' | \
  xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1'
* rg --files-without-match 'MediaWiki\\Title\\Title;' . | grep 'php$' | \
  xargs rg --files-with-matches 'Title\b' | \
  xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1'
* composer fix

Then manual fix-ups for a few files that don't have any use statements.

Bug: T166010
Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a
Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b
2023-03-02 08:46:53 -05:00
jenkins-bot
9a96857757 Merge "Reorg: Move HTML-related classes out of includes/ to Html/" 2023-02-21 15:37:53 +00:00
Kosta Harlan
b16d2b7fc9
ParserOutput: Don't assume that TOC extension data exists
When running PHPUnit integration tests locally for
Extension:GrowthExperiments, $toc['extensionData'] isn't
defined, leading to failures for various tests.

Follows-Up: I67397c49f2d0764e5c755101264631bea6603e16
Change-Id: I3ef45a86c236863dbeafbd121f1a5951947c5dc6
2023-02-17 09:44:23 +01:00
Amir Sarabadani
7d8768e931 Reorg: Move HTML-related classes out of includes/ to Html/
Bug: T321882
Change-Id: I5dc1f7e9c303cd3f5b9dd7010d6bb470d8400a18
2023-02-16 20:40:01 +01:00
jenkins-bot
855004747a Merge "Ensure CacheTime properties are reflected by ParserOutput::collectMetadata" 2023-02-13 22:04:40 +00:00
C. Scott Ananian
fc62d1325d Ensure CacheTime properties are reflected by ParserOutput::collectMetadata
In order to break a cyclic dependency, Parsoid doesn't know about
core's `ParserOutput` class; it defines its own
`ContentMetadataCollector` interface which expose those portions
of the ParserOutput metadata which the parser needs to supply.

Other bits of the ParserOutput metadata are specific to MediaWiki
internals and Parsoid doesn't have to explicitly know about them:
extensions and core implementations of parser functions (eg) can
take the ContentMetadataCollector supplied by Parsoid and downcast
it back to a ParserOutput in order to propagate internal information
(like ParserCache lifetimes) "behind Parsoid's back" - aka, without
violating abstraction boundaries by exposing every implementation
detail of MediaWiki to Parsoid.

When Parsoid calls into core to expand magic words like
`currenttimestamp` they update the cache TTL in the ParserOutput using
this mechanism.  Using ParserOutput::collectMetadata() ensure these
values are propagated to the final ParserOuput, even though Parsoid
doesn't (shouldn't have to) explicitly know about them.

Bug: T329067
Change-Id: Ia92efff4293841330674df09e82897d0775ef4d6
2023-02-13 16:41:08 -05:00
jenkins-bot
91e9cccc04 Merge "Use a SectionMetadata object in Linker::generateTOC()" 2023-02-10 22:48:18 +00:00
jenkins-bot
eaa368f09d Merge "Remove back-compatibility code for ToC marker" 2023-02-10 20:50:13 +00:00
C. Scott Ananian
d5b39490ca Remove back-compatibility code for ToC marker
Before 1.39 we used <mw:toc> and in 1.39 we switched to <mw:tocplace/>
(commit 24949480eb).  This was changed
to a <meta> tag in 1.40 (commit
0b10563895 and
fa8646ca7b) and the old content has long
since expired from the ParserCache.  Clean up the old ParserCache
transition code.

Change-Id: I3254d0acba31e107b50767797a2b0ad28aba59ee
2023-02-10 00:03:54 -05:00
C. Scott Ananian
153a415742 Use a SectionMetadata object in Linker::generateTOC()
This updates Linker::generateTOC() so it uses a TOCData object, not
a "legacy" associative array.

Change-Id: I8fa83afd17b769df69bdd61ebd1b2ef3fe8b540f
2023-02-09 23:20:52 -05:00
C. Scott Ananian
38767bcabf Temporarily preserve TOC top-level extension data
The TOCData should be serialized with the JsonCodec which will also
allow preserving the TOC top-level extension data.  But for now, use a
hack to ensure it is not lost when we use the "legacy" associative
array format to serialize/deserialize TOCData.

Change-Id: I67397c49f2d0764e5c755101264631bea6603e16
2023-02-10 04:16:14 +00:00
C. Scott Ananian
439656e019 Generate TOC HTML on demand in ParserOutput::getText()
* Rather than computing TOC HTML in Parser and setting it in
  ParserOutput, compute it on demand based on section metadata.

  This will let Parsoid set section metadata in ParserOutput
  and have the TOC generated automatically.

* This required fixing some "bugs" in Linker's generateTOC
  which didn't properly close tags and relied on Tidy to fix
  up unclosed li and ul tags.

* This patch relies on converting section metadata objects to
  array objects, but Linker::generateTOC could be converted to
  use TOC data instead.

* Since TOC generation is now moved to getText(), this is done
  post-PC load and this eliminates the parser cache split on
  user language for TOC heading localization.

Bug: T293513
Change-Id: Ief1bba326d3612b40930440c872a61abadffab10
2023-01-25 16:42:16 -05:00
jenkins-bot
8220c7dce3 Merge "Generate/set/get TOCData/SectionMetadata objects instead of arrays" 2023-01-19 21:36:56 +00:00
Subramanya Sastry
d8d6ecd39f Generate/set/get TOCData/SectionMetadata objects instead of arrays
* ParserOutput::setSections()/::getSections() are expected
  to be deprecated. Uses in extensions and skins will need to be
  migrated in follow up patches once the new interface has stabilized.

* In the skins code, the metadata is converted back to an array.
  Downstream skin TOC consumers will need to be migrated as well
  before we can remove the toLegacy() conversion.

* Fixed SerializationTestTrait's validation method
  - Not sure if this is overkill but should handle all future
    complex objects we might stuff into the ParserCache.

* This patch emits a backward-compatible Sections property in order to
  avoid changing the parser cache serialization format. T327439 has
  been filed to eventually use the JsonCodec support for object
  serialization, but for this initial patch it makes sense to avoid
  the need for a concurrent ParserCache format migration by using a
  backward-compatible serialization.

* TOCData is nullable because the intent is that
  ParserOutput::setTOCData() is MW_MERGE_STRATEGY_WRITE_ONCE; that is,
  only the top-level fragment composing a page will set the TOCData.
  This will be enforced in the future via wfDeprecated() (T327429),
  but again our first patch is as backward-compatible as possible.

Bug: T296025
Depends-On: I1b267d23cf49d147c5379b914531303744481b68
Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I8329864535f0b1dd5f9163868a08d6cb1ffcb78f
2023-01-19 16:18:13 -05:00
C. Scott Ananian
96e4f5d840 JsonCodec: fix en/decoding of nested objects and stdClass objects
Add a type annotation when encoding `stdClass` objects so that we can
be sure to decode them as objects instead of arrays.

This avoids issues such as that seen in the Graph extension (T312589)
where an extension data key is stored as a stdClass.  If ParserOutput
was computed fresh, a subsequent getExtensionData(..) call will return
a stdClass object, but if the ParserOutput was cached, getExtensionData()
would return an array.  After this change the return type is always
consistent.

Properly handle nested objects: encode all object values returned by
JsonSerializable::jsonSerialize() (so that client is not responsible
for implementing this correctly), and decode all object values *before*
calling JsonUnserializable::newFromJsonArray (again, so that the
client is not responsible for decoding its property values).  The new
behavior matches how serialize/unserialize is handled in the 'naive'
JsonUnserializable{Sub,Super}Class test cases; ParserOutput (the only
users of JsonCodec in core) was doing an extra manual decode for
the ExtensionData array in ParserOutput::initFromJson that is no longer
necessary.

The GrowthExperiments and SemanticMediaWiki extensions were working
around the non-recursive nature of JsonCodec; this patch depends on
patches to GrowthExperiments to make it agnostic about whether object
unserialization occurs before or after ::newFromJsonArray() is called,
which can then be further cleaned up once this is released.
A pull request for SemanticMediaWiki has also been submitted.

Bug: T312589
Depends-On: I3413609251f056893d3921df23698aeed40754ed
Change-Id: Id7d0695af40b9801b42a9b82f41e46118da288dc
2023-01-12 14:12:32 -05:00
jenkins-bot
ece6ba5417 Merge "ParserOutput: point to documentation for serialization compatibility." 2023-01-03 18:27:59 +00:00
daniel
f2febebb30 ParserOutput: point to documentation for serialization compatibility.
Any changes to the way ParserOutput is serialized must follow the
instructions at
<https://www.mediawiki.org/wiki/Manual:Parser_cache/Serialization_compatibility>.

Change-Id: Ic16a6804ca0a65f8f9abbc3112359cc239febde3
2023-01-03 19:08:22 +01:00
Amir Sarabadani
523ab7cff8 Reorg: Move RawMessage to under language/
To follow Message. This is approved as part of RFC T166010.

Also namespace it but doing it properly with PSR-4 would require
namespacing every class under language/ and that will take some time.

Bug: T321882
Change-Id: I195cf4c67bd51410556c2dd1e33cc9c1033d5d18
2022-12-16 11:30:19 +01:00
Matěj Suchánek
a592d47e91 Clean up redundant array manipulation
PHP does this implicitly.

Change-Id: I009a7c93d44fb5e8c430c971cfc637fa04a8e68d
2022-12-11 12:42:29 +01:00
Amir Sarabadani
2d60ba0c63 Reorg: Move DummyLinker and Linker to linker/
This feels like a no-brainer unless I'm missing something obvious

Bug: T321882
Change-Id: Id49c3d0dd6ea4593211048850856b5b8e05a8fb3
2022-12-08 06:38:17 +01:00
Umherirrender
1b342a8893 Various doc fixes about false and null on method arguments/return types
Doc-only changes

Change-Id: Ice974b3ba41708859dfe646e94b31c5ebbf26410
2022-11-03 18:55:47 +01:00
Tim Starling
0077c5da15 Use short array destructuring instead of list()
Introduced in PHP 7.1. Because it's shorter and looks nice.

I used regex replacement.

Change-Id: I0555e199d126cd44501f859cb4589f8bd49694da
2022-10-21 15:33:37 +11:00
thiemowmde
d81f01e417 Replace various array type hints with more specific string[]
There are many, many more. I touch only a few where I'm sure it's
never anything but an array of strings.

Change-Id: I8b798f2e9d48f07a241b95ce0ace8fa9d981695d
2022-09-27 09:24:22 +02:00
Umherirrender
5c5498a202 Remove unused key variable from foreach loops
Change-Id: Id2d91e30a6f7cc4eb93427b50efc1c5c77f14b75
2022-09-21 21:18:43 +02:00
C. Scott Ananian
6c242a8a11 OutputPage::addParserOutputText(): use default ParserOutput options from skin
This addresses the common case patched by
I530d71d0f9279b40a263cd62467d3ef8c76975c3,
If6267f3389b166043fc94d7f952bc54122b1a378 and probably
the code in Article.php from I44045b3b9e78e7ab793da3f37e3c0dbc91cd7d39
by ensuring that "injectTOC" in the options passed to
ParserOutput::getText() defaults to the correct value based on the skin
being used by OutputPage.

Bug: T317333
Change-Id: Ica30569efbb5730eff5b807e8fc34beb2e13e74f
2022-09-08 15:46:23 -04:00
jenkins-bot
6d840fa896 Merge "ParserOutput::mergeMapStrategy - use a more robust comparison for objects" 2022-07-21 02:51:31 +00:00
Umherirrender
e00a52e6f5 Clean up line indent with mixed tabs and whitespaces
Change-Id: Ifcd15ecc4212d4ebfc26b2e18d6f1da47abf2a86
2022-07-09 22:21:53 +02:00
C. Scott Ananian
541542e588 ParserOutput::mergeMapStrategy - use a more robust comparison for objects
Map values can include JsonUnserializable objects, and strict
(reference) equality comparison of these objects is not going to
reflect value equality.  Serialize the values and compare strings
instead; this case should be hit very infrequently given that
rewriting the same extension data key is discouraged.

Bug: T312588
Change-Id: I942e7fa662b2f1a5e32fd55ef65eaa10a22afcfb
2022-07-08 16:07:18 +00:00
C. Scott Ananian
577879841c ParserOutput::mergeMapStrategy: don't crash if merging non-array values
The PHP `isset(...)` construct covers a multitude of possible "wrong
types" for the left hand side of an array access, but it still crashes
(with "Cannot use object of type stdClass as array") if the left hand
side is an object.

Bug: T312242
Change-Id: I35026c573fb941004764d46d5652ebcddc559c03
2022-07-07 15:02:57 +00:00
jenkins-bot
3ed9d3a6f9 Merge "Use the same tooltip for transcluded sections as normal ones" 2022-06-22 18:31:43 +00:00
daniel
697f28df32 ParserCache: always use JSON
When JSON support was introduced into ParserCache in 1.36, it was
controlled by a feature flag, $wgParserCacheUseJson. The feature flag
was "born deprecated" in 1.36. It can now be removed.

This means that ParserCache will always store entries as JSON.
Support for reading old non-JSON entries remains intact.
This is needed when updating wikis from a version older than 1.36
to the current version.

Change-Id: Id04e42bfb458d98414bac50e0d6c505e8878e5c0
2022-06-07 15:19:45 +02:00
Isabelle Hurbain-Palatin
1277d9f154 Still collect metadata on multiple writes
Follow-up to I9d1f0f6bab1305552a0350667d6142a24bc04049. That patch was
not collecting data at all (not even overwriting them over and over
again) - the assignment operation was, in practice, a NOP. This patch
fixes this.

Bug: T303014
Bug: T303015
Change-Id: I7d09b532f3270edf4327c16e032d665353d992f6
2022-05-17 11:14:51 -04:00