Commit graph

394 commits

Author SHA1 Message Date
C. Scott Ananian
bc9c20c733 Deprecate the use of nonserializable arguments to ParserOutput::addWarningMsg()
Bug: T343048
Change-Id: If026926405b96d76faec6ad40f6cd45c4ec5d4a0
2023-08-07 11:57:38 -04:00
C. Scott Ananian
7a8dd531b2 Remove ParserOutput::addWarning, deprecated since 1.38
Replaced with ParserOutput::addWarningMsg()

Bug: T305161
Change-Id: I137b35a2e8250ea7c10059d04071a98a4f968038
2023-08-07 11:57:07 -04:00
jenkins-bot
3774bcf477 Merge "Rename 'bodyOnly' option to ParserOutput::getText()" 2023-08-02 17:58:28 +00:00
jenkins-bot
549961495b Merge "Hard-deprecate ParserOutput::{get,set}Flag()" 2023-08-02 17:48:18 +00:00
jenkins-bot
422973120e Merge "ParserOutput::addModules,addModuleStyles(): first arg must be array" 2023-08-02 14:24:10 +00:00
C. Scott Ananian
2aad6af983 ParserOutput::addModules,addModuleStyles(): first arg must be array
Use strong PHP type hint on argument to enforce that the first parameter
must be an array; formerly we allowed a string as well.  Non-array
arguments have ben deprecated since 1.38 but this allows us to actually
clean up the code a bit.

Bug: T305161
Change-Id: I1566609990524e48faf1fa36079e2f4a4642979d
2023-07-31 18:45:47 -04:00
C. Scott Ananian
7cb30eceb3 Remove Parsoid back-compat code
Now that the latest Parsoid has been released to mediawiki-vendor,
the method_exists() calls aren't necessary.

Bug: T343155
Followup-To: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Change-Id: I081225a268d608f763814245f9cab1c44bf49bad
2023-07-31 18:07:51 -04:00
Umherirrender
511842f9f9 parser: Remove phan-suppression after parsoid 0.18.0-a20 update
The method_exists are kept, not sure if old objects are in any cache

Follow-Up: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Bug: T343155
Change-Id: I0aaa3dce26df1619bedc39696a115145a61d4d14
2023-07-31 22:01:08 +02:00
jenkins-bot
21a9ff5430 Merge "Remove ParserOutput::hideNewSection, deprecated since 1.38" 2023-07-30 14:23:51 +00:00
C. Scott Ananian
abee9b61f0 Rename 'bodyOnly' option to ParserOutput::getText()
In Parsoid 'body only' means the <body> tag and all of its contents.

In ParserOutput::getText() the option means "just the contents of the
<body> tag" so give it a slightly different name.

Change-Id: I04e56ff2c3e03eb56b919d9ac09b5820e4badb21
2023-07-28 23:33:29 +00:00
C. Scott Ananian
2ed1977c5c ParserOutput: use consistent delimiters in bodyOnly regexps
This is a minor style cleanup; parentheses as regexp delimiters
are confusing.

Change-Id: Ibc0d63e59e468705fd81ef5172c29edd46a7f3d5
2023-07-28 14:12:35 -04:00
C. Scott Ananian
e22d93a6bb Hard-deprecate ParserOutput::{get,set}Flag()
These were deprecated in 1.38; users are expected to use
ParserOutput::{get,set}OutputFlag() instead, which helps eliminate a
confusing aliasing of many MW methods named "flag".

Original deprecation: 06ab90f163

Code search:
    https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=

Patches for non-production extensions:
 PageProperties: I592d43e2c912df635cd9162180ed20a6136535f1
 CIForms: I238a6c557891bb6d271d2641261ef69542b7957e

Bug: T292868
Bug: T305161
Change-Id: I4525443ab0932241b0cf64ab606f7ab7d6d70b6e
2023-07-28 13:51:02 -04:00
C. Scott Ananian
075f8661e6 Remove ParserOutput::hideNewSection, deprecated since 1.38
Replaced with ParserOutput::setHideNewSection()

Bug: T305161
Change-Id: Ib1fe11fc7c10b2e46569948545ef48d66804ca0d
2023-07-28 13:45:18 -04:00
jenkins-bot
5b6e6a2b56 Merge "Record Parsoid version in extension data to allow rollback if necessary" 2023-07-28 17:32:28 +00:00
C. Scott Ananian
ea51801f79 Rename newly-added ParserOutput::appendOutputString() method
Tweaked the pluralization of the newly-added
ParserOutput::appendOutputString() method (now ::appendOutputStrings()
and ::getOutputStrings()), and name of the ParserOutputStrings class
(now ParserOutputStringSets), in an effort to continue repainting
bikesheds until the color is juuuust right.

Also extended the new method to cover ::addModules() and ::addModuleStyles()
and added support for these string sets in ::collectMetadata().

(These methods and the enumeration class were originally added in
b2cfa31eb6173e9f5e8607eadd126c33f8ce440b.)

Depends-On: I8bdffa55498d90e990af5bfc3332e3028b0a3539
Change-Id: Ibd41485d5db7779f01642e2144c50ed49d409812
2023-07-28 12:10:56 -04:00
C. Scott Ananian
0b92c4bedb Record Parsoid version in extension data to allow rollback if necessary
This allows any bad cached parses due to a train deploy to be selectively
rolled back in the RejectParserCacheValue hook, which provides some
operational insurance against corrupted caches.  The version is also
added to the debug information in the HTML footer to aid diagnosis
of any issue in real time.

Depends-On: I3d3caabd959c1ba16f4dc702c2eae38d5d4dcb14
Change-Id: Ibb37a82ec0ce764aefd8c9fab2868073a66301ec
2023-07-27 19:02:24 -04:00
Isabelle Hurbain-Palatin
b2cfa31eb6 Add append/getOutputString to ParserOutput
This aims at providing an interface similar to setOutputFlag for string
sets, such as the ones used in CSP properties.

Change-Id: I6f103bd88802e66611e483403a2f8a540d54aae9
2023-07-27 11:37:11 +02:00
thiemowmde
3c631a59f2 More specific array type hints in ParserOutput/OutputPage
Change-Id: I7dbecebb8b26e57afda13f46d3b895f085c4e95e
2023-07-03 15:52:18 +02:00
Subramanya Sastry
0e9656e6da Add return type to getIndicators() in ParserOutput & OutputPage
This is in preparation for changes on the Parsoid side to make
sure its signature is compatible with the ContentMetadataCollector
interface there.

Change-Id: Ife4ae81dbc304097da7dcba40b143f7030b959f3
2023-06-02 16:13:01 +05:30
jenkins-bot
e1c1632d9c Merge "ParserOutput: Ensure page title is updated after merging properties" 2023-05-11 18:23:20 +00:00
Umherirrender
e04d3a28f6 Replace internal Hooks::runner
The Hooks class contains deprecated functions and the whole class is
going to get removed, so remove the convenience function and inline the
code.

Bug: T335536
Change-Id: I8ef3468a64a0199996f26ef293543fcacdf2797f
2023-05-11 06:17:38 +00:00
Subramanya Sastry
632481c382 ParserOutput: Ensure page title is updated after merging properties
Eventually we should merge the "title text" and "display title" in
ParserOutput (T293514) but for now mirror the logic in
ParserOutput::mergeHtmlMetadataFrom() and update the title text
from the source if it hasn't already been set in the destination.
This patch ensures that after page properties are merged during
metadata collection, the title text is suitably updated if the
'displaytitle' property is set.

This will let Parsoid pass displaytitle (metadata) tests in integrated
mode since Parsoid relies on merging metadata from multiple ParserOutput
objects (in the DataAccess object that is used to expand templates, etc.)

Once this patch is merged, Parsoid patches may start failing CI till
we submit a patch there to fix up the integrated test failures list
since some previously failing tests may now pass.

Bug: T293514
Bug: T294621
Change-Id: Ia673f1261ccd03caf455122b71cfb9769b02f22e
2023-05-10 08:53:41 +00:00
jenkins-bot
c5152db020 Merge "Remove back-compat for <editsection>" 2023-04-28 15:59:12 +00:00
Subramanya Sastry
3e297c43ad Fix breakages generating TOC for API Help pages
* TOCData in Parsoid expects to process non-string-key indexed arrays.

* Don't use 'null' as the default for maxtoclevel to ensure that
  TOC is always displayed even when it isn't passed in as a param
  by callers.

* Follows up on 05535be6 which only partially fixed the breakage
  caused by 153a4157 and 439656e0

Bug: T334551
Change-Id: I8883b58574ea8ed0566de2c44dba3408a47d2d0c
2023-04-12 15:37:03 -05:00
jenkins-bot
90997943f9 Merge "Parser: Remove back-compatibility NO_TOC_CONVERSION code" 2023-03-27 20:43:53 +00:00
C. Scott Ananian
cfd9c516e1 Allow setting a ParserOption to generate Parsoid HTML
This is an initial quick-and-dirty implementation.  The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.

Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.

This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing.  This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.

We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata.  Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa.  T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.

Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2023-03-26 21:46:05 -04:00
C. Scott Ananian
8aae904254 Parser: Remove back-compatibility NO_TOC_CONVERSION code
The TOC used to be language-converted in ParserOutput::getText(), but
it wasn't possible to apply custom rules defined in the wikitext
article body at ::getText() time.  Remove the various hacks that we'd
added in an attempt to do so, which were made unnecessary by
I321cd31dae64bbf845d53282e5d28a55bc4ec319.

Bug: T306862
Change-Id: Ib12cd02e9ade91d5794462e8833f2aa3b45a51f2
2023-03-24 22:14:42 +00:00
C. Scott Ananian
99e9d4927f Remove back-compat for <editsection>
The tag has been <mw:editsection> since at least 2011
(f0fd318a4e), we no longer need to
include the ancient <editsection> variant in our regexp and
test cases.

Change-Id: I5fd783556810ea13b07a69066ea6762d1a1863e1
2023-03-15 13:53:01 -04:00
jenkins-bot
6de76f1fad Merge "Add ParserOutput::getLanguage()" 2023-03-13 14:18:47 +00:00
C. Scott Ananian
29853113f7 Deprecate ParserOutput::{get,set}TOCHTML()
No uses in deployed code outside mediawiki-core:

 https://codesearch.wmcloud.org/deployed/?q=%5Bgs%5DetTOCHTML%5C%28&i=nope&files=&excludeFiles=&repos=

Bug: T293513
Change-Id: I3fd82150ac581afbeb94f401672702063586fff0
2023-03-10 20:34:33 -05:00
C. Scott Ananian
183a6da420 Add ParserOutput::getLanguage()
Provide a way for backend code to determine the primary language of a
ParserOutput, eg for setting the Content-Language header of an API
response.

This is read-only and backed by extension data at the moment for
transition purposes; if this API sticks we'll graduate it to a
"real" property in the future, with appropriate serialization
to/from JSON (T303329).

Similarly, this patch only includes the most basic code to handle
the various ParserOutput merge cases in
ParserOutput::merge{Internal,Html,Tracking}MetaDataFrom(),
ParserOutput::collectMetadata(), and
OutputPage::addParserOutput{Content,Metadata,Text,}(); mostly
inherited from the fact that the storage is backed by extension
data at the moment.

Generally only the "top-level" parser output gets to set the
primary language; we'll presumably need to ensure that the
language is consistent during merge.

Change-Id: I767daba22805a877d9b806fd77334e508902844b
2023-03-10 18:42:29 -05:00
C. Scott Ananian
d2446a77dd Deprecate ParserOutput::getCategories()
This undocumented method returns a reference to ParserOutput's private
storage array, yet very few callers actually require a reference or try
to use this to mutate the internal storage.  Further, the keys of the
array can be converted to `int` when the category names are numeric,
which can further confuse users.  Most users found through codesearch
can/should use ::getCategoryNames() instead.

Add a new ::getCategorySortKey() method to provide access to the sort
keys for those few callers who require them, in a manner which doesn't
expose that the internal `mCategories` array stores numeric category
names as 'int'.

Bug: T331727
Change-Id: I8dc85e76bfbb9ed49a603d990c14b7ee798bd821
2023-03-10 10:02:42 -05:00
C. Scott Ananian
e34b25a09f Ensure categories are returned as strings
Numeric category strings like '1' are converted to ints when they are
used as array keys.  Convert back to strings as needed to ensure this
doesn't surprise any clients.

Bug: T331084
Change-Id: Ib39707216d213e414c09226a6378047ffaf43892
2023-03-10 10:02:23 -05:00
James D. Forrester
ad06527fb4 Reorg: Namespace the Title class
This is moderately messy.

Process was principally:

* xargs rg --files-with-matches '^use Title;' | grep 'php$' | \
  xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1'
* rg --files-without-match 'MediaWiki\\Title\\Title;' . | grep 'php$' | \
  xargs rg --files-with-matches 'Title\b' | \
  xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1'
* composer fix

Then manual fix-ups for a few files that don't have any use statements.

Bug: T166010
Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a
Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b
2023-03-02 08:46:53 -05:00
jenkins-bot
9a96857757 Merge "Reorg: Move HTML-related classes out of includes/ to Html/" 2023-02-21 15:37:53 +00:00
Kosta Harlan
b16d2b7fc9
ParserOutput: Don't assume that TOC extension data exists
When running PHPUnit integration tests locally for
Extension:GrowthExperiments, $toc['extensionData'] isn't
defined, leading to failures for various tests.

Follows-Up: I67397c49f2d0764e5c755101264631bea6603e16
Change-Id: I3ef45a86c236863dbeafbd121f1a5951947c5dc6
2023-02-17 09:44:23 +01:00
Amir Sarabadani
7d8768e931 Reorg: Move HTML-related classes out of includes/ to Html/
Bug: T321882
Change-Id: I5dc1f7e9c303cd3f5b9dd7010d6bb470d8400a18
2023-02-16 20:40:01 +01:00
jenkins-bot
855004747a Merge "Ensure CacheTime properties are reflected by ParserOutput::collectMetadata" 2023-02-13 22:04:40 +00:00
C. Scott Ananian
fc62d1325d Ensure CacheTime properties are reflected by ParserOutput::collectMetadata
In order to break a cyclic dependency, Parsoid doesn't know about
core's `ParserOutput` class; it defines its own
`ContentMetadataCollector` interface which expose those portions
of the ParserOutput metadata which the parser needs to supply.

Other bits of the ParserOutput metadata are specific to MediaWiki
internals and Parsoid doesn't have to explicitly know about them:
extensions and core implementations of parser functions (eg) can
take the ContentMetadataCollector supplied by Parsoid and downcast
it back to a ParserOutput in order to propagate internal information
(like ParserCache lifetimes) "behind Parsoid's back" - aka, without
violating abstraction boundaries by exposing every implementation
detail of MediaWiki to Parsoid.

When Parsoid calls into core to expand magic words like
`currenttimestamp` they update the cache TTL in the ParserOutput using
this mechanism.  Using ParserOutput::collectMetadata() ensure these
values are propagated to the final ParserOuput, even though Parsoid
doesn't (shouldn't have to) explicitly know about them.

Bug: T329067
Change-Id: Ia92efff4293841330674df09e82897d0775ef4d6
2023-02-13 16:41:08 -05:00
jenkins-bot
91e9cccc04 Merge "Use a SectionMetadata object in Linker::generateTOC()" 2023-02-10 22:48:18 +00:00
jenkins-bot
eaa368f09d Merge "Remove back-compatibility code for ToC marker" 2023-02-10 20:50:13 +00:00
C. Scott Ananian
d5b39490ca Remove back-compatibility code for ToC marker
Before 1.39 we used <mw:toc> and in 1.39 we switched to <mw:tocplace/>
(commit 24949480eb).  This was changed
to a <meta> tag in 1.40 (commit
0b10563895 and
fa8646ca7b) and the old content has long
since expired from the ParserCache.  Clean up the old ParserCache
transition code.

Change-Id: I3254d0acba31e107b50767797a2b0ad28aba59ee
2023-02-10 00:03:54 -05:00
C. Scott Ananian
153a415742 Use a SectionMetadata object in Linker::generateTOC()
This updates Linker::generateTOC() so it uses a TOCData object, not
a "legacy" associative array.

Change-Id: I8fa83afd17b769df69bdd61ebd1b2ef3fe8b540f
2023-02-09 23:20:52 -05:00
C. Scott Ananian
38767bcabf Temporarily preserve TOC top-level extension data
The TOCData should be serialized with the JsonCodec which will also
allow preserving the TOC top-level extension data.  But for now, use a
hack to ensure it is not lost when we use the "legacy" associative
array format to serialize/deserialize TOCData.

Change-Id: I67397c49f2d0764e5c755101264631bea6603e16
2023-02-10 04:16:14 +00:00
C. Scott Ananian
439656e019 Generate TOC HTML on demand in ParserOutput::getText()
* Rather than computing TOC HTML in Parser and setting it in
  ParserOutput, compute it on demand based on section metadata.

  This will let Parsoid set section metadata in ParserOutput
  and have the TOC generated automatically.

* This required fixing some "bugs" in Linker's generateTOC
  which didn't properly close tags and relied on Tidy to fix
  up unclosed li and ul tags.

* This patch relies on converting section metadata objects to
  array objects, but Linker::generateTOC could be converted to
  use TOC data instead.

* Since TOC generation is now moved to getText(), this is done
  post-PC load and this eliminates the parser cache split on
  user language for TOC heading localization.

Bug: T293513
Change-Id: Ief1bba326d3612b40930440c872a61abadffab10
2023-01-25 16:42:16 -05:00
jenkins-bot
8220c7dce3 Merge "Generate/set/get TOCData/SectionMetadata objects instead of arrays" 2023-01-19 21:36:56 +00:00
Subramanya Sastry
d8d6ecd39f Generate/set/get TOCData/SectionMetadata objects instead of arrays
* ParserOutput::setSections()/::getSections() are expected
  to be deprecated. Uses in extensions and skins will need to be
  migrated in follow up patches once the new interface has stabilized.

* In the skins code, the metadata is converted back to an array.
  Downstream skin TOC consumers will need to be migrated as well
  before we can remove the toLegacy() conversion.

* Fixed SerializationTestTrait's validation method
  - Not sure if this is overkill but should handle all future
    complex objects we might stuff into the ParserCache.

* This patch emits a backward-compatible Sections property in order to
  avoid changing the parser cache serialization format. T327439 has
  been filed to eventually use the JsonCodec support for object
  serialization, but for this initial patch it makes sense to avoid
  the need for a concurrent ParserCache format migration by using a
  backward-compatible serialization.

* TOCData is nullable because the intent is that
  ParserOutput::setTOCData() is MW_MERGE_STRATEGY_WRITE_ONCE; that is,
  only the top-level fragment composing a page will set the TOCData.
  This will be enforced in the future via wfDeprecated() (T327429),
  but again our first patch is as backward-compatible as possible.

Bug: T296025
Depends-On: I1b267d23cf49d147c5379b914531303744481b68
Co-Authored-By: C. Scott Ananian <cananian@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I8329864535f0b1dd5f9163868a08d6cb1ffcb78f
2023-01-19 16:18:13 -05:00
C. Scott Ananian
96e4f5d840 JsonCodec: fix en/decoding of nested objects and stdClass objects
Add a type annotation when encoding `stdClass` objects so that we can
be sure to decode them as objects instead of arrays.

This avoids issues such as that seen in the Graph extension (T312589)
where an extension data key is stored as a stdClass.  If ParserOutput
was computed fresh, a subsequent getExtensionData(..) call will return
a stdClass object, but if the ParserOutput was cached, getExtensionData()
would return an array.  After this change the return type is always
consistent.

Properly handle nested objects: encode all object values returned by
JsonSerializable::jsonSerialize() (so that client is not responsible
for implementing this correctly), and decode all object values *before*
calling JsonUnserializable::newFromJsonArray (again, so that the
client is not responsible for decoding its property values).  The new
behavior matches how serialize/unserialize is handled in the 'naive'
JsonUnserializable{Sub,Super}Class test cases; ParserOutput (the only
users of JsonCodec in core) was doing an extra manual decode for
the ExtensionData array in ParserOutput::initFromJson that is no longer
necessary.

The GrowthExperiments and SemanticMediaWiki extensions were working
around the non-recursive nature of JsonCodec; this patch depends on
patches to GrowthExperiments to make it agnostic about whether object
unserialization occurs before or after ::newFromJsonArray() is called,
which can then be further cleaned up once this is released.
A pull request for SemanticMediaWiki has also been submitted.

Bug: T312589
Depends-On: I3413609251f056893d3921df23698aeed40754ed
Change-Id: Id7d0695af40b9801b42a9b82f41e46118da288dc
2023-01-12 14:12:32 -05:00
jenkins-bot
ece6ba5417 Merge "ParserOutput: point to documentation for serialization compatibility." 2023-01-03 18:27:59 +00:00
daniel
f2febebb30 ParserOutput: point to documentation for serialization compatibility.
Any changes to the way ParserOutput is serialized must follow the
instructions at
<https://www.mediawiki.org/wiki/Manual:Parser_cache/Serialization_compatibility>.

Change-Id: Ic16a6804ca0a65f8f9abbc3112359cc239febde3
2023-01-03 19:08:22 +01:00