Commit graph

409 commits

Author SHA1 Message Date
Subramanya Sastry
c8d0470f4b Make ParsoidOutputAccess a wrapper over ParserOutputAccess
* Updated ParserOutput to set Parsoid render ids that REST API
  functionality expects in ParserOutput objects.
* CacheThresholdTime functionality no longer exists since it was
  implemented in ParsoidOutputAccess and ParserOutputAccess doesn't
  support it. This is tracked in T346765.
* Enforce the constraint that uncacheable parses are only for fake or
  mutable revisions. Updated tests that violated this constraint to
  use 'getParseOutput' instead of calling the parse method directly.
* Had to make some changes in ParsoidParser around use of preferredVariant
  passed to Parsoid. I also left some TODO comments for future fixes.
  T267067 is also relevant here.

PARSOID-SPECIFIC OPTIONS:
* logLinterData: linter data is always logged by default -- removed
  support to disable it. Linter extension handles stale lints properly
  and it is better to let it handle it rather than add special cases
  to the API.
* offsetType: Moved this support to ParsoidHandler as a post-processing
  of byte-offset output. This eliminates the need to support this
  Parsoid-specific options in the ContentHandler hierarchies.
* body_only / wrapSections: Handled this in HtmlOutputRendererHelper
  as a post-processing of regular output by removing sections and
  returning the body content only. This does result in some useless
  section-wrapping work with Parsoid, but the simplification is probably
  worth it. If in the future, we support Parsoid-specific options in
  the ContentHandler hierarchy, we could re-introduce this. But, in any
  case, this "fragment" flavor options is likely to get moved out of
  core into the VisualEditor extension code.

DEPLOYMENT:
* This patch changes the cache key by setting the useParsoid option
  in ParserOptions. The parent patch handles this to ensure we don't
  encounter a cold cache on deploy.

TESTS:
* Updated tests and mocks to reflect new reality.
* Do we need any new tests?

Bug: T332931
Change-Id: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
2023-10-13 15:03:03 -05:00
C. Scott Ananian
835daa0681 ParserOutput::hasTOCHTML(): Remove old back-compat code
We'll remove ::getTOCHTML() and ::setTOCHTML() shortly as well, but
we need to adjust our parser cache serialization tests first.

Bug: T348134
Bug: T305161
Change-Id: I19f1e3d0ecbbf1225a3cb41d48e668cad9867bc5
2023-10-12 15:36:31 -04:00
C. Scott Ananian
02852b813d Remove implicit setter for ParserOutput::mTOCHTML
The ::setTOCHTML() and ::getTOCHTML() method have been deprecated
since 1.40; there's no reason we should be updating ::$mTOCHTML
behind their backs.

Bug: T348134
Change-Id: I9396bc0a2caeb974a06c5b47075b3e2bb9f4278a
2023-10-04 15:10:58 -04:00
Timo Tijhof
54693fc907 parser: Improve ParserOutput docs and fix absoluteURLs default
* Document the bodyContentOnly option.
  Introduced in Ica09a4284c (cfd9c516e1) and renamed in
  commit I04e56ff2c3 (abee9b61f0).

* Fix default for absoluteURLs option.
  Introduced in Id660e10261 (d334de960a).

* Match order between docs and defaults for readability
  and easier review for correctness.

* Move potentially duplicate brief and ingroup from file doc
  to class doc and clean up file doc to be less novel and more like
  99% of other class files in MediaWiki. See also
  <https://gerrit.wikimedia.org/r/q/message:ingroup+owner:Krinkle>

* Document what the class does and it relates to several other
  prominent in MediaWiki core.

Bug: T341244
Change-Id: Id2e3124652315a74869f504056fa8a99ad794350
2023-10-03 20:16:15 -07:00
C. Scott Ananian
d20663259f Hard-deprecate ParserOutput::getCategories(), deprecated in 1.40
It is difficult to distinguish this method from OutputPage::addJsConfigVars()
in code search:

   https://codesearch.wmcloud.org/deployed/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3EgetCategories%5C%28&files=&excludeFiles=&repos=

We generally try to replace $output with $parserOutput or $pOutput
as we touch code to improve the ability of codesearch to dig up
deprecated ParserOutput methods.

Bug: T305161
Depends-On: I02dd4f61c43c225b0ef6dc51c3e4f9d967a0a272
Depends-On: I61d2d77591579d825ad9d37f902e40366be55dd6
Depends-On: I91155106b7a9e10d3334f95ba4936d02851bfb11
Depends-On: Iaca745c79d9587571af03b23b21d76a6cba0ebf1
Depends-On: Id10a171c44411b1233ee4d6cf8fbd3dc57744eef
Depends-On: I47a25c011d9bd4b1a15dda4e673e32c25eb64f2b
Depends-On: I683fc768aba50b801f46467fcfa1668fa8731ea6
Change-Id: I5a2ac1c99b8b199102e12f0d32dd6ec5cdc24054
2023-09-29 15:25:50 -04:00
James D. Forrester
468e69bccc Namespace Sanitizer under \MediaWiki\Parser
Bug: T166010
Change-Id: Id13dcbf7a0372017495958dbc4f601f40c122508
2023-09-21 05:39:23 +00:00
C. Scott Ananian
15d52473d0 Remove ParserOutput::{get,set}CategoryLinks(), deprecated since 1.38
Bug: T305161
Change-Id: Ie6210fd29db1b7488febc6e0a57d8a0cd5073cdd
2023-09-18 11:34:03 -04:00
C. Scott Ananian
d421ab57f8 Remove ParserOutput::addOutputHook() and related code
ParserOutput::addOutputHook() has been deprecated since 1.38, and without
any calls to ::addOutputHook() the associated ::getOutputHooks() and
$wgParserOutputHooks configuration do nothing.

Bug: T292321
Bug: T305161
Change-Id: Ib770c680d5e0697980e7e36a323ec56ba1d806b8
2023-09-18 11:34:02 -04:00
C. Scott Ananian
722f64f6df Remove ParserOutput::{get,unset,set}Property and ::getProperties()
These were deprecated in 1.38 and replaced with ::{get,set}PageProperty()
and ::getPageProperties(), avoiding a heavily-aliased use of the term
"property" and making the relationship between the ParserOutput and page
properties clearer.

Bug: T305161
Change-Id: Ib1a5d0a2c1387584b81c958fa32516034e7b3d05
2023-09-18 11:34:02 -04:00
C. Scott Ananian
83e197d817 Remove ParserOutput::addTrackingCategory(), deprecated since 1.38
Instead use either Parser::addTrackingCategory() or the TrackingCategories
service.

Bug: T305161
Change-Id: I19e0f67e377e6c68f54f6d5bb4f079110d1e61fc
2023-09-18 11:34:02 -04:00
C. Scott Ananian
3dd695a1ec WikitextContentHandler/ParserOutput: move redirect header to post processing
Insert the redirect handler as part of the post-processing done in
ParserOutput::getText().  This ensures that it does not corrupt
edit-mode Parsoid output.

Depends-On: Ia6e390d849830993a6b97004f099773cfd4fa54b
Change-Id: I20db09619999919bfeda997d79561d21e3bf8718
2023-09-15 15:20:01 -04:00
C. Scott Ananian
92a5a28ed1 Remove ParserOutput::hasDynamicContent(), deprecated since 1.38
This was just an alias for ::hasReducedExpiry(), introduced in 1.37.

Bug: T305161
Change-Id: I3f6caeb9ae4b2164824f9bed274e76d6e61ad7cc
2023-09-11 12:33:48 -04:00
C. Scott Ananian
f33e71f1f4 Remove {ParserOutput,OutputPage}::preventClickjacking, deprecated since 1.38
This was removed by a getter/setter pair with a more standard name:
   ::{set,get}PreventClickjacking()
in both ParserOutput and OutputPage.

In addition, OutputPage::allowClickjacking(), similiarly deprecated,
was removed.

Bug: T305161
Change-Id: I141ec9e9cb4a285edc633c0f9b61516c33f9281c
2023-09-11 12:33:45 -04:00
Daimona Eaytoy
154f04299c Remove redundant empty() constructs (2)
empty() only makes sense when the expression it checks is possibly
undefined, otherwise it's equivalent to a truthiness check with the
additional downside of suppressing errors when it's not wanted.

Replace it with simple truthiness checks, using strict comparison when
that seems to help with polymorphic variables.

These were caught by a bespoke phan plugin.

Change-Id: I70b629dbf9e47cf3ba48ff439b18f19e839677f4
2023-09-08 23:28:11 +02:00
Amir Sarabadani
d8e542abf9 Reorg: Move three output related classes to includes/Output/
And namesapce them:
 - StreamFile
 - OutputHandler
 - OutputPage

Bug: T321882
Change-Id: Iedf8d88c595e580f2d8f0734c92aa5c45618ba33
2023-09-05 19:36:42 +01:00
C. Scott Ananian
bc9c20c733 Deprecate the use of nonserializable arguments to ParserOutput::addWarningMsg()
Bug: T343048
Change-Id: If026926405b96d76faec6ad40f6cd45c4ec5d4a0
2023-08-07 11:57:38 -04:00
C. Scott Ananian
7a8dd531b2 Remove ParserOutput::addWarning, deprecated since 1.38
Replaced with ParserOutput::addWarningMsg()

Bug: T305161
Change-Id: I137b35a2e8250ea7c10059d04071a98a4f968038
2023-08-07 11:57:07 -04:00
jenkins-bot
3774bcf477 Merge "Rename 'bodyOnly' option to ParserOutput::getText()" 2023-08-02 17:58:28 +00:00
jenkins-bot
549961495b Merge "Hard-deprecate ParserOutput::{get,set}Flag()" 2023-08-02 17:48:18 +00:00
jenkins-bot
422973120e Merge "ParserOutput::addModules,addModuleStyles(): first arg must be array" 2023-08-02 14:24:10 +00:00
C. Scott Ananian
2aad6af983 ParserOutput::addModules,addModuleStyles(): first arg must be array
Use strong PHP type hint on argument to enforce that the first parameter
must be an array; formerly we allowed a string as well.  Non-array
arguments have ben deprecated since 1.38 but this allows us to actually
clean up the code a bit.

Bug: T305161
Change-Id: I1566609990524e48faf1fa36079e2f4a4642979d
2023-07-31 18:45:47 -04:00
C. Scott Ananian
7cb30eceb3 Remove Parsoid back-compat code
Now that the latest Parsoid has been released to mediawiki-vendor,
the method_exists() calls aren't necessary.

Bug: T343155
Followup-To: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Change-Id: I081225a268d608f763814245f9cab1c44bf49bad
2023-07-31 18:07:51 -04:00
Umherirrender
511842f9f9 parser: Remove phan-suppression after parsoid 0.18.0-a20 update
The method_exists are kept, not sure if old objects are in any cache

Follow-Up: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Bug: T343155
Change-Id: I0aaa3dce26df1619bedc39696a115145a61d4d14
2023-07-31 22:01:08 +02:00
jenkins-bot
21a9ff5430 Merge "Remove ParserOutput::hideNewSection, deprecated since 1.38" 2023-07-30 14:23:51 +00:00
C. Scott Ananian
abee9b61f0 Rename 'bodyOnly' option to ParserOutput::getText()
In Parsoid 'body only' means the <body> tag and all of its contents.

In ParserOutput::getText() the option means "just the contents of the
<body> tag" so give it a slightly different name.

Change-Id: I04e56ff2c3e03eb56b919d9ac09b5820e4badb21
2023-07-28 23:33:29 +00:00
C. Scott Ananian
2ed1977c5c ParserOutput: use consistent delimiters in bodyOnly regexps
This is a minor style cleanup; parentheses as regexp delimiters
are confusing.

Change-Id: Ibc0d63e59e468705fd81ef5172c29edd46a7f3d5
2023-07-28 14:12:35 -04:00
C. Scott Ananian
e22d93a6bb Hard-deprecate ParserOutput::{get,set}Flag()
These were deprecated in 1.38; users are expected to use
ParserOutput::{get,set}OutputFlag() instead, which helps eliminate a
confusing aliasing of many MW methods named "flag".

Original deprecation: 06ab90f163

Code search:
    https://codesearch.wmcloud.org/search/?q=%5BOo%5Dut%28put%29%3F%28%5C%28%5C%29%29%3F-%3E%28g%7Cs%29etFlag%5C%28&i=nope&files=&excludeFiles=&repos=

Patches for non-production extensions:
 PageProperties: I592d43e2c912df635cd9162180ed20a6136535f1
 CIForms: I238a6c557891bb6d271d2641261ef69542b7957e

Bug: T292868
Bug: T305161
Change-Id: I4525443ab0932241b0cf64ab606f7ab7d6d70b6e
2023-07-28 13:51:02 -04:00
C. Scott Ananian
075f8661e6 Remove ParserOutput::hideNewSection, deprecated since 1.38
Replaced with ParserOutput::setHideNewSection()

Bug: T305161
Change-Id: Ib1fe11fc7c10b2e46569948545ef48d66804ca0d
2023-07-28 13:45:18 -04:00
jenkins-bot
5b6e6a2b56 Merge "Record Parsoid version in extension data to allow rollback if necessary" 2023-07-28 17:32:28 +00:00
C. Scott Ananian
ea51801f79 Rename newly-added ParserOutput::appendOutputString() method
Tweaked the pluralization of the newly-added
ParserOutput::appendOutputString() method (now ::appendOutputStrings()
and ::getOutputStrings()), and name of the ParserOutputStrings class
(now ParserOutputStringSets), in an effort to continue repainting
bikesheds until the color is juuuust right.

Also extended the new method to cover ::addModules() and ::addModuleStyles()
and added support for these string sets in ::collectMetadata().

(These methods and the enumeration class were originally added in
b2cfa31eb6173e9f5e8607eadd126c33f8ce440b.)

Depends-On: I8bdffa55498d90e990af5bfc3332e3028b0a3539
Change-Id: Ibd41485d5db7779f01642e2144c50ed49d409812
2023-07-28 12:10:56 -04:00
C. Scott Ananian
0b92c4bedb Record Parsoid version in extension data to allow rollback if necessary
This allows any bad cached parses due to a train deploy to be selectively
rolled back in the RejectParserCacheValue hook, which provides some
operational insurance against corrupted caches.  The version is also
added to the debug information in the HTML footer to aid diagnosis
of any issue in real time.

Depends-On: I3d3caabd959c1ba16f4dc702c2eae38d5d4dcb14
Change-Id: Ibb37a82ec0ce764aefd8c9fab2868073a66301ec
2023-07-27 19:02:24 -04:00
Isabelle Hurbain-Palatin
b2cfa31eb6 Add append/getOutputString to ParserOutput
This aims at providing an interface similar to setOutputFlag for string
sets, such as the ones used in CSP properties.

Change-Id: I6f103bd88802e66611e483403a2f8a540d54aae9
2023-07-27 11:37:11 +02:00
thiemowmde
3c631a59f2 More specific array type hints in ParserOutput/OutputPage
Change-Id: I7dbecebb8b26e57afda13f46d3b895f085c4e95e
2023-07-03 15:52:18 +02:00
Subramanya Sastry
0e9656e6da Add return type to getIndicators() in ParserOutput & OutputPage
This is in preparation for changes on the Parsoid side to make
sure its signature is compatible with the ContentMetadataCollector
interface there.

Change-Id: Ife4ae81dbc304097da7dcba40b143f7030b959f3
2023-06-02 16:13:01 +05:30
jenkins-bot
e1c1632d9c Merge "ParserOutput: Ensure page title is updated after merging properties" 2023-05-11 18:23:20 +00:00
Umherirrender
e04d3a28f6 Replace internal Hooks::runner
The Hooks class contains deprecated functions and the whole class is
going to get removed, so remove the convenience function and inline the
code.

Bug: T335536
Change-Id: I8ef3468a64a0199996f26ef293543fcacdf2797f
2023-05-11 06:17:38 +00:00
Subramanya Sastry
632481c382 ParserOutput: Ensure page title is updated after merging properties
Eventually we should merge the "title text" and "display title" in
ParserOutput (T293514) but for now mirror the logic in
ParserOutput::mergeHtmlMetadataFrom() and update the title text
from the source if it hasn't already been set in the destination.
This patch ensures that after page properties are merged during
metadata collection, the title text is suitably updated if the
'displaytitle' property is set.

This will let Parsoid pass displaytitle (metadata) tests in integrated
mode since Parsoid relies on merging metadata from multiple ParserOutput
objects (in the DataAccess object that is used to expand templates, etc.)

Once this patch is merged, Parsoid patches may start failing CI till
we submit a patch there to fix up the integrated test failures list
since some previously failing tests may now pass.

Bug: T293514
Bug: T294621
Change-Id: Ia673f1261ccd03caf455122b71cfb9769b02f22e
2023-05-10 08:53:41 +00:00
jenkins-bot
c5152db020 Merge "Remove back-compat for <editsection>" 2023-04-28 15:59:12 +00:00
Subramanya Sastry
3e297c43ad Fix breakages generating TOC for API Help pages
* TOCData in Parsoid expects to process non-string-key indexed arrays.

* Don't use 'null' as the default for maxtoclevel to ensure that
  TOC is always displayed even when it isn't passed in as a param
  by callers.

* Follows up on 05535be6 which only partially fixed the breakage
  caused by 153a4157 and 439656e0

Bug: T334551
Change-Id: I8883b58574ea8ed0566de2c44dba3408a47d2d0c
2023-04-12 15:37:03 -05:00
jenkins-bot
90997943f9 Merge "Parser: Remove back-compatibility NO_TOC_CONVERSION code" 2023-03-27 20:43:53 +00:00
C. Scott Ananian
cfd9c516e1 Allow setting a ParserOption to generate Parsoid HTML
This is an initial quick-and-dirty implementation.  The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.

Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.

This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing.  This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.

We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata.  Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa.  T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.

Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2023-03-26 21:46:05 -04:00
C. Scott Ananian
8aae904254 Parser: Remove back-compatibility NO_TOC_CONVERSION code
The TOC used to be language-converted in ParserOutput::getText(), but
it wasn't possible to apply custom rules defined in the wikitext
article body at ::getText() time.  Remove the various hacks that we'd
added in an attempt to do so, which were made unnecessary by
I321cd31dae64bbf845d53282e5d28a55bc4ec319.

Bug: T306862
Change-Id: Ib12cd02e9ade91d5794462e8833f2aa3b45a51f2
2023-03-24 22:14:42 +00:00
C. Scott Ananian
99e9d4927f Remove back-compat for <editsection>
The tag has been <mw:editsection> since at least 2011
(f0fd318a4e), we no longer need to
include the ancient <editsection> variant in our regexp and
test cases.

Change-Id: I5fd783556810ea13b07a69066ea6762d1a1863e1
2023-03-15 13:53:01 -04:00
jenkins-bot
6de76f1fad Merge "Add ParserOutput::getLanguage()" 2023-03-13 14:18:47 +00:00
C. Scott Ananian
29853113f7 Deprecate ParserOutput::{get,set}TOCHTML()
No uses in deployed code outside mediawiki-core:

 https://codesearch.wmcloud.org/deployed/?q=%5Bgs%5DetTOCHTML%5C%28&i=nope&files=&excludeFiles=&repos=

Bug: T293513
Change-Id: I3fd82150ac581afbeb94f401672702063586fff0
2023-03-10 20:34:33 -05:00
C. Scott Ananian
183a6da420 Add ParserOutput::getLanguage()
Provide a way for backend code to determine the primary language of a
ParserOutput, eg for setting the Content-Language header of an API
response.

This is read-only and backed by extension data at the moment for
transition purposes; if this API sticks we'll graduate it to a
"real" property in the future, with appropriate serialization
to/from JSON (T303329).

Similarly, this patch only includes the most basic code to handle
the various ParserOutput merge cases in
ParserOutput::merge{Internal,Html,Tracking}MetaDataFrom(),
ParserOutput::collectMetadata(), and
OutputPage::addParserOutput{Content,Metadata,Text,}(); mostly
inherited from the fact that the storage is backed by extension
data at the moment.

Generally only the "top-level" parser output gets to set the
primary language; we'll presumably need to ensure that the
language is consistent during merge.

Change-Id: I767daba22805a877d9b806fd77334e508902844b
2023-03-10 18:42:29 -05:00
C. Scott Ananian
d2446a77dd Deprecate ParserOutput::getCategories()
This undocumented method returns a reference to ParserOutput's private
storage array, yet very few callers actually require a reference or try
to use this to mutate the internal storage.  Further, the keys of the
array can be converted to `int` when the category names are numeric,
which can further confuse users.  Most users found through codesearch
can/should use ::getCategoryNames() instead.

Add a new ::getCategorySortKey() method to provide access to the sort
keys for those few callers who require them, in a manner which doesn't
expose that the internal `mCategories` array stores numeric category
names as 'int'.

Bug: T331727
Change-Id: I8dc85e76bfbb9ed49a603d990c14b7ee798bd821
2023-03-10 10:02:42 -05:00
C. Scott Ananian
e34b25a09f Ensure categories are returned as strings
Numeric category strings like '1' are converted to ints when they are
used as array keys.  Convert back to strings as needed to ensure this
doesn't surprise any clients.

Bug: T331084
Change-Id: Ib39707216d213e414c09226a6378047ffaf43892
2023-03-10 10:02:23 -05:00
James D. Forrester
ad06527fb4 Reorg: Namespace the Title class
This is moderately messy.

Process was principally:

* xargs rg --files-with-matches '^use Title;' | grep 'php$' | \
  xargs -P 1 -n 1 sed -i -z 's/use Title;/use MediaWiki\\Title\\Title;/1'
* rg --files-without-match 'MediaWiki\\Title\\Title;' . | grep 'php$' | \
  xargs rg --files-with-matches 'Title\b' | \
  xargs -P 1 -n 1 sed -i -z 's/\nuse /\nuse MediaWiki\\Title\\Title;\nuse /1'
* composer fix

Then manual fix-ups for a few files that don't have any use statements.

Bug: T166010
Follows-Up: Ia5d8cb759dc3bc9e9bbe217d0fb109e2f8c4101a
Change-Id: If8fc9d0d95fc1a114021e282a706fc3e7da3524b
2023-03-02 08:46:53 -05:00
jenkins-bot
9a96857757 Merge "Reorg: Move HTML-related classes out of includes/ to Html/" 2023-02-21 15:37:53 +00:00