Commit graph

62 commits

Author SHA1 Message Date
Umherirrender
e662614f95 Use explicit nullable type on parameter arguments
Implicitly marking parameter $... as nullable is deprecated in php8.4,
the explicit nullable type must be used instead

Created with autofix from Ide15839e98a6229c22584d1c1c88c690982e1d7a

Break one long line in SpecialPage.php

Bug: T376276
Change-Id: I807257b2ba1ab2744ab74d9572c9c3d3ac2a968e
2024-10-16 20:58:33 +02:00
James D. Forrester
a5387c7c20 Namespace all remaining classes in includes/parser
Bug: T353458
Change-Id: If02cc9b1ff78e26c1cf8c91ee4695845eb133829
2024-10-15 23:54:32 +01:00
Ebrahim Byagowi
825f7b5e13 Avoid use of deprecated wfExpandUrl in ExtractBody
Change-Id: Ic68ecf6654c8e73a643adce2ef5dccb53b7a632a
2024-09-09 19:38:04 +03:30
Isabelle Hurbain-Palatin
cd3240f044 Introduce runOutputPipeline and clone by default
This is the third patch of a series of patches to remove
ParserOutput::getText() calls from core. This series of patches should
be functionally equivalent to I2b4bcddb234f10fd8592570cb0496adf3271328e.

Here we temporarily introduce runOutputPipeline in ParserOutput. It
creates and runs the pipeline with default options, and is called by
getText. (This is not entirely truthful because we go through a
runPipelineInternal transient method for null-argument-passing reasons,
but let's not over-complicate this commit message.)

getText is responsible for maintaining the current behaviour,
that is "disallow the cloning of the ParserOutput and putting text back
to as it was" to mitigate T353257. As we get rid of getText, this
behaviour should be moved, if necessary, to the caller site.

The new method is currently added to ParserOutput so that further
refactorings are, for the moment, simpler. It will eventually be moved
to another place within the Content framework.

We also rename 'suppressClone' to 'allowClone' (which is actually its
negation) to avoid multiple levels of negations that make the code
confusing. Note that the default value of 'allowClone' is true, and is
currently overriden in two places: getText and
OutputPage::getParserOutputText (which calls the pipeline directly and
not through ParserOutput).

Bug: T293512
Bug: T371022
Change-Id: Ibf04af1079aaa1934dc78685b00e636ff4d38a9a
2024-09-06 19:06:38 +00:00
C. Scott Ananian
1faf18d657 Ensure that isParsoidContent is initialized in OutputTransformPipeline
The refactorings in I45951a49e57a8031887ee6e4546335141d231c18 replaced
calls to ParserOutput::getText() with direct invocations of the pipeline,
including in OutputPage::getParserOutputText().  However, the direct
invocation skipped the implicit initialization of the options array
previously done in ParserOutput::getText().  Ensure that the options
array gets appropriate default values; in particular 'isParsoidContent'
is expected to always be set.

Bug: T293512
Bug: T373405
Change-Id: Ib8d540b4221f7c00f6047706c4e3bfd88a2cb8cc
2024-08-26 23:42:30 +00:00
jenkins-bot
d5bacc16f2 Merge "Make ContentDOMTransformStage not Parsoid specific" 2024-08-26 22:13:20 +00:00
Arlo Breault
03f9785ad4 Make ContentDOMTransformStage not Parsoid specific
Previously, it assumed Parsoid content and loaded/stored data attributes
unconditionally.  The result being that, if this stage was subclassed to
be used an non-Parsoid pipeline, the dom would undesirably be dirtied
with Parsoid ids or data-parsoid attributes.

Change-Id: I2f1af43d9c39140ce215e2145e51cc3b02f68923
2024-08-14 14:23:44 -04:00
James D. Forrester
bc662aec9b Move Language and friends into Language namespace
Bug: T353458
Change-Id: Id3202c0c4f4a2043bf97b7caee081acab684155c
2024-08-10 13:36:30 +02:00
C. Scott Ananian
b3ac045497 HandleParsoidSectionLinks: also run this pass if COLLAPSABLE_SECTIONS
Bug: T371336
Change-Id: Ieccddc229c39f65de6f2bba6364f933592686ade
2024-07-30 17:22:55 -04:00
Arlo Breault
44580945ed Add OutputPipelineStages from extensions
Adds an experimental configuration to allow extensions to define
OutputPipelineStages to include in the DefaultOutputPipeline.

There are a lot of open questions about this api, like ordering of
execution, but adding it @experimental will help surface the
requirements.

Bug: T370541
Needed-By: I6dc92af0611c680b6e55605a7c9ff8a3fc1dfa26
Change-Id: I64baea40a1687c7a06fbcda9efe9f9a159b0ae8d
2024-07-25 11:44:17 -04:00
Isabelle Hurbain-Palatin
332f1c0702 Update defaults for AddWrapperDivClass
The rest of the pipeline is trying to have the same defaults in the
pipeline built for (what is still) getText than the default options of
the pipeline stages. This is currently not the case for
AddWrapperDivClass; this patch fixes that.

Change-Id: I791d679a7b7309dfeb90c9736ef0e4848b038e08
2024-07-19 18:56:00 +02:00
Isabelle Hurbain-Palatin
91ab9807dd OutputTransform: Handle skipped tests in HydrateHeaderPlaceholders.php
A comment in I8744382dd24b28c623d0dc6569f800fb5489e6c1 mentions that two
tests are skipped. This patch fixes one of these skips, and makes the
other one more explicit.

Change-Id: Id5680fc163a9bfacfe797af619e40032cdee38b1
2024-07-11 21:59:22 +00:00
jenkins-bot
e6de0b7322 Merge "Fix bundle reinjection of ContentDOMTransformStage" 2024-06-24 10:30:29 +00:00
Umherirrender
c08b492d75 Use namespaced classes (3)
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: Ia35b2d3105880631dd26ec974068b000ac7f4b6b
2024-06-16 20:26:43 +02:00
Isabelle Hurbain-Palatin
f56cee51d9 Fix bundle reinjection of ContentDOMTransformStage
When re-injecting the page bundle to the newly created ParserOutput, we
were omitting the version, headers and contentmodel data of said page
bundle reinjection. This patch fixes that.

Note that it will silence places where getText should typically not be
called, but that's a larger problem that needs to be addressed on the
calling places, and doesn't detract from the fact that we needed to fix
this loss of information on the bundle anyway.

Bug: T365433
Depends-On: I2a87a8233b9e42cbafdba63bdf513abe00d826ce
Change-Id: I7f57ddc76b9d3b24226f8b5da1b70bc83134856f
2024-06-11 14:48:28 +02:00
C. Scott Ananian
292709cc13 Use $stage::CONSTRUCTOR_OPTIONS in DefaultOutputPipelineFactory
Rather than have DefaultOutputPipelineFactory::CONSTRUCTOR_OPTIONS be a
union of all the options needed by all the stages, allow each stage to
define its own CONSTRUCTOR_OPTIONS and pass a Config object to the
DefaultOutputPipelineFactory service.

In the process, move the $options and $logger properties into the
abstract superclass, since they are passed to every stage.

Bug: T363764
Followup-To: I64aeb81b395ba84e1d839dfbd31decf16c337cd0
Change-Id: I7d386b22c7d8e99b6dfe4cf798069914ac9af373
2024-06-10 20:53:21 -04:00
Arlo Breault
6011792afa Refactor DI in OutputTransform stages
Bug: T363764
Change-Id: I64aeb81b395ba84e1d839dfbd31decf16c337cd0
2024-06-10 16:30:06 -04:00
Arlo Breault
276fc1608a Inject MobileContext in DefaultOutputPipelineFactory
Change-Id: I613893fa236be956a4850a52a03a40e620c7ce64
2024-06-10 15:11:01 -04:00
Arlo Breault
66020909a4 Get mobile url for Parsoid's baseHref
The legacy parser does not run ExpandToAbsoluteUrls unless it's doing
?action=render.  ExpandToAbsoluteUrls doesn't work for mobile urls,
which seems to be captured in T171398 / T195494.  Since relative urls
aren't resolved in legacy output though, the browser uses the mobile
url.

Parsoid, however, does ExtractBody which has its own expandRelativeAttrs
pass, which resolves relative urls against the baseHref in the document
head.  The baseHref is taken from MainConfigNames::Server, which
presumably suffers the same issue as the above task.  But also maybe MFE
is transforming cached html, where the non-mobile baseHref is desirable.

In any case, to produce the same urls as the legacy parser, transform
the baseHref to one that conforms with mobile url template.

Bug: T365483
Change-Id: I32800f5ea848d70b6ef67ec9102c432b9626afcb
2024-06-10 15:11:01 -04:00
jenkins-bot
c2b336d10a Merge "Alias Parsoid DOM nodes to PHP DOM implementation" 2024-05-22 14:53:15 +00:00
C. Scott Ananian
f856992ad9 Alias Parsoid DOM nodes to PHP DOM implementation
Parsoid abstracts the specific DOM implementation it is using, in
practice (currently) using subclasses of the built-in \DOMDocument
classes using the \DOMDocument::registerNodeClass() mechanism.
Parsoid's own phan configuration uses stubs for its abstract DOM
classes to encourage the use of "standard" DOM methods -- but core
doesn't use Parsoid's phan configuration and doesn't really understand
the way that ::registerNodeClass() works and so get confused by code
such as:

   $el = $document->createElement('div');

In actual practice this is a Wikimedia\Parsoid\DOM\Document (a
subclass of \DOMDocument) which creates a
Wikimedia\Parsoid\DOM\Element (a subclass of \DOMElement) via the
::registerNodeClass() mechanism, but phan sees only the base
\DOMDocument::createElement() signature and assumes this creates a
\DOMElement *not* a Wikimedia\Parsoid\DOM\Element.  If you do
"element-y" things on this, phan has no complaints, but if you pass
this back to a Parsoid method which expects the abstract
Wikimedia\Parsoid\DOM\Element type then phan (spuriously) complains.
This type error can be hard to understand.

Workaround this issue by simply aliasing Parsoid's abstract DOM types
to the built-in \DOMDocument etc types.  The alternative would be to
use Parsoid's stubs, but it seems cleaner (for now) to avoid reaching
into

  vendor/wikimedia/parsoid/.phan/stubs

to get them.

Change-Id: I90b33c5d65bde1582be9a452a144808b6d53d914
2024-05-22 10:35:02 -04:00
Isabelle Hurbain-Palatin
da6f716c41 Fix serialization errors in PageBundle extensiondata
When going through a ContentDOMTransformStage, we try to move the
PageBundle when transforming the document from and to DOM. In the
current version of this code, this adds DataParsoid, a non-serializable
class, to ExtensionData, which breaks on ParserCache storage in later
steps.
This patch is pretty hacky, but it transforms the PageBundle structure
back to a stdClass so that it can be re-serialized before cache
insertion. The added test fails without this patch.
Hopefully we'll get rid of these hacks when using a HTMLHolder later.

Bug: T365036
Change-Id: Icc74edd43ea5098faebc21a084b6d483d6ab99d1
2024-05-17 09:47:18 -04:00
jenkins-bot
7934a84307 Merge "Add Parsoid HTML version to wrapper div" 2024-05-14 04:36:58 +00:00
Isabelle Hurbain-Palatin
2c0fe93193 Fix the loss of ParserOutput pointer in ContentDOMTransformStages
When running a ContentDOMTransformStage, we effectively clone the input
ParserOutput, which is in contradiction with the current expectations of
the pipeline. This patch slightly modifies the logic by making it
possible to apply a PageBundle data to an existing ParserOutput without the
necessity to create a new one.

Bug: T364597
Change-Id: I633fc33485f22cf645acd41650a6983df3b0a534
2024-05-10 17:28:23 +02:00
C. Scott Ananian
5735f94648 Add Parsoid HTML version to wrapper div
Followup-To: I941d31479eebb12ea1f4dcdb0a1737033ddc8ac1
Depends-On: I95be56e3662f9cffd1eb5c03bbc0379d4e0a9ee0
Change-Id: I4aaa4b9e800271c2bcfc2fd74f09853b31ee6859
2024-05-06 15:56:02 -04:00
jenkins-bot
fb6ad0a08c Merge "Localization output transform" 2024-05-06 19:47:43 +00:00
Isabelle Hurbain-Palatin
8de2e66ca7 Localization output transform
This is an output transform to resolve the mw:I18n and mw:LocalizedAttrs
to their localized forms.

Bug: T358191
Change-Id: Id32bc05ff72eb2d9fba7f8c2f192c9f7812cbc70
2024-05-06 15:24:38 -04:00
Bartosz Dziewoński
f0c7fa9234 Move section edit links outside headings (new heading HTML)
Legacy parser can now output headings using a more accessible markup,
which is also identical to the markup used by the Parsoid parser.

Changes to client-side JS and CSS necessary to support the new markup
have already been merged in earlier commits.

includes/skins/Skin.php
includes/ServiceWiring.php
* Define a new skin option, 'supportsMwHeading', which can be used
  to toggle the new markup per-skin.
* Update the built-in fallback skin to enable it. This affects the
  output in parser tests.

docs/config-schema.yaml
includes/config-schema.php
includes/config-vars.php
includes/MainConfigNames.php
includes/MainConfigSchema.php
* Add a new configuration setting, 'ParserEnableLegacyHeadingDOM',
  which can be used to toggle the new markup per-site.

includes/OutputTransform/Stages/HandleSectionLinks.php
* Output new heading HTML for skins that enabled the option.

tests/*
* Duplicate parser tests that cover heading generation to cover both
  new and old markup. Update other parser tests to use new markup.
* Add some unit and integration tests for the behavior of the skin
  option and some parser tests for edge cases of the new markup.

Bug: T13555
Change-Id: I1180169a8e83af834c2984ba16089e6277f2a8dd
2024-05-06 12:25:33 -04:00
jenkins-bot
b671e574eb Merge "Add ParserOptions::setCollapsibleSections()" 2024-04-29 21:17:15 +00:00
C. Scott Ananian
7738554eee [OutputTransform] Add data-mw-parsoid-version to wrapper div
Adding a data-mw-parsoid-version attribute to the wrapper div helps to
unambiguously mark parsoid-generated output in a way which is compatible
with CSS rules and client-side JavaScript.

By embedding the current version of parsoid in the data attribute,
sophisticated CSS rules can match against a specific version of
Parsoid in order to facilitate proper behavior; for example:

    div[data-mw-parsoid-version^="0.20.0"]

This could be useful in deployment scenarios where the parser cache
might contain content generated by older or newer versions of Parsoid,
for roll-forward or roll-back deployment scenarios, respectively.

Bug: T363378
Change-Id: I941d31479eebb12ea1f4dcdb0a1737033ddc8ac1
2024-04-29 12:36:50 -04:00
C. Scott Ananian
8d031bcf87 Add ParserOptions::setCollapsibleSections()
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing.  This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.

Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.

Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
2024-04-29 12:11:09 -04:00
Subramanya Sastry
d47c70ddac ExtractBody: Use page title recorded in ParserOutput
* Followup to 9a466310
* I had previously added page title info to ParserOutput as part of
  6e5413b1, but while working on 9a466310, we didn't realize that.
* Removed urldecode(..) since output of Title::getPrefixedDBKey
  isn't urlencoded and urldecode converts "+" into " ". A new test
  ensures that edge case works properly.
* Simplify testing + add additional test to ensure title normalization
  doesn't trip up the transform.

Bug: T358242
Change-Id: I9a0cb00bdf9d104a4b327d72b1ec94cf509883a2
2024-04-19 18:31:08 +05:30
Subramanya Sastry
9a46631029 ExtractBody: Convert page-internal link fragments to pure fragment urls
* This ensures that when you have query params like (?useparsoid=1),
  all cite links no longer take you to the non-Parsoid page but
  resolve internally.

* Additionally, this also unbreaks reference previews in local testing
  - not yet sure if this will fix all breakage in production.

* We don't have ready access to the title string and so this patch
  extracts it from a link tag in the <head> of Parsoid HTML. That is
  guaranteed to be correct and reliably present.

  But, if in the future, this changes (whether by adding it to
  ParserOptions, ParserOutput, or the $opts array), we can use that
  directly.

* Added new unit tests that verify the new expectations.

Bug: T358242
Change-Id: Iaf482cc9803564b4cf4ae04f975573f61ff3b0e4
2024-04-12 15:36:01 +05:30
jenkins-bot
9517fdca82 Merge "HandleSectionLinks: Remove old debug logging for resolved bug" 2024-03-12 18:34:28 +00:00
Bartosz Dziewoński
0d99a1c445 HandleSectionLinks: Remove old debug logging for resolved bug
This is just a cleanup change. The exception should never happen,
but if it does, this can be reverted.

Change-Id: I26a7c4105d39d83015c09b779a2de3fd1ddacec1
2024-03-07 23:09:25 +01:00
Bartosz Dziewoński
94ac2ba845 Deprecate Linker::generateTOC() and related methods
Move the code to private methods in the only place that needs it.

Change-Id: I7aa038e055adc1aea9faafd17b86e304ee2ca758
2024-03-07 19:57:35 +00:00
Bartosz Dziewoński
87ac02d3a6 Deprecate Linker::makeHeadline()
Move the code to a private method in the only place that needs it.

Change-Id: Ie68a5324b2c789f44ffc495d05eb6957234cb9c8
2024-03-07 19:57:27 +00:00
Bartosz Dziewoński
c50d43ff65 HandleSectionLinks: Fix handling headings with raw > in attributes
Follow-up to Ibce512b3c4a52f74b2d2124f0159e306f2689ea5.

HEADING_REGEX will now correctly match opening tags when one of the
attributes contains an unencoded > character.

In a better world, this would not use regular expressions. However,
while implementing it as a DOM transformation is easy enough, doing so
causes never-ending test failures due to changes in HTML serialization,
so we gave up on it for now in after discussion on the original patch.

Bug: T358810
Change-Id: Ibad4b29a988c2a4911ebe6512791042c46dd1a9b
2024-03-04 21:28:44 +01:00
C. Scott Ananian
55be7b1f09 Don't double-wrap headings when using DiscussionTools
Discussion Tools runs *before* this stage runs, and so we end up
wrapping headings which have already been wrapped by discussion tools.
Check for an existing wrapper to avoid this.

In the future, we will probably add a new post-cache transform hook
which is at the very *end* of the pipeline, instead of in the middle,
to avoid this sort of ordering dependency between extensions and core.

Bug: T357826
Change-Id: I8cd28a3b42e55844be1258d639e605862952806f
2024-02-17 15:31:13 -06:00
Subramanya Sastry
e55cc517da Move Parser to Mediawiki\Parser namespace
Bug: T166010
Co-Authored-By: Daimona Eaytoy <daimona.wiki@gmail.com>
Co-Authored-By: James Forrester <jforrester@wikimedia.org>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I79b4e732c45095eedbaa80afa5eb7479b387ed8a
2024-02-16 09:18:38 -05:00
jenkins-bot
4d6589dc4a Merge "HandleSectionLinks: Remove warning when we don't find attributes" 2024-02-15 20:41:03 +00:00
C. Scott Ananian
b01eb624c4 [OutputTransform] Add section edit links to Parsoid output
Bug: T269630
Change-Id: I9d5fb6348609642ad94743cc5dae81ce608be99d
2024-02-15 13:09:02 -05:00
C. Scott Ananian
28a3371382 [OutputTransform] Remove broken and unused 'bodyContentOnly' option
This was formerly used by the REST api, but instead that code just
uses ParserOutput::getRawText() when it needs the full HTML document.
This option has been broken, with various passes like RenderDebugInfo
and AddWrapperDiv adding content in inappropriate places if
bodyContentOnly was false.

Change-Id: Ib45f95ded59c81c16d61803f977d1edbfe82b262
2024-02-15 13:05:53 -05:00
C. Scott Ananian
ff053ec155 [OutputTransform] Improve ContentDOMTransformStage
Make ContentDOMTransformStage handle Parsoid markup with PageBundle
information embedded in the ParserOutput.

Much of the complexity of this code should move to either Parsoid's
ContentUtils or else into the HtmlHolder abstraction (T347062).

Change-Id: Ib35ae38d84adc7df613d4c7de8930ed80e535634
2024-02-15 13:05:53 -05:00
jenkins-bot
fc685448e1 Merge "Namespace Message, move to appropriate directory" 2024-02-15 15:51:23 +00:00
Bartosz Dziewoński
87b06ee148 HandleSectionLinks: Remove warning when we don't find attributes
I realized that this code path is also triggered by a special page
transclusion that outputs headings, e.g. `{{Special:RecentChanges}}`.
It doesn't seem worth it to try to handle all these cases distinctly.

Follow-up to b26db1f866.

Change-Id: I389ea9210fcc184f41b6731409331dbd3d34d2ca
2024-02-15 12:40:56 +00:00
Arlo Breault
d2f4e8a456 Apply relative attr expansion to indicators
Bug: T357573
Change-Id: If80d0df6c927d0f6981ad56d01751f1aeaba83d3
2024-02-14 18:15:53 -05:00
Arlo Breault
269c93d3ac Resolve relative resource attributes as well
The resource attribute is used in read views for magnify links and
imagemap description links.  See Id46d1b2ab1af3baebff13e10f1485f3cfd9a4b37
and I20130fd39135dfd5074590ee9c2b6e01693384e4

Bug: T357573
Change-Id: I974701ba9eb77e8d0abc894d1091fcdd63b84684
2024-02-14 18:15:36 -05:00
James D. Forrester
eeb5a740b3 Namespace Message, move to appropriate directory
Bug: T353458
Change-Id: I088cbc53fbcdb974e5b05b45a62e91709dacc024
2024-02-14 15:10:36 -05:00
Bartosz Dziewoński
b26db1f866 Move section heading formatting to post-cache transform (forward-compat)
Split off from I4eae18d9d16f54391daba0de82ad05e50f07f9eb for
forward-compatibility, in case that patch needs to be reverted.
See that change for tests and explanation.

Bug: T13555
Change-Id: Ibce512b3c4a52f74b2d2124f0159e306f2689ea5
2024-02-09 23:45:42 +00:00