Commit graph

43 commits

Author SHA1 Message Date
jenkins-bot
d5bacc16f2 Merge "Make ContentDOMTransformStage not Parsoid specific" 2024-08-26 22:13:20 +00:00
Arlo Breault
03f9785ad4 Make ContentDOMTransformStage not Parsoid specific
Previously, it assumed Parsoid content and loaded/stored data attributes
unconditionally.  The result being that, if this stage was subclassed to
be used an non-Parsoid pipeline, the dom would undesirably be dirtied
with Parsoid ids or data-parsoid attributes.

Change-Id: I2f1af43d9c39140ce215e2145e51cc3b02f68923
2024-08-14 14:23:44 -04:00
James D. Forrester
bc662aec9b Move Language and friends into Language namespace
Bug: T353458
Change-Id: Id3202c0c4f4a2043bf97b7caee081acab684155c
2024-08-10 13:36:30 +02:00
C. Scott Ananian
b3ac045497 HandleParsoidSectionLinks: also run this pass if COLLAPSABLE_SECTIONS
Bug: T371336
Change-Id: Ieccddc229c39f65de6f2bba6364f933592686ade
2024-07-30 17:22:55 -04:00
Isabelle Hurbain-Palatin
91ab9807dd OutputTransform: Handle skipped tests in HydrateHeaderPlaceholders.php
A comment in I8744382dd24b28c623d0dc6569f800fb5489e6c1 mentions that two
tests are skipped. This patch fixes one of these skips, and makes the
other one more explicit.

Change-Id: Id5680fc163a9bfacfe797af619e40032cdee38b1
2024-07-11 21:59:22 +00:00
Derick Alangi
bf57de5aed
[tests] OutputTransform: Make providers static methods
In the spirit of T332865, data providers should be made static methods
for its performance benefits.

Follow-up on: I7da0823d4686238003579a.

Change-Id: Ia08fcdd2c27f5cdc13ca8d12132a90c259ca2447
2024-07-06 14:01:50 +02:00
Timo Tijhof
58ce413a04 OutputTransform: Minor test clean up
* Use $this->assertEquals as per PHPUnit docs and other code in
  this repository.

* Remove redundant doc comments for data providers, as per
  current PHPCS presets and matching other code in the repo.
  Our latest PHPCS no longer requires native+doc blocks when there is
  no added information in the doc block. Since data providers aren't
  explicitly called by anyone, presumably no audience for it either.

* Widen `@covers` tags. Rationale in other commits
  at https://gerrit.wikimedia.org/r/q/owner:Krinkle+is:merged+message:Widen

Follows-up 8de2e66ca7.

Change-Id: I7da0823d4686238003579afe425a635541e9baf6
2024-07-05 19:23:57 +01:00
Isabelle Hurbain-Palatin
69b9733bce Use stripParsoidIds from Parsoid
Now that use Ib263439ae221232ffe0902a0c58d155402fb7a17 is merged,
we can use it instead of keeping it in ParsoidLocalizationTest.
It solves the issue pointed out in
I7da0823d4686238003579afe425a635541e9baf6.

Change-Id: I8744382dd24b28c623d0dc6569f800fb5489e6c1
2024-07-05 13:56:24 +02:00
C. Scott Ananian
292709cc13 Use $stage::CONSTRUCTOR_OPTIONS in DefaultOutputPipelineFactory
Rather than have DefaultOutputPipelineFactory::CONSTRUCTOR_OPTIONS be a
union of all the options needed by all the stages, allow each stage to
define its own CONSTRUCTOR_OPTIONS and pass a Config object to the
DefaultOutputPipelineFactory service.

In the process, move the $options and $logger properties into the
abstract superclass, since they are passed to every stage.

Bug: T363764
Followup-To: I64aeb81b395ba84e1d839dfbd31decf16c337cd0
Change-Id: I7d386b22c7d8e99b6dfe4cf798069914ac9af373
2024-06-10 20:53:21 -04:00
Arlo Breault
6011792afa Refactor DI in OutputTransform stages
Bug: T363764
Change-Id: I64aeb81b395ba84e1d839dfbd31decf16c337cd0
2024-06-10 16:30:06 -04:00
Arlo Breault
276fc1608a Inject MobileContext in DefaultOutputPipelineFactory
Change-Id: I613893fa236be956a4850a52a03a40e620c7ce64
2024-06-10 15:11:01 -04:00
Isabelle Hurbain-Palatin
da6f716c41 Fix serialization errors in PageBundle extensiondata
When going through a ContentDOMTransformStage, we try to move the
PageBundle when transforming the document from and to DOM. In the
current version of this code, this adds DataParsoid, a non-serializable
class, to ExtensionData, which breaks on ParserCache storage in later
steps.
This patch is pretty hacky, but it transforms the PageBundle structure
back to a stdClass so that it can be re-serialized before cache
insertion. The added test fails without this patch.
Hopefully we'll get rid of these hacks when using a HTMLHolder later.

Bug: T365036
Change-Id: Icc74edd43ea5098faebc21a084b6d483d6ab99d1
2024-05-17 09:47:18 -04:00
jenkins-bot
fb6ad0a08c Merge "Localization output transform" 2024-05-06 19:47:43 +00:00
Isabelle Hurbain-Palatin
8de2e66ca7 Localization output transform
This is an output transform to resolve the mw:I18n and mw:LocalizedAttrs
to their localized forms.

Bug: T358191
Change-Id: Id32bc05ff72eb2d9fba7f8c2f192c9f7812cbc70
2024-05-06 15:24:38 -04:00
Bartosz Dziewoński
f0c7fa9234 Move section edit links outside headings (new heading HTML)
Legacy parser can now output headings using a more accessible markup,
which is also identical to the markup used by the Parsoid parser.

Changes to client-side JS and CSS necessary to support the new markup
have already been merged in earlier commits.

includes/skins/Skin.php
includes/ServiceWiring.php
* Define a new skin option, 'supportsMwHeading', which can be used
  to toggle the new markup per-skin.
* Update the built-in fallback skin to enable it. This affects the
  output in parser tests.

docs/config-schema.yaml
includes/config-schema.php
includes/config-vars.php
includes/MainConfigNames.php
includes/MainConfigSchema.php
* Add a new configuration setting, 'ParserEnableLegacyHeadingDOM',
  which can be used to toggle the new markup per-site.

includes/OutputTransform/Stages/HandleSectionLinks.php
* Output new heading HTML for skins that enabled the option.

tests/*
* Duplicate parser tests that cover heading generation to cover both
  new and old markup. Update other parser tests to use new markup.
* Add some unit and integration tests for the behavior of the skin
  option and some parser tests for edge cases of the new markup.

Bug: T13555
Change-Id: I1180169a8e83af834c2984ba16089e6277f2a8dd
2024-05-06 12:25:33 -04:00
Timo Tijhof
c02513c97e phpunit: Fix tests relying on implicit wgScript/wgArticlePath
A number of tests have hardcoded expections that pass only in WMF CI
where Quibble has LocalSettings.php with $wgScript and $wgArticlePath
set a certain way.

We could fix these by adding setMwGlobals() in their tests, as we
often do, but these are so often forgotten that I'd rather we just
add them to TestSetup.php so that it is simply impossible to write a
test that that passes locally for you (if you have the same config)
but not for someone else.

There is a larger project in there somewhere about expanding this
slowly such that we basically only pluck DB-settings and extension
enablement from LocalSettings and otherwise run the tests with the
default settings in PHPUnit. Pretty much by definition, any (other)
setting you have in LocalSettings is irrelevant because it either:
1. has no effect on the test (majority, harmless either way),
2. has a custom default via TestSetup.php (which has precedence over
   LocalSettings.php),
3. is relevant to the code being tested and the test case correctly
   calls setMwGlobals() to ensure a consistent value during test.
4. is relevant to the tested code but has no override, thus only
   passes if you happen to have the "right" value set for it
   (undesirable).

Case 4 is already categorically impossible for the most common config
settings that influence random code because we give them a value
in TestSetup.php. This patch expands that to include $wgScript
and $wgArticlePath. Perhaps in the future we can think about a way
to do this automatically by either re-applying MainConfigSchema
(sans db settings) or by only selectively applying LocalSettings.php
in the first place.

This patch follows-up I072ddf89562fe, which added a test case in
WikitextContentHandlerIntegrationTest.php that assumed "/index.php"
as the value of $wgScript. This passes in WMF CI since Quibble uses
that value, but the tests failed in most local development installs
since those tend to use "/w" instead.

Rather than one-off fixing that one test with overrideConfigValues(),
switch to a more general fixture, since the precise values don't
matter for this test.

Bug: T349087
Bug: T277470
Change-Id: If4304b7ca4a838bd892d4516a0b5c6dfbc30986e
2024-05-05 00:00:01 +00:00
Isabelle Hurbain-Palatin
03c4ffe137 Replace TOC markers only once, if any
The legacy parser only allows for a single insertion of TOC (it drops
later __TOC__ magic words). On Parsoid, we can end up with multiple TOC
markers (which we want to keep around for round-trip reasons), so we
need to discard them in the HandleTOCMarkers phase.

Bug: T359882
Change-Id: I60fdfc2c52680ed53e48d1931fd7f5c937b437a2
2024-05-02 14:54:43 +02:00
jenkins-bot
b671e574eb Merge "Add ParserOptions::setCollapsibleSections()" 2024-04-29 21:17:15 +00:00
C. Scott Ananian
8d031bcf87 Add ParserOptions::setCollapsibleSections()
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing.  This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.

Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.

Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
2024-04-29 12:11:09 -04:00
Subramanya Sastry
d47c70ddac ExtractBody: Use page title recorded in ParserOutput
* Followup to 9a466310
* I had previously added page title info to ParserOutput as part of
  6e5413b1, but while working on 9a466310, we didn't realize that.
* Removed urldecode(..) since output of Title::getPrefixedDBKey
  isn't urlencoded and urldecode converts "+" into " ". A new test
  ensures that edge case works properly.
* Simplify testing + add additional test to ensure title normalization
  doesn't trip up the transform.

Bug: T358242
Change-Id: I9a0cb00bdf9d104a4b327d72b1ec94cf509883a2
2024-04-19 18:31:08 +05:30
Subramanya Sastry
9a46631029 ExtractBody: Convert page-internal link fragments to pure fragment urls
* This ensures that when you have query params like (?useparsoid=1),
  all cite links no longer take you to the non-Parsoid page but
  resolve internally.

* Additionally, this also unbreaks reference previews in local testing
  - not yet sure if this will fix all breakage in production.

* We don't have ready access to the title string and so this patch
  extracts it from a link tag in the <head> of Parsoid HTML. That is
  guaranteed to be correct and reliably present.

  But, if in the future, this changes (whether by adding it to
  ParserOptions, ParserOutput, or the $opts array), we can use that
  directly.

* Added new unit tests that verify the new expectations.

Bug: T358242
Change-Id: Iaf482cc9803564b4cf4ae04f975573f61ff3b0e4
2024-04-12 15:36:01 +05:30
Bartosz Dziewoński
0d99a1c445 HandleSectionLinks: Remove old debug logging for resolved bug
This is just a cleanup change. The exception should never happen,
but if it does, this can be reverted.

Change-Id: I26a7c4105d39d83015c09b779a2de3fd1ddacec1
2024-03-07 23:09:25 +01:00
jenkins-bot
1071fe9614 Merge "HandleSectionLinks: Fix handling headings with raw > in attributes" 2024-03-04 21:09:43 +00:00
Bartosz Dziewoński
c50d43ff65 HandleSectionLinks: Fix handling headings with raw > in attributes
Follow-up to Ibce512b3c4a52f74b2d2124f0159e306f2689ea5.

HEADING_REGEX will now correctly match opening tags when one of the
attributes contains an unencoded > character.

In a better world, this would not use regular expressions. However,
while implementing it as a DOM transformation is easy enough, doing so
causes never-ending test failures due to changes in HTML serialization,
so we gave up on it for now in after discussion on the original patch.

Bug: T358810
Change-Id: Ibad4b29a988c2a4911ebe6512791042c46dd1a9b
2024-03-04 21:28:44 +01:00
James D. Forrester
fe1fbb3a5c build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0
Depends-On: I5349d3378b5acd04f0d7c60072a9b1e3dd8f2052
Change-Id: I3b7fd4c460418e72ed0c36febef75f41bad0afb1
2024-03-01 15:58:13 -05:00
jenkins-bot
a3fa07e4d4 Merge "Move section heading formatting to post-cache transform (take 2)" 2024-02-23 05:15:21 +00:00
Dreamy Jazz
2771117900 Disable hook in ExecutePostCacheTransformHooksTest::testTransform
Why:
* The ExecutePostCacheTransformHooksTest::testTransform test was
  failing due to needing to use the DB. This was addressed in
  7358ddd62f but then caused the
  assertion in the test to fail as VisualEditor modified the
  output causing the test failure.
* Disabling the SkinEditSectionLinks hook for the test should fix
  the test and does not cause test failures on my local machine.

What:
* Call ::clearHook with the 'SkinEditSectionLinks' hook in the
  ExecutePostCacheTransformHooksTest::testTransform test.

Bug: T358103
Change-Id: Ia05cfd1eb572639c117fd264e3c05265adb38e32
2024-02-21 20:12:03 +00:00
Dreamy Jazz
7358ddd62f Make ExecutePostCacheTransformHooksTest a database test
Why:
* The ExecutePostCacheTransformHooksTest core test is not currently
  a database test but is an integration test case.
* However, ::testTransform calls a hook and VisualEditor provides
  a handler for SkinEditSectionLinks that reads from the DB which
  is called by this test.
* Adding the test class to the database group will fix this by
  allowing VisualEditor to use the database in the handler as part
  of the test.

What:
* Add `@group Database` to ExecutePostCacheTransformHooksTest.php

Bug: T358103
Change-Id: Ib3b361f07d5411e4951156059dee11dc5367dffb
2024-02-21 14:13:28 +00:00
C. Scott Ananian
55be7b1f09 Don't double-wrap headings when using DiscussionTools
Discussion Tools runs *before* this stage runs, and so we end up
wrapping headings which have already been wrapped by discussion tools.
Check for an existing wrapper to avoid this.

In the future, we will probably add a new post-cache transform hook
which is at the very *end* of the pipeline, instead of in the middle,
to avoid this sort of ordering dependency between extensions and core.

Bug: T357826
Change-Id: I8cd28a3b42e55844be1258d639e605862952806f
2024-02-17 15:31:13 -06:00
C. Scott Ananian
6fe103c0ed [tests] use @dataProvider to OutputTransformStageTestBase
Needed to create a mock Skin for one test case in order to avoid using
the ServiceContainer prematurely.

Change-Id: Iaa33dfd2b187ac3a1fc44ea46f3b88ef29a62098
2024-02-16 19:33:28 -05:00
Bartosz Dziewoński
834ff25dc1 Move section heading formatting to post-cache transform (take 2)
[Previously attempted in de0646843a,
reverted in e72e1cd16368346b66853f68e2d13f9b416d5a11.]

Previously, Parser.php used Linker::makeHeadline() in order to
generate the `<h2><span class="mw-headline" id="...">...</span></h2>`
markup for section headings, and this was saved in the parser cache.
Now it generates heading tags with placeholder attributes like
`<h2 data-mw-...="..." ...>...</h2>`, and they are replaced in a
post-cache transform to generate the final heading markup, similarly
to how section edit links already worked.

The purpose of these changes is to allow changing the final markup
depending on skin options without splitting the parser cache (T13555).

Deployment and undeployment safety:
* The new post-cache transform has been already added in commit
  Ibce512b3c4a52f74b2d2124f0159e306f2689ea5 for forward-compatibility
  (so that if this patch is reverted, new parser cache entries
  will still be shown correctly).

Implementation notes:
* There are many ways to keep the temporary information other than
  `data-mw-...` attributes, but this way is the easiest to handle
  in a post-cache transform (everything is on the DOM node we want
  to modify), is compatible with other heading-enhancing code in
  DiscussionTools and MobileFrontend, and remains human-readable
  if the post-cache transform doesn't run.
* Sadly this code can't be reused to add section heading markup and
  section edit links to Parsoid (T269630), because it lacks some of
  the necessary metadata, and exposes the rest in ways that are
  trickier to handle in a post-cache transform (on other DOM nodes
  or outside the document).

Depends-On: If85f89c40834618f23dc0ace2e599efb3b6d5ed4
Bug: T13555
Change-Id: If04d72f427ec3c3730e757cbb3ade8840c09f7d3
2024-02-16 20:28:56 +00:00
Reedy
e94e265a93 tests: Add Tests to PHP namespacing
Change-Id: I849268172751d50292e93aa75abe8094873f56bc
2024-02-16 19:10:11 +00:00
C. Scott Ananian
e72e1cd163 Revert "Move section heading formatting to post-cache transform"
This reverts commit de0646843a.

Reason for revert: caused T357723.

Change-Id: I4690c03a34e8796090563e19a214d8ede63fe5d1
2024-02-15 20:58:32 +00:00
Bartosz Dziewoński
de0646843a Move section heading formatting to post-cache transform
Previously, Parser.php used Linker::makeHeadline() in order to
generate the `<h2><span class="mw-headline" id="...">...</span></h2>`
markup for section headings, and this was saved in the parser cache.
Now it generates heading tags with placeholder attributes like
`<h2 data-mw-...="..." ...>...</h2>`, and they are replaced in a
post-cache transform to generate the final heading markup, similarly
to how section edit links already worked.

The purpose of these changes is to allow changing the final markup
depending on skin options without splitting the parser cache (T13555).

Deployment and undeployment safety:
* The new post-cache transform has been already added in commit
  Ibce512b3c4a52f74b2d2124f0159e306f2689ea5 for forward-compatibility
  (so that if this patch is reverted, new parser cache entries
  will still be shown correctly).

Implementation notes:
* There are many ways to keep the temporary information other than
  `data-mw-...` attributes, but this way is the easiest to handle
  in a post-cache transform (everything is on the DOM node we want
  to modify), is compatible with other heading-enhancing code in
  DiscussionTools and MobileFrontend, and remains human-readable
  if the post-cache transform doesn't run.
* Sadly this code can't be reused to add section heading markup and
  section edit links to Parsoid (T269630), because it lacks some of
  the necessary metadata, and exposes the rest in ways that are
  trickier to handle in a post-cache transform (on other DOM nodes
  or outside the document).

Bug: T13555
Change-Id: I4eae18d9d16f54391daba0de82ad05e50f07f9eb
2024-02-15 13:09:08 -05:00
C. Scott Ananian
28a3371382 [OutputTransform] Remove broken and unused 'bodyContentOnly' option
This was formerly used by the REST api, but instead that code just
uses ParserOutput::getRawText() when it needs the full HTML document.
This option has been broken, with various passes like RenderDebugInfo
and AddWrapperDiv adding content in inappropriate places if
bodyContentOnly was false.

Change-Id: Ib45f95ded59c81c16d61803f977d1edbfe82b262
2024-02-15 13:05:53 -05:00
James D. Forrester
4bae64d1c7 Namespace includes/context
Bug: T353458
Change-Id: I4dbef138fd0110c14c70214282519189d70c94fb
2024-02-08 11:07:01 -05:00
Isabelle Hurbain-Palatin
ec9dc3d4c4 Rename PostCacheTransformHookRunner
Follow-up to I53551ec6d6471569709c71c1155729e550f64de8.

Bug: T348253
Change-Id: Ia08624a6770070313bf8bbaa11df29e4ed30b73b
2024-02-07 13:01:20 -05:00
Daimona Eaytoy
f2a9836df0 tests: Rename OutputTransformStageTest for PHPUnit 9.6
Abstract test classes are no longer allowed to end in "Test" as of
PHPUnit 9.6.

Follow-up: I53551ec6d6
Bug: T342110
Change-Id: I9638c2937f8b702851d080ab217fbc34620fabb6
2024-01-17 17:41:36 +01:00
Aaron Schulz
f4261e029f Clean up MediaWiki\OutputTransform namespace casing confusion
The case mismatch was causing confusing PHP errors about missing
classes during paratest runs.

Change-Id: Iaddddd2ff825e41609e915938bc27c0bc4bba245
2024-01-05 17:53:19 -08:00
Isabelle Hurbain-Palatin
7f63d5250e Revert "Use Remex for DeduplicateStyles transform"
This reverts commit 82da9cf14b.

Passing through Remex seems to have unexpected consequences to be
investigated but, for the sake of unbreaking the UBN, let's revert this
first.

Bug: T353920
Change-Id: Iaac7942aa77aee5ab525852ac5b41dd516ff13c9
2023-12-22 11:26:09 +01:00
C. Scott Ananian
82da9cf14b Use Remex for DeduplicateStyles transform
The previous implementation was using an ad-hoc regular expression which
was matching inside the data-mw attribute of Parsoid output, eg:

 <sup about="#mwt42" [...] typeof="mw:Extension/ref mw:Error" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;attrs&quot;:{&quot;name&quot;:&quot;infobox_stats_ref_rail&quot;},&quot;body&quot;:{&quot;html&quot;:&quot;<style data-mw-deduplicate=\&quot;TemplateStyles:r1133582631\&quot; typeof=\&quot;...">

After substitution, the <link> element inserted contained " instead of
&quot; and so broke out of the attribute.

Instead use a proper HTML tokenizer (via wikimedia/remex-html) so that
we don't allow bogus matches inside attribute values.

To fix up tests:
* Don't deduplicate styles when parsing UX messages (also helps performance)
* Don't deduplicate styles in ContentHandler integration tests
* Don't deduplicate styles by default in parser tests
  (unless explicit option is set)

Depends-On: Id9801a9ff540bd818a32bc6fa35c48a9cff12d3a
Depends-On: I5111f1fdb7140948b82113adbc774af286174ab3
Followup-To: Ic0b17e361bf6eb0e71c498abc17f5f67f82318f8
Change-Id: I32d3d1772243c3819e1e1486351d16871b6e21c4
2023-12-15 17:49:21 +01:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
Isabelle Hurbain-Palatin
a3f51c732d Refactor DefaultOutputTransform into a pipeline of transforms
Bug: T348253
Change-Id: I53551ec6d6471569709c71c1155729e550f64de8
2023-12-08 18:06:19 -05:00