Commit graph

24 commits

Author SHA1 Message Date
Subramanya Sastry
d47c70ddac ExtractBody: Use page title recorded in ParserOutput
* Followup to 9a466310
* I had previously added page title info to ParserOutput as part of
  6e5413b1, but while working on 9a466310, we didn't realize that.
* Removed urldecode(..) since output of Title::getPrefixedDBKey
  isn't urlencoded and urldecode converts "+" into " ". A new test
  ensures that edge case works properly.
* Simplify testing + add additional test to ensure title normalization
  doesn't trip up the transform.

Bug: T358242
Change-Id: I9a0cb00bdf9d104a4b327d72b1ec94cf509883a2
2024-04-19 18:31:08 +05:30
Subramanya Sastry
9a46631029 ExtractBody: Convert page-internal link fragments to pure fragment urls
* This ensures that when you have query params like (?useparsoid=1),
  all cite links no longer take you to the non-Parsoid page but
  resolve internally.

* Additionally, this also unbreaks reference previews in local testing
  - not yet sure if this will fix all breakage in production.

* We don't have ready access to the title string and so this patch
  extracts it from a link tag in the <head> of Parsoid HTML. That is
  guaranteed to be correct and reliably present.

  But, if in the future, this changes (whether by adding it to
  ParserOptions, ParserOutput, or the $opts array), we can use that
  directly.

* Added new unit tests that verify the new expectations.

Bug: T358242
Change-Id: Iaf482cc9803564b4cf4ae04f975573f61ff3b0e4
2024-04-12 15:36:01 +05:30
Bartosz Dziewoński
0d99a1c445 HandleSectionLinks: Remove old debug logging for resolved bug
This is just a cleanup change. The exception should never happen,
but if it does, this can be reverted.

Change-Id: I26a7c4105d39d83015c09b779a2de3fd1ddacec1
2024-03-07 23:09:25 +01:00
jenkins-bot
1071fe9614 Merge "HandleSectionLinks: Fix handling headings with raw > in attributes" 2024-03-04 21:09:43 +00:00
Bartosz Dziewoński
c50d43ff65 HandleSectionLinks: Fix handling headings with raw > in attributes
Follow-up to Ibce512b3c4a52f74b2d2124f0159e306f2689ea5.

HEADING_REGEX will now correctly match opening tags when one of the
attributes contains an unencoded > character.

In a better world, this would not use regular expressions. However,
while implementing it as a DOM transformation is easy enough, doing so
causes never-ending test failures due to changes in HTML serialization,
so we gave up on it for now in after discussion on the original patch.

Bug: T358810
Change-Id: Ibad4b29a988c2a4911ebe6512791042c46dd1a9b
2024-03-04 21:28:44 +01:00
James D. Forrester
fe1fbb3a5c build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0
Depends-On: I5349d3378b5acd04f0d7c60072a9b1e3dd8f2052
Change-Id: I3b7fd4c460418e72ed0c36febef75f41bad0afb1
2024-03-01 15:58:13 -05:00
jenkins-bot
a3fa07e4d4 Merge "Move section heading formatting to post-cache transform (take 2)" 2024-02-23 05:15:21 +00:00
Dreamy Jazz
2771117900 Disable hook in ExecutePostCacheTransformHooksTest::testTransform
Why:
* The ExecutePostCacheTransformHooksTest::testTransform test was
  failing due to needing to use the DB. This was addressed in
  7358ddd62f but then caused the
  assertion in the test to fail as VisualEditor modified the
  output causing the test failure.
* Disabling the SkinEditSectionLinks hook for the test should fix
  the test and does not cause test failures on my local machine.

What:
* Call ::clearHook with the 'SkinEditSectionLinks' hook in the
  ExecutePostCacheTransformHooksTest::testTransform test.

Bug: T358103
Change-Id: Ia05cfd1eb572639c117fd264e3c05265adb38e32
2024-02-21 20:12:03 +00:00
Dreamy Jazz
7358ddd62f Make ExecutePostCacheTransformHooksTest a database test
Why:
* The ExecutePostCacheTransformHooksTest core test is not currently
  a database test but is an integration test case.
* However, ::testTransform calls a hook and VisualEditor provides
  a handler for SkinEditSectionLinks that reads from the DB which
  is called by this test.
* Adding the test class to the database group will fix this by
  allowing VisualEditor to use the database in the handler as part
  of the test.

What:
* Add `@group Database` to ExecutePostCacheTransformHooksTest.php

Bug: T358103
Change-Id: Ib3b361f07d5411e4951156059dee11dc5367dffb
2024-02-21 14:13:28 +00:00
C. Scott Ananian
55be7b1f09 Don't double-wrap headings when using DiscussionTools
Discussion Tools runs *before* this stage runs, and so we end up
wrapping headings which have already been wrapped by discussion tools.
Check for an existing wrapper to avoid this.

In the future, we will probably add a new post-cache transform hook
which is at the very *end* of the pipeline, instead of in the middle,
to avoid this sort of ordering dependency between extensions and core.

Bug: T357826
Change-Id: I8cd28a3b42e55844be1258d639e605862952806f
2024-02-17 15:31:13 -06:00
C. Scott Ananian
6fe103c0ed [tests] use @dataProvider to OutputTransformStageTestBase
Needed to create a mock Skin for one test case in order to avoid using
the ServiceContainer prematurely.

Change-Id: Iaa33dfd2b187ac3a1fc44ea46f3b88ef29a62098
2024-02-16 19:33:28 -05:00
Bartosz Dziewoński
834ff25dc1 Move section heading formatting to post-cache transform (take 2)
[Previously attempted in de0646843a,
reverted in e72e1cd16368346b66853f68e2d13f9b416d5a11.]

Previously, Parser.php used Linker::makeHeadline() in order to
generate the `<h2><span class="mw-headline" id="...">...</span></h2>`
markup for section headings, and this was saved in the parser cache.
Now it generates heading tags with placeholder attributes like
`<h2 data-mw-...="..." ...>...</h2>`, and they are replaced in a
post-cache transform to generate the final heading markup, similarly
to how section edit links already worked.

The purpose of these changes is to allow changing the final markup
depending on skin options without splitting the parser cache (T13555).

Deployment and undeployment safety:
* The new post-cache transform has been already added in commit
  Ibce512b3c4a52f74b2d2124f0159e306f2689ea5 for forward-compatibility
  (so that if this patch is reverted, new parser cache entries
  will still be shown correctly).

Implementation notes:
* There are many ways to keep the temporary information other than
  `data-mw-...` attributes, but this way is the easiest to handle
  in a post-cache transform (everything is on the DOM node we want
  to modify), is compatible with other heading-enhancing code in
  DiscussionTools and MobileFrontend, and remains human-readable
  if the post-cache transform doesn't run.
* Sadly this code can't be reused to add section heading markup and
  section edit links to Parsoid (T269630), because it lacks some of
  the necessary metadata, and exposes the rest in ways that are
  trickier to handle in a post-cache transform (on other DOM nodes
  or outside the document).

Depends-On: If85f89c40834618f23dc0ace2e599efb3b6d5ed4
Bug: T13555
Change-Id: If04d72f427ec3c3730e757cbb3ade8840c09f7d3
2024-02-16 20:28:56 +00:00
Reedy
e94e265a93 tests: Add Tests to PHP namespacing
Change-Id: I849268172751d50292e93aa75abe8094873f56bc
2024-02-16 19:10:11 +00:00
C. Scott Ananian
e72e1cd163 Revert "Move section heading formatting to post-cache transform"
This reverts commit de0646843a.

Reason for revert: caused T357723.

Change-Id: I4690c03a34e8796090563e19a214d8ede63fe5d1
2024-02-15 20:58:32 +00:00
Bartosz Dziewoński
de0646843a Move section heading formatting to post-cache transform
Previously, Parser.php used Linker::makeHeadline() in order to
generate the `<h2><span class="mw-headline" id="...">...</span></h2>`
markup for section headings, and this was saved in the parser cache.
Now it generates heading tags with placeholder attributes like
`<h2 data-mw-...="..." ...>...</h2>`, and they are replaced in a
post-cache transform to generate the final heading markup, similarly
to how section edit links already worked.

The purpose of these changes is to allow changing the final markup
depending on skin options without splitting the parser cache (T13555).

Deployment and undeployment safety:
* The new post-cache transform has been already added in commit
  Ibce512b3c4a52f74b2d2124f0159e306f2689ea5 for forward-compatibility
  (so that if this patch is reverted, new parser cache entries
  will still be shown correctly).

Implementation notes:
* There are many ways to keep the temporary information other than
  `data-mw-...` attributes, but this way is the easiest to handle
  in a post-cache transform (everything is on the DOM node we want
  to modify), is compatible with other heading-enhancing code in
  DiscussionTools and MobileFrontend, and remains human-readable
  if the post-cache transform doesn't run.
* Sadly this code can't be reused to add section heading markup and
  section edit links to Parsoid (T269630), because it lacks some of
  the necessary metadata, and exposes the rest in ways that are
  trickier to handle in a post-cache transform (on other DOM nodes
  or outside the document).

Bug: T13555
Change-Id: I4eae18d9d16f54391daba0de82ad05e50f07f9eb
2024-02-15 13:09:08 -05:00
C. Scott Ananian
28a3371382 [OutputTransform] Remove broken and unused 'bodyContentOnly' option
This was formerly used by the REST api, but instead that code just
uses ParserOutput::getRawText() when it needs the full HTML document.
This option has been broken, with various passes like RenderDebugInfo
and AddWrapperDiv adding content in inappropriate places if
bodyContentOnly was false.

Change-Id: Ib45f95ded59c81c16d61803f977d1edbfe82b262
2024-02-15 13:05:53 -05:00
James D. Forrester
4bae64d1c7 Namespace includes/context
Bug: T353458
Change-Id: I4dbef138fd0110c14c70214282519189d70c94fb
2024-02-08 11:07:01 -05:00
Isabelle Hurbain-Palatin
ec9dc3d4c4 Rename PostCacheTransformHookRunner
Follow-up to I53551ec6d6471569709c71c1155729e550f64de8.

Bug: T348253
Change-Id: Ia08624a6770070313bf8bbaa11df29e4ed30b73b
2024-02-07 13:01:20 -05:00
Daimona Eaytoy
f2a9836df0 tests: Rename OutputTransformStageTest for PHPUnit 9.6
Abstract test classes are no longer allowed to end in "Test" as of
PHPUnit 9.6.

Follow-up: I53551ec6d6
Bug: T342110
Change-Id: I9638c2937f8b702851d080ab217fbc34620fabb6
2024-01-17 17:41:36 +01:00
Aaron Schulz
f4261e029f Clean up MediaWiki\OutputTransform namespace casing confusion
The case mismatch was causing confusing PHP errors about missing
classes during paratest runs.

Change-Id: Iaddddd2ff825e41609e915938bc27c0bc4bba245
2024-01-05 17:53:19 -08:00
Isabelle Hurbain-Palatin
7f63d5250e Revert "Use Remex for DeduplicateStyles transform"
This reverts commit 82da9cf14b.

Passing through Remex seems to have unexpected consequences to be
investigated but, for the sake of unbreaking the UBN, let's revert this
first.

Bug: T353920
Change-Id: Iaac7942aa77aee5ab525852ac5b41dd516ff13c9
2023-12-22 11:26:09 +01:00
C. Scott Ananian
82da9cf14b Use Remex for DeduplicateStyles transform
The previous implementation was using an ad-hoc regular expression which
was matching inside the data-mw attribute of Parsoid output, eg:

 <sup about="#mwt42" [...] typeof="mw:Extension/ref mw:Error" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;attrs&quot;:{&quot;name&quot;:&quot;infobox_stats_ref_rail&quot;},&quot;body&quot;:{&quot;html&quot;:&quot;<style data-mw-deduplicate=\&quot;TemplateStyles:r1133582631\&quot; typeof=\&quot;...">

After substitution, the <link> element inserted contained " instead of
&quot; and so broke out of the attribute.

Instead use a proper HTML tokenizer (via wikimedia/remex-html) so that
we don't allow bogus matches inside attribute values.

To fix up tests:
* Don't deduplicate styles when parsing UX messages (also helps performance)
* Don't deduplicate styles in ContentHandler integration tests
* Don't deduplicate styles by default in parser tests
  (unless explicit option is set)

Depends-On: Id9801a9ff540bd818a32bc6fa35c48a9cff12d3a
Depends-On: I5111f1fdb7140948b82113adbc774af286174ab3
Followup-To: Ic0b17e361bf6eb0e71c498abc17f5f67f82318f8
Change-Id: I32d3d1772243c3819e1e1486351d16871b6e21c4
2023-12-15 17:49:21 +01:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
Isabelle Hurbain-Palatin
a3f51c732d Refactor DefaultOutputTransform into a pipeline of transforms
Bug: T348253
Change-Id: I53551ec6d6471569709c71c1155729e550f64de8
2023-12-08 18:06:19 -05:00