Commit graph

55 commits

Author SHA1 Message Date
Subramanya Sastry
c8d0470f4b Make ParsoidOutputAccess a wrapper over ParserOutputAccess
* Updated ParserOutput to set Parsoid render ids that REST API
  functionality expects in ParserOutput objects.
* CacheThresholdTime functionality no longer exists since it was
  implemented in ParsoidOutputAccess and ParserOutputAccess doesn't
  support it. This is tracked in T346765.
* Enforce the constraint that uncacheable parses are only for fake or
  mutable revisions. Updated tests that violated this constraint to
  use 'getParseOutput' instead of calling the parse method directly.
* Had to make some changes in ParsoidParser around use of preferredVariant
  passed to Parsoid. I also left some TODO comments for future fixes.
  T267067 is also relevant here.

PARSOID-SPECIFIC OPTIONS:
* logLinterData: linter data is always logged by default -- removed
  support to disable it. Linter extension handles stale lints properly
  and it is better to let it handle it rather than add special cases
  to the API.
* offsetType: Moved this support to ParsoidHandler as a post-processing
  of byte-offset output. This eliminates the need to support this
  Parsoid-specific options in the ContentHandler hierarchies.
* body_only / wrapSections: Handled this in HtmlOutputRendererHelper
  as a post-processing of regular output by removing sections and
  returning the body content only. This does result in some useless
  section-wrapping work with Parsoid, but the simplification is probably
  worth it. If in the future, we support Parsoid-specific options in
  the ContentHandler hierarchy, we could re-introduce this. But, in any
  case, this "fragment" flavor options is likely to get moved out of
  core into the VisualEditor extension code.

DEPLOYMENT:
* This patch changes the cache key by setting the useParsoid option
  in ParserOptions. The parent patch handles this to ensure we don't
  encounter a cold cache on deploy.

TESTS:
* Updated tests and mocks to reflect new reality.
* Do we need any new tests?

Bug: T332931
Change-Id: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
2023-10-13 15:03:03 -05:00
Subramanya Sastry
311e1a9a43 Revert offsetType disabling from 1aa71cf5: Parsoid's rt-testing needs it
* Parsoid's rt-testing script is still a node.js script and hence needs
  ucs2 offests for its syntactic / semantic diff classification.

* So, we cannot let 1aa71cf5 ride the train since it will break
  Parsoid's rt-testing. We'll figure out an alternative way of handling
  it, but for now, I am reverting that part of the patch.

* Document in the ParsoidHandlerTest test that ucs2 offsets are used and
  cannot be changed to 'byte'

Bug: T347426
Change-Id: Ifa833e01ef117d7bcd6da1c7eb542535192662eb
2023-09-28 19:56:43 -05:00
Subramanya Sastry
1aa71cf51b Disable Parsoid support for non-default output versions and offset types
* This is in service of a followup patch that merges ParsoidOutputAccess
  and ParserOutputAccess. We want to eliminate all Parsoid-specific options
  that aren't part of ParserOptions and aren't easily supportable via
  html2html transforms.

* offsetType conversion relies on Parsoid code that is a bit entangled
  with env, siteconfig (and extension configs), page source, etc. It
  could all be refactored but once the html2html output transformation
  framework lands, we could potentially use that to call Parsoid to do
  these transforms by exposing such transforms to the framework.

* In this patch, outputContentVersion that isn't the default major HTML
  version is no longer support. It could potentially be supported via the
  downgrade functionality in Parsoid in the future, or we might decide
  to re-enable multiple outputContentVersion selection in the future
  if such a use case arises. But, there are no plans to bump the major
  HTML version in the near future while we work on read views.

* Rather than delete associated tests, I've marked them skipped so that
  they can re-enabled when this support is added back.

Bug: T347426
Change-Id: Ibede4acd68e944512f6d00763d29c6b1605d67eb
2023-09-27 15:03:41 -05:00
James D. Forrester
94ece673b2 Namespace TitleValue under \MediaWiki\Title
One of the big ones, so doing this alone.

Bug: T166010
Change-Id: I4c901d5c32696d8334ec30cede7d9b6f3d8d645e
2023-09-18 18:24:39 +01:00
Subramanya Sastry
062fd08e51 Remove all Parsoid debugApi references and uses
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
  classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
  repo as well.

Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
2023-09-14 14:48:48 -05:00
Subramanya Sastry
f275549f56 ParsoidHandler: Remove unnecessary instanceof check against PageConfig
* All uses of getHtmlInputTransformHelper already pass a PageIdentity

Change-Id: Iccf8662400299d6efb76ddfc892b50a82b449df8
2023-09-13 16:35:33 -05:00
Daimona Eaytoy
f83e611efa Make MediaWikiIntegrationTestCase::addCoreDBData a noop
The method should never be called directly, so make it throw an exception.
Nonetheless, mark it as deprecated and detect overrides in the
constructor, so that anyone who tries to override this method will see a
warning.

Fix the few tests that were relying on the existence of the test page.

Bug: T342428
Depends-On: Ic64ded5e2c0b59e7c888ece9566076058a125be4
Change-Id: I308617427309815062d54c14f3438cab31b08a73
2023-09-05 00:36:36 +00:00
Subramanya Sastry
1738e1eec3 ParsoidHandler: Look up page title from oldid, if available
* Without this, the TransformHandler creates a 'newMainPage' in
  TransformHandler::tryToCreatePageIdentity. This then causes the
  page and revision to not correspond to each other.

* This somehow didn't matter so far, but in my patch where I try
  to integrate ParsoidOutputAccess and ParserOutputAccess,
  ParserOutputAccess has a precondition check asserting that the
  page and revision object agree on the page id and causes some
  of these selser tests to fail.

* Got rid of the titleMissing derived property in request attributes
  since it felt extraneous.

Change-Id: I10ea62c2076f0ca9ba160b753bdca2ad0f8b40cd
2023-09-01 21:55:31 -05:00
Daimona Eaytoy
be7a86637f tests: Avoid relying on existence of a test page
Tests should create fixtures if and when they need them. Create test
pages explicitly in tests that were expecting them to exist.

Bug: T342428
Change-Id: I552420bb857388cb0873f7afc4e8b15b88388937
2023-08-07 22:57:59 +00:00
Arlo Breault
d8c64f41b7 Inline createPageConfig in tryToCreatePageConfig
Change-Id: If1d62fc75895fc5322a64a1398d2b68e6fde72ac
2023-06-22 21:10:24 -04:00
Tim Starling
580ec48e5b Fix more PHPStorm inspections (#2)
* Illegal string offset and invalid argument supplied to foreach, due to incorrect type information
* Array internal pointer reset is unnecessary
* $hookData unused since MW 1.35 due to incomplete revert
* array_push() with single element
* Unnecessary sprintf()
* for loop can be replaced with str_repeat()
* preg_replace() can be replaced with rtrim()
* array_values() call is redundant
* Unnecessary cast to string
* Unnecessary ternary. Often the result relies on short-circuit evaluation, but I find it more readable nonetheless.

Change-Id: I4c45bdb59b51b243fa96286bec8b58deb097d707
2023-03-25 00:19:58 +00:00
Tim Starling
5e30a927bc tests: Make some PHPUnit data providers static
Just methods where adding "static" to the declaration was enough, I
didn't do anything with providers that used $this.

Initially by search and replace. There were many mistakes which I
found mostly by running the PHPStorm inspection which searches for
$this usage in a static method. Later I used the PHPStorm "make static"
action which avoids the more obvious mistakes.

Bug: T332865
Change-Id: I47ed6692945607dfa5c139d42edbd934fa4f3a36
2023-03-24 02:53:57 +00:00
daniel
4f2f40f6a0 ParsoidHandlerTest: check no etag is emitted perdefault
The page/html endpoint should only return an ETag if stashing was
requested. Otherwise, the ETag is meaningless.

Bug: T331629
Change-Id: I55d7c5e33ef7275695ee93e2937da1a998e2eda3
2023-03-14 14:05:16 +00:00
jenkins-bot
5434c71393 Merge "Use Bcp47Code when interfacing with Parsoid" 2023-03-13 19:11:03 +00:00
daniel
74d6e57e6a TransformHandler: Load stashed page bundle based on ETag.
Allow clients to use an If-Match header with the
transform/html/to/wikitext endpoint.

This follows up on Ida81a314f015e205f2081c68a82d486145097c92
(reverted and reapplied)
It adds support for stashing in wt2html, enabling it for Parsoid's
page/html endpoint. It also ensures we are only emitting ETags if
stashing is enabled.

This also removes handling for use-stash from ParsoidHandler,
which did nothing.

Bug: T310464
Bug: T331629
Needed-By: I08f1388faaccef6c1d9a393f8011011d30a25ec7
Change-Id: I9d6eaf45d5b4978afc17493720777e77f0e645b2
2023-03-13 18:18:03 +00:00
C. Scott Ananian
5ad8dea80a Use Bcp47Code when interfacing with Parsoid
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.

Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.

This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be.  Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method.  The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).

Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.

Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
2023-03-13 13:25:09 -04:00
Daimona Eaytoy
19f8127ef0 Make it possible to override the session in REST API tests
The current signature of the various execute methods only takes a
boolean parameter to determine if the session should be safe against
CSRF, but that does not give callers fine-grained control over the
Session object, including setting a specific token.

Also, do not use createNoOpMock in getSession(), since it implies
strong assertions on what methods are called. This way, getSession
can also be used to get a simple mock session that tests may further
manipulate.

Make $csrfSafe parameter of SessionHelperTestTrait::getSession
mandatory. This way, callers are forced to think what makes sense in
each use case. The various methods in HandlerTestTrait now default to
a session that is safe against CSRF. This assumes that most REST
handlers don't care about the session, and that any handler that does
care about the session and where someone needs to test the behaviour
in case of bad/missing token will explicitly provide a Session that
is NOT safe against CSRF.

Typehint the return value of Session(Backend)::getUser so that PHPUnit
will automatically make it return a mock User object even if the method
is not explicitly mocked. Remove a useless PHPUnit assertion -- setting
the return value to be X and then veryfing that is equal to X is a
tautology, and can only fail if the test itself is flawed (as was the
case, since it was using stdClass as the return type for all
methods). Remove the getUser test case altogether, there's no way to
make it work given the DummySessionBackend, and the test isn't that
helpful anyway. More and more methods will have the same issue as soon
as their return value is typehinted.

Follow-up: I2a9215bf909b83564247ded95ecdb4ead0615150
Change-Id: Ic51dc3e7bf47c81f2ac4705308bb9ecd8275bbaf
2023-02-06 18:56:51 +01:00
Derick Alangi
1afd52e3e4 REST: Move Helper classes to their own namespace
Mixing Handlers with Helpers doesn't look nice for consistency
reasons. Helpers should be in their own place (grouped) in the
Handlers directory as they're really "helpers for the handlers".

Change-Id: Ieeb7a0a706a4cb38778f312bfbfe781a1f366d14
2023-01-16 21:16:09 +01:00
Derick Alangi
ce8e5f1549 Introduce HtmlMessageOutputHelper for system messages
This introduces an interface HtmlOutputHelper that is implemented
by both HtmlMessageOutputHelper or HtmlOutputRendererHelper based
on the page we're dealing with.

Bug: T323558
Change-Id: I1fb8dcc5cc05ce3f32f3c1862b88045f1c8e612b
2022-12-16 11:49:56 +01:00
Derick Alangi
fe091a7cad Parsoid: Default parsoid version to "0.0.0" for unsupported models
When parsoid was dealing with content for content models that it does
not support, the corresponding pagebundle used `null` as the version
which will break when converting the it to JSON.

This patch fixes this by just using a version which is not specified
like 0.0.0

A better fix would be for page bundle to not explode when the version is
set to null. The other real fix is for Parsoid to not ask for rendering of
pages that are not wikitext.

Bug: T325137
Change-Id: Iff1ce432d1b2d30f3f74c53a0602c11034db5874
2022-12-14 13:07:18 +01:00
daniel
5f2026c31c ParsoidHandler: test wt2html with old revision
Update the test for wt2html to assert that it works properly with an old
revision.

Bug: T324801
Change-Id: Ia2a7e28cd999712b1bd890eed48d0a5de931700f
2022-12-09 19:33:18 +01:00
Derick Alangi
e1344b76cd Fix typo (hmtl) to html where necessary
Change-Id: If837968f402c71813121754b8fbdb5519c3dae34
2022-12-07 17:11:48 +01:00
Arlo Breault
dc02ef2500 Permit Parsoid minor version bumps
Loosening the version checks in the integration tests is akin to what
was done for api-testing in I6d7db6a05c48de8a57f83e4c8af38ab50271297a
and I317ce587e62f9e94bbafbdabac64156237c4f1e3.

The tightly coupled tests were added in Ieb4b41375d521893f95e2fcb5f4984e7b5a2364c

The change to the condition when original html needs downgrading was
exposed by the test added in Ic3cc3a598f32cad6122964cb8a7376a56be9129f
though not exercised because edited and re-parsed html had the same
versions.  Without it, in-flight edits on wikis that might request a
re-parse (think private wikis) when a new semantically equivalent
version is deployed could fails due to trying to find a downgrade path
and none being there.

Needed-By: Iabab0c093dcb21e28c643be6e85cf1a7b54cd999
Change-Id: I33e70df750c6d4b082281fdc8bacdea72662832a
2022-12-05 19:51:38 -05:00
Daniel Kinzler
f36a28ff21 [Fix] ParsoidHandler: use HtmlOutputRendererHelper in wt2html
Fixes the reason for reverting Ie430acd0753880d88370bb9f22bb40a0f9ded917:

The issue was that with my patch, the transform/wikitext/to/html started
ignoring the offsetType field in the body. So the offsetType used in the
response (or stashed data) would always be 'byte'.
But the roundtrip-test.js scripts requests 'ucs2'.

This causes an error when sending the HTML and data-parsoid back to
transform/html/to/wikitext, again with offsetType:'ucs2': the offsetType
embededed in data-parsoid will be byte, and the mismatch causes a 400
to be returned. This broke the roundtrip-test.js script.

The fix is to no ignore the offsetType specified in the request body.

Change-Id: Ief721c23ed9a57d781cfdac625a62113f22f87a5
2022-12-05 18:49:30 +00:00
Daniel Kinzler
5cb388455b [Re-apply] ParsoidHandler: use HtmlOutputRendererHelper in wt2html
This restores change Ie430acd0753880d88370bb9f22bb40a0f9ded917.
This reverts commit ab6baad1a5.

NOTE: Also needs the patch the fixes the original reason for the
revert: Ief721c23ed9a57d781cfdac625a62113f22f87a5

Change-Id: Ic48db1b5fdff1dfd4f2d2643d64252e5fc721e79
2022-12-05 18:43:51 +00:00
Daniel Kinzler
ab6baad1a5 Revert "ParsoidHandler: use HtmlOutputRendererHelper in wt2html"
This reverts commit e82f11c246.

Reason for revert: Breaks parsoid CI

1) Parsoid round-trip e2e testing with MW REST endpoints
     rt-testing e2e:
     AssertionError: expected 1 to equal 0
     + expected - actual
     -1
     +0

     at Context.<anonymous> (tests/api-testing/RoundTrip.js:59:10)
     at processTicksAndRejections (internal/process/task_queues.js:95:5)

Change-Id: Ib94f964c2717885f777c1fe0c9c443cd6a5ed3ae
2022-12-01 21:17:34 +00:00
daniel
e82f11c246 ParsoidHandler: use HtmlOutputRendererHelper in wt2html
NOTE: This causes Parsoid output to be written to the parser cache.
This should be unconditional in the future, but for now it is
controled by wgTemporaryParsoidHandlerParserCacheWriteRatio.

This change affects the following endpoints that use the wt2html method:
* /coredev/v0/transform/wikitext/to/html in core
* /{domain}/v3/transform/wikitext/to/html from parsoid
* /{domain}/v3/page/html/{title} from parsoid

The /v1/page/{title}/html endpoint is not affected, since it
doesn't use wt2html, but has always been using HtmlOutputRendererHelper
directly.

Bug: T322672
Depends-On: Ic37f606bb51504c8164d005af55ca9a65f595041
Change-Id: Ie430acd0753880d88370bb9f22bb40a0f9ded917
2022-12-01 10:14:49 +00:00
thiemowmde
0b80e9ebcc Fix incomplete ITextFormatter mocks
Otherwise the mocked getLangCode() method returns null, which is not
allowed any more in PHP 8.1.

Bug: T289926
Required-For: I7e026cca216aba24ee5d5662b6fca322b3cec9ae
Change-Id: I178def7f03a44f6b49cdb461d9ab340e1c89517f
2022-11-21 10:00:57 +01:00
daniel
7c2ad4e058 Add tests for wt2html
Porting relevant tests from tests/api-testing/Parsoid.js in the
Parsoid extension.

Change-Id: Ieb4b41375d521893f95e2fcb5f4984e7b5a2364c
2022-11-17 20:44:48 +00:00
daniel
f7dc7e6045 ParsoidHandler: test that selser will re-parse
When no original HTML and no etag is provided,
selser should still be attempted based on a rendered
version of the old wikitext.

Change-Id: Ic3cc3a598f32cad6122964cb8a7376a56be9129f
2022-11-08 12:22:57 +01:00
daniel
f545d5efeb Rename HTMLTransform to HtmlToContentTransform
* We will have several kinds of HTML transformations.
Rename HTMLTransform to indicate that its for converting HTML to Content
objects.

* Using Naming Convention 'Html' instead of 'HTML'

Change-Id: I506f3303ae8f9e4db17299211366bef1558f142c
2022-11-03 16:47:36 +01:00
Tim Starling
43a93d9782 Use the null coalescing assignment operator
Available since PHP 7.4.

Automated search, manual replacement.

Change-Id: Ibb163141526e799bff08cfeb4037b52144bb39fa
2022-10-21 13:26:49 +11:00
jenkins-bot
05d701a2a4 Merge "ParsoidHandler: use metrics from SiteConfig" 2022-10-04 17:05:27 +00:00
daniel
79cc21beaf ParsoidHandler: use metrics from SiteConfig
ParsoidHandler should pass the metrics object from the
SiteConfig to HtmlInputTransformHelper, instead of using the global
metrics instance. Otherwise, the metricsPrefix defined in the parsoid
settings is ignored.

Change-Id: Ie85f2306e8b0f123b9fdd737faffdd85117015b1
2022-10-04 16:49:36 +00:00
daniel
a02be0b3f8 HtmlInputTransformHelper: Fall back to ParserCache
If a render ID is given via the use-cache parameter, but the key is not
found in the parsoid stash, look at the most recent known rendering of
the revision, and use it if it matches the render ID.

This patch moves the responsibility for looking up RevisionRecords and
PageRecords into ParsoidOutputAccess. This way, callers only need to
have a PageIdentity, and optionally a revision ID.

Bug: T318395
Change-Id: I1aa5b0fd9fb1acaa2544d5a58125fa3810a0eb39
2022-09-30 15:56:23 +00:00
daniel
f31cd9f1d3 REST: HtmlInputTransformHelper: Load original data from stash
Parsoid needs the original rendering in order to apply
selective serialization (selser). The page/{title}/html endpoint
can stash the rendering, and now the transform endpoint can make use
of the stashed rendering.

Bug: T310464
Change-Id: Ia58043ed3aa1eb12731d82aa87606c82ec63f663
2022-09-29 19:52:27 +02:00
jenkins-bot
33706fd187 Merge "TransformHandler: add test for variant conversion" 2022-09-26 16:39:09 +00:00
daniel
4107333069 Introduce HtmlInputTransformHelper
The HtmlInputTransformHelper is intended to provide code sharing
between VisualEditor's DirectParsoidClient and the ParsoidHandler
base class used by TransformHandler.

Bug: T310376
Change-Id: I9c15f075cfc5f198e290758fc23d25990b47a185
2022-09-26 12:58:17 +00:00
daniel
654d1d0dd1 TransformHandler: add test for variant conversion
Change-Id: I91acc9b4306a8170be5e4f94377aab764e185807
2022-09-26 12:57:08 +02:00
daniel
d6140952ed HTMLTransform: do not presume wikitext
Parsoid supports other source formats besides wikitext.
This patch improves support for non-wikitext content by removing
assumptions about the source type.

Change-Id: I5480ff200a93026cea7f1542e12834b06ac6f730
2022-09-22 17:41:48 +01:00
daniel
24a26ec25b REST: make ParsoidHandler use HTMLTransformFactory
This also moves the creation of PageConfig from HTMLTransformFactory
into HTMLTransform, to ensure all relevant info, particularly the
page language, is known.

Change-Id: Id354862d6497816e0c007b9cb3b0d183c9d4b719
2022-09-16 18:46:17 +02:00
jenkins-bot
246b1f64e5 Merge "ParsoidHandlerTest: Add tests to cover tryToCreatePageConfig()" 2022-08-25 20:12:46 +00:00
Derick Alangi
0859331ba1 ParsoidHandlerTest: Add tests to cover tryToCreatePageConfig()
This patch creates a patch to cover the aforementioned code path for
when it works correctly and when it throws.

Change-Id: I4b3d9f280a0977d3811afb768824da302673e659
2022-08-25 19:42:48 +01:00
daniel
df0744f402 Split setOriginalData( ... ) to more related setters for encapsulation
By splitting the setOriginalData methods into several setters, we remove
any knowledge about the structure of the request body from HTMLTransform.
It also allows us to be specific about which data to operate on.

This also removes the concept of page bundles from the public interface
of HTMLTransform. PageBundle objects are used only internally.

Change-Id: If97a74ce251f281b7d980928a01b764d6ec0d0a4
2022-08-25 18:40:26 +02:00
Derick Alangi
b078f598f9 Move transformHtmlToWikitext() and getSelserData() to HTMLTransform
This patch moves remaining transformation logic to a renamed (from
HTMLTransformInput -> HTMLTransform) class. Also, the HTMLTransform
class is moved to the correct directory, hence namespace (including
tests).

Some data files have been copied over to it's own sub-directory in
the correct place since HTMLTransformTest needs it. ParsoidHandler
class is fine where it is because its operation is what happens in
the REST land.

NOTE: The 2 remaining methods moved into HTMLTransform are the last
ones we intended to move into this class to make the refactoring of
html2wt() method complete in this context.

Change-Id: I8929931e1b0acf247abe9d826eef57f3e0d4e132
2022-08-11 07:50:53 +01:00
jenkins-bot
e1ff495a45 Merge "ParsoidHandler: fix page bundle input with no orig HTML." 2022-08-02 19:55:25 +00:00
daniel
311f5450a1 ParsoidHandler: fix page bundle input with no orig HTML.
When refactoring, we were assuming that if the input is a page bundle,
it will always contain original HTML. That is not the case. We need to
make use of the supplied data-parsoid array even if there is no original
HTML given.

Change-Id: Ida8cbfcaac059af8902db9560a3ad6884e8b1790
2022-08-02 19:07:52 +02:00
daniel
6fd1561658 ParsoidHandler: add test for pagebundle input without original HTML
This is a regression test for an issue that broke parsoid's roundtrip
tests. The issue itself was already fixed on the master branch.

Change-Id: Ia7edb26a149231fbfc43f4edb8185304341b19ca
2022-08-02 14:46:39 +00:00
daniel
00c4f11ab6 Fix $validateXMLNames flag when parsing HTML
Change-Id: I6cbd2e8a7096b96814e9e0afe0193e1ca781af45
2022-08-01 17:23:03 +02:00
daniel
d50218c84f Move DOM transformations into HTMLTransformInput getters.
This moved the logic for applying page bundle data into the getters that
return DOM elements. It also makes the application of version
downgrades implicit.

NOTE: This patch changes the expected value of one of the phpunit tests
to a version that has no closing </div> tags. This appears to be the
original and expected behavior, per the corresponding test in the
parsoid extension's Parsoid.js test suite.

Change-Id: If2d7b06d8ba92fb63e6955ec7587ed4aea557251
2022-07-28 19:07:40 +02:00