Commit graph

25 commits

Author SHA1 Message Date
Abijeet
5c113a833a LanguageVariantConverter: Add fallback to core LanguageConverter
If variant conversion is not supported by Parsoid, fallback to using
the old LanguageConverter.

We still call parsoid to perform variant conversion in order to add
metadata that is missing when the core language converter is used.

Bug: T318401
Change-Id: I0499c853b4e301f135339fc137054bd760ee237d
Depends-On: Ie94aaa11963ec1e9e99136af469a05fa4005710d
2022-12-11 12:12:33 +05:30
Abijeet
803092d4af tests: Remove unnecessary override to use pig-latin
Pig latin is enabled by default since
Ia80ad33cbf5e311fa8b84bd765a8df8d156f4c38

Change-Id: I0cd922bb0ee1fd7bce2ced2eacbdb6ed25ada7d8
2022-12-08 17:52:00 +05:30
daniel
b7ab24c218 Fix LanguageVariantConverter test
Accept sr-Latn as well as sr-el as the language code for Serbian with
latin script.

This was broken when the parsoid library started to use BCP-47 codes
rather than internal MediaWiki codes. For now, we accept both, so we are
compatible with the version of the parsoid lib currently in the vendor
repo as well as the version picked by composer update.

Bug: T323985
Change-Id: If0b02be4f391b31fb75e2ad51e199a83707b0e3c
2022-11-29 15:34:42 +01:00
daniel
e61b9b6680 page/{title}/html: handle unknown variant gracefully
Language conversion shouldn't crash with a 500 when a variant is
requested for a language that does not support variants. This behavior
is especially annoying when manually calling REST endpoints with a
browser, since browsers routinely send Accept-Language headers.

Change-Id: I31a14cb184a7bf940b7d178c12b2e7829d2eca0f
2022-11-22 23:03:55 +01:00
daniel
6fd3a7b0b0 Stash original wikitext when rendering unsaved content.
When visual editor switches from source mode to visual mode, we need to
stash the wikitext. Otherwise, we later lack the proper context to
convert the modified HTML back to wikitext.

Bug: T321862
Change-Id: Id611e6e022bf8d9d774ca1a3a214220ada713285
2022-11-04 17:17:32 +01:00
daniel
f545d5efeb Rename HTMLTransform to HtmlToContentTransform
* We will have several kinds of HTML transformations.
Rename HTMLTransform to indicate that its for converting HTML to Content
objects.

* Using Naming Convention 'Html' instead of 'HTML'

Change-Id: I506f3303ae8f9e4db17299211366bef1558f142c
2022-11-03 16:47:36 +01:00
daniel
4ad9c9b035 variant transform: allow input content-language to be a variant
When submitting HTML to transform/html/to/html, the language specified
by the input's content-language header should be allowed to be the
source variant.

It should also be possible to just specify the source variant, and
derive the base language from that rather than the content-language
header or the page language.

Change-Id: I703c112358a921a8b0c9e63b70fd820ae3ea16fc
2022-11-02 01:30:36 -04:00
Abijeet
715080cfd5 LanguageVariantConverter: Use content language code from HTTP header
Use the content language from the header, and give that the highest
priority when identifying the page language.

Bug: T317019
Change-Id: Ibb0671f1b873ef83a4d53824a9c4c17726e68635
2022-10-07 20:28:57 +05:30
daniel
5b0d1cfd35 Re-apply: Introduce LanguageVariantConverter
This reverts Ib73841bcc6c101bbe8a76f76dc81553290726039 and re-applies
I55a58f9824329893575a532cd10b9422ededb9ba with some changes: The source
variant is passed in explicitly. More complete handling of the input
language will be added in a follow-up.

Original description:

This class is used in ParsoidHandler::languageConversion

It uses the Parsoid to perform the actual conversion of the content
to a language variant.

The source language is determined using the PageBundle or the page
language from the Title.

To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.

Bug: T317019
Change-Id: Ida1a040628c26ac2ef108b0c90a3d3285a493b0e
2022-10-04 20:29:54 +02:00
Daniel Kinzler
c5bc391b2b Revert "Introduce LanguageVariantConverter"
This reverts commit 5c49a09e89.

Reason for revert: See https://phabricator.wikimedia.org/T319282

Bug: T319282
Change-Id: Ib73841bcc6c101bbe8a76f76dc81553290726039
2022-10-04 11:52:09 +00:00
Abijeet
5c49a09e89 Introduce LanguageVariantConverter
This class is used in ParsoidHandler::languageConversion

It uses the Parsoid to perform the actual conversion of the content
to a language variant.

The source language is determined using the PageBundle or the page
language from the Title.

To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.

Bug: T317019
Change-Id: I55a58f9824329893575a532cd10b9422ededb9ba
2022-10-03 16:13:29 +00:00
daniel
4107333069 Introduce HtmlInputTransformHelper
The HtmlInputTransformHelper is intended to provide code sharing
between VisualEditor's DirectParsoidClient and the ParsoidHandler
base class used by TransformHandler.

Bug: T310376
Change-Id: I9c15f075cfc5f198e290758fc23d25990b47a185
2022-09-26 12:58:17 +00:00
daniel
d6140952ed HTMLTransform: do not presume wikitext
Parsoid supports other source formats besides wikitext.
This patch improves support for non-wikitext content by removing
assumptions about the source type.

Change-Id: I5480ff200a93026cea7f1542e12834b06ac6f730
2022-09-22 17:41:48 +01:00
jenkins-bot
05189b87f0 Merge "REST: make ParsoidHandler use HTMLTransformFactory" 2022-09-16 20:21:57 +00:00
daniel
24a26ec25b REST: make ParsoidHandler use HTMLTransformFactory
This also moves the creation of PageConfig from HTMLTransformFactory
into HTMLTransform, to ensure all relevant info, particularly the
page language, is known.

Change-Id: Id354862d6497816e0c007b9cb3b0d183c9d4b719
2022-09-16 18:46:17 +02:00
jenkins-bot
61cbd18ff3 Merge "parser: Use a <meta> tag for the internal TOC_PLACEHOLDER" 2022-09-09 21:12:34 +00:00
daniel
d228095c7e Introduce HTMLTransformFactory
This enables HTMLTransform to be used outside of ParsoidHandler.

Bug: T310376
Change-Id: I8576ca4c2bc346a62f524ac7c0ebddd3f6b97f4f
2022-08-26 15:31:39 +01:00
daniel
cee1d08550 HTMLTransform: add more tests
This improves test coverage by testing the interaction of getters and
setters.

Change-Id: I90506d73e5571fda6c26eb72359e41951b3950c1
2022-08-26 11:33:39 +02:00
daniel
df0744f402 Split setOriginalData( ... ) to more related setters for encapsulation
By splitting the setOriginalData methods into several setters, we remove
any knowledge about the structure of the request body from HTMLTransform.
It also allows us to be specific about which data to operate on.

This also removes the concept of page bundles from the public interface
of HTMLTransform. PageBundle objects are used only internally.

Change-Id: If97a74ce251f281b7d980928a01b764d6ec0d0a4
2022-08-25 18:40:26 +02:00
C. Scott Ananian
0b10563895 parser: Use a <meta> tag for the internal TOC_PLACEHOLDER
Split out from the I44045b3b9e78e change.

This is consistent with what Parsoid will use for the TOC marker.

Bug: T287767
Bug: T270199
Bug: T311502
Depends-On: I1f607cf1ef1b61fb4d2e1880de756fb94d5a6b22
Change-Id: Ie63eed07b9bca1bfa07d4c256aba3728cedd8f93
2022-08-16 06:05:17 +00:00
Subramanya Sastry
86f2b26589 Don't hardcode Parsoid HTML version number in tests
* Without this fix, the test fails vendor patches whenever Parsoid's
  default version number is bumped in the Parsoid repo.

Change-Id: Icce7b61dfbbbbd57b4f1ed76a32d160e92b48b15
2022-08-12 14:29:25 -05:00
Derick Alangi
b078f598f9 Move transformHtmlToWikitext() and getSelserData() to HTMLTransform
This patch moves remaining transformation logic to a renamed (from
HTMLTransformInput -> HTMLTransform) class. Also, the HTMLTransform
class is moved to the correct directory, hence namespace (including
tests).

Some data files have been copied over to it's own sub-directory in
the correct place since HTMLTransformTest needs it. ParsoidHandler
class is fine where it is because its operation is what happens in
the REST land.

NOTE: The 2 remaining methods moved into HTMLTransform are the last
ones we intended to move into this class to make the refactoring of
html2wt() method complete in this context.

Change-Id: I8929931e1b0acf247abe9d826eef57f3e0d4e132
2022-08-11 07:50:53 +01:00
Roman Stolar
a68e641f9d Move Content::getParserOutput & AbstractContent::fillParserOutput to ContentHandler
Update/Create override classes of ContentHandler.
Soft-deprecate and remove method from Content and classes that override them.

Bug: T287158
Change-Id: Idfcfbfe1a196cd69a04ca357281d08bb3d097ce2
2021-09-29 13:10:51 +03:00
Cindy Cicalese
eed48e402b Detect and monitor against multiple Parser invocation during edit requests
Bug: T288707
Change-Id: I0cca8f9bcf1d6e964b8b06c0c4490e83f4fb1de5
2021-09-23 16:12:40 -05:00
C. Scott Ananian
1fd4a7af4e Introduce Tidy service
Refactor the old MWTidy singleton as a DI service.

Change-Id: I95605ea5fd22f53a7f90fe07a6a73fa6c959597a
2021-03-15 17:22:36 -04:00