Commit graph

64 commits

Author SHA1 Message Date
jenkins-bot
6a909bce4e Merge "Add namespace to WikitextContent" 2024-08-07 04:03:24 +00:00
Ebrahim Byagowi
4c270a72ac Add namespace to WikitextContent
It adds MediaWiki\Content namespace to WikitextContent
and two classes related.

Change-Id: Ib74e4c5b3edac6aa0e35d3b2093ce1d0b794cb6d
2024-08-06 17:42:51 +03:30
Fomafix
594e9fb023 Use MainConfigNames in tests
Change-Id: I6f79b3a36ce6605b86604fa9a99d55e8b8e67f7b
2024-08-06 13:58:47 +00:00
thiemowmde
4bd95cd96b Use MainConfigNames constants in tests where possible
I believe this makes the code less brittle, and also makes it a bit
more obvious what these strings are meant to represent.

Change-Id: Ia39b5c80af4b495931d0a68fd091b783645dd709
2024-07-10 10:11:22 +00:00
Umherirrender
f27c2433bb tests: Use namespaced classes (2)
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: I4ff4d0c10820dc2a3b8419b4115fadf81a76f7a2
2024-06-13 23:21:02 +02:00
C. Scott Ananian
a565e388f9 Move ParsoidOutputAccess::supportsContentModel() into Parsoid SiteConfig
The `supportsContentModel` method is really querying Parsoid for the
set of content models it supports, so it makes sense to put it in the
Parsoid-specific SiteConfig service.

This is part of the work to deprecate and remove ParsoidOutputAccess.

Change-Id: I81eb2df8cef93ede95361a4e03185b3d58e5b84b
2024-05-22 10:57:37 -04:00
Ebrahim Byagowi
848a9f279f Add namespace and deprecation alias to JsonContent
This patch introduces a namespace declaration for the
MediaWiki\Content to JsonContent and establishes a class
alias marked as deprecated since version 1.43.

Bug: T353458
Change-Id: I44abb1ab5bd1fabf9886dc1457e241d7cae068bc
2024-05-20 18:57:07 +03:30
C. Scott Ananian
8d031bcf87 Add ParserOptions::setCollapsibleSections()
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing.  This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.

Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.

Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
2024-04-29 12:11:09 -04:00
C. Scott Ananian
25f2bf0d34 tests: Refactor common code out of Parser used-options tests
Change-Id: I05e7da17d3ecaed111e545f09106a1f1ac9dd174
2024-04-25 14:13:08 +00:00
daniel
e7f21f6e64 HtmlOutputRendererHelper: fall back to page language
HtmlOutputRendererHelper should not crash hard if the ParserOutput has
no language set. ParserOutput may come from a variety of places, we
should be lenient about it not having a language.

However, we should try harder to actually set a language on ParserOutput
if we have one available. So this also updates
PageBundleParserOutputConverter to keep the ParserOutput's language in
sync wit the language header in the PageBundle.

Bug: T349868
Bug: T353689
Bug: T359426
Change-Id: I2edf20dc3b199e22cda2f32bc858c21ca7d8f4bd
2024-03-06 17:18:16 +00:00
Reedy
8771b338e4 tests: Namespace more parser classes
Change-Id: I35d6e3181ed885b8731ff1c4b5703459fb4223e4
2024-02-17 00:38:31 +00:00
Reedy
85396a9c99 tests: Fix @covers and @coversDefaultClass to have leading \
Change-Id: I5629f91387f2ac453ee4341bfe4bba310bd52f03
2024-02-16 22:43:56 +00:00
Reedy
e94e265a93 tests: Add Tests to PHP namespacing
Change-Id: I849268172751d50292e93aa75abe8094873f56bc
2024-02-16 19:10:11 +00:00
C. Scott Ananian
770d2bf040 [ParserOutput] Make 'enableSectionEditLinks' a ParserOption
This will allow the Translate extension to set this parser option
in the ArticleParserOptions hook, instead of mutating $options passed
to ParserOutput::getText() in the ParserOutputPostCacheTransform hook.

It ought to also help to handle the many places which call:

   ... = $parserOutput->getText( [
       'enableSectionEditLinks' => false,
   ] );

by allowing them to set the appropriate ParserOption instead
of passing arguments to ::getText().

Bug: T350626
Change-Id: I719c115194059060f7f888608417a194ac80cc92
2024-02-09 23:42:03 +00:00
C. Scott Ananian
242c6d2cf9 Introduce ParserOutput:setFromParserOptions() and use for preview flag
Bug: T341010
Co-Authored-by: cananian <cananian@wikimedia.org>
Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org>
Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6
2024-02-07 21:22:06 -05:00
Umherirrender
a3a9cf99cb tests: Use namespaced class names in @covers annotations
Assist from 8c9cb701e56226cac43fee2fa24b0d0e586f1733

Change-Id: I47897c499028d9e24c00ad0bc6ba7fd8002d9bc1
2024-01-27 01:11:07 +01:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
Amir Sarabadani
beb3261b8d Remove language coverter for Kazakh
This has been constantly mentioned as buggy and broken and there is no
official version of latin or Arabic (see the ticket for more details).

This can be turned back as an extension if needed by third party users.

Bug: T350684
Bug: T268143
Depends-On: I6180dca2c49b3119751766268acc56087aaf8414
Change-Id: Ifbf3c8954d885daf891f8d9efc11743d898302f0
2023-11-20 10:31:16 -05:00
Subramanya Sastry
ce89bee18b Followup to cf3f68b6: Handle bogus target variant codes
* Discovered through a failing API test in the Parsoid repo.
* Added a new phpunit test to catch this in the future.

Change-Id: Ic6326b409c9420fec676060566879f9a37a80961
2023-11-01 10:51:48 -05:00
Subramanya Sastry
6e5413b1d8 ParsoidParser: Record page title in ParserCache entries
* This lets post-cache transforms have access to the title.
* Specifically, DiscussionTools uses this to post-process the HTML.

Bug: T341010
Change-Id: I328f533e6cdb11c0c3a873d23bab1a113dfa39be
2023-10-30 13:36:36 -05:00
Subramanya Sastry
225be51fa7 ParsoidParser: Register watcher after creating ParserOutput object
* Updated documentation around this point
* Adjust tests to reflect this change.
* While it initially appeared that this can cause ParserCache impacts,
  'disableContentConversion' isn't part of the cache key and thus
  has no deployment impacts.

Change-Id: I535cb21cc104a358aa70829b030ae3751b76ae00
2023-10-17 17:51:19 -05:00
James D. Forrester
468e69bccc Namespace Sanitizer under \MediaWiki\Parser
Bug: T166010
Change-Id: Id13dcbf7a0372017495958dbc4f601f40c122508
2023-09-21 05:39:23 +00:00
jenkins-bot
201b487881 Merge "tests: Move test cases from /includes/ into sub folder" 2023-09-15 01:22:10 +00:00
Subramanya Sastry
062fd08e51 Remove all Parsoid debugApi references and uses
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
  classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
  repo as well.

Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
2023-09-14 14:48:48 -05:00
Umherirrender
790ae736c1 tests: Move test cases from /includes/ into sub folder
Follow move of the tested class
Most moves are part of T321882

Change-Id: I74ab45d6a5331dcb2ff0b65dc2cc7c6315146646
2023-09-13 00:09:05 +02:00
Daimona Eaytoy
6b1a62e169 Fix more non-database tests accessing the database
Mock the needed services, or set fixed values to avoid DB lookups, when
possible. Add the test to the Database group otherwise, e.g. for things
like Skin and Parser that use global state all over the place.

Change-Id: I8d87013d89accaf04d0ac19cb4b7216290383eb5
2023-08-06 15:30:41 +00:00
Subramanya Sastry
68805e2f50 ParsoidParser: Record ParserOptions watcher on ParserOutput object
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
  Because of this, you can see that the current parser cache key
  (in deployed production code) doesn't have 'useParsoid=1' in it.

  Ex: View source on enwiki:Hospet shows that the parser cache key
  there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".

  The only reason this doesn't conflict with legacy parser output
  is because we use "parsoid-pcache", a diferent cache instance than
  "pcache" used for legacy parser output. But if/when we decide to use
  the same parser cache instance, this could cause cache corruptions.

  With FlaggedRevisions, where a single "stable-pcache" parser cache
  instance is used, in local testing, this was causing Parsoid HTML to be
  saved without "useParsoid=1", and so Parsoid HTML was being returned
  for legacy parser cache requests.

* In addition, fix the code in PageBundleParserOutputConverter to copy
  over internal metadata (which includes used options). This ensures
  that any tracked parser options aren't lost and the right parser cache
  key is constructed later on.

* Added / updated a number of new tests that verifies that usedOptions
  is tracked correctly in the useParsoid code paths. The tests fail
  without the code changes in this patch.

Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
2023-07-11 10:53:11 -05:00
Derick Alangi
e076836219 HtmlToContentTransformTest: Ensure individual defaults with options set
Follow-Up: I1fade591e73034e071417d31fbdfff1a83180360
Change-Id: Ie470dddf51407a4c1717ad32bf19f7ef870fc92d
2023-06-28 16:33:58 +03:00
daniel
9338889682 HtmlToContentTransform: define default values for options
It should not be necessary to call setOptions() to perform a
transformation. All options should have defaults defined.

Change-Id: I1fade591e73034e071417d31fbdfff1a83180360
2023-06-28 09:41:43 +00:00
Umherirrender
d36073cdcf tests: Make some PHPUnit data providers static
Initally used a new sniff with autofix (T333745),
but some provide are defined non-static in TestBase class
and need more work to make them static in a compatible way

Bug: T332865
Change-Id: I889d33424f0c01fb26f2d86f8d4fc3de3e568843
2023-05-20 01:05:27 +02:00
jenkins-bot
c5152db020 Merge "Remove back-compat for <editsection>" 2023-04-28 15:59:12 +00:00
C. Scott Ananian
cfd9c516e1 Allow setting a ParserOption to generate Parsoid HTML
This is an initial quick-and-dirty implementation.  The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.

Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.

This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing.  This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.

We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata.  Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa.  T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.

Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2023-03-26 21:46:05 -04:00
Tim Starling
5e30a927bc tests: Make some PHPUnit data providers static
Just methods where adding "static" to the declaration was enough, I
didn't do anything with providers that used $this.

Initially by search and replace. There were many mistakes which I
found mostly by running the PHPStorm inspection which searches for
$this usage in a static method. Later I used the PHPStorm "make static"
action which avoids the more obvious mistakes.

Bug: T332865
Change-Id: I47ed6692945607dfa5c139d42edbd934fa4f3a36
2023-03-24 02:53:57 +00:00
Tim Starling
f600d07ec4 Fix tests that fail when $wgUsePigLatinVariant = false
* ParserTestRunner: LocalisationCache needs to be reset since it has a
  reference to LanguageNameUtils which has a copy of
  $wgUsePigLatinVariant. Also factor out some
  MediaWikiServices::getInstance() calls.
* In some other tests, set the variable.

Change-Id: I6c1e9bfad9790cf805809c28a3f8d45952cbb981
2023-03-17 19:56:32 +11:00
C. Scott Ananian
99e9d4927f Remove back-compat for <editsection>
The tag has been <mw:editsection> since at least 2011
(f0fd318a4e), we no longer need to
include the ancient <editsection> variant in our regexp and
test cases.

Change-Id: I5fd783556810ea13b07a69066ea6762d1a1863e1
2023-03-15 13:53:01 -04:00
jenkins-bot
5434c71393 Merge "Use Bcp47Code when interfacing with Parsoid" 2023-03-13 19:11:03 +00:00
C. Scott Ananian
5ad8dea80a Use Bcp47Code when interfacing with Parsoid
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.

Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.

This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be.  Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method.  The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).

Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.

Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
2023-03-13 13:25:09 -04:00
C. Scott Ananian
bce63d1912 Preserve non-PageBundle metadata set by Parsoid
The Parsoid entrypoints should always have a "real" ParserOutput
passed as the ContentMetadataCollector object, so that recursive
invocations of extensions, etc, can set appropriate metadata
properties in the ParserOutput.

This is part of a belt-and-suspenders fix for T331084, where a
StubMetadataCollector is being used in production -- production should
never use a stub, it should always use a real ParserOutput object.
The other fix for T331084 is
I30ea2bb24e6c9b0950a8f46dc8e5b9bf5ee3378b, which ensures that if you
*were* to use a StubMetadataCollector in production, it wouldn't throw
an error when a numeric category string was encountered.

Bug: T331084
Change-Id: I8711a51fc1bcac48eae92ab1ba15a33fe05937ed
2023-03-13 11:24:57 -04:00
C. Scott Ananian
d5b39490ca Remove back-compatibility code for ToC marker
Before 1.39 we used <mw:toc> and in 1.39 we switched to <mw:tocplace/>
(commit 24949480eb).  This was changed
to a <meta> tag in 1.40 (commit
0b10563895 and
fa8646ca7b) and the old content has long
since expired from the ParserCache.  Clean up the old ParserCache
transition code.

Change-Id: I3254d0acba31e107b50767797a2b0ad28aba59ee
2023-02-10 00:03:54 -05:00
Abijeet
5c113a833a LanguageVariantConverter: Add fallback to core LanguageConverter
If variant conversion is not supported by Parsoid, fallback to using
the old LanguageConverter.

We still call parsoid to perform variant conversion in order to add
metadata that is missing when the core language converter is used.

Bug: T318401
Change-Id: I0499c853b4e301f135339fc137054bd760ee237d
Depends-On: Ie94aaa11963ec1e9e99136af469a05fa4005710d
2022-12-11 12:12:33 +05:30
Abijeet
803092d4af tests: Remove unnecessary override to use pig-latin
Pig latin is enabled by default since
Ia80ad33cbf5e311fa8b84bd765a8df8d156f4c38

Change-Id: I0cd922bb0ee1fd7bce2ced2eacbdb6ed25ada7d8
2022-12-08 17:52:00 +05:30
daniel
b7ab24c218 Fix LanguageVariantConverter test
Accept sr-Latn as well as sr-el as the language code for Serbian with
latin script.

This was broken when the parsoid library started to use BCP-47 codes
rather than internal MediaWiki codes. For now, we accept both, so we are
compatible with the version of the parsoid lib currently in the vendor
repo as well as the version picked by composer update.

Bug: T323985
Change-Id: If0b02be4f391b31fb75e2ad51e199a83707b0e3c
2022-11-29 15:34:42 +01:00
daniel
e61b9b6680 page/{title}/html: handle unknown variant gracefully
Language conversion shouldn't crash with a 500 when a variant is
requested for a language that does not support variants. This behavior
is especially annoying when manually calling REST endpoints with a
browser, since browsers routinely send Accept-Language headers.

Change-Id: I31a14cb184a7bf940b7d178c12b2e7829d2eca0f
2022-11-22 23:03:55 +01:00
daniel
6fd3a7b0b0 Stash original wikitext when rendering unsaved content.
When visual editor switches from source mode to visual mode, we need to
stash the wikitext. Otherwise, we later lack the proper context to
convert the modified HTML back to wikitext.

Bug: T321862
Change-Id: Id611e6e022bf8d9d774ca1a3a214220ada713285
2022-11-04 17:17:32 +01:00
daniel
f545d5efeb Rename HTMLTransform to HtmlToContentTransform
* We will have several kinds of HTML transformations.
Rename HTMLTransform to indicate that its for converting HTML to Content
objects.

* Using Naming Convention 'Html' instead of 'HTML'

Change-Id: I506f3303ae8f9e4db17299211366bef1558f142c
2022-11-03 16:47:36 +01:00
daniel
4ad9c9b035 variant transform: allow input content-language to be a variant
When submitting HTML to transform/html/to/html, the language specified
by the input's content-language header should be allowed to be the
source variant.

It should also be possible to just specify the source variant, and
derive the base language from that rather than the content-language
header or the page language.

Change-Id: I703c112358a921a8b0c9e63b70fd820ae3ea16fc
2022-11-02 01:30:36 -04:00
Abijeet
715080cfd5 LanguageVariantConverter: Use content language code from HTTP header
Use the content language from the header, and give that the highest
priority when identifying the page language.

Bug: T317019
Change-Id: Ibb0671f1b873ef83a4d53824a9c4c17726e68635
2022-10-07 20:28:57 +05:30
daniel
5b0d1cfd35 Re-apply: Introduce LanguageVariantConverter
This reverts Ib73841bcc6c101bbe8a76f76dc81553290726039 and re-applies
I55a58f9824329893575a532cd10b9422ededb9ba with some changes: The source
variant is passed in explicitly. More complete handling of the input
language will be added in a follow-up.

Original description:

This class is used in ParsoidHandler::languageConversion

It uses the Parsoid to perform the actual conversion of the content
to a language variant.

The source language is determined using the PageBundle or the page
language from the Title.

To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.

Bug: T317019
Change-Id: Ida1a040628c26ac2ef108b0c90a3d3285a493b0e
2022-10-04 20:29:54 +02:00
Daniel Kinzler
c5bc391b2b Revert "Introduce LanguageVariantConverter"
This reverts commit 5c49a09e89.

Reason for revert: See https://phabricator.wikimedia.org/T319282

Bug: T319282
Change-Id: Ib73841bcc6c101bbe8a76f76dc81553290726039
2022-10-04 11:52:09 +00:00
Abijeet
5c49a09e89 Introduce LanguageVariantConverter
This class is used in ParsoidHandler::languageConversion

It uses the Parsoid to perform the actual conversion of the content
to a language variant.

The source language is determined using the PageBundle or the page
language from the Title.

To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.

Bug: T317019
Change-Id: I55a58f9824329893575a532cd10b9422ededb9ba
2022-10-03 16:13:29 +00:00