I believe this makes the code less brittle, and also makes it a bit
more obvious what these strings are meant to represent.
Change-Id: Ia39b5c80af4b495931d0a68fd091b783645dd709
Changes to the use statements done automatically via script
Addition of missing use statement done manually
Change-Id: I4ff4d0c10820dc2a3b8419b4115fadf81a76f7a2
The `supportsContentModel` method is really querying Parsoid for the
set of content models it supports, so it makes sense to put it in the
Parsoid-specific SiteConfig service.
This is part of the work to deprecate and remove ParsoidOutputAccess.
Change-Id: I81eb2df8cef93ede95361a4e03185b3d58e5b84b
This patch introduces a namespace declaration for the
MediaWiki\Content to JsonContent and establishes a class
alias marked as deprecated since version 1.43.
Bug: T353458
Change-Id: I44abb1ab5bd1fabf9886dc1457e241d7cae068bc
This is a non-default option that will add a <div> wrapper around
section contents to allow client-side collapsing. This is intended
for use by MobileFrontEnd, but could eventually be enabled for
desktop read views as well.
Since this parser option is in the "cache-varying options" set, any
caller who sets this option will fork the cache for that page, which
is reasonable as the parser options sets a ParserOutput property.
In the future our caching strategy will get smarter and we'll add
code which avoids the cache split and just transfers the appropriate
values from ParserOptions to ParserOutput flags after the cached
output is retrieved.
Bug: T359001
Change-Id: Ie93959a056ed15a728404eb293e4bb6eeaeb15c0
HtmlOutputRendererHelper should not crash hard if the ParserOutput has
no language set. ParserOutput may come from a variety of places, we
should be lenient about it not having a language.
However, we should try harder to actually set a language on ParserOutput
if we have one available. So this also updates
PageBundleParserOutputConverter to keep the ParserOutput's language in
sync wit the language header in the PageBundle.
Bug: T349868
Bug: T353689
Bug: T359426
Change-Id: I2edf20dc3b199e22cda2f32bc858c21ca7d8f4bd
This will allow the Translate extension to set this parser option
in the ArticleParserOptions hook, instead of mutating $options passed
to ParserOutput::getText() in the ParserOutputPostCacheTransform hook.
It ought to also help to handle the many places which call:
... = $parserOutput->getText( [
'enableSectionEditLinks' => false,
] );
by allowing them to set the appropriate ParserOption instead
of passing arguments to ::getText().
Bug: T350626
Change-Id: I719c115194059060f7f888608417a194ac80cc92
This has been constantly mentioned as buggy and broken and there is no
official version of latin or Arabic (see the ticket for more details).
This can be turned back as an extension if needed by third party users.
Bug: T350684
Bug: T268143
Depends-On: I6180dca2c49b3119751766268acc56087aaf8414
Change-Id: Ifbf3c8954d885daf891f8d9efc11743d898302f0
* Discovered through a failing API test in the Parsoid repo.
* Added a new phpunit test to catch this in the future.
Change-Id: Ic6326b409c9420fec676060566879f9a37a80961
* This lets post-cache transforms have access to the title.
* Specifically, DiscussionTools uses this to post-process the HTML.
Bug: T341010
Change-Id: I328f533e6cdb11c0c3a873d23bab1a113dfa39be
* Updated documentation around this point
* Adjust tests to reflect this change.
* While it initially appeared that this can cause ParserCache impacts,
'disableContentConversion' isn't part of the cache key and thus
has no deployment impacts.
Change-Id: I535cb21cc104a358aa70829b030ae3751b76ae00
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
repo as well.
Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
Mock the needed services, or set fixed values to avoid DB lookups, when
possible. Add the test to the Database group otherwise, e.g. for things
like Skin and Parser that use global state all over the place.
Change-Id: I8d87013d89accaf04d0ac19cb4b7216290383eb5
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
Because of this, you can see that the current parser cache key
(in deployed production code) doesn't have 'useParsoid=1' in it.
Ex: View source on enwiki:Hospet shows that the parser cache key
there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".
The only reason this doesn't conflict with legacy parser output
is because we use "parsoid-pcache", a diferent cache instance than
"pcache" used for legacy parser output. But if/when we decide to use
the same parser cache instance, this could cause cache corruptions.
With FlaggedRevisions, where a single "stable-pcache" parser cache
instance is used, in local testing, this was causing Parsoid HTML to be
saved without "useParsoid=1", and so Parsoid HTML was being returned
for legacy parser cache requests.
* In addition, fix the code in PageBundleParserOutputConverter to copy
over internal metadata (which includes used options). This ensures
that any tracked parser options aren't lost and the right parser cache
key is constructed later on.
* Added / updated a number of new tests that verifies that usedOptions
is tracked correctly in the useParsoid code paths. The tests fail
without the code changes in this patch.
Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
It should not be necessary to call setOptions() to perform a
transformation. All options should have defaults defined.
Change-Id: I1fade591e73034e071417d31fbdfff1a83180360
Initally used a new sniff with autofix (T333745),
but some provide are defined non-static in TestBase class
and need more work to make them static in a compatible way
Bug: T332865
Change-Id: I889d33424f0c01fb26f2d86f8d4fc3de3e568843
This is an initial quick-and-dirty implementation. The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.
Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.
This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing. This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.
We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata. Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa. T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.
Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
Just methods where adding "static" to the declaration was enough, I
didn't do anything with providers that used $this.
Initially by search and replace. There were many mistakes which I
found mostly by running the PHPStorm inspection which searches for
$this usage in a static method. Later I used the PHPStorm "make static"
action which avoids the more obvious mistakes.
Bug: T332865
Change-Id: I47ed6692945607dfa5c139d42edbd934fa4f3a36
* ParserTestRunner: LocalisationCache needs to be reset since it has a
reference to LanguageNameUtils which has a copy of
$wgUsePigLatinVariant. Also factor out some
MediaWikiServices::getInstance() calls.
* In some other tests, set the variable.
Change-Id: I6c1e9bfad9790cf805809c28a3f8d45952cbb981
The tag has been <mw:editsection> since at least 2011
(f0fd318a4e), we no longer need to
include the ancient <editsection> variant in our regexp and
test cases.
Change-Id: I5fd783556810ea13b07a69066ea6762d1a1863e1
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.
Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.
This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be. Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method. The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).
Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.
Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
The Parsoid entrypoints should always have a "real" ParserOutput
passed as the ContentMetadataCollector object, so that recursive
invocations of extensions, etc, can set appropriate metadata
properties in the ParserOutput.
This is part of a belt-and-suspenders fix for T331084, where a
StubMetadataCollector is being used in production -- production should
never use a stub, it should always use a real ParserOutput object.
The other fix for T331084 is
I30ea2bb24e6c9b0950a8f46dc8e5b9bf5ee3378b, which ensures that if you
*were* to use a StubMetadataCollector in production, it wouldn't throw
an error when a numeric category string was encountered.
Bug: T331084
Change-Id: I8711a51fc1bcac48eae92ab1ba15a33fe05937ed
Before 1.39 we used <mw:toc> and in 1.39 we switched to <mw:tocplace/>
(commit 24949480eb). This was changed
to a <meta> tag in 1.40 (commit
0b10563895 and
fa8646ca7b) and the old content has long
since expired from the ParserCache. Clean up the old ParserCache
transition code.
Change-Id: I3254d0acba31e107b50767797a2b0ad28aba59ee
If variant conversion is not supported by Parsoid, fallback to using
the old LanguageConverter.
We still call parsoid to perform variant conversion in order to add
metadata that is missing when the core language converter is used.
Bug: T318401
Change-Id: I0499c853b4e301f135339fc137054bd760ee237d
Depends-On: Ie94aaa11963ec1e9e99136af469a05fa4005710d
Accept sr-Latn as well as sr-el as the language code for Serbian with
latin script.
This was broken when the parsoid library started to use BCP-47 codes
rather than internal MediaWiki codes. For now, we accept both, so we are
compatible with the version of the parsoid lib currently in the vendor
repo as well as the version picked by composer update.
Bug: T323985
Change-Id: If0b02be4f391b31fb75e2ad51e199a83707b0e3c
Language conversion shouldn't crash with a 500 when a variant is
requested for a language that does not support variants. This behavior
is especially annoying when manually calling REST endpoints with a
browser, since browsers routinely send Accept-Language headers.
Change-Id: I31a14cb184a7bf940b7d178c12b2e7829d2eca0f
When visual editor switches from source mode to visual mode, we need to
stash the wikitext. Otherwise, we later lack the proper context to
convert the modified HTML back to wikitext.
Bug: T321862
Change-Id: Id611e6e022bf8d9d774ca1a3a214220ada713285
* We will have several kinds of HTML transformations.
Rename HTMLTransform to indicate that its for converting HTML to Content
objects.
* Using Naming Convention 'Html' instead of 'HTML'
Change-Id: I506f3303ae8f9e4db17299211366bef1558f142c
When submitting HTML to transform/html/to/html, the language specified
by the input's content-language header should be allowed to be the
source variant.
It should also be possible to just specify the source variant, and
derive the base language from that rather than the content-language
header or the page language.
Change-Id: I703c112358a921a8b0c9e63b70fd820ae3ea16fc
Use the content language from the header, and give that the highest
priority when identifying the page language.
Bug: T317019
Change-Id: Ibb0671f1b873ef83a4d53824a9c4c17726e68635
This reverts Ib73841bcc6c101bbe8a76f76dc81553290726039 and re-applies
I55a58f9824329893575a532cd10b9422ededb9ba with some changes: The source
variant is passed in explicitly. More complete handling of the input
language will be added in a follow-up.
Original description:
This class is used in ParsoidHandler::languageConversion
It uses the Parsoid to perform the actual conversion of the content
to a language variant.
The source language is determined using the PageBundle or the page
language from the Title.
To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.
Bug: T317019
Change-Id: Ida1a040628c26ac2ef108b0c90a3d3285a493b0e
This class is used in ParsoidHandler::languageConversion
It uses the Parsoid to perform the actual conversion of the content
to a language variant.
The source language is determined using the PageBundle or the page
language from the Title.
To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.
Bug: T317019
Change-Id: I55a58f9824329893575a532cd10b9422ededb9ba