I believe the more recent syntax is quite a bit more readable. The
most obvious benefit is that it allows for much less duplication.
Note this patch is intentionally only touching tests, so it can't
have any effect on production code.
Change-Id: Ibdde9e37edf80f0d3bb3cb9056bee5f7df8010ee
* Its not very clean to import Wikimedia\Stats in parsoid
* Mediawiki depends on parsoid
* As a workaround we can extract the 2 methods we need in SiteConfig
Bug: T354908
Change-Id: I696131cfba6ccc26ae1f705f216e221a7c3db175
Page bundle headers should not contain objects, as they are supposed
to represent plaintext HTTP headers.
Change-Id: I2a87a8233b9e42cbafdba63bdf513abe00d826ce
The `supportsContentModel` method is really querying Parsoid for the
set of content models it supports, so it makes sense to put it in the
Parsoid-specific SiteConfig service.
This is part of the work to deprecate and remove ParsoidOutputAccess.
Change-Id: I81eb2df8cef93ede95361a4e03185b3d58e5b84b
I don't think these do anything with the documentation generators
we currently use. Especially not in tests. How are tests part of a
"package" when the code is not?
Note how most of these are simply identical to the namespace. They
are most probably auto-generated by some IDEs but don't actually
mean anything.
Change-Id: I771b5f2041a8e3b077865c79cbebddbe028543d1
Covered:
- Constructor initialization with correct dependencies.
- Retrieve roles assigned to page content.
- Check if the specified role exists in the page content slots.
- Retrieve model name for specified role in page content
- Handle exception for non-existent role when retrieving model
- Retrieve content format for specified role in page content
- Retrieve serialized content for specified role in page content
- Handle exception for non-existent role when retrieving content
Change-Id: Ia2129e37b15bb8c09c0b26e487a9e311e66b932f
HtmlOutputRendererHelper should not crash hard if the ParserOutput has
no language set. ParserOutput may come from a variety of places, we
should be lenient about it not having a language.
However, we should try harder to actually set a language on ParserOutput
if we have one available. So this also updates
PageBundleParserOutputConverter to keep the ParserOutput's language in
sync wit the language header in the PageBundle.
Bug: T349868
Bug: T353689
Bug: T359426
Change-Id: I2edf20dc3b199e22cda2f32bc858c21ca7d8f4bd
ParserOutput::getText() is not a simple getter, but does
transformations on the "text" of the ParserOutput; the simple getter
is named ::getRawText().
To maintain consistency, rename ParserOutput::setText() to
::setRawText() and the property name ParserOutput::$mText to
::$mRawText so future readers are not confused.
The JSON property name as it appears in the serialized ParserCache
is left as 'Text' so that we don't have any forward- or backward-
rollback issues.
Change-Id: I3ef34814ab9473cc70d0a6806e8c5a4a02b73491
This class belongs with the rest of the Parsoid output stash code.
This class has been marked @unstable since 1.39 and thus the move
does not need release notes.
Change-Id: I16061c0c28b1549fbe90ea082cc717fee4a09a6e
There are a couple of user options related classes already,
and the T321527 work on dynamic defaults is going to add
even more. Let's move them into a separate namespace
to make core a bit more organized.
Old name is kept as an alias for compatibility purposes.
Bug: T321527
Bug: T352284
Change-Id: I9822eb1553870b876d0b8a927e4e86c27d83bd52
This nominally takes a string-valued language code conforming to the
BCP-47 standard, but this is often generated from a Bcp47Code object.
Since the MediaWiki Language code implements Bcp47Code, we may have
the case where we have a Language object in hand (but typed as a
Bcp47Code not Language) and call Language::toBcp47Code() only to pass
it to LanguageCode::bcp47ToInternal to convert it back to a
mediawiki-internal code.
We can save steps and be more efficient if allow the parameter to be a
Bcp47Code object, and write a fast path for the special case where
that Bcp47Code happens to be a Language object and we can simply call
Language::getCode() to obtain the internal code.
Change-Id: I24932449b8c40e3a5072748d87667184f4befa67
Remove parser creation from service creation
In ParsoidSiteConfig inject the ParserFactory and call getMainInstance
later, ParsoidSiteConfig is created often without calls to the parser.
For ParsoidDataAccess store the factory and call it when needed.
Bug: T343070
Change-Id: Ib3acadaf190383e4a8b3d266a9fd75c9b20c6649
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
repo as well.
Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
Unit tests should not access the ExtensionRegistry singleton. This is
similar to how MediaWikiServices is disabled, but needs to be done
separately because ExtensionRegistry is not a service.
Make ExtensionRegistryTest use a mocked SettingsBuilder to avoid
triggering the exception when SettingsBuilder tries to access the global
instance of ExtensionRegistry.
Inject data from ExtensionRegistry into Parsoid's SiteConfig to keep
SiteConfigTest a working unit test.
Change-Id: I0a04c82250582fed7a66c1e10868d9b4f3823a28
wfParseUrl falls back to the global service locator as of I706ef8a5.
This will soon be disallowed in unit tests (see I5117eab9), and all the
classes updated in this patch are covered by a unit test that would then
fail.
SiteConfig already has a UrlUtils object available, so just use that.
In the other classes, there is no need to inject a UrlUtils service and
we can instead adopt parse_url, because these didn't depend on our
site-configurable or custom parsing logic. For precedent see also
change I6492f5142861513e4a7, I1e76d2f5aef, and lots of other examples
in Codesearch for parse_url().
The warnings about parse_url() in UrlUtils.php have been obsolete
since about PHP 5.4, when it started to support protocol-relative
URLs, non-slash protocols like "mailto", and deal with spaces/newlines
correctly (https://3v4l.org/YWUkl).
This patch was partly copied from PS 20 of I5117eab9.
Co-Authored-by: Timo Tijhof <krinkle@fastmail.com>
Change-Id: I98ea4670e842d11598664f058d8c90a900477be4
LanguageVariantConverterUnitTest: don't mock a method in the Parsoid
class that no longer exists.
ParsoidParser: pass a Bcp47Code (in the form of a Language object),
not a string, when selecting the preferred variant for the output
Followup-To: Ib8554f98b1c653df3864110e0e66796b8da67b5f
Change-Id: I32fd64a9495b8aed729b0b5b00535180006e0223
* SiteConfig::variants() was replaced by ::variantsFor()
* SiteConfig::langConverterEnabled() was replaced by ::langConverterEnabledBcp47()
Change-Id: I2dc510fcf0f03304f01c14cff92d5dd50736f062
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
Because of this, you can see that the current parser cache key
(in deployed production code) doesn't have 'useParsoid=1' in it.
Ex: View source on enwiki:Hospet shows that the parser cache key
there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".
The only reason this doesn't conflict with legacy parser output
is because we use "parsoid-pcache", a diferent cache instance than
"pcache" used for legacy parser output. But if/when we decide to use
the same parser cache instance, this could cause cache corruptions.
With FlaggedRevisions, where a single "stable-pcache" parser cache
instance is used, in local testing, this was causing Parsoid HTML to be
saved without "useParsoid=1", and so Parsoid HTML was being returned
for legacy parser cache requests.
* In addition, fix the code in PageBundleParserOutputConverter to copy
over internal metadata (which includes used options). This ensures
that any tracked parser options aren't lost and the right parser cache
key is constructed later on.
* Added / updated a number of new tests that verifies that usedOptions
is tracked correctly in the useParsoid code paths. The tests fail
without the code changes in this patch.
Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
This an issue introduced by I8711a51fc1bcac48, which
caused duplicate variant conversion to be applied in some cases.
The reason is that the $parserOutput and $processedParserOutput fields
in HtmlOutputRendererHelper ended up being the same object.
Change-Id: Ic1fbc8815ef74beba6dae927563a9945b6dab1a1
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.
Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.
Do the same with the corresponding TitleFactory methods.
Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).
Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
This is now enabled in production (Ic5a4a9950d51f63b17f4c5e70516bec87b981aa5)
and not something we want to remain configurable.
It is removed from Parsoid in I52ddfd21ff2e72a34cb5eb68742e3dfb85c6ccf6
Change-Id: I6a4d7d33fb42270fc5da3a922aa0a959180fb33f
Just methods where adding "static" to the declaration was enough, I
didn't do anything with providers that used $this.
Initially by search and replace. There were many mistakes which I
found mostly by running the PHPStorm inspection which searches for
$this usage in a static method. Later I used the PHPStorm "make static"
action which avoids the more obvious mistakes.
Bug: T332865
Change-Id: I47ed6692945607dfa5c139d42edbd934fa4f3a36
This is a slightly stricter test than we'd previously used to check
the validity of the provided source language parameter.
Change-Id: I22e9c5cf6c30ce737884162970a1eb349549c86d
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.
Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.
This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be. Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method. The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).
Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.
Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
The Parsoid entrypoints should always have a "real" ParserOutput
passed as the ContentMetadataCollector object, so that recursive
invocations of extensions, etc, can set appropriate metadata
properties in the ParserOutput.
This is part of a belt-and-suspenders fix for T331084, where a
StubMetadataCollector is being used in production -- production should
never use a stub, it should always use a real ParserOutput object.
The other fix for T331084 is
I30ea2bb24e6c9b0950a8f46dc8e5b9bf5ee3378b, which ensures that if you
*were* to use a StubMetadataCollector in production, it wouldn't throw
an error when a numeric category string was encountered.
Bug: T331084
Change-Id: I8711a51fc1bcac48eae92ab1ba15a33fe05937ed