This has been constantly mentioned as buggy and broken and there is no
official version of latin or Arabic (see the ticket for more details).
This can be turned back as an extension if needed by third party users.
Bug: T350684
Bug: T268143
Depends-On: I6180dca2c49b3119751766268acc56087aaf8414
Change-Id: Ifbf3c8954d885daf891f8d9efc11743d898302f0
* Discovered through a failing API test in the Parsoid repo.
* Added a new phpunit test to catch this in the future.
Change-Id: Ic6326b409c9420fec676060566879f9a37a80961
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
repo as well.
Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
Just methods where adding "static" to the declaration was enough, I
didn't do anything with providers that used $this.
Initially by search and replace. There were many mistakes which I
found mostly by running the PHPStorm inspection which searches for
$this usage in a static method. Later I used the PHPStorm "make static"
action which avoids the more obvious mistakes.
Bug: T332865
Change-Id: I47ed6692945607dfa5c139d42edbd934fa4f3a36
* ParserTestRunner: LocalisationCache needs to be reset since it has a
reference to LanguageNameUtils which has a copy of
$wgUsePigLatinVariant. Also factor out some
MediaWikiServices::getInstance() calls.
* In some other tests, set the variable.
Change-Id: I6c1e9bfad9790cf805809c28a3f8d45952cbb981
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.
Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.
This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be. Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method. The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).
Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.
Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
The Parsoid entrypoints should always have a "real" ParserOutput
passed as the ContentMetadataCollector object, so that recursive
invocations of extensions, etc, can set appropriate metadata
properties in the ParserOutput.
This is part of a belt-and-suspenders fix for T331084, where a
StubMetadataCollector is being used in production -- production should
never use a stub, it should always use a real ParserOutput object.
The other fix for T331084 is
I30ea2bb24e6c9b0950a8f46dc8e5b9bf5ee3378b, which ensures that if you
*were* to use a StubMetadataCollector in production, it wouldn't throw
an error when a numeric category string was encountered.
Bug: T331084
Change-Id: I8711a51fc1bcac48eae92ab1ba15a33fe05937ed
If variant conversion is not supported by Parsoid, fallback to using
the old LanguageConverter.
We still call parsoid to perform variant conversion in order to add
metadata that is missing when the core language converter is used.
Bug: T318401
Change-Id: I0499c853b4e301f135339fc137054bd760ee237d
Depends-On: Ie94aaa11963ec1e9e99136af469a05fa4005710d
Accept sr-Latn as well as sr-el as the language code for Serbian with
latin script.
This was broken when the parsoid library started to use BCP-47 codes
rather than internal MediaWiki codes. For now, we accept both, so we are
compatible with the version of the parsoid lib currently in the vendor
repo as well as the version picked by composer update.
Bug: T323985
Change-Id: If0b02be4f391b31fb75e2ad51e199a83707b0e3c
Language conversion shouldn't crash with a 500 when a variant is
requested for a language that does not support variants. This behavior
is especially annoying when manually calling REST endpoints with a
browser, since browsers routinely send Accept-Language headers.
Change-Id: I31a14cb184a7bf940b7d178c12b2e7829d2eca0f
When submitting HTML to transform/html/to/html, the language specified
by the input's content-language header should be allowed to be the
source variant.
It should also be possible to just specify the source variant, and
derive the base language from that rather than the content-language
header or the page language.
Change-Id: I703c112358a921a8b0c9e63b70fd820ae3ea16fc
Use the content language from the header, and give that the highest
priority when identifying the page language.
Bug: T317019
Change-Id: Ibb0671f1b873ef83a4d53824a9c4c17726e68635
This reverts Ib73841bcc6c101bbe8a76f76dc81553290726039 and re-applies
I55a58f9824329893575a532cd10b9422ededb9ba with some changes: The source
variant is passed in explicitly. More complete handling of the input
language will be added in a follow-up.
Original description:
This class is used in ParsoidHandler::languageConversion
It uses the Parsoid to perform the actual conversion of the content
to a language variant.
The source language is determined using the PageBundle or the page
language from the Title.
To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.
Bug: T317019
Change-Id: Ida1a040628c26ac2ef108b0c90a3d3285a493b0e
This class is used in ParsoidHandler::languageConversion
It uses the Parsoid to perform the actual conversion of the content
to a language variant.
The source language is determined using the PageBundle or the page
language from the Title.
To encapsulate Parsoid related concepts, the class has the ability
to create Parsoid\Config\PageConfig if not provided.
Bug: T317019
Change-Id: I55a58f9824329893575a532cd10b9422ededb9ba