The interaction between the title cache, the link cache and the parser
tests is very strange. With different parser tests and different
extension enabled it can fail and seems not very deterministic
Follow-Up: Ie4b67106512fb1a3a1b595dc4f6036276db96378
Change-Id: I55501eea7de739cac044b22caec150089183620b
The instance in the title cache does not see the reset of the id (lazy
loaded Title::mArticleId). When a title instance with Title::newFromText
is used from cache the instance can assume the title does not exists,
while the page was created. Remove the cached instance to get a fresh
instance which can do a fresh db lookup for the id when needed.
This should remove the issues from the parser test
with wrong title states
Follow-Up: Id056580c7b869ae4984de5e2c89fb4687eecf7bd
Follow-Up: Ibc8e0ddbe9e53c3334b9c26ec2d1eda976c2a62b
Change-Id: Ie4b67106512fb1a3a1b595dc4f6036276db96378
Avoid clone of Parser object, create a new one
Reset the UrlUtils services after changing the server setting,
it is a dependency of the ParserFactory and gets initalizied earlier now
Bug: T250448
Change-Id: Ie62250242965d3d90873909795ced2cbda506ddb
In the parser tests for LabelledSectionTransclusion a page transcludes
another page. On creation of the first page the second page does not
exists. The parser parsed the first page and after that the title cache
contains a title object for the second page with an article id of 0,
after the creation of the second page the article id in the title object
needs to be reset as it gets reused in the test.
Before 880fc5da the title object was in the title cache on page
creation, as the addArticle function was using Title::newFromText, which
results in the correct reset of the id in the cached object.
Bug: T342875
Change-Id: Id056580c7b869ae4984de5e2c89fb4687eecf7bd
Parsoid CI broke with b42062e7. It looks like the title cache clear
is still needed.
For anyone interested in digging deeper into this, you can revert
this patch and run the command below to see a failing test.
* composer phpunit:entrypoint -- --testsuite parsertests --filter 'multiple templates'
Change-Id: Ibc8e0ddbe9e53c3334b9c26ec2d1eda976c2a62b
Promote the deprecation to an error in the context of PHPUnit tests. The
point of hard deprecations is to make tests fail and this will help with
that, and also with eventually promoting the deprecation to an error
outside of tests.
Adjust code in parser tests that was accessing MediaWikiServices via
Title too early.
Avoid hack of resetting the error handler after loading Setup.php, and
conditionally install MW's hadler instead. This is particularly
important in scenarios where an exception is thrown before the handler
is reset, because MW's exception handler may also access
MediaWikiServices.
Bug: T227900
Bug: T273261
Change-Id: I7c5234046379cf4abd25d65e78c0a99ac9f32600
This reapplied commit b4e797510c with
some fixes for the parsoid tests, which can be checked by diff the
patchset 1 and the latest patchset.
- Factored out resetting of services that only related to language
and variant into a function, sorted all services roughly by their
dependency relationship.
Also, only reset when the language or variant is configured by the
test case.
- Replaced manual redefine of ContentLanguage service with reset
method like others.
- Reset necessary services after setting the default language code in
staticSetup(), so the override in addArticles() can be removed.
- Use the 'skin' param of getText() for setting skin, so we don't need
to touch the context.
- Removed the override of wgUser, wgLang, and wgOut. They didn't live
in parser-related codes anymore.
- Setting user option is not the correct way here, the problem is
UserOption(Lookup|Manager) didn't get reset, so the default user
language is wrong; and the context have a cached Language object.
When the default language option updated, purge the cache with
setUser() should be sufficient.
Change-Id: I4ebeaef98ed9b7682701c0385c68145ee1e78951
This reverts commit b4e797510c.
That commit breaks Parsoid CI, as demonstrated by tests on empty
commits Iad5e05eda4b94ce9f5708c84526c59e25cafa7a0 (passing,
depending on this patch) and I8a85aa11a29bcc568dd8079bce01320b087e04ac
(failing, not depending on this patch).
Change-Id: Ifaf295d45c00783a37e056e01bee98567d4a7cf5
- Factored out resetting of services that only related to language
and variant into a function, sorted all services roughly by their
dependency relationship.
Also, only reset when the language or variant is configured by the
test case.
- Replaced manual redefine of ContentLanguage service with reset
method like others.
- Reset necessary services after setting the default language code in
staticSetup(), so the override in addArticles() can be removed.
- Use the 'skin' param of getText() for setting skin, so we don't need
to touch the context.
- Removed the override of wgUser, wgLang, and wgOut. They didn't live
in parser-related codes anymore.
- Setting user option is not the correct way here, the problem is
UserOption(Lookup|Manager) didn't get reset, so the default user
language is wrong; and the context have a cached Language object.
When the default language option updated, purge the cache with
setUser() should be sufficient.
Change-Id: I94103b86a02d6b971f70a0bb7ece1f22cd16e715
The Hooks class contains deprecated functions and the whole class is
going to get removed, so remove the convenience function and inline the
code.
Bug: T335536
Change-Id: I8ef3468a64a0199996f26ef293543fcacdf2797f
Note that the metadata isn't even checked unless the wt2html is passing
it may take several runs to get all the tests updated.
Follows-Up: Ieaca9152b9f0d0a853c0dfaff1bdca808110539e
Change-Id: I10f5b54a8ebffaf10111d57aa66e1220c5418ca7
Follow up to I854f89bd823aab297efe29cd4fdee675afd77752
Returns the behaviour to what it was before that patch.
Change-Id: I743fa1118c4c78863f3857f4dc70d82f6bf4f0ac
Clean up to bypass skipped tests early in both legacy and
parsoid test runs without duplicating the skipped test check.
Also got rid of two FIXMEs with this refactoring.
Change-Id: I854f89bd823aab297efe29cd4fdee675afd77752
This is now enabled in production (Ic5a4a9950d51f63b17f4c5e70516bec87b981aa5)
and not something we want to remain configurable.
It is removed from Parsoid in I52ddfd21ff2e72a34cb5eb68742e3dfb85c6ccf6
Change-Id: I6a4d7d33fb42270fc5da3a922aa0a959180fb33f
The TOC used to be language-converted in ParserOutput::getText(), but
it wasn't possible to apply custom rules defined in the wikitext
article body at ::getText() time. Remove the various hacks that we'd
added in an attempt to do so, which were made unnecessary by
I321cd31dae64bbf845d53282e5d28a55bc4ec319.
Bug: T306862
Change-Id: Ib12cd02e9ade91d5794462e8833f2aa3b45a51f2
* ParserTestRunner: LocalisationCache needs to be reset since it has a
reference to LanguageNameUtils which has a copy of
$wgUsePigLatinVariant. Also factor out some
MediaWikiServices::getInstance() calls.
* In some other tests, set the variable.
Change-Id: I6c1e9bfad9790cf805809c28a3f8d45952cbb981
Two bugs here: first, we were silently skipping a needed file update in
::updateKnownFailures() if the file didn't previously exist. We're going
to still avoid writing the file if it didn't previously exist, out of an
abundance of caution, but at least we'll now fail noisily so the problem
can be fixed.
Second, the `--parsoid` flag was overriding the result of
::getFileSkipMessage() so we were processing files for
`--updateKnownFailures` that should have been skipped because they
are not marked parsoid-compatible. This override made sense when
we were still debugging the integrated-mode parsoid support in the
ParserTestRunner, but it is not needed anymore.
Change-Id: Iba961ea327e54bb6bdc87399dbcba87cd57b6b20
This adds support for various options which add metadata
information to the parser test output, including 'showtocdata'.
This builds on I845694d4f2109a8b9125410e8533ca69bbea50fa in treating
the metadata output as a separate section.
Bug: T270312
Depends-On: I8023931d31e494df325b16d1b922539e20b58c51
Change-Id: I0c42ec2dc93c358f1cddab77324b229bcc163e83
This provides a bit of isolation from the actual layout and names
of properties in the object, as well as being a touch more readable
when debugging test failures.
Change-Id: I5ddca850f577b2ac24e237a2518f03983e79a51d
If a ParserTest mixes HTML output and metadata properties, it can
complicate HTML normalization and other test processes, especially
for Parsoid-mode bidirectional tests.
Support splitting metadata output into a separate section, named
`!! metadata`, with the standard options for legacy and parsoid
variants, like `!! metadata/php` and `!! metadata/parsoid` and
`!! metadata/parsoid+integrated` etc.
For compatibility, if the metadata flags are present on the test
and the new section is not present, we'll continue to handle the
metadata output as we have before, aka append or prepend the metadata
to the HTML.
Code search for uses of these options (uses in parsoid and core can
be ignored; uses of 'pst' are harmless when they are not combined
with another option):
https://codesearch.wmcloud.org/search/?q=%28%5E%7C%20%29%28%28showtitle%7Cshowindicators%7Cill%7Ccat%7Cpst%7Cshowflags%29%28%20%7C%24%29%7C%28extension%3D%7Cproperty%3D%29%29&i=nope&files=%5Etests%2Fparser%2F.*%5C.txt&excludeFiles=&repos=
Change-Id: I845694d4f2109a8b9125410e8533ca69bbea50fa
This is a clean up refactor to keep the metadata handling code in one place,
and to allow Parsoid to share it when running in integrated mode.
Change-Id: Ic4fda0397977413b9d742d47ab1fc5a7bc6f6b96
In 24949480eb (Oct 2021) injection of
the Table of Contents was moved from Parser to
ParserOutput::getText(); that is, from parse time to "postprocess text
possibly fetched from the cache" time. Unfortunately, this meant that
language conversion wasn't done on the table of contents (!), for
either traditional skins or the vector-2022 skin. This was fixed for
traditional skins by 059e62cde6 (Nov
2021), later amended by 0955046ca5 (Mar
2022), which added explicit language conversion to the TOC injection
process in ParserOptions::getText(). This fix was still not complete,
however, since editor-defined custom language-conversion rules defined
in the article body were no longer available to the language converter
when conversion was done in ParserOutput::getText(); the ToC title was
also being double-converted. Further, neither of these short-term
fixes addressed the output of ParserOutput::getSections() (now
ParserOutput::getTOCData()) which was used by vector-2022 to generate
the ToC in the sidebar and which remained entirely unconverted.
With 439656e019 (Jan 2023), we started
using the ::getSections()/::getTOCData() output for main article text
as well, but we kept the previous hack which post-converted the
generated HTML. This kept old skins at parity with the post-Oct-2021
status, but also didn't address the conversion issue for vector-2022.
The solution here is to perform language conversion on the ToC lines
at parse time along with the rest of the language conversion, and
store *converted* headings in TOCData. This has a number of side
effects:
1. The ToC information array available via the action API
is now language converted. This is *probably* what you wanted in the
first place, but could potentially be disruptive.
2. The ToC is consistently converted with the full set of
editor-defined custom conversion rules. Before Oct 2021, the ToC was
converted using the set of custom conversion rules *active at the
point at which the ToC was inserted* (which was usually near the
beginning of the article). When all conversion rules appear at the
very top of the article (best practice!):
-{en:Foo; en-x-piglatin:Bar;}
Lead section text
== Introduction ==
== Foo ==
There should be no difference before pre-Oct 2021 behavior and the
behavior after this patch: in both cases the rule defined in the
article body will be applied both to the heading and to the TOC, and
they will be consistent. (After Oct 2021 and before this patch, Foo
would be converted in the heading but not in the table of contents.)
But in cases where conversion rules are defined after the
TOC insertion point, the section heading as it appears in the body
text could appear different from the section heading as it appears in
the ToC. For example, if you defined a conversion rule just before
using a term in a heading:
== Introduction ==
-{en:Foo; en-x-piglatin:Bar;}-
== Foo ==
Before Oct 2021, this rule would be applied to the heading, but not to
the TOC (because the TOC insertion point was before the rule
definition). This would also be the behavior before this patch (since
rules defined in the article body are currently not applied at all).
After this patch, the rule will be applied to both the heading and the
TOC (because the rule application location is effectively "at the very
end of the article"). In the rare cases when rules are not defined in
glossaries at the top of the article, this type of usage (definition
immediately preceding first use) is expected to be the most common
and the behavior after this patch is more correct.
But alternatively, if you defined a conversion rule *after* using
the term in a heading:
== Introduction ==
== Foo ==
-{en:Foo; en-x-piglatin:Bar;}-
Before Oct 2021, this rule wouldn't be applied to the heading *or* the
TOC. Before this patch, this would also be the case (because rules
defined in the article body are not applied at all). After this
patch, the rule will be applied to the ToC but not the heading, since
the application point for the TOC is effectively at the end of the
article. This inconsistency is probably not desirable, but this case
is expected to be rare, and (assuming the editor intended 'Foo' to be
unconverted) the editor can work around the inconsistency by
explicitly protecting 'Foo' from conversion:
== -{Foo}- ==
-{en:Foo; en-x-piglatin:Bar;}-
And if the editor /intended/ Foo to be converted, the rule definition
should be moved earlier in the article. Again, putting all rules at
the top of the article is the preferred style, and works better with
the glossary style used by the zhwiki community (see also
https://www.mediawiki.org/wiki/Requests_for_comment/Scoped_language_converter
).
Bug: T306862
Depends-On: I0c9c9fec920f7cb028d935e552a8f11475a23ba7
Change-Id: I321cd31dae64bbf845d53282e5d28a55bc4ec319
This is just a clean up refactor to keep the option handling code in one
place, and to allow Parsoid to share it when running in integrated mode.
But in the process we tweaked a few other things for consistency:
* The 'pst' option now can add all of the parser output metadata flags
(not just 'showflags')
* The 'ill' and 'cat' options append to, instead of replacing, the
parser output.
* A new 'nohtml' option was added to allow suppressing the HTML in
cases where replacing, rather than appending to, the HTML is
desired.
* 'showflags' doesn't emit an 'extra' newline at the start of its output
any more.
The appropriate changes to parser tests were performed to accomodate
these tweaks.
Code search for uses of these options (uses in parsoid and core can be
ignored; uses of 'pst' are largely unchanged):
https://codesearch.wmcloud.org/search/?q=%28%5E%7C%20%29%28pst%7Cill%7Ccat%7Cshowflags%29%28%20%7C%24%29&i=nope&files=%5Etests%2Fparser%2F.*%5C.txt&excludeFiles=&repos=
Depends-On: I5c22f456a3ae5ea25b59c4246d68965099c465cc
Depends-On: If307de5d683829beb552663c72288d065827cfb6
Change-Id: I61a67a0e6928463e3872be9a42ff6992c6754662
In production the user language is set to the variant code, not the base
language code, and the parser test runner should match that. Similarly,
we don't need to explicitly ::setLanguage() in the RequestContext; let
it pick up the appropriate language (actually, variant) from ::setUser().
This exposes a bug in the way ToC conversion is done, which results in
double-conversion of the ToC title. This will be fixed in
I321cd31dae64bbf845d53282e5d28a55bc4ec319.
Some of the variant tests use Pig Latin as the target variant. But in
order to have good results from parser tests with Pig Latin as a
variant, it needs to be "supported" aka have at least one localized
message, so we can verify that the correct localization is being done.
Some tweaks were needed to LanguageNameUtils in order to ensure that
Pig Latin is supported if (and only if) the UsePigLatinVariant is
enabled in the site configuration.
This is a revision of Ia72c960b2c7914342eb4d5e3e63f2d6af14719ad, which
had to be reverted when it broke Parsoid CI: the ParserTestRunner
used $context->getLanguage() in some places (which is the *user*
language) where it ought to have used the *page* language.
Depends-On: I318c2fb8694c90efef07f1e2951c6d9aa6b3e82f
Change-Id: I0c9c9fec920f7cb028d935e552a8f11475a23ba7
Similar to If6e5d85c0937e12344678b36aa734fc0d97adc51
and I0b65d654ec04e8853184724edb8ebd67c8b1f65f
which accounts Icf9610ef0b3e9879024377078048066daeb300552
Change-Id: I03641049fa44ebabe2273caf079b4e0e4b11270b
In a91be7405fb92bac277b5beb64520fb5ef6c4472 and
I0b65d654ec04e8853184724edb8ebd67c8b1f65f Parsoid was changed to
represent an empty known-failures file as '{}' instead of '[]'.
FormatJson doesn't support the JSON_FORCE_OBJECT flag, so cast to
object before encoding to achieve behavior consistent with Parsoid.
This aligns with 244046c7cfd517ecf960258f604b00f3466ac3da in Parsoid.
Change-Id: Icf9610ef0b3e9879024377078048066daeb30055
The initialization of the $opts array seems to have gotten dropped in
some previous patch, causing a bunch of Parsoid tests to be run that
involve test options which we don't (yet) support. These failing
tests were presumably added to the known failures list, but they
should have just been skipped.
Change-Id: I3b833f537d8b23b3b881d45b863ad831bea7351f