These methods should be made private in the next release, but
hard-deprecate them for 1.34.
Tweak the return value of the attribute whitelist to be an
associative rather than a sequential array, which makes the
lookup of allowed attributes more efficient and avoids an
array_flip for every html element sanitized.
Bug: T221677
Change-Id: I17d734937accec6c2679dbe17328cf9554bd556a
The Preprocessor_DOM implementation doesn't interact well with PHP memory
profiling, and has some limitations not present in the Preprocessor_Hash
implementation (see T216664). There is no reason to keep around two
versions of the preprocessor: it just complicates on-going wikitext
feature development.
Hard deprecate use of Preprocessor_DOM, so we can remove the redundant
code in a future release.
Bug: T204945
Depends-On: Id38c9360e4d02b570996dbf7a660f964f02f1a2c
Change-Id: Ica5d1ad5b1e677542962fc36d582a793f941155e
This changeset implements T89432 and related tickets and is based on exploration
done at the Prague Hackathon. The goal is to identify tests in MediaWiki core
that can be run without having to install & configure MediaWiki and its dependencies,
and provide a way to execute these tests via the standard phpunit entry point,
allowing for faster development and integration with existing tooling like IDEs.
The initial set of tests that met these criteria were identified using the work Amir did in
I88822667693d9e00ac3d4639c87bc24e5083e5e8. These tests were then moved into a new subdirectory
under phpunit/ and organized into a separate test suite. The environment for this suite
is set up via a PHPUnit bootstrap file without a custom entry point.
You can execute these tests by running:
$ vendor/bin/phpunit -d memory_limit=512M -c tests/phpunit/unit-tests.xml
Bug: T89432
Bug: T87781
Bug: T84948
Change-Id: Iad01033a0548afd4d2a6f2c1ef6fcc9debf72c0d
If {{REVISIONID}} results in a re-parse, that re-parse will be post-send
unless the user has canonical parser options and will need the output for
page views anyway (e.g. the refresh after editing).
Also make getPreparedEdit() allow lazy-loading of the parser output via
a callback. A magic __get() method handles objects created the new way
but accessed by other code the old way.
Bug: T216306
Change-Id: I2012437c45dd605a6c0868dea47cf43dc67061d8
HTML, generated by some infoboxes and perhaps other places, gets
stripped in a way that merges words together that should not be
merged. Add tr, th, and td to the list of tags that should force
word separation.
Bug: T218001
Change-Id: Ib374339628b1f543ea4e07f24aa3e3b76f3117b5
The addModuleScripts() methods were deprecated in 1.31 and 1.32,
these are now removed.
The getModuleScripts() are now deprecated as well, always returning
an empty array. To be removed in 1.34.
Depends on commits for bundled/wmf-deployed extensions that
remove the last few remaining callers to the deprecated functions
in: 3D, Collection, Flow, GlobalUserPage, and Wikibase.
Bug: T188689
Depends-On: If9f0bc6aef85117587fa1929f34f8861c8d80314
Depends-On: Ia8d41b97fbf6822f5f8f7ac889408acce1ac9a3a
Depends-On: I503b919739ea474ff33726815b0da55e2f7e2724
Depends-On: I236ef637fd03b810a46eb361e25067a037e9d183
Depends-On: I62e17779753b977a452cc0c9694947941e999cc3
Change-Id: I5a19b8f164ccf666485d2971202194b747f882df
Remex is pure PHP so there is no reason to use an external tidy any
more. Configuration variables and implementation classes were
deprecated in 1.32 or earlier. We've kept only $wgTidyConfig
which can be used for experimental features or debugging Remex.
Bug: T198214
Change-Id: I99d48f858d97b6e1d1e6cd76a42c960cc2c61f9f
This adds a method to LinkFilter to build the query conditions necessary
to properly use it, and adjusts code to use it.
This also takes the opportunity to clean up the calculation of el_index:
IPs are handled more sensibly and IDNs are canonicalized.
Also weird edge cases for invalid hosts like "http://.example.com" and
corresponding searches like "http://*..example.com" are now handled more
regularly instead of being treated as if the extra dot were omitted,
while explicit specification of the DNS root like "http://example.com./"
is canonicalized to the usual implicit specification.
Note that this patch will break link searches for links where the host
is an IP or IDN until refreshExternallinksIndex.php is run.
Bug: T59176
Bug: T130482
Change-Id: I84d224ef23de22dfe179009ec3a11fd0e4b5f56d
Future parsers will not support the output generated with tidy disabled.
Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.
No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.
Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
This reverts commit 1bb5b58eb1.
A month has passed, the workaround for old parser cache entries
should not be needed anymore.
Bug: T203716
Change-Id: I446b47cc6b4c43aaae33675d62086d842b04ddcb
During development a lot of classes were placed in MediaWiki\Storage\.
The precedent set would mean that every class relating to something
stored in a database table, plus all related value classes and such,
would go into that namespace.
Let's put them into MediaWiki\Revision\ instead. Then future classes
related to the 'page' table can go into MediaWiki\Page\, future classes
related to the 'user' table can go into MediaWiki\User\, and so on.
Note I didn't move DerivedPageDataUpdater, PageUpdateException,
PageUpdater, or RevisionSlotsUpdate in this patch. If these are kept
long-term, they probably belong in MediaWiki\Page\ or MediaWiki\Edit\
instead.
Bug: T204158
Change-Id: I16bea8927566a3c73c07e4f4afb3537e05aa04a5
Previously, getCacheTime would default to the current time, potentially
causing the return value to change over subsequent calls. With this change,
the value is determined on the first call, and then remembered for subsequent
calls.
Bug: T205464
Change-Id: If240161c71d523ad5b0d33b9378950e0bebceb6e
Otherwise I get errors every time I try to run PHPUnit on includes/ or
includes/parser, because it tries to run ParserIntegrationTest.php and
fails. Apparently the <exclude> in suite.xml doesn't work if PHPUnit is
invoked on a directory.
Bug: T201278
Change-Id: I7d09576bee2705d8516152e8fa671da8dac40233
Certain html tags imply a word break, but our html stripping doesn't
understand that at all. Adjust the html stripping to inject whitespace
for all block level tags (per MDN) along with the <br> element.
Bug: T195389
Change-Id: I9fbfac765ea88628e4f9b2794fb54e1cd0060203
Partially and temporarily reverts I1641b7995 to deal with cached
HTML the same way the old code did.
Bug: T203716
Change-Id: I29846a6513f6b580b429c0bfe6c310ada50b28bb
This injects the new, unsaved RevisionRecord object into the Parser used
for Pre-Save Transform, and sets the user and timestamp on that revision,
to allow {{subst:REVISIONUSER}} and {{subst:REVISIONTIMESTAMP}} to function.
Bug: T203583
Change-Id: I31a97d0168ac22346b2dad6b88bf7f6f8a0dd9d0
This allows optimization for situations in which a caller
needs the meta-data of a ParserOutput, and the respective
ContentHandler can provide that meta-data without generating
HTML output.
Bug: T194048
Change-Id: I786d294d18a6a2e3cea61577313e21b578c44f1e
RevisionRenderer is the MCR replacement for Content::getParserOutput,
as outlined in <https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PageUpdater>.
Note: This change also introduces quite a bit of code for
merging ParserOutput objects.
Bug: T194048
Change-Id: I871978bf79f67c9e7954fb3fc8528d6e365f2cc1
Instead of applying wrapping the the parser and unwrapping in
ParserOutput::getText(), turn this around and apply wrapping in getText(),
and only if desired.
This avoids search&replace logic for unwrapping, and it also makes it a lot
easier to merge the output of multiple slots for MCR output.
This changes behavior in two hopefully irrelevant ways:
1) the limit report comments will be inside the wrapper div, instead of
following it.
2) if HTML with a wrapper div is explicitly injected into a ParserOutput
object, it will not be possible to unwrap the text.
Bug: T174035
Change-Id: I1641b7995af9bd297f1acd610d583fbf874f34e0
I wasn't sure how to convert the rest of the occurrences in core (there
are a significant number).
Bug: T200881
Change-Id: I114bba946cd3ea8a293121e275588c3c4d174f94
This changes behavior in some tests by making them set $wgLanguageCode
as well as $wgContLang, but that seems like a good thing.
Bug: T200246
Change-Id: I936888f46ff9fefe2707efba837e2ce3a7ca5e3f
Having a different ParserOptions for each content model isn't feasible
in an MCR world. And the only thing using this was Wikibase, which has
been fixed to do what it needs in a different way.
Bug: T194263
Change-Id: I01373b29ee25fa9346c6b0317155be4ccdc8c515
If a lazy option is passed to ParserOptions::optionsHash(), we should
resolve the option so the hash can incorporate the proper value instead
of omitting it.
Also, completely unrelatedly, refactor the hook overriding in the unit
test because people won't stop whining about it in code review.
Change-Id: I2df78ed90875c229090b503b65f20fbbbba7f237
This combines two changes – defaulting tidying to on, previously off, and
defaulting the tidying library to RemexHtml, previously the tidy binary.
Config options are going to be a bit of a mess until we drop support for
the old tidy binary config route.
Bug: T185753
Depends-On: I0a8973f508fbf65160177b003260831639828eeb
Change-Id: I6879a77a78d780c7c056d807dde20682c6097d1b
This reverts commit efcef34d3d.
This is causing failures in CI for extensions
Depends-On: If9789a61d52f60882fc2f0226757c9d93e1c6362
Change-Id: I17cf305a951b2bf1f03285b12c3e131abcffd31d
This combines two changes – defaulting tidying to on, previously off, and
defaulting the tidying library to RemexHtml, previously the tidy binary.
Config options are going to be a bit of a mess until we drop support for
the old tidy binary config route.
Bug: T185753
Depends-On: I0a8973f508fbf65160177b003260831639828eea
Change-Id: I6879a77a78d780c7c056d807dde20682c6097d1a
This also removes all the in-core calls that had been kept for the
benefit of extensions, and causes them to not have any effect since
anything that had been calling them was already either a no-op or will
probably be broken now that nothing in core is setting or checking the
flags.
Change-Id: Id22c1a5a6d6a249debb14063ae3f8838d105b634
This transformation will find <style> tag with a "data-mw-deduplicate"
attribute. For each value of the attribute, the first instance will be
kept as-is, while any subsequent tags with the same value will be
replaced by a <link rel="mw-deduplicated-inline-style"> with its href
referring to the "data-mw-deduplicate" value using a custom scheme.
This also adds an $attribs parameter to Html::inlineStyle() so the
data-mw-deduplicate attribute can be added.
Note this doesn't actually depend on Ib931e25c, but action=mobileview
will break if it starts being used without that patch.
Bug: T160563
Change-Id: I055abdf4d73ec65771eaa4fe0999ec907c831568
Depends-On: Ib931e25ce85072000e62c486bbe5907f03372494
This will result in an exception from WikiPage::getParserOutput() if
anything was missed.
This also hard-deprecates ParserOptions::setWrapOutputClass( false )
Bug: T181846
Change-Id: Ica541e1f6b52f5eec6d28cff60ba64bf525258c7
Depends-On: Ie5d6c5ce34c05b8fe2353d3bb36b2a3a4166ec4b
Depends-On: Ibfaefde2f3811151ec712554cbc9cf2415ed017f
Depends-On: I55048bbae5d4d2d0c79c241c1784448b82db3bb4
Depends-On: I23a26ba0dfbe83007cd40e97d71a2139a5ecddc7
Depends-On: Ibc013a41f4a463f4014fbbce7ce27f8690161728
Depends-On: Ie936dff918dc0869503a924298b4580402038b52
And deprecate passing false for ParserOptions::setWrapOutputClass().
There are three cases for the Parser wrapper: the default
mw-parser-output, a custom wrapper, or no wrapper. As things currently
stand, we have to fragment the parser cache on each of these options,
which uses a nontrival amount of storage space (T167784).
Ideally we'd do all the wrapping as a post-cache transform, but
TemplateStyles needs to know the wrapper in use in order to properly
prefix its CSS rules (that's why we added the wrapper in the first
place). So, second best option is to make *un*wrapping be a post-cache
transform and make "custom wrapper" be uncacheable.
This patch does the first bit (unwrapping as a post-cache transform),
and a followup will do the second part once the deprecation process is
satisfied.
Bug: T181846
Change-Id: Iba16e78c41be992467101e7d83e9c3134765b101
This ensures that if GENDER is fed wfEscapeWikitext()'d version
of a username, it will normalize it.
See discussion on T182800.
Note, we do not need to worry about the case of a user named
"Project:*foo" as such namespace prefixes are illegal in
usernames.
Change-Id: Ic5a8fc76c28dca43ce8e334ef1874c2673433f00
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:
> print Sanitizer::stripAllTags( '<p data-foo="a<b>c">Hello</p>' );
c">Hello
We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:
> print strip_tags('1<span class="<?php">2</span>3');
1
> print strip_tags('1<span class="<?">2</span>3');
1
Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
ParserOptions::legacyOptions() has been sitting around since 1.17.
Originally it seems to have been intended as a way to avoid a mass cache
invalidation (similar to optionsHashPre30() from I7fb9ffca9). That code
was mostly removed in 1.23, but legacyOptions() was left behind because
it was also being used in a few places as "all cache-varying options"
(despite it not being documented for that purpose) where we'd rather
have any key than no key at all.
This patch creates an actual ParserOptions::allCacheVaryingOptions()
method for those use cases and deprecates the long-obsolete
legacyOptions().
It also makes more explicit the use of the "all cache-varying options"
fallback in ParserCache::getKey(), and doesn't bother trying to use that
fallback in ParserCache::get() where it no longer makes sense.
Change-Id: Ife1e54744155136a570210c03fe907f18f8e8ece
The pre-1.30 version of ParserOptions::optionsHash() was kept
temporarily as ParserOptions::optionsHashPre30() to prevent a cache
stampede on WMF sites when the hash format was changed in I7fb9ffca9.
Now that the cache has been rebuilt, it's no longer needed and we should
clean it up instead of leaving it forever to bitrot.
Change-Id: I037d8dfdefe72a295547bd331bc1454e69cb418d
The handling of the 'editsection' option prior to I7fb9ffca9 was
unusual: it was included in the cache key, but the getter didn't ever
flag it as "used". This was overlooked in I7fb9ffca9.
This fixes the handling to restore that behavior. It's no longer
considered to be a real parser option, so changing it won't make
isSafeToCache() fail while reading it won't flag it as 'used'.
But to keep Wikibase working (see T85252), if 'editsection' is supplied
in $forOptions optionsHash() will still include it in the hash so
whatever Wikibase is doing by forcing that doesn't break. The hash when
it is included is the same as was used in I7fb9ffca9 to reuse keys.
Once optionsHashPre30() is removed, Wikibase should be changed to use
some other method to fix T85252 so we can remove that hack from
optionsHash().
Change-Id: I77b5519c5a1122a1fafbfc523b77b2268c0efeb1
* ParserOptions is reorganized so it knows all the options and their
defaults, and can report whether the non-key options are at their
defaults.
* Definition of the "canonical" ParserOptions (which is unfortunately
different from the "default" ParserOptions) is moved from
ContentHandler to ParserOptions.
* WikiPage uses this to throw an exception if it's asked to cache
with options that aren't used in the cache key.
* ParserCache gets some temporary code to try to avoid a massive cache
stampede on upgrade.
Bug: T110269
Change-Id: I7fb9ffca96e6bd04db44d2d5f2509ec96ad9371f
Depends-On: I4070a8f51927121f690469716625db4a1064dea5
This will allow CSS to target just the parser output, without also
accidentally targeting the edit form, diff tables, and so on.
Bug: T37247
Change-Id: If4eb5bf71f94fa366ec4eddb6964e8f4df6b824a
Depends-On: I330c6aa4aaee045614b1801ed34bc9e03be69650
Depends-On: I52a518fa44e017841fe78474012cd69823e0a41d
You have to allow tests to cover private Parser methods that they
execute. Private methods will never have separate tests.
Change-Id: Ic842e2be4675f505dc26d1d3e1dd9000401df46c
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: I46261416f7603558dceb76ebe695a5cac274e417
It was omitted due to the new way in which parser test TestCase objects
are constructed. Should fix Jenkins double-execution of parser tests.
Change-Id: I8131c3b13f2e08f784bce46fee16051c14761304
Merge the PHPUnit parser test runner with the old parserTests.inc,
taking the good bits of both. Reviewed, pared down and documented the
setup code. parserTests.php is now a frontend to a fully featured
parser test system, with lots of developer options, whereas PHPUnit
provides a simpler interface with increased isolation between test
cases.
Performance of both frontends is much improved, perhaps 2x faster for
parserTests.php and 10x faster for PHPUnit.
General:
* Split out the pre-Setup.php global variable configuration from
phpunit.php into a new class called TestSetup, also called it from
parserTests.php.
* Factored out the setup of TestsAutoLoader into a static method in
Maintenance.
* In Setup.php improved "caches" debug output.
PHPUnit frontend:
* Delete the entire contents of NewParserTest and replace it with a
small wrapper around ParserTestRunner. It doesn't inherit from
MediaWikiTestCase anymore since integrating the setup code was an
unnecessary complication.
* Rename MediaWikiParserTest to ParserTestTopLevelSuite and made it an
instantiable TestSuite class instead of just a static method. Got rid
of the eval(), just construct TestCase objects directly with a
specified name, it works just as well.
* Introduce ParserTestFileSuite for per-file setup.
* Remove parser-related options from phpunit.php, since we don't
support them anymore. Note that --filter now works just as well as
--regex used to.
* Add CoreParserTestSuite, equivalent to ExtensionsParserTestSuite,
for clarity.
* Make it possible to call MediaWikiTestCase::setupTestDB() more than
once, as is implied by the documentation.
parserTests.php frontend:
* Made parserTests.php into a Maintenance subclass, moved CLI-specific
code to it.
* Renamed ParserTest to ParserTestRunner, this is now the generic
backend.
* Add --upload-dir option which sets up an FSFileBackend, similar
to the old default behaviour
Test file reading and interpretation:
* Rename TestFileIterator to TestFileReader, and make it read and buffer
an entire file, instead of iterating.
* The previous code had an associative array representation of test
specifications. Used this form more widely to pass around test data.
* Remove the idea of !!hooks copying hooks from $wgParser, this is
unnecessary now that all extensions use ParserFirstCallInit. Resurrect
an old interpretation of the feature which was accidentally broken: if
a named hook does not exist, skip all tests in the file.
* Got rid of the "subtest" idea for tidy variants, instead use a
human-readable description that appears in the output.
* When all tests in a file are filtered or skipped, don't create the
articles in them. This greatly speeds up execution time when --regex
matches a small number of tests. It may possibly break extensions, but
they would have been randomly broken anyway since there is no
guarantee of test file execution order.
* Remove integrated testing of OutputPage::addCategoryLinks() category
link formatting, life is complicated enough already. It can go in
OutputPageTest if that's a thing we really need.
Result recording and display:
* Make TestRecorder into a generic plugin interface for progress output
etc., which needs to be abstracted for PHPUnit integration.
* Introduce MultiTestRecorder for recorder chaining, instead of using
a long inheritance chain. All test recorders now directly inherit from
TestRecorder.
* Move all console-related code to the new ParserTestPrinter.
* Introduce PhpunitTestRecorder, which is the recorder for the PHPUnit
frontend. Most events are ignored since they are never emitted in the
PHPUnit frontend, which does not call runTests().
* Put more information into ParserTestResult and use it more often.
Setup and teardown:
* Introduce a new API for setup/teardown where setup functions return a
ScopedCallback object which automatically performs the corresponding
teardown when it goes out of scope.
* Rename setUp() to staticSetup(), rewrite. There was a lot of cruft in
here which was simply copied from Setup.php without review, and had
nothing to do with parser tests.
* Rename setupGlobals() to perTestSetup(), mostly rewrite. For
performance, give staticSetup() precedence in cases where they were
both setting up the same thing.
* In support of merged setup code, allow Hooks::clear() to be called
from parserTests.php.
* Remove wgFileExtensions -- it is only used by UploadBase which we
don't call.
* Remove wgUseImageResize -- superseded by MockMediaHandlerFactory which
I imported from NewParserTest.
* Import MockFileBackend from NewParserTest. But instead of
customising the configuration globals, I injected services.
* Remove thumbnail deletion from upload teardown. This makes glob
handling as in the old parserTests.php unnecessary.
* Remove math file from upload teardown, math is actually an extension
now! Also, the relevant parser tests were removed from the Math
extension two years ago in favour of unit tests.
* Make addArticle() private, and introduce addArticles() instead, which
allows setup/teardown to be done once for each batch of articles
instead of every time.
* Remove $wgNamespaceAliases and $wgNamespaceProtection setup. These were
copied in from Setup.php in 2010, and are redundant since we do
actually run Setup.php.
* Use NullLockManager, don't set up a temporary directory just for
this alone.
Fuzz tests:
* Use the new TestSetup class.
* Updated for ParserTestRunner interface change.
* Remove some obsolete references to fuzz tests from the two frontends
where they used to reside.
Bug: T41473
Change-Id: Ia8e17008cb9d9b62ce5645e15a41a3b402f4026a
Since in several cases, with an all-in-one commit, git's file rename
detection failed, I split the renames out into their own commit to
make review easier. Some changes here won't make complete sense without
the following commit.
* Moved TestsAutoLoader to tests/common/. It will be joined by a friend.
* Renamed ParserTest to ParserTestRunner, since the former name was
overly generic.
* Renamed TestFileIterator to TestFileReader. Please see the subsequent
commit for rationale.
* Moved parserTests.php to tests/parser/. It was the only file left in
tests/, and it should have been moved to tests/parser years ago,
analogous to phpunit.php.
* Renamed NewParserTest to ParserIntegrationTest. This was a tricky one,
apparently the name has to end in "Test" or else the structure test
will fail. Analogous to ParserMethodsTest etc. Rationale: because it's
not new anymore.
* Renamed MediaWikiParserTest to ParserTestTopLevelSuite and moved it to
the suites directory. A more descriptive name. Being in suites/
shields it from StructureTests, and is correct anyway.
Change-Id: Iddc6eaf815fdd64b3addb8570b4b6303ab99d634
* Split up testHelpers.inc into one class per file, with the file named
after the class per the usual convention. Put them in tests/parser
since they are all parser-related, even though a couple are reused by
other unit tests.
* Also rename parserTest.inc and parserTestsParserHook.php to follow the
usual convention, and split off ParserTestResultNormalizer
* Move fuzz testing out to its own maintenance script. It's really not
helpful to have fuzz testing, which is designed to run forever,
exposed as a PHPUnit test.
* Increased fuzz test memory limit, and increased the memory headroom for
getMemoryBreakdown(), since HHVM's ReflectionClass has an internal
cache which uses quite a lot of memory.
* Temporarily switched a couple of ParserTest methods from private to
public to support fuzz testing from a separate class -- I plan on
replacing this interface in a subsequent commit.
Change-Id: Ib1a07e109ec1005bff2751b78eb4de35f2dfc472
This class is intended to be an integration test of both preprocessor
implementations and their helper classes.
Change-Id: Iefbd6d8828bbc3278503a0f85efd7d1230a9d66c
teardownGlobals() was called at the end of testParserTest(), but not
when returning early or throwing a "skipped" exception. This caused a
test failure when VisualEditor tests were run after parser tests, due to
$wgThumbLimits having the value set in parser tests.
Change-Id: I12d9365813fc51c15f6649084c373f5b7ccfac26
* Instead of only testing the configured preprocessor, test each in turn.
* Fix a test error when testing Preprocessor_Hash by removing <equals>
tags -- only Preprocessor_Hash emits them, but they have no effect on
the expansion.
Change-Id: I596f6b66fc636b767c447af3450556bfebe28241
* Have TidySupport provide $wgTidyConfig instead of the legacy globals
* Add --use-tidy-config option to parserTests.php. This tells
TidySupport to use the tidy configuration from LocalSettings.php
instead of the traditional safe defaults.
* Add a way for TidySupport to disable tidy via $wgTidyConfig, using
driver=>disabled
Change-Id: Ie76e68e2d5238d0a1aef49a1a815c0d1cd8bfdae
tl;dr: Having unnessary complexity in security critical code is bad.
* Extra options add extra complexity and maintenance burden
** Thus we should only have one html output mode. well formed = false
was already vetoed in T52040, so lets go with WellFormed=true.
* Options which are used by very few people tend to get tested less
* Escaping is an area of code where we should be very conservative
* Having escaping rules depend on making assumptions about which
characters various browsers consider "whitespace" is scary
* $wgWellFormedXml=false has had a negative security impact in the
past (Usually not directly its fault, but has made other bugs
more exploitable)
* Saving a couple bytes (even less bytes after gzip taken into
account) is really not worth it in this context (imho).
Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed
It is expected that namespaces (except for NS_SPECIAL) will have a
paired subject and talk namespace. While not having the accompanying
talk namespace mostly works, it can cause unexpected issues when some
code paths (e.g. WikiPage::onArticleCreate()) expect it to exist.
Change-Id: I8f02fd886d0256679dfc10e1743204da4c6678b7
* Add a subtest index to the recorded test name, to avoid an SQL
duplicate key error.
* Introduce TestFileDataProvider. Previously, the order of named
parameters in TestFileIterator was relevant but undocumented, so
adding a subtest parameter broke phpunit tests.
* Don't implicitly commit (commitMasterChanges) an explicit transaction,
since that now causes a fatal error.
* Reset namespace cache as in NewParserTest.php, so that the MemoryAlpha
article insertion doesn't fail. This was only visible with --record
because the namespace cache is initialised by
SpecialVersion::getVersion() during recorder setup.
Change-Id: Ied4636b4acbf1d268e45901fed4d4e077b5ed666
(Previously done in f51d0d9a81 and
reverted in 543f46e9c08e0ff8c5e8b4e917fcc045730ef1bc.)
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content. (For tags that
"shadow" a HTML tag name, this results in the tag being treated as a
HTML tag. This currently only affects <pre> tags: if unclosed, they
are still displayed as preformatted text, but without suppressing
wikitext formatting.)
It also does not affect <includeonly>, <noinclude> and <onlyinclude>
tags. Changing this behavior now would be too disruptive to existing
content, and is the reason why previous attempt was reverted. (They
are already special-cased enough that this isn't too weird, for example
mismatched closing tags are hidden.)
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
Change-Id: Ide8b034e464eefb1b7c9e2a48ed06e21a7f8d434
The 'noxml' documentation in parserTests.txt was added in 2006 (7eea2398; r12504),
however it wasn't actually implemented.
It wasn't until 2009 (7aa4a8f9; r54767) that $wgWellFormedXml was created,
which defaults to true and has no relation to this option.
Remove the 'noxml' options from existing tests.
Change-Id: Ie3ae9f491b5747716080607b81b9763bf2bcc889
$wgAllowMicroDataAttributes and $wgAllowRdfaAttributes have been
introduced in MediaWiki 1.16 and required at this moment $wgHTML5
to be true. This last setting has been removed in MediaWiki 1.22.
To simplify the code maintenance and the configuration complexity,
those settings are removed and the features are always available.
RDFa users must now explicitly set $wgHtml5Version to a RDFa
version. Currently the correct values are:
- HTML+RDFa 1.0
- XHTML+RDFa 1.0
Bug: T130040
Change-Id: I17a7bff2cad170e381eabf0aec4e26e4fd0cddc3
This reduces the runtime of database-bound tests by about 40%
(on my system, from 4:55 to 2:47; results from Jenkins are
inconclusive).
The basic idea is to call addCoreDBData() only once, and have
a addDBDataOnce() that is called once per test class, not for
every test method lie addDBData() is. Most tests could be
trivially be changed to implement addDBDataOnce() instead of
addDBData(). The ones for which this did not work immediately
were left out for now. A closer look at the tests that still
implement addDBData() may reveal additional potential for
improvement.
TODO: Once this is merged, try to change addDBData() to
addDBDataOnce() where possible in extensions.
Change-Id: Iec4ed4c8419fb4ad87e6710de808863ede9998b7
Right now it forgets to reset $wgResourceBasePath, which means it
is inherited from the wikis's (or Jenkins') default settings which
is typically '/w'. That caused parser tests to behave as if pointers
to /extensions were outside /w.
Also update wgScriptPath to be '' instead of '/'. Otherwise this
can cause double-slash prefixed urls.
Change-Id: Ic455d62fca8fcac2c4ecc055cc0d7e311b70a94a
This reverts commit f51d0d9a81.
Breaks templates with non-closed </noinclude> tags, which
were previously acceptable.
Bug: T125754
Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content.
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
To consider:
* Should we keep previous behavior for unclosed <includeonly> /
<noinclude>? This would be particularly disruptive for these if
someone relied on the old behavior, and they're already
special-cased in places.
* Unclosed <pre> tags are now treated as HTML tags, and are still
displayed as preformatted text, but without suppressing wikitext
formatting.
Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
Introduce an ogv video file to the parser file testsuite, so that we
can use it later in TimedMediaHandler parsertests.
Change-Id: I6a3b307ad9c82e9df0aeec025934d736eec8375f
The MediaWiki test suite is painfully slow and delays merging of
changes. More than half of the time is spent in
ParserTest_Parser⁄parserTests::testParserTest which is the PHPUnit
wrapping class for the parser tests.
This patch let us extract the parser tests so we can run them
independently. By running them parallely with the rest of the tests,
that will speed up the gate processing time.
Mark the MediaWikiParserTest and NewParserTest class as belonging to the
test group 'ParserTests'. Will let us filter them out via PHPUnit
option --exclude-group
Introduce a new PHPUnit test suite 'parsertests' which loads the
MediaWiki core parser tests wrapper 'MediaWikiParserTest' and the suite
which loads the extensions parser tests (ExtensionsParserTestSuite.php).
This way we can run solely the parser tests with:
cd tests/phpunit
php phpunit.php --testsuite ParserTests
Wikimedia CI can then be configure to run two jobs:
A) php phpunit.php --exclude-group ParserTests
B) php phpunit.php --testsuite ParserTests
Bug: T114314
Change-Id: Ie819bab43163995048c073691c4c5d258f797c02
Changed some old bugzilla links to new phabricator links in comments,
test data and error message. This reduces the need for redirects from
old bugzilla to new phabricator from our source code.
Change-Id: Id98278e26ce31656295a23f3cadb536859c4caa5
* Split tidy implementations into a class hierarchy
* Bring all tidy configuration into a single associative array and
deprecate the old configuration.
* Remove $wgAlwaysUseTidy
This is preparatory to replacement of Tidy (T89331). I used the name
"Raggett" for things relating to Dave Raggett's Tidy, since if we use
"tidy" to mean the new abstract system as well as Raggett's tidy, it
gets confusing.
Change-Id: I77af1a16cbbb47fc226d05fb9aad56c58e8910b5
MediaWikiParserTest.php generates fake test classes with eval(). It uses
synthetic class names with U+2044 "fraction slash" as a separator, but
this turns out to be an unfortunate choice since in certain terminal
modes, it causes readline to return to the start of the line as if the
"home" key was pressed, without adding a character. This makes it
difficult to paste class names.
Change-Id: I1c66b9caf256b8d0535fb7ed6e52ed842e193f46
These tests all involve database access in some way,
and thus need @group Database tags.
These failed when setting a bogus database password
and then running the tests.
Change-Id: I7f113a79ac44d09d88ec607f76b8ec22bc1ebcf1
* There's a branch path in the sanitizer that depends on $wgUseTidy,
which means the test output differs from on wiki.
* In general, we should set these variables to match the wiki behaviour
in tests.
* Exposes T92892, Sanitizer removes empty tags when tidy is disabled.
* Tweaked tests for T19663 to use an extension tag to show that
HTML5 tags with non-word characters make it through the parser
intact (before being ultimately sanitized).
Change-Id: I09c72fd739e11a8b757f37dc4c790758d782ad73
This is a hard deprecation, with getSecondaryDataUpdates returning an
empty array and addSecondaryDataUpdate throwing an exception. This seems
prudent since there are no known users of these methods, and they
interfere with the parser cache:
DataUpdates are basically jobs, they need access to services to
function. That makes them inherently non-serializable. This interferes
with the function of the parser cache, which serializes ParserOutput
objects in order to persist them.
This could be solved by splitting DataUpdates into DataUpdateDefinitions
and DataUpdateHandlers, similar to how JobSpecification works with
wgJobClasses. That however seems pointless and overkill, since
ParserOutput already has a mechanism for storing arbitrary data,
including any info needed by an UpdateJob: the setExtensionData method.
After this change, the preferred method to introduce custom data updates
is to store any relevant data using setExtensionData and
implement Content::getSecondaryDataUpdates() if possible. If not,
use the 'SecondaryDataUpdates' hook to construct the necessary update
objects from the info stored using setExtensionData.
Change-Id: I0f6f49e61fa3d8904e55f42c99f342a3dc357495
* Use MediaWikiTestCase::getNewTempFile and getNewTempDirectory
instead of wfTempDir().
The upload api tests wrote a tempnam() file directly (where
wfTempDir() is typically shared with other systems and concurrent
runs). Use MediaWikiTestCase::getNewTempFile and
getNewTempDirectory instead.
This also ensures its removal by the teardown handler without
needing manual unlink() calls. And it doesn't rely on the test
passing. (Many unlink calls where at the bottom of tests,
which wouldn't be reached in case of failure).
* For the upload test, the presistent storing of
'Oberaargletscher_from_Oberaar.jpg' (downloaded from Commons)
was removed. Note that this didn't work for Jenkins builds anyway
as Jenkins builds set $wgTmpDirectory to a unique directory
in tmpfs associated with an individual build.
* For filebackend tests, moved directory creation from the dataProvider
to the main test.
Implemented addTmpFiles() to allow subclasses to register
additional files (created by other means) to be cleaned up also.
Removed unused $tmpName and $toPath parameters in data
provider for FileBackendTest::testStore. And fixed weird double
$op2 variable name to be called $op3.
* Skipped parserTest.inc, MockFileBackend.php, and
UploadFromUrlTestSuite.php as those don't use MediaWikiTestCase.
Change-Id: Ic7feb06ef0c1006eb99485470a1a59419f972545
Update $wgSVGConverters['rsvg'] to something closer to WMF production
configuration (there is a more complicated setup involving two
variants of rsvg for some reason).
Documentation is scarce, but 'rsvg-convert' appears to be the "modern"
way to call rsvg, with 'rsvg' being deprecated or not recommended.
Bug: T76476
Change-Id: I5ed877f3a5a1f1e97ae881c1d03fc977276182b6
My previous patch broke this: ApiStashEdit would stash ParserOutput
with no custom DataUpdates, but calling getSecondaryDataUpdates still
failed after unserialization. This patch should fix that.
Bug: T86305
Change-Id: Ic114e521c5dfd0d3c028ea7d16e93eace758deef
Follows-up b36d883.
By far most data providers are static (and PHPUnit expects them
to be static and calls them that way).
Most of these classes already had their data providers static
but additional commits sloppily introduced non-static ones.
* ResourceLoaderWikiModuleTest, 8968d8787f.
* TitleTest, 545f1d3a73.
Odd unused method 'dataTestIsValidMoveOperation' was introduced
in 550b878e63.
* GlobalVarConfigTest, a3e18c3670.
Change-Id: I5da99f7cd3da68c550ae507ffe1f725d31e7666f
The previous implementation would unescape '&', '=', '+', and '%'. The
first three will break the URL when unescaped in the query string, and
the last will break when unescaped anywhere.
The code is now changed to treat the path, query, and fragment parts of
the URL separately when unescaping. We also escape any unsafe characters
and ensure all percent-encodings use uppercase hexits.
And since the old name is no longer accurate,
Parser::replaceUnusualEscapes is deprecated in favor of
Parser::normalizeLinkUrl.
Bug: 57909
Change-Id: I77dc308d0d016c395ad737c08cf10a7711e25bbd
This includes the extension name, and it also does much
more stringent validation. In the (now rather unlikely)
event of a duplicate name, it will append a number.
This is important, as it is very confusing when this bug strikes.
There exists extensions like CharRangeSpan which will trigger this bug.
Bug: 42174
Change-Id: Idf14b4cbdb8ec103340d48855e0361acf707b101
- Added space after reserved words: function, foreach, if
- Combined 'else if' into elseif
- Added braces to one-line statements
- Added spaces after comma, before parentheses
Change-Id: Ie5bbf680d6fbe0f0872dab2700c16b1394906a72
MediaWiki installations that use the setting
$wgUseTidy = true; are unable to output
MathML since the well defined MathML elements
are filtered out by Tidy. This was reported as
http://sourceforge.net/p/tidy/patches/84/ .
This change hides MathML blocks from
Tidy.
Bug: 66516
Change-Id: Ib48b91238c3eddd6a86b62f6ce57801d7058f0d8
Setting this to default avoids failing parser tests,
when it is set in LocalSettings.php
Bug: 54576
Change-Id: I531d5839e9abe571c6c29f290bb159dabca34798
- Swap "$variable type" to "type $variable"
- Fixed spacing inside docs
- Makes beginning of @param/@var/@throws in capital
- Changed some types to match the more common spelling
Change-Id: Ia041964250d8b7c0349d79dc9b131c5b8696e795
Note that the old parser tests helper function `tidy()` never actually did
anything, since $wgUseTidy was forced to `false` in the parser test setup.
Remove this unused code, and replace it with our new tidy support.
Allows new parser test sections: 'html+tidy' denotes "tidied" HTML (open
tags closed and other fixups to original wikitext markup) which should be
applicable to any parser. 'html/php+tidy' is output specific to the PHP
parser with tidy turned on. The Parsoid backend will use the 'html/parsoid'
section if present, but if it is not present it will fallback to first the
'html+tidy' section, and if that is missing the 'html' section.
Note that 'tidy' has a large number of open bugs (see
https://bugzilla.wikimedia.org/show_bug.cgi?id=2542 ) and so in some cases
we deliberately do *not* use 'html+tidy' or 'html/php+tidy' clauses, in
order to avoid documenting broken output. In these cases, there is no
broken HTML in the PHP parser output, and so (in theory) the 'html' and
'html+tidy' sections would be identical (that is, if tidy didn't have
bugs).
Change-Id: Iba45f38774b221522dc3b6ae2d1312fb79f8f41f
This was noticed on enwiki after w: was marked as a local interwiki prefix
there. Links like [[w🇩🇪Foo]] ought to act like [[🇩🇪Foo]], not
[[de:Foo]].
Also adding a number of additional parser tests related to interwiki links.
Bug: 68085
Change-Id: If39af06edb4af2da85c9bcf43df7088181809fcf
- Added/removed spaces around parenthesis
- Added space after switch/if/foreach
- changed else if to elseif
Change-Id: I99cda543e0e077320091addd75c188cb6e3a42c2
This support was added in https://gerrit.wikimedia.org/r/111390
but no parser tests were added at that time.
Bug: 32189
Change-Id: I299ce844919b3f20b3ce116adf64b37dd95325d0
This test was causing failures locally when wgServer != localhost
because {{SERVERNAME}} is derived from wgServerName, not wgServer
and the test is only mocking wgServer.
> MagicVariableTest::testServername.. with data set #2 ('//localhost/')
> Magic servername should be <localhost:string>
> Failed asserting that two strings are identical.
> --- Expected
> +++ Actual
> @@ @@
> -localhost
> +krinkle.dev
This value is no longer derived by the Parser, but is instead
set using wfParserUrl in Setup.php.
Remove this obsolete test and add any missing test cases for
wgParserUrl to its test suite.
Change-Id: I7d7d201cb46841e63dac8ab9fd81b45b252264a3
Allow transparent tag hooks to be loaded during parser tests the way that
regular and function tag hooks can be.
Change-Id: I28ac9cc239628c248f72898d247fa1f6e2c308bd