Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:
> print Sanitizer::stripAllTags( '<p data-foo="a<b>c">Hello</p>' );
c">Hello
We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:
> print strip_tags('1<span class="<?php">2</span>3');
1
> print strip_tags('1<span class="<?">2</span>3');
1
Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
ParserOptions::legacyOptions() has been sitting around since 1.17.
Originally it seems to have been intended as a way to avoid a mass cache
invalidation (similar to optionsHashPre30() from I7fb9ffca9). That code
was mostly removed in 1.23, but legacyOptions() was left behind because
it was also being used in a few places as "all cache-varying options"
(despite it not being documented for that purpose) where we'd rather
have any key than no key at all.
This patch creates an actual ParserOptions::allCacheVaryingOptions()
method for those use cases and deprecates the long-obsolete
legacyOptions().
It also makes more explicit the use of the "all cache-varying options"
fallback in ParserCache::getKey(), and doesn't bother trying to use that
fallback in ParserCache::get() where it no longer makes sense.
Change-Id: Ife1e54744155136a570210c03fe907f18f8e8ece
The pre-1.30 version of ParserOptions::optionsHash() was kept
temporarily as ParserOptions::optionsHashPre30() to prevent a cache
stampede on WMF sites when the hash format was changed in I7fb9ffca9.
Now that the cache has been rebuilt, it's no longer needed and we should
clean it up instead of leaving it forever to bitrot.
Change-Id: I037d8dfdefe72a295547bd331bc1454e69cb418d
The handling of the 'editsection' option prior to I7fb9ffca9 was
unusual: it was included in the cache key, but the getter didn't ever
flag it as "used". This was overlooked in I7fb9ffca9.
This fixes the handling to restore that behavior. It's no longer
considered to be a real parser option, so changing it won't make
isSafeToCache() fail while reading it won't flag it as 'used'.
But to keep Wikibase working (see T85252), if 'editsection' is supplied
in $forOptions optionsHash() will still include it in the hash so
whatever Wikibase is doing by forcing that doesn't break. The hash when
it is included is the same as was used in I7fb9ffca9 to reuse keys.
Once optionsHashPre30() is removed, Wikibase should be changed to use
some other method to fix T85252 so we can remove that hack from
optionsHash().
Change-Id: I77b5519c5a1122a1fafbfc523b77b2268c0efeb1
* ParserOptions is reorganized so it knows all the options and their
defaults, and can report whether the non-key options are at their
defaults.
* Definition of the "canonical" ParserOptions (which is unfortunately
different from the "default" ParserOptions) is moved from
ContentHandler to ParserOptions.
* WikiPage uses this to throw an exception if it's asked to cache
with options that aren't used in the cache key.
* ParserCache gets some temporary code to try to avoid a massive cache
stampede on upgrade.
Bug: T110269
Change-Id: I7fb9ffca96e6bd04db44d2d5f2509ec96ad9371f
Depends-On: I4070a8f51927121f690469716625db4a1064dea5
This will allow CSS to target just the parser output, without also
accidentally targeting the edit form, diff tables, and so on.
Bug: T37247
Change-Id: If4eb5bf71f94fa366ec4eddb6964e8f4df6b824a
Depends-On: I330c6aa4aaee045614b1801ed34bc9e03be69650
Depends-On: I52a518fa44e017841fe78474012cd69823e0a41d
You have to allow tests to cover private Parser methods that they
execute. Private methods will never have separate tests.
Change-Id: Ic842e2be4675f505dc26d1d3e1dd9000401df46c
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: I46261416f7603558dceb76ebe695a5cac274e417
It was omitted due to the new way in which parser test TestCase objects
are constructed. Should fix Jenkins double-execution of parser tests.
Change-Id: I8131c3b13f2e08f784bce46fee16051c14761304
Merge the PHPUnit parser test runner with the old parserTests.inc,
taking the good bits of both. Reviewed, pared down and documented the
setup code. parserTests.php is now a frontend to a fully featured
parser test system, with lots of developer options, whereas PHPUnit
provides a simpler interface with increased isolation between test
cases.
Performance of both frontends is much improved, perhaps 2x faster for
parserTests.php and 10x faster for PHPUnit.
General:
* Split out the pre-Setup.php global variable configuration from
phpunit.php into a new class called TestSetup, also called it from
parserTests.php.
* Factored out the setup of TestsAutoLoader into a static method in
Maintenance.
* In Setup.php improved "caches" debug output.
PHPUnit frontend:
* Delete the entire contents of NewParserTest and replace it with a
small wrapper around ParserTestRunner. It doesn't inherit from
MediaWikiTestCase anymore since integrating the setup code was an
unnecessary complication.
* Rename MediaWikiParserTest to ParserTestTopLevelSuite and made it an
instantiable TestSuite class instead of just a static method. Got rid
of the eval(), just construct TestCase objects directly with a
specified name, it works just as well.
* Introduce ParserTestFileSuite for per-file setup.
* Remove parser-related options from phpunit.php, since we don't
support them anymore. Note that --filter now works just as well as
--regex used to.
* Add CoreParserTestSuite, equivalent to ExtensionsParserTestSuite,
for clarity.
* Make it possible to call MediaWikiTestCase::setupTestDB() more than
once, as is implied by the documentation.
parserTests.php frontend:
* Made parserTests.php into a Maintenance subclass, moved CLI-specific
code to it.
* Renamed ParserTest to ParserTestRunner, this is now the generic
backend.
* Add --upload-dir option which sets up an FSFileBackend, similar
to the old default behaviour
Test file reading and interpretation:
* Rename TestFileIterator to TestFileReader, and make it read and buffer
an entire file, instead of iterating.
* The previous code had an associative array representation of test
specifications. Used this form more widely to pass around test data.
* Remove the idea of !!hooks copying hooks from $wgParser, this is
unnecessary now that all extensions use ParserFirstCallInit. Resurrect
an old interpretation of the feature which was accidentally broken: if
a named hook does not exist, skip all tests in the file.
* Got rid of the "subtest" idea for tidy variants, instead use a
human-readable description that appears in the output.
* When all tests in a file are filtered or skipped, don't create the
articles in them. This greatly speeds up execution time when --regex
matches a small number of tests. It may possibly break extensions, but
they would have been randomly broken anyway since there is no
guarantee of test file execution order.
* Remove integrated testing of OutputPage::addCategoryLinks() category
link formatting, life is complicated enough already. It can go in
OutputPageTest if that's a thing we really need.
Result recording and display:
* Make TestRecorder into a generic plugin interface for progress output
etc., which needs to be abstracted for PHPUnit integration.
* Introduce MultiTestRecorder for recorder chaining, instead of using
a long inheritance chain. All test recorders now directly inherit from
TestRecorder.
* Move all console-related code to the new ParserTestPrinter.
* Introduce PhpunitTestRecorder, which is the recorder for the PHPUnit
frontend. Most events are ignored since they are never emitted in the
PHPUnit frontend, which does not call runTests().
* Put more information into ParserTestResult and use it more often.
Setup and teardown:
* Introduce a new API for setup/teardown where setup functions return a
ScopedCallback object which automatically performs the corresponding
teardown when it goes out of scope.
* Rename setUp() to staticSetup(), rewrite. There was a lot of cruft in
here which was simply copied from Setup.php without review, and had
nothing to do with parser tests.
* Rename setupGlobals() to perTestSetup(), mostly rewrite. For
performance, give staticSetup() precedence in cases where they were
both setting up the same thing.
* In support of merged setup code, allow Hooks::clear() to be called
from parserTests.php.
* Remove wgFileExtensions -- it is only used by UploadBase which we
don't call.
* Remove wgUseImageResize -- superseded by MockMediaHandlerFactory which
I imported from NewParserTest.
* Import MockFileBackend from NewParserTest. But instead of
customising the configuration globals, I injected services.
* Remove thumbnail deletion from upload teardown. This makes glob
handling as in the old parserTests.php unnecessary.
* Remove math file from upload teardown, math is actually an extension
now! Also, the relevant parser tests were removed from the Math
extension two years ago in favour of unit tests.
* Make addArticle() private, and introduce addArticles() instead, which
allows setup/teardown to be done once for each batch of articles
instead of every time.
* Remove $wgNamespaceAliases and $wgNamespaceProtection setup. These were
copied in from Setup.php in 2010, and are redundant since we do
actually run Setup.php.
* Use NullLockManager, don't set up a temporary directory just for
this alone.
Fuzz tests:
* Use the new TestSetup class.
* Updated for ParserTestRunner interface change.
* Remove some obsolete references to fuzz tests from the two frontends
where they used to reside.
Bug: T41473
Change-Id: Ia8e17008cb9d9b62ce5645e15a41a3b402f4026a
Since in several cases, with an all-in-one commit, git's file rename
detection failed, I split the renames out into their own commit to
make review easier. Some changes here won't make complete sense without
the following commit.
* Moved TestsAutoLoader to tests/common/. It will be joined by a friend.
* Renamed ParserTest to ParserTestRunner, since the former name was
overly generic.
* Renamed TestFileIterator to TestFileReader. Please see the subsequent
commit for rationale.
* Moved parserTests.php to tests/parser/. It was the only file left in
tests/, and it should have been moved to tests/parser years ago,
analogous to phpunit.php.
* Renamed NewParserTest to ParserIntegrationTest. This was a tricky one,
apparently the name has to end in "Test" or else the structure test
will fail. Analogous to ParserMethodsTest etc. Rationale: because it's
not new anymore.
* Renamed MediaWikiParserTest to ParserTestTopLevelSuite and moved it to
the suites directory. A more descriptive name. Being in suites/
shields it from StructureTests, and is correct anyway.
Change-Id: Iddc6eaf815fdd64b3addb8570b4b6303ab99d634
* Split up testHelpers.inc into one class per file, with the file named
after the class per the usual convention. Put them in tests/parser
since they are all parser-related, even though a couple are reused by
other unit tests.
* Also rename parserTest.inc and parserTestsParserHook.php to follow the
usual convention, and split off ParserTestResultNormalizer
* Move fuzz testing out to its own maintenance script. It's really not
helpful to have fuzz testing, which is designed to run forever,
exposed as a PHPUnit test.
* Increased fuzz test memory limit, and increased the memory headroom for
getMemoryBreakdown(), since HHVM's ReflectionClass has an internal
cache which uses quite a lot of memory.
* Temporarily switched a couple of ParserTest methods from private to
public to support fuzz testing from a separate class -- I plan on
replacing this interface in a subsequent commit.
Change-Id: Ib1a07e109ec1005bff2751b78eb4de35f2dfc472
This class is intended to be an integration test of both preprocessor
implementations and their helper classes.
Change-Id: Iefbd6d8828bbc3278503a0f85efd7d1230a9d66c
teardownGlobals() was called at the end of testParserTest(), but not
when returning early or throwing a "skipped" exception. This caused a
test failure when VisualEditor tests were run after parser tests, due to
$wgThumbLimits having the value set in parser tests.
Change-Id: I12d9365813fc51c15f6649084c373f5b7ccfac26
* Instead of only testing the configured preprocessor, test each in turn.
* Fix a test error when testing Preprocessor_Hash by removing <equals>
tags -- only Preprocessor_Hash emits them, but they have no effect on
the expansion.
Change-Id: I596f6b66fc636b767c447af3450556bfebe28241
* Have TidySupport provide $wgTidyConfig instead of the legacy globals
* Add --use-tidy-config option to parserTests.php. This tells
TidySupport to use the tidy configuration from LocalSettings.php
instead of the traditional safe defaults.
* Add a way for TidySupport to disable tidy via $wgTidyConfig, using
driver=>disabled
Change-Id: Ie76e68e2d5238d0a1aef49a1a815c0d1cd8bfdae
tl;dr: Having unnessary complexity in security critical code is bad.
* Extra options add extra complexity and maintenance burden
** Thus we should only have one html output mode. well formed = false
was already vetoed in T52040, so lets go with WellFormed=true.
* Options which are used by very few people tend to get tested less
* Escaping is an area of code where we should be very conservative
* Having escaping rules depend on making assumptions about which
characters various browsers consider "whitespace" is scary
* $wgWellFormedXml=false has had a negative security impact in the
past (Usually not directly its fault, but has made other bugs
more exploitable)
* Saving a couple bytes (even less bytes after gzip taken into
account) is really not worth it in this context (imho).
Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed
It is expected that namespaces (except for NS_SPECIAL) will have a
paired subject and talk namespace. While not having the accompanying
talk namespace mostly works, it can cause unexpected issues when some
code paths (e.g. WikiPage::onArticleCreate()) expect it to exist.
Change-Id: I8f02fd886d0256679dfc10e1743204da4c6678b7
* Add a subtest index to the recorded test name, to avoid an SQL
duplicate key error.
* Introduce TestFileDataProvider. Previously, the order of named
parameters in TestFileIterator was relevant but undocumented, so
adding a subtest parameter broke phpunit tests.
* Don't implicitly commit (commitMasterChanges) an explicit transaction,
since that now causes a fatal error.
* Reset namespace cache as in NewParserTest.php, so that the MemoryAlpha
article insertion doesn't fail. This was only visible with --record
because the namespace cache is initialised by
SpecialVersion::getVersion() during recorder setup.
Change-Id: Ied4636b4acbf1d268e45901fed4d4e077b5ed666
(Previously done in f51d0d9a81 and
reverted in 543f46e9c08e0ff8c5e8b4e917fcc045730ef1bc.)
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content. (For tags that
"shadow" a HTML tag name, this results in the tag being treated as a
HTML tag. This currently only affects <pre> tags: if unclosed, they
are still displayed as preformatted text, but without suppressing
wikitext formatting.)
It also does not affect <includeonly>, <noinclude> and <onlyinclude>
tags. Changing this behavior now would be too disruptive to existing
content, and is the reason why previous attempt was reverted. (They
are already special-cased enough that this isn't too weird, for example
mismatched closing tags are hidden.)
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
Change-Id: Ide8b034e464eefb1b7c9e2a48ed06e21a7f8d434
The 'noxml' documentation in parserTests.txt was added in 2006 (7eea2398; r12504),
however it wasn't actually implemented.
It wasn't until 2009 (7aa4a8f9; r54767) that $wgWellFormedXml was created,
which defaults to true and has no relation to this option.
Remove the 'noxml' options from existing tests.
Change-Id: Ie3ae9f491b5747716080607b81b9763bf2bcc889
$wgAllowMicroDataAttributes and $wgAllowRdfaAttributes have been
introduced in MediaWiki 1.16 and required at this moment $wgHTML5
to be true. This last setting has been removed in MediaWiki 1.22.
To simplify the code maintenance and the configuration complexity,
those settings are removed and the features are always available.
RDFa users must now explicitly set $wgHtml5Version to a RDFa
version. Currently the correct values are:
- HTML+RDFa 1.0
- XHTML+RDFa 1.0
Bug: T130040
Change-Id: I17a7bff2cad170e381eabf0aec4e26e4fd0cddc3
This reduces the runtime of database-bound tests by about 40%
(on my system, from 4:55 to 2:47; results from Jenkins are
inconclusive).
The basic idea is to call addCoreDBData() only once, and have
a addDBDataOnce() that is called once per test class, not for
every test method lie addDBData() is. Most tests could be
trivially be changed to implement addDBDataOnce() instead of
addDBData(). The ones for which this did not work immediately
were left out for now. A closer look at the tests that still
implement addDBData() may reveal additional potential for
improvement.
TODO: Once this is merged, try to change addDBData() to
addDBDataOnce() where possible in extensions.
Change-Id: Iec4ed4c8419fb4ad87e6710de808863ede9998b7
Right now it forgets to reset $wgResourceBasePath, which means it
is inherited from the wikis's (or Jenkins') default settings which
is typically '/w'. That caused parser tests to behave as if pointers
to /extensions were outside /w.
Also update wgScriptPath to be '' instead of '/'. Otherwise this
can cause double-slash prefixed urls.
Change-Id: Ic455d62fca8fcac2c4ecc055cc0d7e311b70a94a
This reverts commit f51d0d9a81.
Breaks templates with non-closed </noinclude> tags, which
were previously acceptable.
Bug: T125754
Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content.
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
To consider:
* Should we keep previous behavior for unclosed <includeonly> /
<noinclude>? This would be particularly disruptive for these if
someone relied on the old behavior, and they're already
special-cased in places.
* Unclosed <pre> tags are now treated as HTML tags, and are still
displayed as preformatted text, but without suppressing wikitext
formatting.
Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
Introduce an ogv video file to the parser file testsuite, so that we
can use it later in TimedMediaHandler parsertests.
Change-Id: I6a3b307ad9c82e9df0aeec025934d736eec8375f
The MediaWiki test suite is painfully slow and delays merging of
changes. More than half of the time is spent in
ParserTest_Parser⁄parserTests::testParserTest which is the PHPUnit
wrapping class for the parser tests.
This patch let us extract the parser tests so we can run them
independently. By running them parallely with the rest of the tests,
that will speed up the gate processing time.
Mark the MediaWikiParserTest and NewParserTest class as belonging to the
test group 'ParserTests'. Will let us filter them out via PHPUnit
option --exclude-group
Introduce a new PHPUnit test suite 'parsertests' which loads the
MediaWiki core parser tests wrapper 'MediaWikiParserTest' and the suite
which loads the extensions parser tests (ExtensionsParserTestSuite.php).
This way we can run solely the parser tests with:
cd tests/phpunit
php phpunit.php --testsuite ParserTests
Wikimedia CI can then be configure to run two jobs:
A) php phpunit.php --exclude-group ParserTests
B) php phpunit.php --testsuite ParserTests
Bug: T114314
Change-Id: Ie819bab43163995048c073691c4c5d258f797c02
Changed some old bugzilla links to new phabricator links in comments,
test data and error message. This reduces the need for redirects from
old bugzilla to new phabricator from our source code.
Change-Id: Id98278e26ce31656295a23f3cadb536859c4caa5