* Have TidySupport provide $wgTidyConfig instead of the legacy globals
* Add --use-tidy-config option to parserTests.php. This tells
TidySupport to use the tidy configuration from LocalSettings.php
instead of the traditional safe defaults.
* Add a way for TidySupport to disable tidy via $wgTidyConfig, using
driver=>disabled
Change-Id: Ie76e68e2d5238d0a1aef49a1a815c0d1cd8bfdae
tl;dr: Having unnessary complexity in security critical code is bad.
* Extra options add extra complexity and maintenance burden
** Thus we should only have one html output mode. well formed = false
was already vetoed in T52040, so lets go with WellFormed=true.
* Options which are used by very few people tend to get tested less
* Escaping is an area of code where we should be very conservative
* Having escaping rules depend on making assumptions about which
characters various browsers consider "whitespace" is scary
* $wgWellFormedXml=false has had a negative security impact in the
past (Usually not directly its fault, but has made other bugs
more exploitable)
* Saving a couple bytes (even less bytes after gzip taken into
account) is really not worth it in this context (imho).
Change-Id: I5c922e0980d3f9eb39adb5bb5833e158afda42ed
It is expected that namespaces (except for NS_SPECIAL) will have a
paired subject and talk namespace. While not having the accompanying
talk namespace mostly works, it can cause unexpected issues when some
code paths (e.g. WikiPage::onArticleCreate()) expect it to exist.
Change-Id: I8f02fd886d0256679dfc10e1743204da4c6678b7
* Add a subtest index to the recorded test name, to avoid an SQL
duplicate key error.
* Introduce TestFileDataProvider. Previously, the order of named
parameters in TestFileIterator was relevant but undocumented, so
adding a subtest parameter broke phpunit tests.
* Don't implicitly commit (commitMasterChanges) an explicit transaction,
since that now causes a fatal error.
* Reset namespace cache as in NewParserTest.php, so that the MemoryAlpha
article insertion doesn't fail. This was only visible with --record
because the namespace cache is initialised by
SpecialVersion::getVersion() during recorder setup.
Change-Id: Ied4636b4acbf1d268e45901fed4d4e077b5ed666
(Previously done in f51d0d9a81 and
reverted in 543f46e9c08e0ff8c5e8b4e917fcc045730ef1bc.)
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content. (For tags that
"shadow" a HTML tag name, this results in the tag being treated as a
HTML tag. This currently only affects <pre> tags: if unclosed, they
are still displayed as preformatted text, but without suppressing
wikitext formatting.)
It also does not affect <includeonly>, <noinclude> and <onlyinclude>
tags. Changing this behavior now would be too disruptive to existing
content, and is the reason why previous attempt was reverted. (They
are already special-cased enough that this isn't too weird, for example
mismatched closing tags are hidden.)
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
Change-Id: Ide8b034e464eefb1b7c9e2a48ed06e21a7f8d434
The 'noxml' documentation in parserTests.txt was added in 2006 (7eea2398; r12504),
however it wasn't actually implemented.
It wasn't until 2009 (7aa4a8f9; r54767) that $wgWellFormedXml was created,
which defaults to true and has no relation to this option.
Remove the 'noxml' options from existing tests.
Change-Id: Ie3ae9f491b5747716080607b81b9763bf2bcc889
$wgAllowMicroDataAttributes and $wgAllowRdfaAttributes have been
introduced in MediaWiki 1.16 and required at this moment $wgHTML5
to be true. This last setting has been removed in MediaWiki 1.22.
To simplify the code maintenance and the configuration complexity,
those settings are removed and the features are always available.
RDFa users must now explicitly set $wgHtml5Version to a RDFa
version. Currently the correct values are:
- HTML+RDFa 1.0
- XHTML+RDFa 1.0
Bug: T130040
Change-Id: I17a7bff2cad170e381eabf0aec4e26e4fd0cddc3
This reduces the runtime of database-bound tests by about 40%
(on my system, from 4:55 to 2:47; results from Jenkins are
inconclusive).
The basic idea is to call addCoreDBData() only once, and have
a addDBDataOnce() that is called once per test class, not for
every test method lie addDBData() is. Most tests could be
trivially be changed to implement addDBDataOnce() instead of
addDBData(). The ones for which this did not work immediately
were left out for now. A closer look at the tests that still
implement addDBData() may reveal additional potential for
improvement.
TODO: Once this is merged, try to change addDBData() to
addDBDataOnce() where possible in extensions.
Change-Id: Iec4ed4c8419fb4ad87e6710de808863ede9998b7
Right now it forgets to reset $wgResourceBasePath, which means it
is inherited from the wikis's (or Jenkins') default settings which
is typically '/w'. That caused parser tests to behave as if pointers
to /extensions were outside /w.
Also update wgScriptPath to be '' instead of '/'. Otherwise this
can cause double-slash prefixed urls.
Change-Id: Ic455d62fca8fcac2c4ecc055cc0d7e311b70a94a
This reverts commit f51d0d9a81.
Breaks templates with non-closed </noinclude> tags, which
were previously acceptable.
Bug: T125754
Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content.
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
To consider:
* Should we keep previous behavior for unclosed <includeonly> /
<noinclude>? This would be particularly disruptive for these if
someone relied on the old behavior, and they're already
special-cased in places.
* Unclosed <pre> tags are now treated as HTML tags, and are still
displayed as preformatted text, but without suppressing wikitext
formatting.
Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
Introduce an ogv video file to the parser file testsuite, so that we
can use it later in TimedMediaHandler parsertests.
Change-Id: I6a3b307ad9c82e9df0aeec025934d736eec8375f
The MediaWiki test suite is painfully slow and delays merging of
changes. More than half of the time is spent in
ParserTest_Parser⁄parserTests::testParserTest which is the PHPUnit
wrapping class for the parser tests.
This patch let us extract the parser tests so we can run them
independently. By running them parallely with the rest of the tests,
that will speed up the gate processing time.
Mark the MediaWikiParserTest and NewParserTest class as belonging to the
test group 'ParserTests'. Will let us filter them out via PHPUnit
option --exclude-group
Introduce a new PHPUnit test suite 'parsertests' which loads the
MediaWiki core parser tests wrapper 'MediaWikiParserTest' and the suite
which loads the extensions parser tests (ExtensionsParserTestSuite.php).
This way we can run solely the parser tests with:
cd tests/phpunit
php phpunit.php --testsuite ParserTests
Wikimedia CI can then be configure to run two jobs:
A) php phpunit.php --exclude-group ParserTests
B) php phpunit.php --testsuite ParserTests
Bug: T114314
Change-Id: Ie819bab43163995048c073691c4c5d258f797c02
Changed some old bugzilla links to new phabricator links in comments,
test data and error message. This reduces the need for redirects from
old bugzilla to new phabricator from our source code.
Change-Id: Id98278e26ce31656295a23f3cadb536859c4caa5
* Split tidy implementations into a class hierarchy
* Bring all tidy configuration into a single associative array and
deprecate the old configuration.
* Remove $wgAlwaysUseTidy
This is preparatory to replacement of Tidy (T89331). I used the name
"Raggett" for things relating to Dave Raggett's Tidy, since if we use
"tidy" to mean the new abstract system as well as Raggett's tidy, it
gets confusing.
Change-Id: I77af1a16cbbb47fc226d05fb9aad56c58e8910b5
MediaWikiParserTest.php generates fake test classes with eval(). It uses
synthetic class names with U+2044 "fraction slash" as a separator, but
this turns out to be an unfortunate choice since in certain terminal
modes, it causes readline to return to the start of the line as if the
"home" key was pressed, without adding a character. This makes it
difficult to paste class names.
Change-Id: I1c66b9caf256b8d0535fb7ed6e52ed842e193f46
These tests all involve database access in some way,
and thus need @group Database tags.
These failed when setting a bogus database password
and then running the tests.
Change-Id: I7f113a79ac44d09d88ec607f76b8ec22bc1ebcf1
* There's a branch path in the sanitizer that depends on $wgUseTidy,
which means the test output differs from on wiki.
* In general, we should set these variables to match the wiki behaviour
in tests.
* Exposes T92892, Sanitizer removes empty tags when tidy is disabled.
* Tweaked tests for T19663 to use an extension tag to show that
HTML5 tags with non-word characters make it through the parser
intact (before being ultimately sanitized).
Change-Id: I09c72fd739e11a8b757f37dc4c790758d782ad73
This is a hard deprecation, with getSecondaryDataUpdates returning an
empty array and addSecondaryDataUpdate throwing an exception. This seems
prudent since there are no known users of these methods, and they
interfere with the parser cache:
DataUpdates are basically jobs, they need access to services to
function. That makes them inherently non-serializable. This interferes
with the function of the parser cache, which serializes ParserOutput
objects in order to persist them.
This could be solved by splitting DataUpdates into DataUpdateDefinitions
and DataUpdateHandlers, similar to how JobSpecification works with
wgJobClasses. That however seems pointless and overkill, since
ParserOutput already has a mechanism for storing arbitrary data,
including any info needed by an UpdateJob: the setExtensionData method.
After this change, the preferred method to introduce custom data updates
is to store any relevant data using setExtensionData and
implement Content::getSecondaryDataUpdates() if possible. If not,
use the 'SecondaryDataUpdates' hook to construct the necessary update
objects from the info stored using setExtensionData.
Change-Id: I0f6f49e61fa3d8904e55f42c99f342a3dc357495
* Use MediaWikiTestCase::getNewTempFile and getNewTempDirectory
instead of wfTempDir().
The upload api tests wrote a tempnam() file directly (where
wfTempDir() is typically shared with other systems and concurrent
runs). Use MediaWikiTestCase::getNewTempFile and
getNewTempDirectory instead.
This also ensures its removal by the teardown handler without
needing manual unlink() calls. And it doesn't rely on the test
passing. (Many unlink calls where at the bottom of tests,
which wouldn't be reached in case of failure).
* For the upload test, the presistent storing of
'Oberaargletscher_from_Oberaar.jpg' (downloaded from Commons)
was removed. Note that this didn't work for Jenkins builds anyway
as Jenkins builds set $wgTmpDirectory to a unique directory
in tmpfs associated with an individual build.
* For filebackend tests, moved directory creation from the dataProvider
to the main test.
Implemented addTmpFiles() to allow subclasses to register
additional files (created by other means) to be cleaned up also.
Removed unused $tmpName and $toPath parameters in data
provider for FileBackendTest::testStore. And fixed weird double
$op2 variable name to be called $op3.
* Skipped parserTest.inc, MockFileBackend.php, and
UploadFromUrlTestSuite.php as those don't use MediaWikiTestCase.
Change-Id: Ic7feb06ef0c1006eb99485470a1a59419f972545
Update $wgSVGConverters['rsvg'] to something closer to WMF production
configuration (there is a more complicated setup involving two
variants of rsvg for some reason).
Documentation is scarce, but 'rsvg-convert' appears to be the "modern"
way to call rsvg, with 'rsvg' being deprecated or not recommended.
Bug: T76476
Change-Id: I5ed877f3a5a1f1e97ae881c1d03fc977276182b6
My previous patch broke this: ApiStashEdit would stash ParserOutput
with no custom DataUpdates, but calling getSecondaryDataUpdates still
failed after unserialization. This patch should fix that.
Bug: T86305
Change-Id: Ic114e521c5dfd0d3c028ea7d16e93eace758deef
Follows-up b36d883.
By far most data providers are static (and PHPUnit expects them
to be static and calls them that way).
Most of these classes already had their data providers static
but additional commits sloppily introduced non-static ones.
* ResourceLoaderWikiModuleTest, 8968d8787f.
* TitleTest, 545f1d3a73.
Odd unused method 'dataTestIsValidMoveOperation' was introduced
in 550b878e63.
* GlobalVarConfigTest, a3e18c3670.
Change-Id: I5da99f7cd3da68c550ae507ffe1f725d31e7666f
The previous implementation would unescape '&', '=', '+', and '%'. The
first three will break the URL when unescaped in the query string, and
the last will break when unescaped anywhere.
The code is now changed to treat the path, query, and fragment parts of
the URL separately when unescaping. We also escape any unsafe characters
and ensure all percent-encodings use uppercase hexits.
And since the old name is no longer accurate,
Parser::replaceUnusualEscapes is deprecated in favor of
Parser::normalizeLinkUrl.
Bug: 57909
Change-Id: I77dc308d0d016c395ad737c08cf10a7711e25bbd
This includes the extension name, and it also does much
more stringent validation. In the (now rather unlikely)
event of a duplicate name, it will append a number.
This is important, as it is very confusing when this bug strikes.
There exists extensions like CharRangeSpan which will trigger this bug.
Bug: 42174
Change-Id: Idf14b4cbdb8ec103340d48855e0361acf707b101
- Added space after reserved words: function, foreach, if
- Combined 'else if' into elseif
- Added braces to one-line statements
- Added spaces after comma, before parentheses
Change-Id: Ie5bbf680d6fbe0f0872dab2700c16b1394906a72
MediaWiki installations that use the setting
$wgUseTidy = true; are unable to output
MathML since the well defined MathML elements
are filtered out by Tidy. This was reported as
http://sourceforge.net/p/tidy/patches/84/ .
This change hides MathML blocks from
Tidy.
Bug: 66516
Change-Id: Ib48b91238c3eddd6a86b62f6ce57801d7058f0d8
Setting this to default avoids failing parser tests,
when it is set in LocalSettings.php
Bug: 54576
Change-Id: I531d5839e9abe571c6c29f290bb159dabca34798
- Swap "$variable type" to "type $variable"
- Fixed spacing inside docs
- Makes beginning of @param/@var/@throws in capital
- Changed some types to match the more common spelling
Change-Id: Ia041964250d8b7c0349d79dc9b131c5b8696e795
Note that the old parser tests helper function `tidy()` never actually did
anything, since $wgUseTidy was forced to `false` in the parser test setup.
Remove this unused code, and replace it with our new tidy support.
Allows new parser test sections: 'html+tidy' denotes "tidied" HTML (open
tags closed and other fixups to original wikitext markup) which should be
applicable to any parser. 'html/php+tidy' is output specific to the PHP
parser with tidy turned on. The Parsoid backend will use the 'html/parsoid'
section if present, but if it is not present it will fallback to first the
'html+tidy' section, and if that is missing the 'html' section.
Note that 'tidy' has a large number of open bugs (see
https://bugzilla.wikimedia.org/show_bug.cgi?id=2542 ) and so in some cases
we deliberately do *not* use 'html+tidy' or 'html/php+tidy' clauses, in
order to avoid documenting broken output. In these cases, there is no
broken HTML in the PHP parser output, and so (in theory) the 'html' and
'html+tidy' sections would be identical (that is, if tidy didn't have
bugs).
Change-Id: Iba45f38774b221522dc3b6ae2d1312fb79f8f41f