Some malformed pages contain "character references" that were so long
that they caused PHP's `hexdec` to return a `float` instead of an
`int`. This caused Parsoid to crash on a type hint on the argument to
Sanitizer::validateCodepoint(). MediaWiki core has the same issue,
but doesn't have the type hint (yet), so soft-fails instead of
crashes. Add sanity checks around each call to `hexdec` to protect
against arbitrarily-long entity strings (while allowing arbitrary
zero-padding), and add a note to `intval` to explain why it is not
similarly affected. New test cases added to SanitizerUnitTest as
well.
Corresponding patch on the Parsoid side:
Ic33196961bb2b86290148fbc3ce33bcd8b28ab56
(And see T247804 re: eventually removing this duplicate code.)
Bug: T322892
Change-Id: I5085c4edbb86e282b92536d05b01ed5f9d5c615e
Since (T208881) "CSS using var() to create exponential sized calc() on wiki page will crash visitor's browser" was fixed by disabling var in inline CSS, the issue with browser crashes appears to have been fixed in Firefox, Chrome, modern Edge, and Opera.
This change reverts T208881.
Bug: T288201
Change-Id: I387a0e9fdd02faa69616890c613462c83b91b789
The existing Sanitizer::removeHTMLtags() method, in addition to having
dodgy capitalization, uses regular expressions to parse the HTML.
That produces corner cases like T298401 and T67747 and is not guaranteed
to yield balanced or well-formed HTML.
Instead, introduce and use a new Sanitizer::removeSomeTags() method
which is guaranteed to always return balanced and well-formed HTML.
Note that Sanitizer::removeHTMLtags()/::removeSomeTags() take a callback
argument which (as far as I can tell) is never used outside core. Mark
that argument as @internal, and clean up the version used by
::removeSomeTags().
Use the new ::removeSomeTags() method in the two places where
DISPLAYTITLE is handled (following up on T67747). The use by the
legacy parser is more difficult to replace (and would have a
performace cost), so leave the old ::removeHTMLtags() method in place
for that call site for now: when the legacy parser is replaced by
Parsoid the need for the old ::removeHTMLtags() will go away. In a
follow-up patch we'll rename ::removeHTMLtags() and mark it @internal
so that we can deprecate ::removeHTMLtags() for external use.
Some benchmarking code added. On my machine, with PHP 7.4, the new
method tidies short 30-character title strings at a rate of about
6764/s while the tidy-based method being replaced here managed 6384/s.
Sanitizer::removeHTMLtags blazes through short strings 20x faster
(120,915/s); some of this difference is due to the set up cost of
creating the tag whitelist and the Remex pipeline, so further
optimizations could doubtless be done if Sanitizer::removeSomeTags()
is more widely used.
Bug: T299722
Bug: T67747
Change-Id: Ic864c01471c292f11799c4fbdac4d7d30b8bc50f
We use Sanitizer::stripAllTags primarily to remove formatting from
html so that we can use it in places like notifications, emails,
search result blurbs etc etc.
It is very unlikely we want the raw contents of css and/or js tags
anywhere in those places, so lets surpress that content, to make it
more readable as template styles are showing up in more and more
places.
Bug: T228856
Change-Id: I7930361068ddcf3a6c2fdebd0177d142f025b64f
Let PHP do the UTF-8 encoding of Unicode characters in PHP strings.
Also use faster str_replace instead of preg_replace.
Change-Id: I4e99de694a607e2b5df52c6efcd3d863bb42f76e
This test compares strings. I find it critical to know this test will
start failing if, for example, a method that is expected to return the
string "" starts returning null. assertEquals() will not report this
and quite a bunch of other edge-cases.
Change-Id: I9a3f19f91b95aa384ca612f9a58c7af685306d57
* Unset globals to avoid tests that look like unit tests but actually rely on
globals
* move some tests out of unit directory so that the test suite will pass.
* Assert that tests which extend MediaWikiUnitTestCase are in a directory with
"/unit/" in its path name
Depends-On: I67b37b1bde94eaa3d4298d9bd98ac57995ce93b9
Depends-On: I90921679518ee95fe393f8b1bbd9134daf0ba032
Bug: T87781
Change-Id: I16691fc8ac063705ba0c2bc63b96c4534ca8660b
Out of 150 tests of SanitizerTest.php, 100 of them are pure unit tests
they are moved to the new file in the new structure, the rest stay
Change-Id: I366d37607abff4bcd624a56fb8b2299729fbc088