* Add a new limit to the parser which limits the size of the output
generated by StripState. The relevant bug shows exponential blowup in
output size.
* Remove the $prefix parameter from the StripState constructor. Used by
no Gerrit-hosted extensions, hard-deprecated since 1.26.
* Convert the existing unstrip recursion depth limit to a normal parser
limit with limit report row, warning and tracking category. Provide
the same features in the new limit.
* Add an optional $parser parameter to the StripState constructor so
that warnings and tracking categories can be added.
Bug: T187833
Change-Id: Ie5f6081177610dc7830de4a0a40705c0c8cb82f1
These comments do not add anything. I argue they are worse than having
no comments, because I have to read them first to understand they
actually don't explain anything. Removing them makes room for actual
improvements in the future (if needed).
Change-Id: Iee70aad681b3385e9af282d5581c10addbb91ac4
This is a little far fetched, but meant as a hardening step. No
valid strip marker name should have any of those things in them.
If a malicious user managed to somehow control the strip marker name,
he could make a strip marker that "spanned" different html contexts.
Note: I've checked carefully - its impossible for a user to control
the strip marker name. This is just a hardening step against any
future features.
For example, if someone could make a strip marker using the marker
name "a','b", then they could create an xss by feeding
"\x7UNIQfa+QINU\x7f" to charinsert, which will split on + sign,
and create output like
<a onclick="mw.toolbar.insertTags('\x7FUNIQa','bQIN\X7f...
It just seems safer to not allow any of the special characters in
strip marker names - especially because there is no need to ever
use them, and to my knowledge there is no example of anyone ever
actually using such a special character in the marker name.
and not recognize either part as a strip marker.
Change-Id: I798d31aff4e48b4c6da886530c15867226c953d2
Generating one-time, unique strip markers hurts us in multiple ways:
* The strip marker regexes don't benefit from JIT compilation, so they are
slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
compiled, because HHVM bets on regexes getting reused. This extra work is
fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
displaces from the cache regexes which are in fact reused.
Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:
* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
replace any occurences of \x7f with '?', to prevent strip marker forgery.
\x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
prefix may no longer be specified.
Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
When adding strip markers, allow closures to be passed in place of text.
The closure is then called during unstrip. Also, add a hook that runs
after unstripGeneral. This is needed for Extension:Cite's I0e136f952.
Change-Id: If83b0623671fd67e5ccc9deaaaab456a6679af8f
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
Breaks extensions, doesn't entirely fix the problem it was meant to fix.
This reverts commit 6da3f169ac.
Change-Id: Ic193abcff8c72b0c8b434fcac514f88603a45beb
The JIT compiler in newer versions of PCRE experiences lock contention
when multithreaded applications perform a high rate of concurrent
compilations. We are seeing some performance impact on HHVM under normal
production traffic.
The random part of the strip marker is just there to protect against
deliberate insertion of strip markers into the source text, which is
very rare. So use a generic regex to find strip markers, and check in
the callback whether the random state ID is correct.
StripState::killMarkers() will be slower when it has to remove many
strip markers, but most calls to it will not match any strip markers, so
overall performance should be improved due to reduced JIT compilation.
Bug: 72205
Change-Id: I8d37ae929a8c669c9e39adc8096b89e5732b68d0
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Also added some missing @param.
Change-Id: I49f8f48b521878de7abd9cc40efdeff6cf9a37e0
By PSR2 PHP Standard, the files should ends with exactly one newline.
Some of our files have 2 or more and some other were missing a newline.
Fix almost all occurences of CodeSniffer sniff:
PSR2.Files.EndFileNewline.TooMany
I have not fixed the selenium files, I believe we will drop them.
Change-Id: I89fca8c1786fee94855b7b77bb0f364001ee84b6
Explicitly detect circular references in strip tags and break the loop,
similar to how we deal with circular references in templates. This is
necessary to support Scribunto since we imagine we will provide an API
that allows strip markers to be forged.
The recursion depth limit is a consequence of changing the algorithm
from iterative to recursive, it's required to protect the stack against
deeply nested #tag invocations.
Change-Id: Icc8dc4aedbced55ad75b3b5a5429a376d06d9b31
* Introduced Parser::killMarkers() based on the concept from StringFunctions. Used it in cases where markerStripCallback() doesn't make sense semantically, namely grammar, padleft, padright and anchorencode. Used markerStripCallback() in other cases.
* Changed headline unstrip order as suggested by P.Copp on bug 18295
* In CPF::lc() and CPF::uc(), removed the is_callable(). This was a temporary testing hack committed by me in r30109, which allowed me to do differential testing against a copy of the parser from before that revision.
markers.
Not sure the preg_match() is actually needed. Or it may be
appropriate to use MARKER_SUFFIX for the match.
The error message may also need to be rewritten to be more
user-friendly, but I'm pretty sure *an* error message is friendlier
than UNIQ garbage. And making them visible error messages makes them
easier to be found.
* It was not necessary to preserve the $stripState->general->setPair() interface since it wasn't used by any extensions.
* Moved StripState to its own file.
* Refactored serialiseHalfParsedText() and unserialiseHalfParsedText() so that the bulk of the functionality is in the relevant modules, instead of using scary direct access to object member variables. Made it support the new StripState. It seemed like a lot of work to go to to support an "emergency optimisation" feature in Cite. Cite updates will be in a subsequent commit.
* Fixed spelling of serialiseHalfParsedText() and unserialiseHalfParsedText(), there is unavoidable interface breakage anyway, due to cache object versioning.
* Moved transparent tags to their own function, as requested in a fixme comment.
* Added documentation for markerSkipCallback().
* Removed OnlyIncludeReplacer, unused since MW 1.12.