Commit graph

27 commits

Author SHA1 Message Date
Max Semenik
369b3fa977 Normalize PHPDoc attributes
Change-Id: I83e686d099de0ff0aacda7e332972e1c7ee49f04
2018-03-16 22:59:15 -07:00
Tim Starling
f0247e05bd StripState testing and cleanup
* Added StripState unit tests
* Deprecated unmaintained "half-parsed" serialization experiment
* Renamed some variables for brevity and removed unused "prefix"

Change-Id: I838d7ac7f9a2189e13d39c6939dba5d70e74a6b7
2018-03-05 16:43:58 +11:00
Tim Starling
3dfda8c155 Limit total expansion size in StripState and improve limit handling
* Add a new limit to the parser which limits the size of the output
  generated by StripState. The relevant bug shows exponential blowup in
  output size.
* Remove the $prefix parameter from the StripState constructor. Used by
  no Gerrit-hosted extensions, hard-deprecated since 1.26.
* Convert the existing unstrip recursion depth limit to a normal parser
  limit with limit report row, warning and tracking category. Provide
  the same features in the new limit.
* Add an optional $parser parameter to the StripState constructor so
  that warnings and tracking categories can be added.

Bug: T187833
Change-Id: Ie5f6081177610dc7830de4a0a40705c0c8cb82f1
2018-03-05 05:16:04 +00:00
Tim Starling
ea52f36afc In StripState use closures instead of temporary member variables
The former convention was an awkward workaround for the lack of
closures.

Change-Id: I8722e168fb9b5e76cf6a937139be728bb3fc3e92
2018-02-28 11:22:43 +11:00
Thiemo Mättig
ef470ebf7f Remove @param comments that literally repeat what the code says
These comments do not add anything. I argue they are worse than having
no comments, because I have to read them first to understand they
actually don't explain anything. Removing them makes room for actual
improvements in the future (if needed).

Change-Id: Iee70aad681b3385e9af282d5581c10addbb91ac4
2018-01-10 14:14:26 +01:00
Brian Wolff
939faea318 Require strip marker names to not have & ' " < or > in them
This is a little far fetched, but meant as a hardening step. No
valid strip marker name should have any of those things in them.
If a malicious user managed to somehow control the strip marker name,
he could make a strip marker that "spanned" different html contexts.
Note: I've checked carefully - its impossible for a user to control
the strip marker name. This is just a hardening step against any
future features.

For example, if someone could make a strip marker using the marker
name "a&#039;,&#039;b", then they could create an xss by feeding
"\x7UNIQfa+QINU\x7f" to charinsert, which will split on + sign,
and create output like
<a onclick="mw.toolbar.insertTags(&#039\x7FUNIQa&#039;,&#039;bQIN\X7f...
It just seems safer to not allow any of the special characters in
strip marker names - especially because there is no need to ever
use them, and to my knowledge there is no example of anyone ever
actually using such a special character in the marker name.
and not recognize either part as a strip marker.

Change-Id: I798d31aff4e48b4c6da886530c15867226c953d2
2016-04-26 13:53:26 -04:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Ori Livneh
12571bde26 Use a fixed marker prefix string in the Parser and MWTidy
Generating one-time, unique strip markers hurts us in multiple ways:

* The strip marker regexes don't benefit from JIT compilation, so they are
  slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
  compiled, because HHVM bets on regexes getting reused. This extra work is
  fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
  displaces from the cache regexes which are in fact reused.

Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:

* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
  complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
  replace any occurences of \x7f with '?', to prevent strip marker forgery.
  \x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
  prefix may no longer be specified.

Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
2015-05-31 19:33:36 -07:00
Jackmcbarn
62c3fe221f Allow running code during unstrip
When adding strip markers, allow closures to be passed in place of text.
The closure is then called during unstrip. Also, add a hook that runs
after unstripGeneral. This is needed for Extension:Cite's I0e136f952.

Change-Id: If83b0623671fd67e5ccc9deaaaab456a6679af8f
2015-05-13 02:44:20 +00:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
Tim Starling
ce8e466e44 Revert "Use a fixed regex for StripState"
Breaks extensions, doesn't entirely fix the problem it was meant to fix.

This reverts commit 6da3f169ac.

Change-Id: Ic193abcff8c72b0c8b434fcac514f88603a45beb
2014-10-20 21:42:53 +00:00
Tim Starling
6da3f169ac Use a fixed regex for StripState
The JIT compiler in newer versions of PCRE experiences lock contention
when multithreaded applications perform a high rate of concurrent
compilations. We are seeing some performance impact on HHVM under normal
production traffic.

The random part of the strip marker is just there to protect against
deliberate insertion of strip markers into the source text, which is
very rare. So use a generic regex to find strip markers, and check in
the callback whether the random state ID is correct.

StripState::killMarkers() will be slower when it has to remove many
strip markers, but most calls to it will not match any strip markers, so
overall performance should be improved due to reduced JIT compilation.

Bug: 72205
Change-Id: I8d37ae929a8c669c9e39adc8096b89e5732b68d0
2014-10-19 14:38:09 -07:00
addshore
61c989cfc0 Fix phpcs issues in parser
This fixes all issues except for:
 - class names
 - line length

Change-Id: Ie91b010d5b3eec49d3b80b6e93b125a901ef43c6
2014-08-12 01:00:15 +00:00
umherirrender
7f9fd63901 Fixed some @params documentation (includes/parser)
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Also added some missing @param.

Change-Id: I49f8f48b521878de7abd9cc40efdeff6cf9a37e0
2014-04-22 01:38:39 +02:00
umherirrender
23fab68274 Fix spacing after @param and friends in comments
Searched for:
\@(param|return|throws|since|deprecated|access|todo|var)[ \t]{2,}

Change-Id: Icce22ba9fe0635455691ca58d9872d618151f346
2014-04-05 20:02:29 +00:00
Antoine Musso
f6b92231fd style: normalize end of files
By PSR2 PHP Standard, the files should ends with exactly one newline.
Some of our files have 2 or more and some other were missing a newline.

Fix almost all occurences of CodeSniffer sniff:
PSR2.Files.EndFileNewline.TooMany

I have not fixed the selenium files, I believe we will drop them.

Change-Id: I89fca8c1786fee94855b7b77bb0f364001ee84b6
2013-02-03 15:04:39 +01:00
umherirrender
85d8ee1f87 Remove a bunch of trailing spaces and unneeded newlines
Change-Id: I00f369641320acd7f087427ef031f3ee7efa0997
2012-10-10 20:14:40 +02:00
Siebrand Mazeland
4e1ccf0267 Replace deprecated wfMsg* calls with Message class calls.
Doing this in steps of roughly 100 changes per commit, so that it remains reviewable.

Change-Id: I4950fdf8be669b52446290768ece0b8df8399d5d
2012-08-20 22:52:17 +02:00
Tim Starling
3905be18fb (bug 35315) Detect circular references in strip tags
Explicitly detect circular references in strip tags and break the loop,
similar to how we deal with circular references in templates. This is
necessary to support Scribunto since we imagine we will provide an API
that allows strip markers to be forged.

The recursion depth limit is a consequence of changing the algorithm
from iterative to recursive, it's required to protect the stack against
deeply nested #tag invocations.

Change-Id: Icc8dc4aedbced55ad75b3b5a5429a376d06d9b31
2012-05-08 14:36:32 +10:00
Alexandre Emsenhuber
0fc8c8e14e Added missing GPLv2 headers in some places.
Also made file/class documentation more consistent.

Change-Id: I10c077f27a2077a266a64048fa137f7b1f8e226c
2012-05-01 09:05:48 +02:00
Tim Starling
13b514edae Fixed a few "strip tag exposed" bugs.
* Introduced Parser::killMarkers() based on the concept from StringFunctions. Used it in cases where markerStripCallback() doesn't make sense semantically, namely grammar, padleft, padright and anchorencode. Used markerStripCallback() in other cases.
* Changed headline unstrip order as suggested by P.Copp on bug 18295
* In CPF::lc() and CPF::uc(), removed the is_callable(). This was a temporary testing hack committed by me in r30109, which allowed me to do differential testing against a copy of the parser from before that revision.
2012-03-20 04:39:09 +00:00
Sam Reed
3f704bbe0b Fix whitespace
Fix/improve documentation
2012-01-10 18:42:59 +00:00
Mark A. Hershberger
4bd08afcb7 Revert r100262 — wasn't the right place for it and other problems,
like broken parser tests.

Please read [[Manual:Pre-commit checklist]] before committing.
2011-10-19 20:01:50 +00:00
Mark A. Hershberger
fdb4ccd5d2 Give a clear error message instead of un-intelligible UNIQ.*QINU
markers.

Not sure the preg_match() is actually needed.  Or it may be
appropriate to use MARKER_SUFFIX for the match.

The error message may also need to be rewritten to be more
user-friendly, but I'm pretty sure *an* error message is friendlier
than UNIQ garbage.  And making them visible error messages makes them
easier to be found.
2011-10-19 19:34:56 +00:00
Tim Starling
d6bae9f79c Fix for bug 31374: reintroduce recursive unstrip as in r27667, somehow omitted during the refactor in r82645. StripState::merge() is still wrong, but it's currently unused on Wikimedia, so this will do as a temporary patch. 2011-10-06 00:07:45 +00:00
Sam Reed
b15737fa83 And even more documentation, the last of this batch 2011-05-28 19:00:01 +00:00
Tim Starling
a20350dd31 * Rewrote StripState to not use ReplacementArray. The memory usage of FSS was excessive when there were many (>10k) strip items. I used preg_replace_callback(), which is slower than strtr() in the simplest case, but much faster than it when the markers have different lengths, which they usually do.
* It was not necessary to preserve the $stripState->general->setPair() interface since it wasn't used by any extensions.
* Moved StripState to its own file.
* Refactored serialiseHalfParsedText() and unserialiseHalfParsedText() so that the bulk of the functionality is in the relevant modules, instead of using scary direct access to object member variables. Made it support the new StripState. It seemed like a lot of work to go to to support an "emergency optimisation" feature in Cite. Cite updates will be in a subsequent commit.
* Fixed spelling of serialiseHalfParsedText() and unserialiseHalfParsedText(), there is unavoidable interface breakage anyway, due to cache object versioning. 
* Moved transparent tags to their own function, as requested in a fixme comment.
* Added documentation for markerSkipCallback().
* Removed OnlyIncludeReplacer, unused since MW 1.12.
2011-02-23 06:58:15 +00:00