Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
Max Semenik	369b3fa977	Normalize PHPDoc attributes Change-Id: I83e686d099de0ff0aacda7e332972e1c7ee49f04	2018-03-16 22:59:15 -07:00
Tim Starling	f0247e05bd	StripState testing and cleanup * Added StripState unit tests * Deprecated unmaintained "half-parsed" serialization experiment * Renamed some variables for brevity and removed unused "prefix" Change-Id: I838d7ac7f9a2189e13d39c6939dba5d70e74a6b7	2018-03-05 16:43:58 +11:00
Tim Starling	3dfda8c155	Limit total expansion size in StripState and improve limit handling * Add a new limit to the parser which limits the size of the output generated by StripState. The relevant bug shows exponential blowup in output size. * Remove the $prefix parameter from the StripState constructor. Used by no Gerrit-hosted extensions, hard-deprecated since 1.26. * Convert the existing unstrip recursion depth limit to a normal parser limit with limit report row, warning and tracking category. Provide the same features in the new limit. * Add an optional $parser parameter to the StripState constructor so that warnings and tracking categories can be added. Bug: T187833 Change-Id: Ie5f6081177610dc7830de4a0a40705c0c8cb82f1	2018-03-05 05:16:04 +00:00
Tim Starling	ea52f36afc	In StripState use closures instead of temporary member variables The former convention was an awkward workaround for the lack of closures. Change-Id: I8722e168fb9b5e76cf6a937139be728bb3fc3e92	2018-02-28 11:22:43 +11:00
Thiemo Mättig	ef470ebf7f	Remove @param comments that literally repeat what the code says These comments do not add anything. I argue they are worse than having no comments, because I have to read them first to understand they actually don't explain anything. Removing them makes room for actual improvements in the future (if needed). Change-Id: Iee70aad681b3385e9af282d5581c10addbb91ac4	2018-01-10 14:14:26 +01:00
Brian Wolff	939faea318	Require strip marker names to not have & ' " < or > in them This is a little far fetched, but meant as a hardening step. No valid strip marker name should have any of those things in them. If a malicious user managed to somehow control the strip marker name, he could make a strip marker that "spanned" different html contexts. Note: I've checked carefully - its impossible for a user to control the strip marker name. This is just a hardening step against any future features. For example, if someone could make a strip marker using the marker name "a','b", then they could create an xss by feeding "\x7UNIQfa+QINU\x7f" to charinsert, which will split on + sign, and create output like <a onclick="mw.toolbar.insertTags(&#039\x7FUNIQa','bQIN\X7f... It just seems safer to not allow any of the special characters in strip marker names - especially because there is no need to ever use them, and to my knowledge there is no example of anyone ever actually using such a special character in the marker name. and not recognize either part as a strip marker. Change-Id: I798d31aff4e48b4c6da886530c15867226c953d2	2016-04-26 13:53:26 -04:00
Kunal Mehta	6e9b4f0e9c	Convert all array() syntax to [] Per wikitech-l consensus: https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html Notes: * Disabled CallTimePassByReference due to false positives (T127163) Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b	2016-02-17 01:33:00 -08:00
Ori Livneh	12571bde26	Use a fixed marker prefix string in the Parser and MWTidy Generating one-time, unique strip markers hurts us in multiple ways: * The strip marker regexes don't benefit from JIT compilation, so they are slower to execute than they could be. * Although the regexes don't benefit from JIT compilation, they are still compiled, because HHVM bets on regexes getting reused. This extra work is fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off. * The size of the PCRE JIT cache is finite, and the caching of one-off regexes displaces from the cache regexes which are in fact reused. Tim's preferred solution (per his review comment on https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers. So: * Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which complements the existing Parser::MARKER_SUFFIX. * Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix(). * Deprecate Parser::getRandomString(), since it is no longer useful. * In Preprocessor_:preprocessToObj() and Parser::fetchTemplateAndTitle, replace any occurences of \x7f with '?', to prevent strip marker forgery. \x7f is not valid input anyway. Deprecate the $prefix parameter for StripState::__construct, since a custom prefix may no longer be specified. Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f	2015-05-31 19:33:36 -07:00
Jackmcbarn	62c3fe221f	Allow running code during unstrip When adding strip markers, allow closures to be passed in place of text. The closure is then called during unstrip. Also, add a hook that runs after unstripGeneral. This is needed for Extension:Cite's I0e136f952. Change-Id: If83b0623671fd67e5ccc9deaaaab456a6679af8f	2015-05-13 02:44:20 +00:00
Chad Horohoe	aa21e125a3	Remove obvious function-level profiling Xhprof generates this data now. Custom profiling of various sub-function units are kept. Calls to profiler represented about 3% of page execution time on Special:BlankPage (1.5% in/out); after this change it's down to about 0.98% of page execution time. Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7	2015-01-07 11:14:24 -08:00
Tim Starling	ce8e466e44	Revert "Use a fixed regex for StripState" Breaks extensions, doesn't entirely fix the problem it was meant to fix. This reverts commit `6da3f169ac`. Change-Id: Ic193abcff8c72b0c8b434fcac514f88603a45beb	2014-10-20 21:42:53 +00:00
Tim Starling	6da3f169ac	Use a fixed regex for StripState The JIT compiler in newer versions of PCRE experiences lock contention when multithreaded applications perform a high rate of concurrent compilations. We are seeing some performance impact on HHVM under normal production traffic. The random part of the strip marker is just there to protect against deliberate insertion of strip markers into the source text, which is very rare. So use a generic regex to find strip markers, and check in the callback whether the random state ID is correct. StripState::killMarkers() will be slower when it has to remove many strip markers, but most calls to it will not match any strip markers, so overall performance should be improved due to reduced JIT compilation. Bug: 72205 Change-Id: I8d37ae929a8c669c9e39adc8096b89e5732b68d0	2014-10-19 14:38:09 -07:00
addshore	61c989cfc0	Fix phpcs issues in parser This fixes all issues except for: - class names - line length Change-Id: Ie91b010d5b3eec49d3b80b6e93b125a901ef43c6	2014-08-12 01:00:15 +00:00
umherirrender	7f9fd63901	Fixed some @params documentation (includes/parser) Swapped some "$var type" to "type $var" or added missing types before the $var. Changed some other types to match the more common spelling. Makes beginning of some text in captial. Also added some missing @param. Change-Id: I49f8f48b521878de7abd9cc40efdeff6cf9a37e0	2014-04-22 01:38:39 +02:00
umherirrender	23fab68274	Fix spacing after @param and friends in comments Searched for: \@(param\|return\|throws\|since\|deprecated\|access\|todo\|var)[ \t]{2,} Change-Id: Icce22ba9fe0635455691ca58d9872d618151f346	2014-04-05 20:02:29 +00:00
Antoine Musso	f6b92231fd	style: normalize end of files By PSR2 PHP Standard, the files should ends with exactly one newline. Some of our files have 2 or more and some other were missing a newline. Fix almost all occurences of CodeSniffer sniff: PSR2.Files.EndFileNewline.TooMany I have not fixed the selenium files, I believe we will drop them. Change-Id: I89fca8c1786fee94855b7b77bb0f364001ee84b6	2013-02-03 15:04:39 +01:00
umherirrender	85d8ee1f87	Remove a bunch of trailing spaces and unneeded newlines Change-Id: I00f369641320acd7f087427ef031f3ee7efa0997	2012-10-10 20:14:40 +02:00
Siebrand Mazeland	4e1ccf0267	Replace deprecated wfMsg* calls with Message class calls. Doing this in steps of roughly 100 changes per commit, so that it remains reviewable. Change-Id: I4950fdf8be669b52446290768ece0b8df8399d5d	2012-08-20 22:52:17 +02:00
Tim Starling	3905be18fb	(bug 35315) Detect circular references in strip tags Explicitly detect circular references in strip tags and break the loop, similar to how we deal with circular references in templates. This is necessary to support Scribunto since we imagine we will provide an API that allows strip markers to be forged. The recursion depth limit is a consequence of changing the algorithm from iterative to recursive, it's required to protect the stack against deeply nested #tag invocations. Change-Id: Icc8dc4aedbced55ad75b3b5a5429a376d06d9b31	2012-05-08 14:36:32 +10:00
Alexandre Emsenhuber	0fc8c8e14e	Added missing GPLv2 headers in some places. Also made file/class documentation more consistent. Change-Id: I10c077f27a2077a266a64048fa137f7b1f8e226c	2012-05-01 09:05:48 +02:00
Tim Starling	13b514edae	Fixed a few "strip tag exposed" bugs. * Introduced Parser::killMarkers() based on the concept from StringFunctions. Used it in cases where markerStripCallback() doesn't make sense semantically, namely grammar, padleft, padright and anchorencode. Used markerStripCallback() in other cases. * Changed headline unstrip order as suggested by P.Copp on bug 18295 * In CPF::lc() and CPF::uc(), removed the is_callable(). This was a temporary testing hack committed by me in r30109, which allowed me to do differential testing against a copy of the parser from before that revision.	2012-03-20 04:39:09 +00:00
Sam Reed	3f704bbe0b	Fix whitespace Fix/improve documentation	2012-01-10 18:42:59 +00:00
Mark A. Hershberger	4bd08afcb7	Revert r100262 — wasn't the right place for it and other problems, like broken parser tests. Please read [[Manual:Pre-commit checklist]] before committing.	2011-10-19 20:01:50 +00:00
Mark A. Hershberger	fdb4ccd5d2	Give a clear error message instead of un-intelligible UNIQ.QINU markers. Not sure the preg_match() is actually needed. Or it may be appropriate to use MARKER_SUFFIX for the match. The error message may also need to be rewritten to be more user-friendly, but I'm pretty sure an* error message is friendlier than UNIQ garbage. And making them visible error messages makes them easier to be found.	2011-10-19 19:34:56 +00:00
Tim Starling	d6bae9f79c	Fix for bug 31374: reintroduce recursive unstrip as in r27667, somehow omitted during the refactor in r82645. StripState::merge() is still wrong, but it's currently unused on Wikimedia, so this will do as a temporary patch.	2011-10-06 00:07:45 +00:00
Sam Reed	b15737fa83	And even more documentation, the last of this batch	2011-05-28 19:00:01 +00:00
Tim Starling	a20350dd31	* Rewrote StripState to not use ReplacementArray. The memory usage of FSS was excessive when there were many (>10k) strip items. I used preg_replace_callback(), which is slower than strtr() in the simplest case, but much faster than it when the markers have different lengths, which they usually do. * It was not necessary to preserve the $stripState->general->setPair() interface since it wasn't used by any extensions. * Moved StripState to its own file. * Refactored serialiseHalfParsedText() and unserialiseHalfParsedText() so that the bulk of the functionality is in the relevant modules, instead of using scary direct access to object member variables. Made it support the new StripState. It seemed like a lot of work to go to to support an "emergency optimisation" feature in Cite. Cite updates will be in a subsequent commit. * Fixed spelling of serialiseHalfParsedText() and unserialiseHalfParsedText(), there is unavoidable interface breakage anyway, due to cache object versioning. * Moved transparent tags to their own function, as requested in a fixme comment. * Added documentation for markerSkipCallback(). * Removed OnlyIncludeReplacer, unused since MW 1.12.	2011-02-23 06:58:15 +00:00

27 commits