Commit graph

126 commits

Author SHA1 Message Date
Tim Starling
6a04d86149 Remove all assert() calls with string parameters
These fail when HHVM is in RepoAuthoritative mode

Change-Id: Ifb1628f8269b2b651154b740b95cc14163a1b186
2016-08-15 23:11:18 +00:00
Bartosz Dziewoński
674e8388cb Preprocessor: Don't allow unclosed extension tags (matching until end of input)
(Previously done in f51d0d9a81 and
reverted in 543f46e9c08e0ff8c5e8b4e917fcc045730ef1bc.)

I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.

This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content. (For tags that
"shadow" a HTML tag name, this results in the tag being treated as a
HTML tag. This currently only affects <pre> tags: if unclosed, they
are still displayed as preformatted text, but without suppressing
wikitext formatting.)

It also does not affect <includeonly>, <noinclude> and <onlyinclude>
tags. Changing this behavior now would be too disruptive to existing
content, and is the reason why previous attempt was reverted. (They
are already special-cased enough that this isn't too weird, for example
mismatched closing tags are hidden.)

Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.

It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).

Change-Id: Ide8b034e464eefb1b7c9e2a48ed06e21a7f8d434
2016-04-05 12:28:10 -07:00
Thiemo Mättig
6906de45c1 Fix @param and @return types on all PPFrame::getArgument methods
This is about template parameters. They can be indexed by position (int) or
name (string). The returned value is always a string, or false (bool) on
failure.

Change-Id: I565210ad485505281246ef2bb3086a675b905976
2016-03-29 06:12:18 +00:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Legoktm
543f46e9c0 Revert "Preprocessor: Don't allow unclosed extension tags (matching until end of input)"
This reverts commit f51d0d9a81.

Breaks templates with non-closed </noinclude> tags, which
were previously acceptable.

Bug: T125754
Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
2016-02-04 00:38:35 +00:00
Bartosz Dziewoński
f51d0d9a81 Preprocessor: Don't allow unclosed extension tags (matching until end of input)
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.

This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content.

Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.

It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).

To consider:
* Should we keep previous behavior for unclosed <includeonly> /
  <noinclude>? This would be particularly disruptive for these if
  someone relied on the old behavior, and they're already
  special-cased in places.
* Unclosed <pre> tags are now treated as HTML tags, and are still
  displayed as preformatted text, but without suppressing wikitext
  formatting.

Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
2016-01-21 04:22:34 +00:00
Ori Livneh
447f40d2e9 Move brace matching rules to Preprocessor class
Instead of declaring the array of rules within both Preprocessor_DOM:: and
Preprocessor_Hash::preprocessToXml(), declare it as a protected property of the
parent Preprocessor class.

Change-Id: I6193de66566c164fe85cdd6a88c04fa9c565f1a9
2015-11-03 02:57:05 +00:00
Ori Livneh
1559be9f7a Consolidate common Preprocessor caching code
* Consolidate nearly-identical caching code in Preprocessor_DOM and
  Preprocessor_Hash by making Preprocessor an abstract class rather than an
  interface and by implementing Preprocessor::cacheSetTree() and
  Preprocessor::cacheGetTree().
* Cache trees for wikitext blobs that have length equal or greater to
  PreprocessorCacheThreshold. Previously they needed to be greater than
  PreprocessorCacheThreshold, so this changes the requirement by one character.
  I did it because it seems more natural.
* Modernize the code to use singleton service objects rather than globals.

We spend a lot of time in the Preprocessor, so it would be nice for this code
to be well-factored and clear.

Change-Id: Ib71c29f14a28445a505e12c774a24ad964330b95
2015-10-25 23:06:48 +00:00
umherirrender
c5ab19bf31 Use line comments for @codingStandardsIgnoreStart
In Preprocess_DOM.php and Preprocess_Hash.php the
@codingStandardsIgnoreStart is inside a doc comment, but phpcs does not
see this tag and does not ignore the error. Using line comments fix this
problems.

See
https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs/842/console

Change-Id: Id0edf6edb2902466748165c2e820d2cf4b7fcf75
2015-10-07 20:15:31 +02:00
umherirrender
be0d31ace0 Use variable documentation in Preprocessor_DOM.php
Instead of having comments behind variable declaration.

This also avoids mixed tabs and spaces at begin of line

Change-Id: Iba62430f4413fd52bac1d51f5c5df4cb6479284d
2015-09-28 19:48:33 +02:00
Vivek Ghaisas
c54766586a Fix issues identified by SpaceBeforeSingleLineComment sniff
Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9
2015-09-26 23:06:52 +00:00
Kevin Israel
20cbd0f226 Make PPFrame::RECOVER_COMMENTS actually work
Because of a missing condition, it generally only had an effect on
output type Parser::OT_WIKI, and thus {{msgnw:}} would strip comments
except when substituted during a pre-save transform.

Bug: T98841
Change-Id: I1e47696434fe87475f9902e6bfb8990566456e2f
2015-08-15 04:12:23 -04:00
umherirrender
70f3afd548 Remove unneeded empty lines at begin of if/else/foreach body
An if body must not begin with an empty line

Change-Id: I62b058be337fcc85a120fcd3dadce564db59a271
2015-06-19 20:05:45 +02:00
Kunal Mehta
f6e5079a69 Use mediawiki/at-ease library for suppressing warnings
wfSuppressWarnings() and wfRestoreWarnings() were split out into a
separate library. All usages in core were replaced with the new
functions, and the wf* global functions are marked as deprecated.

Additionally, some uses of @ were replaced due to composer's autoloader
being loaded even earlier.

Ie1234f8c12693408de9b94bf6f84480a90bd4f8e adds the library to
mediawiki/vendor.

Bug: T100923
Change-Id: I5c35079a0a656180852be0ae6b1262d40f6534c4
2015-06-11 18:49:29 +00:00
Ori Livneh
12571bde26 Use a fixed marker prefix string in the Parser and MWTidy
Generating one-time, unique strip markers hurts us in multiple ways:

* The strip marker regexes don't benefit from JIT compilation, so they are
  slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
  compiled, because HHVM bets on regexes getting reused. This extra work is
  fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
  displaces from the cache regexes which are in fact reused.

Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:

* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
  complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
  replace any occurences of \x7f with '?', to prevent strip marker forgery.
  \x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
  prefix may no longer be specified.

Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
2015-05-31 19:33:36 -07:00
Jackmcbarn
9a805b816d Warn when duplicate arguments are found
Currently, duplicate arguments result in a categorization but not a
warning, and it's often difficult to find where in the template hierarchy
the problem lies. This causes a warning to be provided containing the
calling page's name, the called template's name, and the parameter's name.

Bug: T85352
Change-Id: I26b9a7ed5a2f246d00a49a5f6effe40b4443a9d0
2015-05-28 13:36:50 -04:00
Kunal Mehta
13975fe76a Use wikimedia/utfnormal library, add backwards-compatability layer
This drops support for the custom utf8 normal PHP extension in favor
of the intl extension.

Bug: T90825
Change-Id: Ifbaeb2ef684217cf6187ccc4fb4d303f89608300
2015-03-24 12:59:26 -07:00
Ricordisamoa
2ae155da52 Fix phpcs errors in includes/
Mostly Squiz.WhiteSpace.SuperfluousWhitespace.EmptyLines

Change-Id: I678b2f0902f11cd1dfa1611b9da24e7237df9122
2015-01-08 20:15:07 +01:00
Aaron Schulz
4ff8136807 Removed remaining profile calls
Change-Id: I31c81c78715048004fc8fca0f27d09c1fa71c118
2015-01-08 02:49:33 -08:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
Reedy
4d9143c7f5 Add lots of @throws
Change-Id: I09d0c13070f966fcf23d2638d8fc1328279a5995
2014-12-24 13:49:20 +00:00
Jackmcbarn
05b7a51966 Add a tracking category for duplicate arguments
If a page accidentally duplicates an argument, such as
{{foo|bar=1|bar=2}} or {{foo|bar|1=baz}}, add it to a tracking category.

Bug: 69964
Change-Id: I3b6eeff8b51859bc7af0ea985f6f7528c2e9d220
2014-10-16 04:38:14 +00:00
Brad Jorsch
6a7b192a0a Handle multiple ownerDocuments for args in Preprocessor_DOM
As long as Preprocessor_DOM::newPartNodeArray returns nodes with
different roots when called multiple times, PPFrame_DOM::newChild should
be prepared to receive such.

Bug: 70046
Change-Id: Ie048d8dbd3042f19d934ff0dd8d32b4c46f9f952
2014-08-26 15:39:04 +00:00
umherirrender
6b4c44c2db Add missing @param to function docs
Change-Id: Ib26407bc55dff7969d8a3b1e2ae51751b202d8fb
2014-08-18 16:24:59 +00:00
addshore
99d7ca6bfc Add @codingStandardsIgnore tags to parser classes
Change-Id: I16d19de3d2b2461a68030afd3a79aa59c9e948d4
2014-08-12 01:01:11 +00:00
addshore
61c989cfc0 Fix phpcs issues in parser
This fixes all issues except for:
 - class names
 - line length

Change-Id: Ie91b010d5b3eec49d3b80b6e93b125a901ef43c6
2014-08-12 01:00:15 +00:00
Jackmcbarn
368aa5dc67 Make RECOVER_ORIG preserve extension tags
Add PPFrame::NO_TAGS, set by PPFrame::RECOVER_ORIG, to preserve extension
tags rather than expanding them.

Bug: 22683
Change-Id: I427333a20d32eb711a7b5d5ac8b780ef89c752a1
2014-06-19 18:12:14 +00:00
Jackmcbarn
18d15fa138 Add PPFrame::getTTL() and setTTL()
Add functions to frames to control the TTL of their output, and expose
this via expandtemplates in the API.

Bug: 49803
Change-Id: I412febf3469503bf4839fb1ef4dca098a8c79457
2014-06-09 20:40:22 +00:00
Brad Jorsch
d18ba4e9df Add PPFrame::isVolatile and PPFrame::setVolatile
Most wikitext is safe to parse once and then cache for when that same
wikitext is used again, such as for multiple transclusions of the same
template within a page. There are occasions, though, where some piece of
wikitext has side effects and so should not be cached; a prominent
example of such wikitext is the <ref> and <references> tags in Cite.php.

This change adds PPFrame::setVolatile so parser hooks such as <ref> and
<references> can indicate that they have done something that should not
be cached, and PPFrame::isVolatile so that callers of PPFrame::expand
can know when to avoid caching.

Bug: 46815
Bug: 31834
Change-Id: I95b3cf8781cf047cdb63da221cef45f3e7d1632e
2014-05-30 14:07:06 -04:00
Jackmcbarn
2094e578b4 Restrict empty-frame cache entries to their parent
Remove the parser's global $mTplExpandCache, and replace it with an
alternative that is separated by parent frame. This allows the integrity
of the empty-frame expansion cache to be maintained while also allowing
parent frame access.

A page with 3 copies of 
http://ja.wikipedia.org/wiki/%E4%B8%AD%E5%A4%AE%E7%B7%9A_(%E9%9F%93%E5%9B%BD) 
has the following statistics: Without this change, there are 4625 cache hits
on this page, and a sample of 3 parses took 16.6, 16.9, and 16.8 seconds.
With this change, there are 2588 cache hits, and a sample of 3 parses took
16.7, 16.7, and 17.0 seconds.

Change-Id: I621e9075e0f136ac188a4d2f53418b7cc957408d
2014-05-30 01:38:15 +00:00
jenkins-bot
49952a4050 Merge "Make phpcs-strict pass on includes/ (7/7)" 2014-05-19 19:38:53 +00:00
Ori.livneh
df983f6642 Revert "Declare visibility on class properties of includes/parser/"
See https://bugzilla.wikimedia.org/65375#c4

This reverts commit f359cdf614.

Bug: 65375
Change-Id: I12a60b5cc52a07a6deabcbf47c7c99cd2faac3c3
2014-05-16 00:52:24 +00:00
Siebrand Mazeland
a7fbdd6503 Make phpcs-strict pass on includes/ (7/7)
Change-Id: Ia9baaf0b3cdbe1a3c6b50ef8c4fe86fead88f909
2014-05-15 20:07:09 +02:00
Brad Jorsch
ff78abc1a1 Preprocessor_DOM::newPartNodeArray should check that loadXML succeeded
If something manages to get invalid UTF-8 into
Preprocessor_DOM::newPartNodeArray, or anything else that somehow is
invalid XML, it should handle it in the same way that
Preprocessor_DOM::preprocessToObj does rather than having something
further down the line blow up on a PPNode_DOM with a null node.

Bug: 65081
Change-Id: Ic24db455808106e17d49a11e41df33ec170f1206
2014-05-12 03:44:23 +00:00
Siebrand Mazeland
dfc7416fbe Various documentation updates for includes/parser/
Change-Id: I16dd3a792cc83f8c80b3652d42c055730f6d177a
2014-05-11 18:18:26 +02:00
Siebrand Mazeland
2527cca6de Fix most CodeSniffer issues in includes/parser/
Remaining are the classes containing underscores and possibly a few other
issues that will be addressed soonish.

Change-Id: Icf56374c71afc134420ebbcfecf12dcb29dc9564
2014-05-11 08:44:52 +00:00
Siebrand Mazeland
f359cdf614 Declare visibility on class properties of includes/parser/
Change-Id: If03a9bd5eb83be4d15f54e73f49f42540fb7d5fc
2014-05-11 02:25:00 +02:00
umherirrender
7f9fd63901 Fixed some @params documentation (includes/parser)
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Also added some missing @param.

Change-Id: I49f8f48b521878de7abd9cc40efdeff6cf9a37e0
2014-04-22 01:38:39 +02:00
Alexandre Emsenhuber
c29d513deb Put the "else" (or "elseif") on the same line as the previous closing brace
Per https://www.mediawiki.org/wiki/Manual:Coding_conventions#Indenting_and_alignment

Change-Id: I208981db0a866524156bad18cb687f010afeac2c
2014-03-15 13:54:53 +01:00
umherirrender
0bc583af2c Move closing parenthesis from multi line if and function to own line
The Line continuation Coding conventions prefers the closing parenthesis
on the same line than the beginning curly braces. This is done for ifs
and functions.
Also move some boolean operator from the end of a line to the beginning
and changed some indentation to make the condition hopefully better
readable.

Change-Id: Id0437b06bde86eb5a75bc59eefa19e7edb624426
2013-12-01 21:39:00 +01:00
umherirrender
5dbfd5bf80 Fixed spacing
- Removed trailing spaces in comments
- Removed multiple empty lines
- Removed space after object operator

Change-Id: I9fd3256ab490c7cd2034de3fd94e6be6e6d6d8f2
2013-11-21 18:52:25 +00:00
Platonides
c7ab09b0ff Cleanup Preprocessor_DOM::preprocessToObj wfProfileOut()s
Simplify the multiple if levels used for profiling out.

Change-Id: Id0530207f99daca49a6a76ce256476b677a4108f
2013-09-21 02:32:02 +00:00
umherirrender
24bfde2710 Fix spacing and break some lines
Change-Id: Ia57685d8858e02e399ad5c75ce64d12609d340ac
2013-08-24 17:06:25 +02:00
C. Scott Ananian
35422cadf2 Allow lines empty but for tabs and comments to be ignored.
We originally allowed only spaces around comments.  Now allow tabs as
well.  This ought to affect very few pages, but it helps predictability
and to maintain consistency between the PHP preprocessor and parsoid.

Change-Id: Icb3ff6eec08aaa83ae332d03c910c13995c9c9ee
2013-08-13 15:36:57 -04:00
C. Scott Ananian
f089e20bc0 Preprocessor: Don't treat a line containing multiple comments as a blank line.
After this patch, 'a', 'b', and 'c' are all treated as members of the
same list in the following wikitext:

*a
 <!--x-->
*b
 <!--x--> <!--y-->
*c

The old comment-removal rule was "trim a comment which is both
preceded and followed by a newline (ignoring spaces)".  This only works
if there is a single comment on the line, and was often surprising
to users.  The new rule allows any number of whitespace-separated
comments on the line.

Bug: 41756
Change-Id: I6030086226e1eeece59643c29dbb4361668b4bd6
2013-08-09 00:28:28 +00:00
Platonides
f2b6f389da Simplify the nested ifs of Preprocessor_DOM::preprocessToObj()
Change-Id: Ibb91068678aca1729f00f1ba7844017771334e94
2013-04-23 13:05:09 +00:00
umherirrender
6c38a5eb72 Fixed spacing in logging/parser/profiler/rl/revdel/search folder
Added spaces before if, foreach
Added some braces for one line statements

Change-Id: I11bbcfa351e945b7bde10c2105d61a3cf5622205
2013-04-20 17:38:24 +02:00
umherirrender
15abcf71ca Added/Removed spaces around string concatenation
And added/removed spaces around some other tokens,
like +, -, *, /, <, >, =, !

Fixed windows newline style

Change-Id: I0b9c8c408f3f6bfc0d685a074d7ec468fb848fc8
2013-04-13 13:36:24 +02:00
umherirrender
978bb31c5e Add missing wfProfileOut before throwing an exception
Change-Id: I1d830da0597f19efd0b2ae48642389975e736e23
2013-04-08 18:37:24 +00:00
umherirrender
6c278b6d7e fix some spacing
* Removed spaces around array index
* Removed double spaces or added spaces to begin or end of function
  calls, method signature, conditions or foreachs
* Added braces to one-line ifs
* Changed multi line conditions to one line conditions
* Realigned some arrays

Change-Id: Ia04d2a99d663b07101013c2d53b3b2e872fd9cc3
2013-03-25 22:22:46 +00:00