This reverts commit f51d0d9a81.
Breaks templates with non-closed </noinclude> tags, which
were previously acceptable.
Bug: T125754
Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.
This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content.
Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.
It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).
To consider:
* Should we keep previous behavior for unclosed <includeonly> /
<noinclude>? This would be particularly disruptive for these if
someone relied on the old behavior, and they're already
special-cased in places.
* Unclosed <pre> tags are now treated as HTML tags, and are still
displayed as preformatted text, but without suppressing wikitext
formatting.
Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
Instead of declaring the array of rules within both Preprocessor_DOM:: and
Preprocessor_Hash::preprocessToXml(), declare it as a protected property of the
parent Preprocessor class.
Change-Id: I6193de66566c164fe85cdd6a88c04fa9c565f1a9
* Consolidate nearly-identical caching code in Preprocessor_DOM and
Preprocessor_Hash by making Preprocessor an abstract class rather than an
interface and by implementing Preprocessor::cacheSetTree() and
Preprocessor::cacheGetTree().
* Cache trees for wikitext blobs that have length equal or greater to
PreprocessorCacheThreshold. Previously they needed to be greater than
PreprocessorCacheThreshold, so this changes the requirement by one character.
I did it because it seems more natural.
* Modernize the code to use singleton service objects rather than globals.
We spend a lot of time in the Preprocessor, so it would be nice for this code
to be well-factored and clear.
Change-Id: Ib71c29f14a28445a505e12c774a24ad964330b95
In Preprocess_DOM.php and Preprocess_Hash.php the
@codingStandardsIgnoreStart is inside a doc comment, but phpcs does not
see this tag and does not ignore the error. Using line comments fix this
problems.
See
https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs/842/console
Change-Id: Id0edf6edb2902466748165c2e820d2cf4b7fcf75
This is a temporarily workaround for T111289. The data ought not be so large,
but it frequently is, and the problem is almost exclusive to this code path.
For now, just avoid attempting to cache the value if its size exceeds a million
bytes.
Bug: T111289
Change-Id: Idd1acd903193f0753cc5548bd32800705716dd9f
Because of a missing condition, it generally only had an effect on
output type Parser::OT_WIKI, and thus {{msgnw:}} would strip comments
except when substituted during a pre-save transform.
Bug: T98841
Change-Id: I1e47696434fe87475f9902e6bfb8990566456e2f
Generating one-time, unique strip markers hurts us in multiple ways:
* The strip marker regexes don't benefit from JIT compilation, so they are
slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
compiled, because HHVM bets on regexes getting reused. This extra work is
fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
displaces from the cache regexes which are in fact reused.
Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:
* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
replace any occurences of \x7f with '?', to prevent strip marker forgery.
\x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
prefix may no longer be specified.
Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
Currently, duplicate arguments result in a categorization but not a
warning, and it's often difficult to find where in the template hierarchy
the problem lies. This causes a warning to be provided containing the
calling page's name, the called template's name, and the parameter's name.
Bug: T85352
Change-Id: I26b9a7ed5a2f246d00a49a5f6effe40b4443a9d0
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
If a page accidentally duplicates an argument, such as
{{foo|bar=1|bar=2}} or {{foo|bar|1=baz}}, add it to a tracking category.
Bug: 69964
Change-Id: I3b6eeff8b51859bc7af0ea985f6f7528c2e9d220
There is no PPNode_Hash class, use interface PPNode
Preprocessor_Hash does not work with DOMDocument, so remove that
Follow-Up: I621e9075e0f136ac188a4d2f53418b7cc957408d
Change-Id: Ic234d37e15b7fdf274bd55101bb576d2ec49a781
- Swap "$variable type" to "type $variable"
- Added missing types
- Fixed spacing inside docs
- Makes beginning of @param/@return/@var/@throws in capital
- Changed some types to match the more common spelling
Change-Id: I8ebfbcea0e2ae2670553822acedde49c1aa7e98d
Add PPFrame::NO_TAGS, set by PPFrame::RECOVER_ORIG, to preserve extension
tags rather than expanding them.
Bug: 22683
Change-Id: I427333a20d32eb711a7b5d5ac8b780ef89c752a1
Add functions to frames to control the TTL of their output, and expose
this via expandtemplates in the API.
Bug: 49803
Change-Id: I412febf3469503bf4839fb1ef4dca098a8c79457
Most wikitext is safe to parse once and then cache for when that same
wikitext is used again, such as for multiple transclusions of the same
template within a page. There are occasions, though, where some piece of
wikitext has side effects and so should not be cached; a prominent
example of such wikitext is the <ref> and <references> tags in Cite.php.
This change adds PPFrame::setVolatile so parser hooks such as <ref> and
<references> can indicate that they have done something that should not
be cached, and PPFrame::isVolatile so that callers of PPFrame::expand
can know when to avoid caching.
Bug: 46815
Bug: 31834
Change-Id: I95b3cf8781cf047cdb63da221cef45f3e7d1632e
Remove the parser's global $mTplExpandCache, and replace it with an
alternative that is separated by parent frame. This allows the integrity
of the empty-frame expansion cache to be maintained while also allowing
parent frame access.
A page with 3 copies of
http://ja.wikipedia.org/wiki/%E4%B8%AD%E5%A4%AE%E7%B7%9A_(%E9%9F%93%E5%9B%BD)
has the following statistics: Without this change, there are 4625 cache hits
on this page, and a sample of 3 parses took 16.6, 16.9, and 16.8 seconds.
With this change, there are 2588 cache hits, and a sample of 3 parses took
16.7, 16.7, and 17.0 seconds.
Change-Id: I621e9075e0f136ac188a4d2f53418b7cc957408d
Remaining are the classes containing underscores and possibly a few other
issues that will be addressed soonish.
Change-Id: Icf56374c71afc134420ebbcfecf12dcb29dc9564
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Also added some missing @param.
Change-Id: I49f8f48b521878de7abd9cc40efdeff6cf9a37e0
The Line continuation Coding conventions prefers the closing parenthesis
on the same line than the beginning curly braces. This is done for ifs
and functions.
Also move some boolean operator from the end of a line to the beginning
and changed some indentation to make the condition hopefully better
readable.
Change-Id: Id0437b06bde86eb5a75bc59eefa19e7edb624426
We originally allowed only spaces around comments. Now allow tabs as
well. This ought to affect very few pages, but it helps predictability
and to maintain consistency between the PHP preprocessor and parsoid.
Change-Id: Icb3ff6eec08aaa83ae332d03c910c13995c9c9ee
After this patch, 'a', 'b', and 'c' are all treated as members of the
same list in the following wikitext:
*a
<!--x-->
*b
<!--x--> <!--y-->
*c
The old comment-removal rule was "trim a comment which is both
preceded and followed by a newline (ignoring spaces)". This only works
if there is a single comment on the line, and was often surprising
to users. The new rule allows any number of whitespace-separated
comments on the line.
Bug: 41756
Change-Id: I6030086226e1eeece59643c29dbb4361668b4bd6
And added/removed spaces around some other tokens,
like +, -, *, /, <, >, =, !
Fixed windows newline style
Change-Id: I0b9c8c408f3f6bfc0d685a074d7ec468fb848fc8
* Removed spaces around array index
* Removed double spaces or added spaces to begin or end of function
calls, method signature, conditions or foreachs
* Added braces to one-line ifs
* Changed multi line conditions to one line conditions
* Realigned some arrays
Change-Id: Ia04d2a99d663b07101013c2d53b3b2e872fd9cc3
Doxygen expects parameter types to come before the
parameter name in @param tags. Used a quick regex
to switch everything around where possible. This
only fixes cases where a primitve variable (or a
primitive followed by other types) is the variable
type. Other cases will need to be fixed manually.
Change-Id: Ic59fd20856eb0489d70f3469a56ebce0efb3db13
Added/removed spaces around logical/arithmetic operator
Reduced multiple empty lines to one empty line
Removed wrong tabs before comments at end of line
Removed too many spaces in assigments
Change-Id: I2bba4e72f9b5f88c53324d7b70e6042f1aad8f6b
Parser tests also included, test case and original patch supplied by
Bergi on bugzilla. Tested against the current version.
Change-Id: Id7ec4e694783dd0f682f65f39d8b9e59f82e58aa