Commit graph

134 commits

Author SHA1 Message Date
Umherirrender
dd25f3df6b Improve type hints in parser related classes
Change-Id: Ia07a2eb32894f96b195fa3189fb5f617e68f2581
2019-07-05 21:29:32 +00:00
Zoranzoki21
4226fada45 Split parser related files to have one class in one file
Change-Id: I36b26609ccb3f135a22961b32a46cdc06603b3e4
2019-04-27 00:41:47 +00:00
Derick Alangi
fb46f4fc3f parser: Fix return type for methods and match phpdoc comments
Change-Id: I867d7eb6fc56cc52eb8e129977b7a62607a11268
2019-04-12 23:33:47 +00:00
Aryeh Gregor
7b4489e019 Get rid of unnecessary func_get_args() and friends
HHVM does not support variadic arguments with type hints.  This is
mostly not a big problem, because we can just drop the type hint, but
for some reason PHPUnit adds a type hint of "array" when it creates
mocks, so a class with a variadic method can't be mocked (at least in
some cases).  As such, I left alone all the classes that seem like
someone might like to mock them, like Title and User.  If anyone wants
to mock them in the future, they'll have to switch back to
func_get_args().  Some of the changes are definitely safe, like
functions and test classes.

In most cases, func_get_args() (and/or func_get_arg(), func_num_args() )
were only present because the code was written before we required PHP
5.6, and writing them as variadic functions is strictly superior. In
some cases I left them alone, aside from HHVM compatibility:

* Forwarding all arguments to another function. It's useful to keep
  func_get_args() here where we want to keep the list of expected
  arguments and their meanings in the function signature line for
  documentation purposes, but don't want to copy-paste a long line of
  argument names.
* Handling deprecated calling conventions.
* One or two miscellaneous cases where we're basically using the
  arguments individually but want to use them as an array as well for
  some reason.

Change-Id: I066ec95a7beb7c0665146195a08e7cce1222c788
2019-04-12 20:17:01 +00:00
Fomafix
43244db9a2 Use PHP 7 '??' operator instead of if-then-else
Change-Id: If9d4be5d88c8927f63cbb84dfc8181baf62ea3eb
2018-10-21 21:46:46 +02:00
Umherirrender
a4caa4d0c6 build: Updating mediawiki/mediawiki-codesniffer to 22.0.0
Added spaces around .
Removed empty return statement which are not required
Removed return after phpunit markTestIncomplete,
which is throwing to exit the test, no need for a return

Change-Id: I2c80b965ee52ba09949e70ea9e7adfc58a1d89ce
2018-09-16 15:51:11 +00:00
Bartosz Dziewoński
485f66f174 Use PHP 7 '??' operator instead of '?:' with 'isset()' where convenient
Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/
Replace with: '\1 ?? '

(Everywhere except includes/PHPVersionCheck.php)
(Then, manually fix some line length and indentation issues)

Then manually reviewed the replacements for cases where confusing
operator precedence would result in incorrect results
(fixing those in I478db046a1cc162c6767003ce45c9b56270f3372).

Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
2018-05-30 18:06:13 -07:00
Kunal Mehta
f093fb4192 Don't globally disable PHPCS's prohibition of assert()
Whitelist the remaining usages of assert(), and reinstate the PHPCS sniff
that forbids usage of it. Add FIXME comments as well, so any casual readers
of the code will not think that the disabling and usage is intentional.

Change-Id: I7cabe715c0e6aa6a9ef3ffe5657f3de7fd8e662b
2018-05-07 17:15:16 +00:00
Arlo Breault
f27c50b580 Clarify -{ => {{ transition
Ensure we have the correct rule on the stack.

Change-Id: Ie814df7b759a2381be0b815eeefdb5d1f7adcde0
2018-03-15 13:04:59 -04:00
Arlo Breault
7c450dff64 Remove "dash" case in preprocessToObj
This was introduced in 2877402 and removed in 186a182

Change-Id: Ibfa1ae1597bfc50ae6ea49402c7966ca042f12e5
2018-03-09 17:46:53 -05:00
Umherirrender
3124a990a2 Use ::class to resolve class names in includes files
This helps to find renamed or misspelled classes earlier.
Phan will check the class names

Change-Id: I07a925c2a9404b0865e8a8703864ded9d14aa769
2018-01-27 20:34:29 +01:00
Umherirrender
255d76f2a1 build: Updating mediawiki/mediawiki-codesniffer to 15.0.0
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable

For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore

Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
2018-01-01 14:10:16 +01:00
jenkins-bot
1a40e0cc86 Merge "Change php extract() to explicit code" 2017-12-28 09:44:59 +00:00
Huji Lee
e74bfe13f6 Require indentation of CASE statements in PHP code
Bug: T182546
Change-Id: I91a9555893a08e4ec58da97c6cc4d1e70000ff6b
2017-12-10 22:07:50 -05:00
Umherirrender
3d560be428 Change php extract() to explicit code
Avoid php magic and make var settings more visible

Change-Id: I223874fd871104b0ac6a80d7f39c6dd997d0551d
2017-12-08 14:46:33 +01:00
Umherirrender
f739a8f368 Improve some parameter docs
Add missing @return and @param to function docs and fixed some @param

Change-Id: I810727961057cfdcc274428b239af5975c57468d
2017-09-10 20:32:31 +02:00
Umherirrender
3f1a52805e Use short type bool/int in param documentation
Enable the phpcs sniffs for this and used phpcbf

Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925
2017-08-20 13:20:59 +02:00
WMDE-Fisch
6df9ed1ad6 update mediawiki-codesniffer to 0.11.0 and fix issues
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case  passing by reference

Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
2017-08-11 22:27:51 +02:00
Umherirrender
b5cddfb27b Remove empty lines at begin of function, if, foreach, switch
Organize phpcs.xml a bit

Change-Id: Ifb767729b481b4b686e6d6444cf48b1f580cc478
2017-07-01 11:34:16 +00:00
C. Scott Ananian
186a182a15 Protect language converter markup in the preprocessor (take 2).
This revises 2877402276, which was
reverted in master due to unexpected issues with `-{{...}} ` markup
on translatewiki and enwiki.  Test cases are added to ensure that this
is parsed as a template, not as language converter markup.

https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical
documentation for the preprocessor; this will be updated after this
patch is merged.  The basic principles described in that page are
maintained in this patch:

* Rightmost opening structure has precedence: `-{{` is parsed as a
dash followed by template opening.

* `{{{` has precedence over `{{` and `-{`: `-{{{{` is parsed as
`-{` `{{{` since we first grab the rightmost `{{{`.

A bunch of test cases were added to verify the "ideal precedence"
order described on that wiki page.

This patch introduced some minor incompatibilities in existing
markup, in particular with chemical formulae in templates.
Fixes for these are being tracked at
https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups

Bug: T146304
Bug: T153761
Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150
2017-05-23 15:43:49 +01:00
James D. Forrester
9635dda73a includes: Replace implicit Bugzilla bug numbers with Phab ones
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.

Change-Id: I6f59febaf8fc96e80f8cfc11f4356283f461142a
2017-02-21 18:13:24 +00:00
C. Scott Ananian
046c463635 Revert "Protect language converter markup in the preprocessor."
This effectively reverts commit 2877402276 in
order to unblock the deploy train.  The underlying behavior might not be
incorrect, but it was unexpected.

Bug: T153761
Change-Id: Ifc9c7cf3482dd5d222ff4da24a6d4cc401e9d965
2017-01-03 17:23:28 -05:00
C. Scott Ananian
2877402276 Protect language converter markup in the preprocessor.
This ensures that `{{echo|-{R|foo}-}}` is parsed correctly as
a template invocation with a single argument, not as two separate
arguments split by the `|`.

Bug: T146304
Change-Id: I709d007c70a3fd19264790055042c615999b2f67
2016-12-15 23:50:44 +00:00
Tim Starling
6a04d86149 Remove all assert() calls with string parameters
These fail when HHVM is in RepoAuthoritative mode

Change-Id: Ifb1628f8269b2b651154b740b95cc14163a1b186
2016-08-15 23:11:18 +00:00
Tim Starling
b2f7bb4d76 Preprocessor_Hash: use child arrays instead of linked lists
The singly-linked list data structure of Preprocessor_Hash was causing
stack exhaustion due to the need for a recursion depth proportional to
the number of children of a given PPNode, in serialize() and on
object destruction. So, switch to array-based storage. PPNode_* becomes
a temporary proxy around the underlying storage, which avoids circular
references and keeps the storage very compact. Preprocessor_DOM uses
similar temporary PPNode objects, so the fact that

  $node->getFirstChild() !== $node->getFirstChild()

should not cause any new problems.

* Increment cache version
* Use JSON serialization of the store array instead of serialize(),
  since JSON is more compact, even after gzipping.
* For efficiency, make $accum a plain array, and use it as an array
  where possible, instead of using helper functions.

Performance and memory usage for typical input are slightly improved:
something like 4% faster for the whole parse, and 20% less memory for
the tree.

Bug: T73486
Change-Id: I0d6c162b790d6dc1ddb0352aba6e4753854f4c56
2016-07-22 05:25:11 +00:00
Bartosz Dziewoński
674e8388cb Preprocessor: Don't allow unclosed extension tags (matching until end of input)
(Previously done in f51d0d9a81 and
reverted in 543f46e9c08e0ff8c5e8b4e917fcc045730ef1bc.)

I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.

This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content. (For tags that
"shadow" a HTML tag name, this results in the tag being treated as a
HTML tag. This currently only affects <pre> tags: if unclosed, they
are still displayed as preformatted text, but without suppressing
wikitext formatting.)

It also does not affect <includeonly>, <noinclude> and <onlyinclude>
tags. Changing this behavior now would be too disruptive to existing
content, and is the reason why previous attempt was reverted. (They
are already special-cased enough that this isn't too weird, for example
mismatched closing tags are hidden.)

Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.

It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).

Change-Id: Ide8b034e464eefb1b7c9e2a48ed06e21a7f8d434
2016-04-05 12:28:10 -07:00
Thiemo Mättig
6906de45c1 Fix @param and @return types on all PPFrame::getArgument methods
This is about template parameters. They can be indexed by position (int) or
name (string). The returned value is always a string, or false (bool) on
failure.

Change-Id: I565210ad485505281246ef2bb3086a675b905976
2016-03-29 06:12:18 +00:00
umherirrender
74534f9ba6 Fix unmatched @codingStandardsIgnore in parser folder
Fix outstanding phpcs errors

Change-Id: I7b857be88354f2ffa27d76406253ec9e9710b91d
2016-02-17 21:26:30 +01:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Legoktm
543f46e9c0 Revert "Preprocessor: Don't allow unclosed extension tags (matching until end of input)"
This reverts commit f51d0d9a81.

Breaks templates with non-closed </noinclude> tags, which
were previously acceptable.

Bug: T125754
Change-Id: I8bafb15eefac4e1d3e727c1c84782636d8b82c2b
2016-02-04 00:38:35 +00:00
Bartosz Dziewoński
f51d0d9a81 Preprocessor: Don't allow unclosed extension tags (matching until end of input)
I think it's saner to treat this as invalid syntax, and output the
mismatched tag code verbatim. The current behavior is particularly
annoying for <ref> tags, which often swallow everything afterwards.

This does not affect HTML tags, though. Assuming Tidy is enabled, they
are still auto-closed at the end of the page content.

Related to T17712 and T58306. I think this brings the PHP parser closer
to Parsoid's interpretation.

It reduces performance somewhat in the worst case, though. Testing with
https://phabricator.wikimedia.org/F3245989 (a 1 MB page starting with
3000 opening tags of 15 different types), parsing time rises from
~0.2 seconds to ~1.1 seconds on my setup. We go from O(N) to O(kN),
where N is bytes of input and k is the number of types of tags present
on the page. Maximum k shouldn't exceed 30 or so in reasonable setups
(depends on installed extensions, it's 20 on English Wikipedia).

To consider:
* Should we keep previous behavior for unclosed <includeonly> /
  <noinclude>? This would be particularly disruptive for these if
  someone relied on the old behavior, and they're already
  special-cased in places.
* Unclosed <pre> tags are now treated as HTML tags, and are still
  displayed as preformatted text, but without suppressing wikitext
  formatting.

Change-Id: Ia2f24dbfb3567c4b0778761585e6c0303d11ddd0
2016-01-21 04:22:34 +00:00
umherirrender
54c1e18eec Remove various double empty newlines
The double empty newline is not needed between functions, variable or at
end of file

Change-Id: Ib866a95084c4601ac150a2b402cfa184ebc18afa
2015-12-27 18:55:12 +00:00
Brad Jorsch
2f95b102f5 Fix PPNode_Hash_Tree::getChildrenOfType return value
PPNode defines it as returning an array-type PPNode, not an array.

Change-Id: I9a6c5cea408aae449bfbf808d067837c4337c672
2015-12-17 22:18:14 +00:00
Ori Livneh
447f40d2e9 Move brace matching rules to Preprocessor class
Instead of declaring the array of rules within both Preprocessor_DOM:: and
Preprocessor_Hash::preprocessToXml(), declare it as a protected property of the
parent Preprocessor class.

Change-Id: I6193de66566c164fe85cdd6a88c04fa9c565f1a9
2015-11-03 02:57:05 +00:00
Ori Livneh
1559be9f7a Consolidate common Preprocessor caching code
* Consolidate nearly-identical caching code in Preprocessor_DOM and
  Preprocessor_Hash by making Preprocessor an abstract class rather than an
  interface and by implementing Preprocessor::cacheSetTree() and
  Preprocessor::cacheGetTree().
* Cache trees for wikitext blobs that have length equal or greater to
  PreprocessorCacheThreshold. Previously they needed to be greater than
  PreprocessorCacheThreshold, so this changes the requirement by one character.
  I did it because it seems more natural.
* Modernize the code to use singleton service objects rather than globals.

We spend a lot of time in the Preprocessor, so it would be nice for this code
to be well-factored and clear.

Change-Id: Ib71c29f14a28445a505e12c774a24ad964330b95
2015-10-25 23:06:48 +00:00
umherirrender
c5ab19bf31 Use line comments for @codingStandardsIgnoreStart
In Preprocess_DOM.php and Preprocess_Hash.php the
@codingStandardsIgnoreStart is inside a doc comment, but phpcs does not
see this tag and does not ignore the error. Using line comments fix this
problems.

See
https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs/842/console

Change-Id: Id0edf6edb2902466748165c2e820d2cf4b7fcf75
2015-10-07 20:15:31 +02:00
Vivek Ghaisas
c54766586a Fix issues identified by SpaceBeforeSingleLineComment sniff
Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9
2015-09-26 23:06:52 +00:00
Ori Livneh
4595e34ff1 Decline to cache preprocessor items larger than 1 Mb
This is a temporarily workaround for T111289. The data ought not be so large,
but it frequently is, and the problem is almost exclusive to this code path.
For now, just avoid attempting to cache the value if its size exceeds a million
bytes.

Bug: T111289
Change-Id: Idd1acd903193f0753cc5548bd32800705716dd9f
2015-09-02 17:27:28 -07:00
Kevin Israel
20cbd0f226 Make PPFrame::RECOVER_COMMENTS actually work
Because of a missing condition, it generally only had an effect on
output type Parser::OT_WIKI, and thus {{msgnw:}} would strip comments
except when substituted during a pre-save transform.

Bug: T98841
Change-Id: I1e47696434fe87475f9902e6bfb8990566456e2f
2015-08-15 04:12:23 -04:00
umherirrender
70f3afd548 Remove unneeded empty lines at begin of if/else/foreach body
An if body must not begin with an empty line

Change-Id: I62b058be337fcc85a120fcd3dadce564db59a271
2015-06-19 20:05:45 +02:00
Ori Livneh
12571bde26 Use a fixed marker prefix string in the Parser and MWTidy
Generating one-time, unique strip markers hurts us in multiple ways:

* The strip marker regexes don't benefit from JIT compilation, so they are
  slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
  compiled, because HHVM bets on regexes getting reused. This extra work is
  fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
  displaces from the cache regexes which are in fact reused.

Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:

* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
  complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
  replace any occurences of \x7f with '?', to prevent strip marker forgery.
  \x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
  prefix may no longer be specified.

Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
2015-05-31 19:33:36 -07:00
Jackmcbarn
9a805b816d Warn when duplicate arguments are found
Currently, duplicate arguments result in a categorization but not a
warning, and it's often difficult to find where in the template hierarchy
the problem lies. This causes a warning to be provided containing the
calling page's name, the called template's name, and the parameter's name.

Bug: T85352
Change-Id: I26b9a7ed5a2f246d00a49a5f6effe40b4443a9d0
2015-05-28 13:36:50 -04:00
Aaron Schulz
4ff8136807 Removed remaining profile calls
Change-Id: I31c81c78715048004fc8fca0f27d09c1fa71c118
2015-01-08 02:49:33 -08:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
Jackmcbarn
05b7a51966 Add a tracking category for duplicate arguments
If a page accidentally duplicates an argument, such as
{{foo|bar=1|bar=2}} or {{foo|bar|1=baz}}, add it to a tracking category.

Bug: 69964
Change-Id: I3b6eeff8b51859bc7af0ea985f6f7528c2e9d220
2014-10-16 04:38:14 +00:00
jenkins-bot
c8aa42bbfb Merge "Fix doc of PPFrame_Hash::cachedExpand" 2014-08-20 06:55:21 +00:00
umherirrender
6b4c44c2db Add missing @param to function docs
Change-Id: Ib26407bc55dff7969d8a3b1e2ae51751b202d8fb
2014-08-18 16:24:59 +00:00
addshore
99d7ca6bfc Add @codingStandardsIgnore tags to parser classes
Change-Id: I16d19de3d2b2461a68030afd3a79aa59c9e948d4
2014-08-12 01:01:11 +00:00
addshore
61c989cfc0 Fix phpcs issues in parser
This fixes all issues except for:
 - class names
 - line length

Change-Id: Ie91b010d5b3eec49d3b80b6e93b125a901ef43c6
2014-08-12 01:00:15 +00:00
umherirrender
b63b9f1e92 Fix doc of PPFrame_Hash::cachedExpand
There is no PPNode_Hash class, use interface PPNode
Preprocessor_Hash does not work with DOMDocument, so remove that

Follow-Up: I621e9075e0f136ac188a4d2f53418b7cc957408d
Change-Id: Ic234d37e15b7fdf274bd55101bb576d2ec49a781
2014-08-11 19:15:33 +00:00