Commit graph

753 commits

Author SHA1 Message Date
Timo Tijhof
d06855ecbe Parser: Say tildes instead of ~~~ in comment to fix Doxygen fatal
Doxygen was unable to parse the file past validateSig().

> Parser.php:6397: warning: reached end of file while inside a ~~~ block!
> The command that should end the block seems to be missing!

Change-Id: I3d1b547968302611d2bd78a7c11dd0738b40d23a
2015-04-06 12:32:25 +00:00
Max Semenik
08762b02de Minor cleanups
* Declare undeclared variables
* Kill unused variables
* Fix comments including PHPDoc

Change-Id: I60015f6b6740aa9088bda3745f4dc4e65e29fcb1
2015-04-02 16:22:42 +00:00
Aaron Schulz
ac0de3c430 Fixed {{REVISION(TIMESTAMP|USER|SIZE)}} on new revisions
* This makes use of the injected new revision object used elsewhere
  in Parser to solve this problem.

Bug: T94407
Change-Id: I7881583cf7cb2bc799c89ffaa2a344a2d4ca3a4e
2015-03-30 21:10:09 -07:00
Chad Horohoe
c33f4de066 Profile all external HTTP requests from MW
Change-Id: Ie980b080da2ef21ec7d9fc32f1accc55710de140
2015-03-03 20:54:30 -08:00
Jackmcbarn
cab99af90e Fix TOC anchor name collisions in edge cases
Currently, the parser adds a "_2" to the second of two identical headlines to
avoid collisions, but there's still a collision if another headline actually
ends in "_2". This change causes the new headline to also be checked for a
collision, and advances to "_3" or beyond if there is one.

Bug: T26787
Change-Id: Id0a55aa4c1917bac2f8f0d4863fcb85bd3dff1ca
2015-02-17 20:59:33 +00:00
Timo Tijhof
d62a2b76b1 Replace dev.w3.org with more permanent or stable urls
* Sanitizer: dev.w3.org/html5/spec-preview
  Follows-up 8e8b15afc6.
  Use stable reference to www.w3.org/TR/html5 instead (currently
  from October 2014) instead of an old preview branch from 2012.

* parserTests: dev.w3.org/html5
  Follows-up 959aa336a1.
  Url is now a dead end. Replaced with link to a draft from around
  that time. The relevant section no longer exists in the curent
  spec as it got split off into a separate spec. Maybe this one:
  https://url.spec.whatwg.org/#percent-encoded-bytes

* Parser, HTMLIntField: dev.w3.org/html5
  Use stable reference to www.w3.org/TR/html5 instead.

* HTMLFloatField.php: dev.w3.org/html5
  Url is now a dead end. Draft from around that time:
  http://www.w3.org/TR/2011/WD-html5-20110525/common-microsyntaxes.html#real-numbers
  The section "Real numbers" no longer exists in the current spec,
  but the Infrastructure chapter has a section on floating point
  numbers that describes the same sequence now.

Change-Id: I7dcd49b6cd39785fb1b294e4eeaf39bda52337b2
2015-02-14 14:21:33 +00:00
Sam Reed
f41e2ddb6a Don't split regex string unnecessarily
Change-Id: Id5912e64916ce5c7be2991478c32531596917540
2015-01-28 16:17:41 +00:00
Aaron Schulz
6921770414 Updated some try-catch statements: MWException -> Exception
Change-Id: I76601a86e30f4984e3b1a8c8ec5ef5a0f652433a
2015-01-09 17:20:22 -08:00
Ricordisamoa
2ae155da52 Fix phpcs errors in includes/
Mostly Squiz.WhiteSpace.SuperfluousWhitespace.EmptyLines

Change-Id: I678b2f0902f11cd1dfa1611b9da24e7237df9122
2015-01-08 20:15:07 +01:00
Aaron Schulz
4ff8136807 Removed remaining profile calls
Change-Id: I31c81c78715048004fc8fca0f27d09c1fa71c118
2015-01-08 02:49:33 -08:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
Amir E. Aharoni
144d741196 Shorten lines to pass phpcs test
Change-Id: I5588e1f16f1a23d77160cd180058bd2000a93ab6
2014-12-29 17:14:08 +02:00
Derk-Jan Hartman
e20e64eb6b Parser: Add <bdi> to the whitelist for TOC links
Bug: 72884
Change-Id: Id5aa9a4eb32fb185881141e55de700ae36f806c5
2014-12-27 21:24:42 +01:00
C. Scott Ananian
54a8199f87 Don't allow embedded newlines in magic links, but do allow &nbsp;
This continues the work started in T67278 to make magic link parsing
more consistent with wiki text parsing in general, and closes two
long-standing bugs.

Bug: T30950
Bug: T31025
Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
2014-12-22 04:14:55 +00:00
C. Scott Ananian
b975a0bfe0 Don't break autolinks by stripping the final semicolon from an entity.
Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link.  But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.

Fix this corner case.  This also unifies autolink parsing with Parsoid.

See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
2014-12-18 17:27:55 -05:00
Brad Jorsch
5c1eeb2464 Normalize "\r" newlines in preSaveTransform
The behavior of the different preprocessors differs when given \r or
\r\n newlines. We already normalize the latter here, so may as well do
the former here too.

Bug: T78488
Change-Id: Id6390f64a73ea01088729f25d79103388c1fe7e8
2014-12-15 16:59:42 -05:00
C. Scott Ananian
25d35fc65c Enforce spaces around magic links (RFC, PMID, and ISBN).
Ensure that there is a \b boundary before and after RFC, PMID, and ISBN
links.  (Previously we enforced \b boundaries only before free external
links and after ISBN links.)  Consistency is a good thing!

In addition:
* \b is not a PHP escape sequence, so you don't need to write \\b inside
  a string.
* \b before the numeric part of an ISBN is pointless: by the structure
  of the regexp there will always be a space on the left and a word
  character (a digit) on the right.

Bug: 65278
Change-Id: Ic315b988091a5c7530a8285b9249804db72e55db
2014-12-11 03:41:23 +00:00
Aaron Schulz
e369f66d00 Replace wfRunHooks calls with direct Hooks::run calls
* This avoids the overhead of an extra function call

Change-Id: I8ee996f237fd111873ab51965bded3d91e61e4dd
2014-12-10 12:26:59 -08:00
umherirrender
489d793882 Fixed spacing
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file

Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
2014-12-05 22:28:07 +01:00
jenkins-bot
0b8f48c535 Merge "Use Parser::SFH_NO_HASH/SFH_OBJECT_ARGS class const" 2014-11-27 05:55:07 +00:00
Aaron Schulz
88ad1bd9a7 Cleaned up template profile report tabbing
Change-Id: I46abfc856d718d4db73d0510bde3e2b589341b10
2014-11-18 14:58:02 -08:00
umherirrender
91f26d50ee Use Parser::SFH_NO_HASH/SFH_OBJECT_ARGS class const
Instead of the global const
Add hint to Defines, that they should not be used.

Change-Id: I3e1dcf46fe18a97a05e3406c209815adb7e0e083
2014-11-18 21:19:22 +01:00
Chad Horohoe
b8d93fb4fd Refactor profiling output from profiling
* Added a standard getFunctionStats() method for Profilers to return
  per function data as maps. This is not toolbar specific like getRawData().
* Cleaned up the interface of SectionProfiler::getFunctionStats() a bit.
* Removed unused cpu_sq, real_sq fields from profiler UDP output.
* Moved getTime/getInitialTime to ProfilerStandard.

Co-Authored-By: Aaron Schulz <aschulz@wikimedia.org>
Change-Id: I266ed82031a434465f64896eb327f3872fdf1db1
2014-11-17 19:26:04 -07:00
Aaron Schulz
0bfa6b6264 Move request-only template profiling to an always-on parser report
Change-Id: I0660c8d6cac0dadab648eac9736504b7939320f3
2014-11-12 18:06:00 -08:00
Reedy
aa5c2493cb Remove documentation hinting LinkHolderArray::replace() should return value
Return value not used in any code in our repo

Removes FIXME too

Change-Id: Ia2ec35099f0b54ea39c2f6b9371e94c3034bddb0
2014-11-10 18:32:57 +00:00
MZMcBride
627ccbcd7b Minor code comment tweaks for spelling and consistency
Change-Id: I51391f45d0f81e4245ccc0e435a71ccd5b0e3ca3
2014-11-08 14:07:19 -05:00
Bartosz Dziewoński
565e9fa077 Correctly parse <indicator/> contents, Parser rejiggering
includes/parser/Parser.php
  * Pull out a chunk of code we need to reuse from parse() to
    internalParseHalfParsed(). This is a fully backwards-compatible
    change.

    Code changes:
    * Add a guard for running ParserBeforeTidy and ParserAfterTidy
      hooks, as extensions might not expect them to be called for
      snippets, only full page content.
    * Change $options to $this->mOptions.

    The bulk of parsing work is now done in internalParse() and
    internalParseHalfParsed(), parse() only handles four things:
    * Resetting parser state when a parse starts/finishes
    * Page title language conversion
    * Outputting limit report and limitation warnings
    * Running ParserAfterParse hook (dunno why, but it's documented)

  * Expand documentation for recursiveTagParse(), with some uppercase
    warnings so that no one does the stupid thing I did ever again.

  * Add new public method recursiveTagParseFully(), which is a
    recursive parser entry point that produces fully parsed HTML ready
    for inclusion in HTML output. Compared to Parser::parse(), it
    doesn't produce limit reports and doesn't run the ParserAfterParse
    hook.

includes/parser/CoreTagHooks.php
  * Use the new recursiveTagParseFully() method.
  * Use Parser::stripOuterParagraph() to remove silly tags.

Bug: 72887
Change-Id: I89ae9a50b82245f9a9e4a903563aeb1c51b6103e
2014-11-04 10:25:58 +00:00
Reedy
8e6fa108b8 or -> ||
Change-Id: Ic591f06f70c68bb2912b7f028f7f988eb658375d
2014-10-24 11:26:14 +01:00
Chad Horohoe
6c30fff0ba Swap and for &&
Change-Id: I7821a62586cc2d2f929fb3d7d5046958a70efbd0
2014-10-23 13:03:14 -07:00
Tim Starling
ce8e466e44 Revert "Use a fixed regex for StripState"
Breaks extensions, doesn't entirely fix the problem it was meant to fix.

This reverts commit 6da3f169ac.

Change-Id: Ic193abcff8c72b0c8b434fcac514f88603a45beb
2014-10-20 21:42:53 +00:00
jenkins-bot
cf93d76c03 Merge "Remove hitcounters and associated code" 2014-10-20 21:12:54 +00:00
Chad Horohoe
90d90dad6e Remove hitcounters and associated code
The hitcounter implementation in MediaWiki is flawed
and needs removal. For proper metrics, it is suggested to use
something like Piwik or Google Analytics.

RFC: https://www.mediawiki.org/wiki/Requests_for_comment/Removing_hit_counters_from_MediaWiki_core
Change-Id: I0e5006a7e8a09c800f8fa4effa9399e8afdd7a57
2014-10-20 13:01:55 -07:00
Tim Starling
6da3f169ac Use a fixed regex for StripState
The JIT compiler in newer versions of PCRE experiences lock contention
when multithreaded applications perform a high rate of concurrent
compilations. We are seeing some performance impact on HHVM under normal
production traffic.

The random part of the strip marker is just there to protect against
deliberate insertion of strip markers into the source text, which is
very rare. So use a generic regex to find strip markers, and check in
the callback whether the random state ID is correct.

StripState::killMarkers() will be slower when it has to remove many
strip markers, but most calls to it will not match any strip markers, so
overall performance should be improved due to reduced JIT compilation.

Bug: 72205
Change-Id: I8d37ae929a8c669c9e39adc8096b89e5732b68d0
2014-10-19 14:38:09 -07:00
Gergő Tisza
382d4df858 Move addTrackingCategory from Parser to ParserOutput
addTrackingCategory is more in line with ParserOutput's functionality
(addLink, addCategory etc), and tracking categories are useful even for
content types which do not use the parser at all. There is no reason to
require the caller to obtain a Parser object just to be able to add
tracking categories.

Change-Id: I89d9ea1db3a4e6486e77eee940bd438f7753b776
2014-09-28 23:35:52 +00:00
jenkins-bot
0755177e64 Merge "Add parser callback to get a page's current revision" 2014-09-25 22:52:10 +00:00
Brad Jorsch
8eeb906f93 Break accidental references in Parser::__clone
If you have a reference *to* an object field (anywhere in the call
stack) when you clone the object, the field will be cloned as a
reference rather than as a value.

So we have to break those unexpected references in the cloned object
manually, which is easy enough by making a non-reference copy and then
rebinding the cloned object's reference to this copy.

Bug: 56226
Change-Id: I9c600e9c0845b4fde0366126ce3809d74e2240b4
2014-09-22 13:44:49 -04:00
Jackmcbarn
edc9f2acd9 Add parser callback to get a page's current revision
Add Parser::fetchCurrentRevisionOfTitle(). By default, this just calls
Revision::newFromTitle, but a callback can be set in ParserOptions that
will override it. Anything that runs as part of a parse should use this
wherever possible.

Bug: 70495
Change-Id: I521f1f68ad819cf0f37e63240806f10c1cceef9c
2014-09-19 11:59:58 -04:00
Brad Jorsch
e2c9d4dfa9 Improve/rename Parser::replaceUnusualEscapes
The previous implementation would unescape '&', '=', '+', and '%'. The
first three will break the URL when unescaped in the query string, and
the last will break when unescaped anywhere.

The code is now changed to treat the path, query, and fragment parts of
the URL separately when unescaping. We also escape any unsafe characters
and ensure all percent-encodings use uppercase hexits.

And since the old name is no longer accurate,
Parser::replaceUnusualEscapes is deprecated in favor of
Parser::normalizeLinkUrl.

Bug: 57909
Change-Id: I77dc308d0d016c395ad737c08cf10a7711e25bbd
2014-09-16 23:00:16 +00:00
umherirrender
896f835ea9 Refactor: Use local variables for editsections in Parser
In Parser.php an array was built and then the elements of that array
were used, replaced this by local vars.

In ParserOutput.php also use local vars to make the code more readable.
Also inlined a private callback by using an anonymous function.

Change-Id: I1c31c9e4855f93a8fb65e1c21faba46fcdcb1f4b
2014-09-05 13:33:05 +00:00
This, that and the other
fb7e8b876a Fix URL protocol detection regex for file link= parameter
This regex looked something like /^(?i)bitcoin:|ftp://|ftps://|.../, which
meant the anchoring ^ only applied to the first name. This meant that any
link= value that happened to contain a URL protocol anywhere within it
(e.g. wikinews:Foo containing "news:") got incorrectly matched by this
regex.

Bug: 69317
Change-Id: Ide1c4f64137666db99f8e3b6816df01ef5099c8e
2014-08-16 22:09:42 +10:00
addshore
61c989cfc0 Fix phpcs issues in parser
This fixes all issues except for:
 - class names
 - line length

Change-Id: Ie91b010d5b3eec49d3b80b6e93b125a901ef43c6
2014-08-12 01:00:15 +00:00
jenkins-bot
bfc3710111 Merge "Don't include images/categories when behind a local interwiki prefix" 2014-08-09 11:51:07 +00:00
umherirrender
c332e33c2b Doc: Parser::getTargetLanguage cannot return null
Change-Id: I979d3d5010dc3d0ada3d82ca6d9546c5e800aaec
2014-08-08 21:03:46 +02:00
This, that and the other
9883b2471c Don't include images/categories when behind a local interwiki prefix
This solution is somewhat imperfect, as the logic being added here to
MediaWikiTitleCodec really belongs in the parser. However, given the
current state of this code, this is the cleanest possible solution at
the moment.

Modified the existing release note for this.

Bug: 68802
Change-Id: I38309186bdcad23f49e23beb26daaf3ef5bceea1
2014-08-01 18:20:51 +10:00
umherirrender
dd8921c9d9 Cleanup some docs (includes/[m-r])
- Swap "$variable type" to "type $variable"
- Added missing types
- Fixed spacing inside docs
- Makes beginning of @param/@return/@var/@throws in capital
- Changed some types to match the more common spelling

Change-Id: I8ebfbcea0e2ae2670553822acedde49c1aa7e98d
2014-07-24 19:43:25 +02:00
This, that and the other
e349358a5d No interlanguage links after local interwiki prefixes
This was noticed on enwiki after w: was marked as a local interwiki prefix
there. Links like [[w🇩🇪Foo]] ought to act like [[🇩🇪Foo]], not
[[de:Foo]].

Also adding a number of additional parser tests related to interwiki links.

Bug: 68085
Change-Id: If39af06edb4af2da85c9bcf43df7088181809fcf
2014-07-22 15:01:07 +02:00
umherirrender
de39f3e019 Use some callable hints on @param docs
Callbacks can be given as a string or array, so the hint 'callable' is
used.

Change-Id: I3842606f74c8c3705dffc70bf13e31f44a37fa65
2014-07-03 21:20:35 +02:00
Max Semenik
467f4affd1 New hook, AfterParserFetchFileAndTitle
It is needed for PageImages to collect information about galleries, improving results
for Commons mainspace.

Bug: 66510
Change-Id: I3136d648ef2c1841767db0ab33855cd168e3de3e
2014-07-01 17:40:11 -07:00
Jackmcbarn
c313a75c80 Support {{!}} as a magic word
Add {{!}} as a magic word that expands to a pipe. Parsoid already does
this, so we know it isn't going to cause major breakage.

Change-Id: I1f857417d224d6443504074a5add852df3975b89
2014-06-26 14:56:04 -07:00
jenkins-bot
ddeadfc49b Merge "Prevent OutputPage::addWikiText and friends from causing UNIQ fails" 2014-06-26 09:25:19 +00:00