Commit graph

783 commits

Author SHA1 Message Date
Amir E. Aharoni
bd30ccd795 Make lines shorter to pass phpcs in some files under includes/parser
This doesn't fix all the files under includes/parser -
some of them deserve their own patches.

Bug: T102614
Change-Id: I2fcbc19ee337e1b7db4635b5e5f324c651b4d144
2015-09-26 18:19:11 +00:00
jenkins-bot
f244389571 Merge "Load module mediawiki.page.gallery.styles for all ImageGalleries" 2015-09-24 18:22:29 +00:00
umherirrender
27700d276f Load module mediawiki.page.gallery.styles for all ImageGalleries
Move the added module from Parser.php to TraditionalImageGallery,
because there the gallerybox class is added to the html and at the
moment all core image galleries are extending the traditional one.

That brings the styles back for special pages like Special:NewFiles,
Special:MostImages and also on category pages with media files.

Follows Ib1aef04dc4fece78e6615386ecaef6a9f368f49e

Bug: T113511
Change-Id: I32697c2c65824d7622c1840330d6074ebb68b488
2015-09-24 16:46:47 +02:00
jenkins-bot
889283b16e Merge "Add MWTimestamp::getTimezoneString(), use it in file revert message" 2015-09-23 20:53:50 +00:00
gladoscc
90e1b22166 Add MWTimestamp::getTimezoneString(), use it in file revert message
MWTimestamp::getTimezoneString() returns the timezone name as a message,
that supports wiki localization. The code is moved from Parser::pstPass2.

The default file revert message is currently always in UTC.

This patch sets the default timestamp to be in the wiki timezone (similar
to ~~~~). The timezone is passed as a new parameter to the message, with
the date / time parameters being merged and handled by
$wgContentLang->timeanddate

Bug: T36948
Change-Id: I48772f5f3b1635d33b6185776cedfc4ee1882494
2015-09-23 13:38:16 -07:00
C. Scott Ananian
a05971dfc7 Terminate free external link on &nbsp; (and numeric versions of <>)
Bug: T84937
Change-Id: Ic74d8d069e08c0597c7b26755e0d942bf3a510cc
2015-09-23 16:00:52 -04:00
Tim Starling
2c6c954e23 Abstract and refactor Tidy support
* Split tidy implementations into a class hierarchy
* Bring all tidy configuration into a single associative array and
  deprecate the old configuration.
* Remove $wgAlwaysUseTidy

This is preparatory to replacement of Tidy (T89331). I used the name
"Raggett" for things relating to Dave Raggett's Tidy, since if we use
"tidy" to mean the new abstract system as well as Raggett's tidy, it
gets confusing.

Change-Id: I77af1a16cbbb47fc226d05fb9aad56c58e8910b5
2015-09-10 20:18:52 -07:00
Ori Livneh
a7056e3c69 Measure string length once in Parser::replaceVariables
Change-Id: I5b1e3f3fa06cb4e2982f3c0d24222ba2ee59ea47
2015-09-09 23:52:32 +00:00
Kunal Mehta
7a1b87e543 Really actually fix the typo in Parser.php
Change-Id: I9d9c6f13095087ac2c2c6693c6bd1613219bf658
2015-08-28 21:39:15 -07:00
Aaron Schulz
868d11684b Fixed parser report typo
Change-Id: Ia549f4e1932bc1196e840e154b8d6fb0b608d10d
2015-08-28 17:06:45 -07:00
Ori Livneh
26ff3e2946 Add ParserOutput cache and expiry times to NewPP report
The labels are not localized, because I think this ought to be outputted as a
JSON blob, with uniform field names. But not doing that in this patch.

Change-Id: I235839b276632308ddeac7afe763d355b73c2a25
2015-08-27 19:07:35 -07:00
jenkins-bot
d5564b17b5 Merge "Only load gallery styling rules when galleries are on the page" 2015-08-26 22:06:18 +00:00
jdlrobson
c845586dc7 Only load gallery styling rules when galleries are on the page
* Double load styling rules in legacy modules so we have time for
cached pages to catch up
** Double loading styles is acceptable for 30 days. There is no better way.
* Load gallery css when gallery tag invoked.

To test:
* Visit a page with a gallery tag and purge it, note styles are present.
* Visit a page without a gallery tag and purge it, note styles are not present

Bug: T98878
Change-Id: Ib1aef04dc4fece78e6615386ecaef6a9f368f49e
2015-08-26 13:20:15 -07:00
jenkins-bot
8baba70fb4 Merge "Tiny clean up of Parser::doQuotes()" 2015-08-21 21:31:01 +00:00
Pavel Astakhov
78c66e6467 Tiny clean up of Parser::doQuotes()
$firstsingleletterword always is -1 here
because we leave the loop when it's set

Change-Id: I73a430b7ac650bc5919ab95867eec09f723395f2
2015-08-19 21:27:20 +00:00
C. Scott Ananian
87eebf8dd5 Support IPv6 URLs in bracketed and auto links.
The corresponding patch for Parsoid is
Ibb33188cdfe2004e469c3f6ee6f30d34d1923283.

Task: T23261
Change-Id: Iff077bf31168b431febb243e2e62f2c6502616bc
2015-08-18 22:50:58 +00:00
jenkins-bot
9331443546 Merge "Allow to enable OOUI via a parser tag extension" 2015-08-06 08:39:41 +00:00
Bartosz Dziewoński
bd7e02f39f Parser: Don't generate an external link on "http://)" and similar
Bug: T105697
Change-Id: I6cd14b9c4a541af8d0bb50b925aa0b015e97c3fe
2015-08-04 12:23:07 -04:00
Florian
2d50e28975 Allow to enable OOUI via a parser tag extension
This change adds the possibility to enable OOUI out of the parser,
which enabled parser tag functions to easily enable OOUI, if they
need it, for every page view out of the function that handles the
parser tag.

Bug: T106949
Change-Id: If1e139d4f07be98e418e11470794ea42e8a9b2eb
2015-07-25 17:36:33 +02:00
Arlo Breault
0b4208e645 Allow whitespace between indent and table start tag
* \s matches the trim on the line.

 * Since leading space is ok for table start tags, and you can use them
   in ":" context, you should be able to compose the two together.

Bug: T105238
Change-Id: Id08e24e5dd2bb8ca09453adec87b21225df4a840
2015-07-18 20:41:33 +00:00
Chad Horohoe
b8ced862bb Protect against non-text output from StripState going into Title::newFromText()
Non-string input shouldn't be fed into newFromText(). We currently handle this
indirectly with relying on Title to do it. Instead just return earlier and not
try to construct a title from bad input.

Bug: T102321
Change-Id: I9bc96111378d9d4ed5981bffc6f150cbd0c1e331
2015-07-10 20:05:06 +00:00
Arlo Breault
ba00a957fb Cleanup in doTableStuff
Change-Id: I75c0a943b24f96a30c6ee1efc3f0b11388f892b7
2015-07-09 04:57:52 +00:00
Brad Jorsch
359e77d7c9 Parser: Avoid producing <span></span> in the TOC
If someone renames a section but wants old targeted links to still work,
<span id="old-anchor"></span> is the usual solution. And sometimes
people put it inside the section header markup, like

 == <span id="old-anchor"></span>New name ==

since putting it before makes it be considered part of the previous section
while putting it after causes the browser to scroll the section header
off the screen.

But this has the unfortunate side effect that the TOC text for that
section will be "<span></span>New name". We should strip that useless
empty span.

Bug: T96153
Change-Id: I47a33ceb79d48f6d0c38fa3b3814a378feb5e31e
2015-07-08 17:11:21 +00:00
Bartosz Dziewoński
e688bea6a5 Parser: Correct setHook() documentation
Change-Id: Iaeaac9ea79b696dfa39adb6608ed68edd3754516
2015-06-30 19:02:42 +00:00
umherirrender
70f3afd548 Remove unneeded empty lines at begin of if/else/foreach body
An if body must not begin with an empty line

Change-Id: I62b058be337fcc85a120fcd3dadce564db59a271
2015-06-19 20:05:45 +02:00
Vivek Ghaisas
9f5b6f5aeb Fix whitespace issues around parentheses
Fix issues found by MediaWiki.WhiteSpace.SpaceyParenthesis sniff.

Bug: T102617
Change-Id: Iec7f71e64081659fba373ec20d9d2006306a98f4
2015-06-16 22:14:02 +03:00
jenkins-bot
0e1c80e6e1 Merge "Check result of preg_match_all in Parser.php" 2015-06-02 22:08:42 +00:00
Ori Livneh
12571bde26 Use a fixed marker prefix string in the Parser and MWTidy
Generating one-time, unique strip markers hurts us in multiple ways:

* The strip marker regexes don't benefit from JIT compilation, so they are
  slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
  compiled, because HHVM bets on regexes getting reused. This extra work is
  fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
  displaces from the cache regexes which are in fact reused.

Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:

* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
  complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
  replace any occurences of \x7f with '?', to prevent strip marker forgery.
  \x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
  prefix may no longer be specified.

Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
2015-05-31 19:33:36 -07:00
umherirrender
c430850154 Check result of preg_match_all in Parser.php
preg_match_all can return false on failure, which than results in
undefined index access.
Check the result and just keep it as nothing found by processing an
empty array

Change-Id: I1f11894240dc6869506d68d3513715abdc3abb5d
2015-05-29 05:16:08 +00:00
Jackmcbarn
62c3fe221f Allow running code during unstrip
When adding strip markers, allow closures to be passed in place of text.
The closure is then called during unstrip. Also, add a hook that runs
after unstripGeneral. This is needed for Extension:Cite's I0e136f952.

Change-Id: If83b0623671fd67e5ccc9deaaaab456a6679af8f
2015-05-13 02:44:20 +00:00
Timo Tijhof
d06855ecbe Parser: Say tildes instead of ~~~ in comment to fix Doxygen fatal
Doxygen was unable to parse the file past validateSig().

> Parser.php:6397: warning: reached end of file while inside a ~~~ block!
> The command that should end the block seems to be missing!

Change-Id: I3d1b547968302611d2bd78a7c11dd0738b40d23a
2015-04-06 12:32:25 +00:00
Max Semenik
08762b02de Minor cleanups
* Declare undeclared variables
* Kill unused variables
* Fix comments including PHPDoc

Change-Id: I60015f6b6740aa9088bda3745f4dc4e65e29fcb1
2015-04-02 16:22:42 +00:00
Aaron Schulz
ac0de3c430 Fixed {{REVISION(TIMESTAMP|USER|SIZE)}} on new revisions
* This makes use of the injected new revision object used elsewhere
  in Parser to solve this problem.

Bug: T94407
Change-Id: I7881583cf7cb2bc799c89ffaa2a344a2d4ca3a4e
2015-03-30 21:10:09 -07:00
Chad Horohoe
c33f4de066 Profile all external HTTP requests from MW
Change-Id: Ie980b080da2ef21ec7d9fc32f1accc55710de140
2015-03-03 20:54:30 -08:00
Jackmcbarn
cab99af90e Fix TOC anchor name collisions in edge cases
Currently, the parser adds a "_2" to the second of two identical headlines to
avoid collisions, but there's still a collision if another headline actually
ends in "_2". This change causes the new headline to also be checked for a
collision, and advances to "_3" or beyond if there is one.

Bug: T26787
Change-Id: Id0a55aa4c1917bac2f8f0d4863fcb85bd3dff1ca
2015-02-17 20:59:33 +00:00
Timo Tijhof
d62a2b76b1 Replace dev.w3.org with more permanent or stable urls
* Sanitizer: dev.w3.org/html5/spec-preview
  Follows-up 8e8b15afc6.
  Use stable reference to www.w3.org/TR/html5 instead (currently
  from October 2014) instead of an old preview branch from 2012.

* parserTests: dev.w3.org/html5
  Follows-up 959aa336a1.
  Url is now a dead end. Replaced with link to a draft from around
  that time. The relevant section no longer exists in the curent
  spec as it got split off into a separate spec. Maybe this one:
  https://url.spec.whatwg.org/#percent-encoded-bytes

* Parser, HTMLIntField: dev.w3.org/html5
  Use stable reference to www.w3.org/TR/html5 instead.

* HTMLFloatField.php: dev.w3.org/html5
  Url is now a dead end. Draft from around that time:
  http://www.w3.org/TR/2011/WD-html5-20110525/common-microsyntaxes.html#real-numbers
  The section "Real numbers" no longer exists in the current spec,
  but the Infrastructure chapter has a section on floating point
  numbers that describes the same sequence now.

Change-Id: I7dcd49b6cd39785fb1b294e4eeaf39bda52337b2
2015-02-14 14:21:33 +00:00
Sam Reed
f41e2ddb6a Don't split regex string unnecessarily
Change-Id: Id5912e64916ce5c7be2991478c32531596917540
2015-01-28 16:17:41 +00:00
Aaron Schulz
6921770414 Updated some try-catch statements: MWException -> Exception
Change-Id: I76601a86e30f4984e3b1a8c8ec5ef5a0f652433a
2015-01-09 17:20:22 -08:00
Ricordisamoa
2ae155da52 Fix phpcs errors in includes/
Mostly Squiz.WhiteSpace.SuperfluousWhitespace.EmptyLines

Change-Id: I678b2f0902f11cd1dfa1611b9da24e7237df9122
2015-01-08 20:15:07 +01:00
Aaron Schulz
4ff8136807 Removed remaining profile calls
Change-Id: I31c81c78715048004fc8fca0f27d09c1fa71c118
2015-01-08 02:49:33 -08:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
Amir E. Aharoni
144d741196 Shorten lines to pass phpcs test
Change-Id: I5588e1f16f1a23d77160cd180058bd2000a93ab6
2014-12-29 17:14:08 +02:00
Derk-Jan Hartman
e20e64eb6b Parser: Add <bdi> to the whitelist for TOC links
Bug: 72884
Change-Id: Id5aa9a4eb32fb185881141e55de700ae36f806c5
2014-12-27 21:24:42 +01:00
C. Scott Ananian
54a8199f87 Don't allow embedded newlines in magic links, but do allow &nbsp;
This continues the work started in T67278 to make magic link parsing
more consistent with wiki text parsing in general, and closes two
long-standing bugs.

Bug: T30950
Bug: T31025
Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
2014-12-22 04:14:55 +00:00
C. Scott Ananian
b975a0bfe0 Don't break autolinks by stripping the final semicolon from an entity.
Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link.  But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.

Fix this corner case.  This also unifies autolink parsing with Parsoid.

See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
2014-12-18 17:27:55 -05:00
Brad Jorsch
5c1eeb2464 Normalize "\r" newlines in preSaveTransform
The behavior of the different preprocessors differs when given \r or
\r\n newlines. We already normalize the latter here, so may as well do
the former here too.

Bug: T78488
Change-Id: Id6390f64a73ea01088729f25d79103388c1fe7e8
2014-12-15 16:59:42 -05:00
C. Scott Ananian
25d35fc65c Enforce spaces around magic links (RFC, PMID, and ISBN).
Ensure that there is a \b boundary before and after RFC, PMID, and ISBN
links.  (Previously we enforced \b boundaries only before free external
links and after ISBN links.)  Consistency is a good thing!

In addition:
* \b is not a PHP escape sequence, so you don't need to write \\b inside
  a string.
* \b before the numeric part of an ISBN is pointless: by the structure
  of the regexp there will always be a space on the left and a word
  character (a digit) on the right.

Bug: 65278
Change-Id: Ic315b988091a5c7530a8285b9249804db72e55db
2014-12-11 03:41:23 +00:00
Aaron Schulz
e369f66d00 Replace wfRunHooks calls with direct Hooks::run calls
* This avoids the overhead of an extra function call

Change-Id: I8ee996f237fd111873ab51965bded3d91e61e4dd
2014-12-10 12:26:59 -08:00
umherirrender
489d793882 Fixed spacing
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file

Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
2014-12-05 22:28:07 +01:00
jenkins-bot
0b8f48c535 Merge "Use Parser::SFH_NO_HASH/SFH_OBJECT_ARGS class const" 2014-11-27 05:55:07 +00:00