Commit graph

1280 commits

Author SHA1 Message Date
Timo Tijhof
d06855ecbe Parser: Say tildes instead of ~~~ in comment to fix Doxygen fatal
Doxygen was unable to parse the file past validateSig().

> Parser.php:6397: warning: reached end of file while inside a ~~~ block!
> The command that should end the block seems to be missing!

Change-Id: I3d1b547968302611d2bd78a7c11dd0738b40d23a
2015-04-06 12:32:25 +00:00
Max Semenik
08762b02de Minor cleanups
* Declare undeclared variables
* Kill unused variables
* Fix comments including PHPDoc

Change-Id: I60015f6b6740aa9088bda3745f4dc4e65e29fcb1
2015-04-02 16:22:42 +00:00
jenkins-bot
d1adda0be9 Merge "Fixed {{REVISION(TIMESTAMP|USER|SIZE)}} on new revisions" 2015-03-31 18:00:30 +00:00
Aaron Schulz
ac0de3c430 Fixed {{REVISION(TIMESTAMP|USER|SIZE)}} on new revisions
* This makes use of the injected new revision object used elsewhere
  in Parser to solve this problem.

Bug: T94407
Change-Id: I7881583cf7cb2bc799c89ffaa2a344a2d4ca3a4e
2015-03-30 21:10:09 -07:00
Kunal Mehta
13975fe76a Use wikimedia/utfnormal library, add backwards-compatability layer
This drops support for the custom utf8 normal PHP extension in favor
of the intl extension.

Bug: T90825
Change-Id: Ifbaeb2ef684217cf6187ccc4fb4d303f89608300
2015-03-24 12:59:26 -07:00
Arlo Breault
78c3f2f4b1 Tidy up tidy usage
* There's a branch path in the sanitizer that depends on $wgUseTidy,
   which means the test output differs from on wiki.

 * In general, we should set these variables to match the wiki behaviour
   in tests.

 * Exposes T92892, Sanitizer removes empty tags when tidy is disabled.

 * Tweaked tests for T19663 to use an extension tag to show that
   HTML5 tags with non-word characters make it through the parser
   intact (before being ultimately sanitized).

Change-Id: I09c72fd739e11a8b757f37dc4c790758d782ad73
2015-03-16 16:33:50 -04:00
jenkins-bot
4e004124cd Merge "Removed obsolete "containsOldMagic" code" 2015-03-04 06:02:04 +00:00
Chad Horohoe
c33f4de066 Profile all external HTTP requests from MW
Change-Id: Ie980b080da2ef21ec7d9fc32f1accc55710de140
2015-03-03 20:54:30 -08:00
daniel
95c85f71b1 Remove getSecondaryDataUpdates and friends from ParserOutput.
This is a hard deprecation, with getSecondaryDataUpdates returning an
empty array and addSecondaryDataUpdate throwing an exception. This seems
prudent since there are no known users of these methods, and they
interfere with the parser cache:

DataUpdates are basically jobs, they need access to services to
function. That makes them inherently non-serializable. This interferes
with the function of the parser cache, which serializes ParserOutput
objects in order to persist them.

This could be solved by splitting DataUpdates into DataUpdateDefinitions
and DataUpdateHandlers, similar to how JobSpecification works with
wgJobClasses. That however seems pointless and overkill, since
ParserOutput already has a mechanism for storing arbitrary data,
including any info needed by an UpdateJob: the setExtensionData method.

After this change, the preferred method to introduce custom data updates
is to store any relevant data using setExtensionData and 
implement Content::getSecondaryDataUpdates() if possible. If not,
use the 'SecondaryDataUpdates' hook to construct the necessary update
objects from the info stored using setExtensionData.

Change-Id: I0f6f49e61fa3d8904e55f42c99f342a3dc357495
2015-02-24 11:01:16 +01:00
jenkins-bot
cb4f6e9341 Merge "Removed doCascadeProtectionUpdates method to avoid DB writes on page views" 2015-02-23 01:18:05 +00:00
Aaron Schulz
df5ef8b5d7 Removed doCascadeProtectionUpdates method to avoid DB writes on page views
* Use special prioritized refreshLinksJobs instead, which triggers when
  transcluded pages are changed
* Also added a triggerOpportunisticLinksUpdate() method to handle
  dynamic transcludes

bug: T89389
Change-Id: Iea952d4d2e660b7957eafb5f73fc87fab347dbe7
2015-02-22 13:36:13 -08:00
Erik Bernhardson
e73f17527e Correct misleading documentation
Change-Id: Ib020467488616eeaa9b53672e5cc45c72f240a54
2015-02-20 19:55:11 +00:00
Aude
2664ccdc43 Revert "Removed doCascadeProtectionUpdates method to avoid DB writes on page views"
due to breakage at least in phpunit tests for mysql:

https://travis-ci.org/wikimedia/mediawiki-extensions-Wikibase/jobs/51490784

This reverts commit 132f7bb89f.

Change-Id: I85d19ab5ad30e8d13a956d7b7467a94c9e73219d
2015-02-20 13:17:41 +00:00
Aaron Schulz
132f7bb89f Removed doCascadeProtectionUpdates method to avoid DB writes on page views
* Use special prioritized refreshLinksJobs instead, which triggers when
  transcluded pages are changed
* Also added a triggerOpportunisticLinksUpdate() method to handle
  dynamic transcludes

bug: T89389
Change-Id: I8e5a6ddb643c12e0fb5c1c68bc83f912944e6e8d
2015-02-20 03:16:18 +00:00
Jackmcbarn
cab99af90e Fix TOC anchor name collisions in edge cases
Currently, the parser adds a "_2" to the second of two identical headlines to
avoid collisions, but there's still a collision if another headline actually
ends in "_2". This change causes the new headline to also be checked for a
collision, and advances to "_3" or beyond if there is one.

Bug: T26787
Change-Id: Id0a55aa4c1917bac2f8f0d4863fcb85bd3dff1ca
2015-02-17 20:59:33 +00:00
Aaron Schulz
4111ff0dc3 Removed obsolete "containsOldMagic" code
Change-Id: Id225347e0599a6f79b30b0793cce7d97daed46f2
2015-02-15 14:41:49 -08:00
Timo Tijhof
d62a2b76b1 Replace dev.w3.org with more permanent or stable urls
* Sanitizer: dev.w3.org/html5/spec-preview
  Follows-up 8e8b15afc6.
  Use stable reference to www.w3.org/TR/html5 instead (currently
  from October 2014) instead of an old preview branch from 2012.

* parserTests: dev.w3.org/html5
  Follows-up 959aa336a1.
  Url is now a dead end. Replaced with link to a draft from around
  that time. The relevant section no longer exists in the curent
  spec as it got split off into a separate spec. Maybe this one:
  https://url.spec.whatwg.org/#percent-encoded-bytes

* Parser, HTMLIntField: dev.w3.org/html5
  Use stable reference to www.w3.org/TR/html5 instead.

* HTMLFloatField.php: dev.w3.org/html5
  Url is now a dead end. Draft from around that time:
  http://www.w3.org/TR/2011/WD-html5-20110525/common-microsyntaxes.html#real-numbers
  The section "Real numbers" no longer exists in the current spec,
  but the Infrastructure chapter has a section on floating point
  numbers that describes the same sequence now.

Change-Id: I7dcd49b6cd39785fb1b294e4eeaf39bda52337b2
2015-02-14 14:21:33 +00:00
Sam Reed
f41e2ddb6a Don't split regex string unnecessarily
Change-Id: Id5912e64916ce5c7be2991478c32531596917540
2015-01-28 16:17:41 +00:00
m4tx
aa72c4e0d2 Add missing documentation in DateFormatter.php
Change-Id: Ic5c04bdb88bc57a7c44159d7858ef791c24354c4
2015-01-26 17:58:50 +00:00
Kunal Mehta
247ecab445 SpecialTrackingCategories: Read from the extension registry
This demonstrates how we can transition from extensions putting
things into the global scope ($wgTrackingCategories) to instead
storing them in the extension registry. This will increase the
overall performance of the extension registry since it no
longer needs to do an array_merge with $wgTrackingCategories.

For extensions already converted to using the registry
no change is needed as the schema is still the same.

Change-Id: Ie0df4c20b123dac784a1c02eb991edc609a911b6
2015-01-23 10:33:45 -08:00
Aaron Schulz
6921770414 Updated some try-catch statements: MWException -> Exception
Change-Id: I76601a86e30f4984e3b1a8c8ec5ef5a0f652433a
2015-01-09 17:20:22 -08:00
daniel
f10b8df598 Fix ApiStashEdit wrt custom DataUpdates.
My previous patch broke this: ApiStashEdit would stash ParserOutput
with no custom DataUpdates, but calling getSecondaryDataUpdates still
failed after unserialization. This patch should fix that.

Bug: T86305
Change-Id: Ic114e521c5dfd0d3c028ea7d16e93eace758deef
2015-01-09 19:19:13 +00:00
jenkins-bot
dfc2775848 Merge "Skip ApiStashEdit if custom DataUpdates are present." 2015-01-09 16:22:25 +00:00
daniel
d509361e67 Skip ApiStashEdit if custom DataUpdates are present.
Bug: T86305
Change-Id: I423ba39a46a08edf2862b8439169ff91338fb6eb
2015-01-09 15:51:15 +00:00
Ricordisamoa
2ae155da52 Fix phpcs errors in includes/
Mostly Squiz.WhiteSpace.SuperfluousWhitespace.EmptyLines

Change-Id: I678b2f0902f11cd1dfa1611b9da24e7237df9122
2015-01-08 20:15:07 +01:00
Aaron Schulz
4ff8136807 Removed remaining profile calls
Change-Id: I31c81c78715048004fc8fca0f27d09c1fa71c118
2015-01-08 02:49:33 -08:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
jenkins-bot
7746e1458b Merge "Use preview content when it transcludes itself" 2014-12-31 16:19:24 +00:00
Jackmcbarn
779f1024c1 Use preview content when it transcludes itself
When a page transcludes itself, such as <noinclude>foo
{{:{{FULLPAGENAME}}}}</noinclude><includeonly>bar</includeonly>, use the
preview content in its own transclusions. This code was basically ripped
straight from Extension:TemplateSandbox.

Bug: T85408
Bug: T7278
Change-Id: I1aa091a395a4f7b7b744e09e0bed59bc2e1176d0
2014-12-30 12:59:16 -05:00
Amir E. Aharoni
144d741196 Shorten lines to pass phpcs test
Change-Id: I5588e1f16f1a23d77160cd180058bd2000a93ab6
2014-12-29 17:14:08 +02:00
Derk-Jan Hartman
e20e64eb6b Parser: Add <bdi> to the whitelist for TOC links
Bug: 72884
Change-Id: Id5aa9a4eb32fb185881141e55de700ae36f806c5
2014-12-27 21:24:42 +01:00
Reedy
4d9143c7f5 Add lots of @throws
Change-Id: I09d0c13070f966fcf23d2638d8fc1328279a5995
2014-12-24 13:49:20 +00:00
C. Scott Ananian
54a8199f87 Don't allow embedded newlines in magic links, but do allow &nbsp;
This continues the work started in T67278 to make magic link parsing
more consistent with wiki text parsing in general, and closes two
long-standing bugs.

Bug: T30950
Bug: T31025
Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
2014-12-22 04:14:55 +00:00
Jackmcbarn
c05b4c9bc4 Re-emit unknown tags from #tag
When #tag is given a tag that it doesn't recognize, re-emit it as a
regular tag instead of giving an error. This allows for it to be used with
transparent tags and HTML tags.

Change-Id: I0ceee8a4fdaf2d3142054a108f445ff06597c31a
2014-12-18 23:06:22 -05:00
jenkins-bot
389e30373c Merge "Don't break autolinks by stripping the final semicolon from an entity." 2014-12-19 00:41:56 +00:00
Ori Livneh
6138e86945 Revert "Simplify MWTidy"
This is broken, for reasons indicated in
<https://gerrit.wikimedia.org/r/#/c/180384/>. It was broken before, but I made
it more broken. So revert for now, and I'll give this another stab.

Change-Id: I7e67a61f7d6370f90487be6470bebe1449432a4c
2014-12-18 14:58:18 -08:00
C. Scott Ananian
b975a0bfe0 Don't break autolinks by stripping the final semicolon from an entity.
Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link.  But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.

Fix this corner case.  This also unifies autolink parsing with Parsoid.

See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
2014-12-18 17:27:55 -05:00
jenkins-bot
d34a6ca677 Merge "Fix some stuttering in comments and documentation" 2014-12-17 22:28:27 +00:00
Ricordisamoa
12dec5d85d Fix some stuttering in comments and documentation
Change-Id: I9c0088b9aab37335203cad45a1d6fa8ac3f43321
2014-12-17 19:44:10 +00:00
Brad Jorsch
5c1eeb2464 Normalize "\r" newlines in preSaveTransform
The behavior of the different preprocessors differs when given \r or
\r\n newlines. We already normalize the latter here, so may as well do
the former here too.

Bug: T78488
Change-Id: Id6390f64a73ea01088729f25d79103388c1fe7e8
2014-12-15 16:59:42 -05:00
Yuri Astrakhan
573b1194b8 Minor spelling comment fix
Change-Id: Ic56f4e73e56e6dca4825c93b0a95f4d9de835fd4
2014-12-14 09:41:46 +00:00
C. Scott Ananian
25d35fc65c Enforce spaces around magic links (RFC, PMID, and ISBN).
Ensure that there is a \b boundary before and after RFC, PMID, and ISBN
links.  (Previously we enforced \b boundaries only before free external
links and after ISBN links.)  Consistency is a good thing!

In addition:
* \b is not a PHP escape sequence, so you don't need to write \\b inside
  a string.
* \b before the numeric part of an ISBN is pointless: by the structure
  of the regexp there will always be a space on the left and a word
  character (a digit) on the right.

Bug: 65278
Change-Id: Ic315b988091a5c7530a8285b9249804db72e55db
2014-12-11 03:41:23 +00:00
Aaron Schulz
e369f66d00 Replace wfRunHooks calls with direct Hooks::run calls
* This avoids the overhead of an extra function call

Change-Id: I8ee996f237fd111873ab51965bded3d91e61e4dd
2014-12-10 12:26:59 -08:00
Aaron Schulz
6a1d9c8ddc Fixed internalClean class/method existence check for HHVM
* Follows up 4f281083fd

Change-Id: I5fa406ed1c4f2eefd1c22e9ab90e72655f31d162
2014-12-10 19:04:58 +00:00
Bryan Davis
4f281083fd hhvm: Check for tidy function instead of class
Bug: T78166
Change-Id: Ie60e23ffbafd698a3458eed1efce92d54c8d0c2a
2014-12-10 11:08:18 -07:00
Ori Livneh
98c2703f81 Simplify MWTidy
* Make the internal MWTidy::*clean() functions always return an array of two
  elements: the output buffer and the error buffer.
* Make MWTidy::externalTidy() always read both stdout and stderr. We can read
  stderr after stdout because tidy.c produces output in the same order.
* Remove the $stderr parameter from the private MWTidy::*clean() methods, since
  error output is always returned.
* Merge MWTidy::phpClean and MWTidy::hhvmClean, since the difference between
  them is now small enough that splitting them up is not warranted.
* On HHVM, MWTidy::internalTidy() always returns an empty string for the error
  buffer.

Change-Id: I178b42d6ebdd1a5b9bd5921eb093a6c5014ffa49
2014-12-09 16:43:08 -08:00
Aaron Schulz
17431af154 Reuse page preview parses by using the edit stash system
* This also changes previews to render section edit tokens but
  remove them on output, avoiding cache fragmentation.
* Also shortened the resulting getStashKey() value.

Change-Id: Ic8fa87669106b960c76912b864788b781f6ee2e6
2014-12-09 22:43:01 +00:00
umherirrender
489d793882 Fixed spacing
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file

Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
2014-12-05 22:28:07 +01:00
jenkins-bot
b950468ceb Merge "Use HHVM+EZC internal tidy" 2014-11-28 08:25:30 +00:00
Tim Starling
e6fdbfec47 Use HHVM+EZC internal tidy
EZC doesn't currently support direct access to object properties via the
obj->std.properties hashtable, but tidy uses this extensively. But it
turns out that for production use cases, tidy_repair_string() should be
sufficient. $wgDebugTidy and $wgValidateAllHtml are not used, and
no deployed extension calls MWTidy::checkErrors().

The only difference I know of is that errors from tidy (status==2) lead
to the tidy output being used, rather than discarded. But
TY_(ReportFatal) has very few callers in tidylib -- probably none that
are reachable from stripped parser output.

So, throw an exception if MWTidy::checkErrors() is requested on an HHVM
instance with the tidy extension. For MWTidy::tidy(), use
tidy_repair_string(). Refactor some relevant code.

Bug: T758
Change-Id: I8d5b1c2c9f9ddce46d8ad099a671a2e297d256e0
2014-11-28 09:47:25 +11:00