Commit graph

985 commits

Author SHA1 Message Date
jenkins-bot
bd78869618 Merge "No yoda conditions" 2018-12-09 01:34:23 +00:00
Alangi Derick
19adaa6a4b parser: Fix PHPDoc annotations in parser module
Change-Id: I09680d72516f943051e86655b5fddf9ff2988e4e
2018-12-08 13:07:10 +00:00
jenkins-bot
4077b57759 Merge "Parse wikitext in gallery caption" 2018-11-27 15:47:50 +00:00
C. Scott Ananian
f87898b488 Protect legacy URL parameter syntax in link and alt options
HTML doesn't allow certain semicolon-less HTML entities in attribute
values to avoid breaking legacy markup like:
  <a href="http://example.com?foo&param=bar">...</a>
(Note that the & in that URL is not properly entity-escaped as `&amp;`.)

Unlike wikitext, HTML generally allows semicolon-less legacy entities
in text.

Our alt and link option processing shove text through
Sanitizer::stripAllTags, which does entity decoding including these
legacy semicolon-less entities.  Wikitext doesn't allow semicolon-less
entities, so escape & characters where appropriate to protect alt/link
options and avoid breaking URLs.

This was a "regression" in how alt options were handled starting in
ddb4913f53 when we switched to using
Remex for Sanitizer::stripAllTags -- semicolon-less entities (previously
invalid in wikitext) were now being decoded when stripAllTags was
called on alt text.  This change became a problem when
ad80f0bca2 sent link option text through
Sanitizer::stripAllTags (with the new semicolon-less entity decode)
instead of PHP's strip_tags (which, in addition to its other faults,
doesn't do entity decode at all).  This suddenly started decoding
"non-wikitext" entities like `&para` inside URLs, breaking links.
Filed T210437 as a follow-up to consider changing the behavior
of Sanitizer::stripAllTags() globally to prevent it from decoding
semicolon-less entities for all callers.

Bug: T209236
Change-Id: I5925e110e335d83eafa9de935c4e06806322f4a9
2018-11-27 10:12:05 -05:00
Fomafix
3ee1560232 No yoda conditions
Replace
  if ( 42 === $foo )
by
  if ( $foo === 42 )

Change-Id: Ice320ef1ae64a59ed035c20134326b35d454f943
2018-11-21 17:54:39 +01:00
jenkins-bot
7a3eb1f3a6 Merge "Hard deprecate codepaths where tidy is disabled" 2018-11-13 23:54:24 +00:00
Brad Jorsch
d65e96b763 Use new externallinks.el_index_60 field
This adds a method to LinkFilter to build the query conditions necessary
to properly use it, and adjusts code to use it.

This also takes the opportunity to clean up the calculation of el_index:
IPs are handled more sensibly and IDNs are canonicalized.

Also weird edge cases for invalid hosts like "http://.example.com" and
corresponding searches like "http://*..example.com" are now handled more
regularly instead of being treated as if the extra dot were omitted,
while explicit specification of the DNS root like "http://example.com./"
is canonicalized to the usual implicit specification.

Note that this patch will break link searches for links where the host
is an IP or IDN until refreshExternallinksIndex.php is run.

Bug: T59176
Bug: T130482
Change-Id: I84d224ef23de22dfe179009ec3a11fd0e4b5f56d
2018-11-12 22:33:18 +00:00
C. Scott Ananian
54ac31f94d Hard deprecate codepaths where tidy is disabled
Future parsers will not support the output generated with tidy disabled.

Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.

No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.

Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
2018-11-05 18:49:16 +00:00
jenkins-bot
9aedec343e Merge "Handle <nowiki> and other markup consistently in image link/alt options" 2018-11-02 01:59:01 +00:00
Max Semenik
c16704c33a Display SVGs in target language
Previously, they were always displayed in defult language unless
forced explicitly in wikitext, e.g. [[File:Foo.svg|lang=ru]].
This change adds a feature flag that would enable always trying to
display in page language.

* If enabled, Parser will pass a new parameter - 'pagelang' - to
  the media handler.
* SvgHandler uses page language when determining what language to
  render the image in.
* 'pagelang' can always be overridden by 'lang'.
* If no translation in page language is available, the default
  language (English) will be used for thumbnail URLs, to prevent
  cluttering media storage and HTTP caches with useless copies.

Performance: this requires accessing image's metadata during parsing.
My testing indicates there were no code path where this wasn't the
case already, so no performance hit is expected, however we should
still keep an eye on page save performance.

Bug: T205040
Change-Id: I348840ef405e1370cc0c17d69051bce30153c9c0
2018-10-30 16:12:11 -07:00
Tim Starling
a6a017cea4 Fix use of non-existent variable Parser::$config
Fix bug from Ib4394f370cb561ccf195338a1c2e9e465dcb3dc3

Add test.

Bug: T208000
Change-Id: Ia81cca1b64afef2af3cb8dff19719a7f0de9d306
2018-10-25 16:27:55 -07:00
C. Scott Ananian
ad80f0bca2 Handle <nowiki> and other markup consistently in image link/alt options
Use Parser::stripAltText() consistently to handle link and alt options
in both Parser::makeImage() and Parser::renderImageGallery().  This
ensures that link option text can use <nowiki> to escape problematic
text so that (for example) the following works:

```
[[File:Foobar.jpg|link=<nowiki>a''b''c</nowiki>|alt=<nowiki>a''b''c</nowiki>]]
<gallery>
File:Foobar.jpg|link=<nowiki>a''b''c</nowiki>|alt=<nowiki>a''b''c</nowiki>
</gallery>
```

Previously the handling of the link option in
Parser::renderImageGallery() used a bespoke `strip_tags` invocation
which didn't replace <nowiki>s, and the handling of the link option in
Parser::makeImage() didn't strip tags at all, nor did it replace
<nowiki>s.  For example, in Parser::makeImage() double quotes in
titles would be converted to embedded `<i>` tags before being passed
to Parser::parseLinkParameter(), with predictably poor results.

Tests added to confirm behavior of alt/link with HTML-escaped
entities and <nowiki>s exposed a bug in Remex: T207088.  Tests
will fail on PHP 7.0 until that is fixed.

Bug: T206940
Depends-On: Ide67bba20f771868c0e119cb2874464dcf1d758a
Change-Id: Ife4c0edaa85e0cb294c5d4c1e31d5d7d828d9df4
2018-10-22 15:26:36 +00:00
jenkins-bot
b3f03fd75e Merge "Inject Config into Parser instead of using globals" 2018-10-17 14:34:31 +00:00
James D. Forrester
976c50c21a Drop ParserLimitReport, deprecated in 1.22
Change-Id: I4898d92569bd823f09c12f68fa186e2e139790a7
2018-10-10 16:20:18 -07:00
Aryeh Gregor
5173d5ee60 Inject Config into Parser instead of using globals
Change-Id: Ib4394f370cb561ccf195338a1c2e9e465dcb3dc3
2018-10-02 21:26:01 +03:00
jenkins-bot
50f6b24ee6 Merge "Parser: Refactor parsing of [[File:...|link=...]] syntax for reusability" 2018-09-26 19:18:34 +00:00
Bartosz Dziewoński
1c9664d18a Parser: Refactor parsing of [[File:...|link=...]] syntax for reusability
Change-Id: I91467297de4b7c532448a4c20b9a0dc8216c7200
2018-09-26 13:36:32 +02:00
C. Scott Ananian
327f0f92fa Use wfIsHHVM() instead of a HipHop-specific environment variable
Change-Id: I5bbf3e4f65d9b6a0d7419f67e3931e77e92b7e6c
2018-09-20 09:23:54 -04:00
daniel
465954aa23 Provide new, unsaved revision to PST to fix magic words.
This injects the new, unsaved RevisionRecord object into the Parser used
for Pre-Save Transform, and sets the user and timestamp on that revision,
to allow {{subst:REVISIONUSER}} and {{subst:REVISIONTIMESTAMP}} to function.

Bug: T203583
Change-Id: I31a97d0168ac22346b2dad6b88bf7f6f8a0dd9d0
2018-09-06 18:33:44 +02:00
Brian Wolff
13e5700b23 Use annotations for taint in Parser & ParserOutput.
This replaces the builtin taints that are removed in
Ic1e1983a51c. Additionally, parse will no longer warn about
double escaping - there's many situations where such warnings
are wrong (e.g. Using Html::rawElement()). However this also
means that Parser::parse( wfMessage( 'foo' )->parse() ); will
no longer give a double escaping warning, which is unfortunate.

Bug: T202380
Change-Id: Ia52d37411beb62b112c6ff102438063c3d750769
2018-08-31 15:55:44 +00:00
jenkins-bot
a3357744c3 Merge "[MCR] Introduce RevisionRenderer" 2018-08-31 11:25:15 +00:00
Kunal Mehta
eb7150b029 Set @param-taint for Parser::internalParse()
This is not strictly accurate, because Parser::internalParse() actually
returns half-parsed HTML, which is not safe for output. But it is safe for
output from a parser tag.

Maybe phan-taint-check plugin needs to learn about half-parsed HTML as an
extra taint type, and make that an acceptable thing for parser tags to return,
but not other things.

But this fixes the failures for the Listings extension, so I think it's
worthwhile in the meantime.

Change-Id: Idf87f5c3dcf81dd210de73a4ff15e3b1aabd9f89
2018-08-30 21:46:10 -07:00
daniel
e9f71517f7 [MCR] Introduce RevisionRenderer
RevisionRenderer is the MCR replacement for Content::getParserOutput,
as outlined in <https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PageUpdater>.

Note: This change also introduces quite a bit of code for
merging ParserOutput objects.

Bug: T194048
Change-Id: I871978bf79f67c9e7954fb3fc8528d6e365f2cc1
2018-08-30 19:15:12 +02:00
jenkins-bot
ba6c827485 Merge "Apply content wrapping in ParserOutput::getText()" 2018-08-29 16:25:22 +00:00
daniel
0dc7ba02b4 Apply content wrapping in ParserOutput::getText()
Instead of applying wrapping the the parser and unwrapping in
ParserOutput::getText(), turn this around and apply wrapping in getText(),
and only if desired.

This avoids search&replace logic for unwrapping, and it also makes it a lot
easier to merge the output of multiple slots for MCR output.

This changes behavior in two hopefully irrelevant ways:
1) the limit report comments will be inside the wrapper div, instead of
following it.
2) if HTML with a wrapper div is explicitly injected into a ParserOutput
object, it will not be possible to unwrap the text.

Bug: T174035
Change-Id: I1641b7995af9bd297f1acd610d583fbf874f34e0
2018-08-29 16:46:25 +02:00
jenkins-bot
e548e0f35c Merge "Make interwiki transclusion use the WAN cache" 2018-08-27 21:17:04 +00:00
Aaron Schulz
6504e23074 Make interwiki transclusion use the WAN cache
This means that now:
* Entries actually get deleted when expired
* The transclusion cache is shared across wikis
* Large blobs that do not fit in cache no longer cause DB errors
* DB writes are not triggered on GET requests
* Keys are hashed and no longer need to be so restrictive

Also, add a "check key" based purge system and process cache the
text/html values similar to how regular revision text is cached.

Bug: T189702
Change-Id: I8ac12b53c02bb26857175dd5a4af29d49e03dc33
2018-08-27 19:32:04 +00:00
Kunal Mehta
2852255186 Inject SpecialPageFactory into Parser
Change-Id: I6a6a94cbdafdc724ce02408cd9e744e7b3eda92b
2018-08-17 12:03:13 -07:00
Kunal Mehta
b7b8f214bb Parser: Call firstCallInit() in getTags/getFunctionHooks
So callers don't need to do this manually. Pointed out by Tim in T201799.

Depends-On: Ia6c36d5a650095e35093bf47e275e081e89b3daf
Change-Id: Ida62767f3ca53f99609cae01d3a20051bb92ccab
2018-08-14 14:16:42 -07:00
Kunal Mehta
e8370d6977 Parser: Add accessors needed by CodeMirror
Change-Id: Ia2d98baf6caed2cd779cb00aceba5785cf13d633
2018-08-13 22:44:48 -07:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
Aryeh Gregor
bca6085920 Use ParserFactory in a bunch of places
I wasn't sure how to convert the rest of the occurrences in core (there
are a significant number).

Bug: T200881
Change-Id: I114bba946cd3ea8a293121e275588c3c4d174f94
2018-08-11 00:16:17 -06:00
Aryeh Gregor
62515f7b15 Introduce ParserFactory service
Bug: T200881
Change-Id: I257e78200983cb10afb76de1f07dd1b9d531c52a
2018-08-11 00:15:52 -06:00
jenkins-bot
8e7f352610 Merge "Update Parser to use ContentLanguage" 2018-08-03 04:52:25 +00:00
Aryeh Gregor
c2f29a2efc Update Parser to use ContentLanguage
Bug: T200246
Change-Id: Ie54677706ec175189c3ff52342a9d8ac2f5d90d8
2018-08-03 04:29:39 +00:00
Kunal Mehta
612f6e5690 Document Parser::$mFirstCall
And don't bother checking its value in clearState(), since firstCallInit()
will do that anyways.

Change-Id: Ibc5e809daa614e99be91d65a363de4f697e6afa5
2018-08-02 01:57:53 -07:00
Aryeh Gregor
5a16d92e04 Update MagicWordArray to use MagicWordFactory
Bug: T200247
Change-Id: Ie5a60b81382d7299beadc691fe4d27e931ebe0ed
2018-07-31 21:40:21 +03:00
Aryeh Gregor
72ab013be0 Update CoreParserFunctions to use MagicWordFactory
Bug: T200247
Change-Id: I122d8acf601581b18756a5b8d65e50953b28c21d
2018-07-31 15:36:57 +03:00
Aryeh Gregor
c3ee073d62 Update Parser to use MagicWordFactory
Bug: T200247
Depends-On: Ie061fe90f9b9eca0cbf7e8199d9ca325c464867a
Change-Id: Iab6a4cb32c491caf2685cdd68f9465ef1dfa3c4c
2018-07-30 21:20:43 +03:00
Brad Jorsch
d469f53847 Parser: Remove style and script tags' content from TOC
We don't want to display the stylesheet as part of the TOC entry if
someone uses TemplateStyles in a heading.

Bug: T198618
Change-Id: I2f7316daaba0cce662b6a4702ab87322e6783655
2018-07-16 22:52:51 -04:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
Fomafix
a60dcdc2e3 Armor against French spaces detection in HTML attributes
This change also solves T13874 in a generic way.

Bug: T5158
Change-Id: Id8cdb887182f346acab2d108836ce201626848af
2018-06-21 19:24:07 +02:00
jenkins-bot
84fa176c9c Merge "Avoid deprecated LinkCache::singleton()" 2018-06-14 23:48:54 +00:00
C. Scott Ananian
7de2c566dd Deprecate Language::markNoConversion, which confuses readers
Language::markNoConversion is used only within Parser.php and differs
from LanguageConverter::markNoConversion in that, contrary to its name
and its namesake, it only protects *things which look like URLs* from
language conversion.

This wasted several days of my time before I realized what was going on.
It's needless; just hoist the "looks like a URL" special casing inline
to the single place where that functionality is used.  (And I wonder
if the "looks like a URL" case is actually needed at all any more,
since most of those cases are probably free external links, which
go through a different code path, not bracketed external links.)

This is a clean-up to the clean-up that liangent performed in 2012
with e01adbfc0b.

Change-Id: I80479600f34170651732b032e8881855aa1204d8
2018-06-13 13:26:58 -04:00
C. Scott Ananian
dbda7cdfb0 Remove unnecessary Parser::getConverterLanguage() indirection
The getConverterLanguage() method was added in March 2012 in commit
561424c266 as a workaround for a regression
in mediawiki 1.19.  It was an indirection which checked the global variable
$wgBug34832TransitionalRollback to return a different converter language
for Chinese wikis.

When this temporary bugfix was reverted in January 2013 in commit
a3fbdaaa2c, the temporary global variable
was removed, but not the getConverterLanguage() indirection.  Since then,
new code in the parser seems to have faithfully used getConverterLanguage()
instead of getTargetLanguage(), even though they are identical and the
need for getConverterLanguage() has long since passed.

Strike a small blow for elegant minimalism by removing the completely
unnecessary Parser::getConverterLanguage() indirection.  Well, sort
of: since this blight has been slowly growing inside Parser.php for
so long, we need to deprecate getConverterLanguage() first just in
case any external dependency has been infected.  Next release we
can finally excise the unnecessary method.

Change-Id: I567c29c9c7699020955699b76cbe8578d02e2fe6
2018-06-12 23:33:03 +00:00
Fomafix
e1630b6a53 PHP: Use short ternary operator (?:) where possible
Change-Id: Idcc7e4fcdd4d8302ceda44bf6d294fa8c2219381
2018-06-11 11:26:35 +02:00
Kunal Mehta
c4e5a9dd97 Avoid deprecated LinkCache::singleton()
Change-Id: Ie0e5c4ef0fe6ec896378bb2433af0898655dd907
2018-06-10 23:55:11 -07:00
Max Semenik
6e956d55aa Replace call_user_func_array(), part 2
Uses new PHP 5.6 syntax like ...parameter unpacking and
calling anything looking like a callback to make the code more readable.
There are much more occurrences but this commit is intentionally limited
to an easily reviewable size.

In one occurrence, a simple conditional instead of trickery was much more readable.

This patch finishes all the easy stuf in the core, the remainder is either unobvious
or would result in smaller readability gains. It will be carefully dealt with in
further commits.

Change-Id: I79a16c48bfb98b75e5b99f2f6f4fa07b3ae02c5b
2018-06-07 20:19:26 -07:00
Bartosz Dziewoński
485f66f174 Use PHP 7 '??' operator instead of '?:' with 'isset()' where convenient
Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/
Replace with: '\1 ?? '

(Everywhere except includes/PHPVersionCheck.php)
(Then, manually fix some line length and indentation issues)

Then manually reviewed the replacements for cases where confusing
operator precedence would result in incorrect results
(fixing those in I478db046a1cc162c6767003ce45c9b56270f3372).

Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
2018-05-30 18:06:13 -07:00
Bartosz Dziewoński
b191e5e860 Use PHP 7 '<=>' operator in 'sort()' callbacks
`$a <=> $b` returns `-1` if `$a` is lesser, `1` if `$b` is lesser,
and `0` if they are equal, which are exactly the values 'sort()'
callbacks are supposed to return.

It also enables the neat idiom `$a[x] <=> $b[x] ?: $a[y] <=> $b[y]`
to sort arrays of objects first by 'x', and by 'y' if they are equal.

* Replace a common pattern like `return $a < $b ? -1 : 1` with the
  new operator (and similar patterns with the variables, the numbers
  or the comparison inverted). Some of the uses were previously not
  correctly handling the variables being equal; this is now
  automatically fixed.
* Also replace `return $a - $b`, which is equivalent to `return
  $a <=> $b` if both variables are integers but less intuitive.
* (Do not replace `return strcmp( $a, $b )`. It is also equivalent
  when both variables are strings, but if any of the variables is not,
  'strcmp()' converts it to a string before comparison, which could
  give different results than '<=>', so changing this would require
  careful review and isn't worth it.)
* Also replace `return $a > $b`, which presumably sort of works most
  of the time (returns `1` if `$b` is lesser, and `0` if they are
  equal or `$a` is lesser) but is erroneous.

Change-Id: I19a3d2fc8fcdb208c10330bd7a42c4e05d7f5cf3
2018-05-30 18:05:20 -07:00