Commit graph

1005 commits

Author SHA1 Message Date
Aaron Schulz
19ab538705 parser: list the vary-* flags in the NewPP report HTML comment
Change-Id: I5a4afba2bfdb5b5b56ba0a01ed8ff444a67fbb1a
2019-05-29 10:58:56 -07:00
Aryeh Gregor
7149153754 Don't pass Config to Parser(Factory)
Change-Id: I5996b7cad8c8a61518d2997e955a4547c64a73a5
2019-05-20 11:19:21 -05:00
Derick Alangi
b76886d4f8 parser: Remove deprecated Parser class attribute $mUniqPrefix
This variable was deprecated in 1.26 and per a quick search using
Code Search: https://codesearch.wmflabs.org/search/?q=mUniqPrefix&i=nope&files=&repos=
it's no longer used. Hence, removed.

Change-Id: Ic8f939dde3ea511e8e46faf0f1b212d3db2d80cd
2019-04-22 12:38:51 +01:00
jenkins-bot
d02723401c Merge "Simplify and unify the {{REVISIONID}} handling code in Parser" 2019-04-17 22:17:10 +00:00
jenkins-bot
63591baccc Merge "Use LinkTarget in Linker instead of Title" 2019-04-17 18:16:28 +00:00
Aryeh Gregor
e6df285854 Remove all $wgParser use from core
Bug: T160811

Change-Id: I0556c04d33386d0339e02e2bf7a1ee74d97c2abd
2019-04-17 15:16:50 +03:00
Aaron Schulz
9f267e7d69 Simplify and unify the {{REVISIONID}} handling code in Parser
Improve documentation for Parser::getRevisionId().

Change-Id: I3cb8721e3bc2e3a06c3158cd60742bc10a458f20
2019-04-16 03:40:08 -07:00
Aryeh Gregor
b6e1e99bec Use LinkTarget in Linker instead of Title
Bug: T214318
Change-Id: I60b6208fa5b45a568e81f908a19cd0f244ef79be
2019-04-15 17:15:05 +03:00
Aaron Schulz
d256b472f7 parser: use "-" for revision ID for non-preview edit filter parse during save
This avoids a double parse when the edit stash is not used,
which can be confirmed via the SaveParse log for a page
using {{REVISIONID}} when edit stashing is disabled. This
now matches the reuse for the edit stash hit case.

Change-Id: I405c39d4d7ac04e39fbdfe400f73238b734c7833
2019-04-13 16:43:06 -07:00
Aaron Schulz
5e6d9340cb Add vary-revision-exist flag to handle {{REVISIONID}} and parser cache
Follow-up to c537eb1868

Bug: T220854
Change-Id: Idc19cc29764a38e3671ca1dea158bd5fb46eaf4d
2019-04-12 17:20:50 -07:00
Aryeh Gregor
80590a15dc Update Parser to use NamespaceInfo
Change-Id: I668a51487786e4ab05a153ca3995388e79c13b42
2019-04-11 15:40:49 +03:00
Aryeh Gregor
21f7ab7e22 Inject LinkRendererFactory into Parser
Change-Id: Idaf5a0f897dc3bd2aa9bf03be280281836bfc645
2019-04-07 15:02:30 +03:00
Reedy
c13fee87d4 Collapse some nested if statements
Change-Id: I9a97325d738d09370d29d35d5254bc0dadc57ff4
2019-04-04 19:02:22 +00:00
jenkins-bot
58d70885d8 Merge "Disable expensive {{REVISIONID}} magic word in miser mode" 2019-04-02 20:24:25 +00:00
Thiemo Kreuz
9314453c93 Make use of the list() feature where it makes sense
This code is functionally identical, but less error prone (not so easy
to forget or mix these numerical indexes).

This patch happens to touch the Parser, which might be a bit scary. We
can remove this file from this patch if you prefer.

Change-Id: I8cbe3a9a6725d1c42b86e67678c1af15fbc5961a
2019-03-24 20:12:23 +00:00
Aaron Schulz
c537eb1868 Disable expensive {{REVISIONID}} magic word in miser mode
This only applies to content namespaces for now since
the cost of vary-revision-id is much less of a concern.

The potential to harm page save time is far worse than what
use they have, which is almost entirely just hacks to check
for preview mode. These have nothing to do with the actual
revision ID nor timestamp itself. They simply check whether
the value is the empty string. Since this magic word still
only returns an empty string in preview mode, such checks
will keep working.

Bug: T137900
Depends-on: I1809354055513a5b9d9589e2d6acda7579af76e2
Change-Id: Ieff8423ae3804b42d264f630e1a029199abf5976
2019-03-09 10:50:49 -08:00
Max Semenik
1e9db557d7 Remove $wgMediaInTargetLanguage
It's a temporary feature flag not included in any release, just
removing it outright. The functonality will now be always enabled.

Bug: T205040
Change-Id: Ia9da82e6f6b2d270f1790a99fc8c35ad5e6aee5e
2019-03-08 15:24:39 -08:00
Fomafix
f17c297624 Use short assignment operator in PHP
Use
  $var .= $foo
instead of
  $var = $var . $foo

Change-Id: I5dcdd7278e618c14968e5ac1fb8ea43ac2200deb
2019-03-07 09:55:49 +01:00
Fomafix
204126e7c7 Simplify strings in PHP code
Change-Id: I481810ade68b0c5a5be21d22e2a107646d5813e6
2019-03-01 22:16:26 +01:00
Kunal Mehta
cc5d9a92a2 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
Change-Id: I66b1775b7c1d36076d9ca78cbeb42787a743f2aa
2019-02-07 18:39:42 +00:00
jenkins-bot
bd78869618 Merge "No yoda conditions" 2018-12-09 01:34:23 +00:00
Alangi Derick
19adaa6a4b parser: Fix PHPDoc annotations in parser module
Change-Id: I09680d72516f943051e86655b5fddf9ff2988e4e
2018-12-08 13:07:10 +00:00
jenkins-bot
4077b57759 Merge "Parse wikitext in gallery caption" 2018-11-27 15:47:50 +00:00
C. Scott Ananian
f87898b488 Protect legacy URL parameter syntax in link and alt options
HTML doesn't allow certain semicolon-less HTML entities in attribute
values to avoid breaking legacy markup like:
  <a href="http://example.com?foo&param=bar">...</a>
(Note that the & in that URL is not properly entity-escaped as `&amp;`.)

Unlike wikitext, HTML generally allows semicolon-less legacy entities
in text.

Our alt and link option processing shove text through
Sanitizer::stripAllTags, which does entity decoding including these
legacy semicolon-less entities.  Wikitext doesn't allow semicolon-less
entities, so escape & characters where appropriate to protect alt/link
options and avoid breaking URLs.

This was a "regression" in how alt options were handled starting in
ddb4913f53 when we switched to using
Remex for Sanitizer::stripAllTags -- semicolon-less entities (previously
invalid in wikitext) were now being decoded when stripAllTags was
called on alt text.  This change became a problem when
ad80f0bca2 sent link option text through
Sanitizer::stripAllTags (with the new semicolon-less entity decode)
instead of PHP's strip_tags (which, in addition to its other faults,
doesn't do entity decode at all).  This suddenly started decoding
"non-wikitext" entities like `&para` inside URLs, breaking links.
Filed T210437 as a follow-up to consider changing the behavior
of Sanitizer::stripAllTags() globally to prevent it from decoding
semicolon-less entities for all callers.

Bug: T209236
Change-Id: I5925e110e335d83eafa9de935c4e06806322f4a9
2018-11-27 10:12:05 -05:00
Fomafix
3ee1560232 No yoda conditions
Replace
  if ( 42 === $foo )
by
  if ( $foo === 42 )

Change-Id: Ice320ef1ae64a59ed035c20134326b35d454f943
2018-11-21 17:54:39 +01:00
jenkins-bot
7a3eb1f3a6 Merge "Hard deprecate codepaths where tidy is disabled" 2018-11-13 23:54:24 +00:00
Brad Jorsch
d65e96b763 Use new externallinks.el_index_60 field
This adds a method to LinkFilter to build the query conditions necessary
to properly use it, and adjusts code to use it.

This also takes the opportunity to clean up the calculation of el_index:
IPs are handled more sensibly and IDNs are canonicalized.

Also weird edge cases for invalid hosts like "http://.example.com" and
corresponding searches like "http://*..example.com" are now handled more
regularly instead of being treated as if the extra dot were omitted,
while explicit specification of the DNS root like "http://example.com./"
is canonicalized to the usual implicit specification.

Note that this patch will break link searches for links where the host
is an IP or IDN until refreshExternallinksIndex.php is run.

Bug: T59176
Bug: T130482
Change-Id: I84d224ef23de22dfe179009ec3a11fd0e4b5f56d
2018-11-12 22:33:18 +00:00
C. Scott Ananian
54ac31f94d Hard deprecate codepaths where tidy is disabled
Future parsers will not support the output generated with tidy disabled.

Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.

No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.

Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
2018-11-05 18:49:16 +00:00
jenkins-bot
9aedec343e Merge "Handle <nowiki> and other markup consistently in image link/alt options" 2018-11-02 01:59:01 +00:00
Max Semenik
c16704c33a Display SVGs in target language
Previously, they were always displayed in defult language unless
forced explicitly in wikitext, e.g. [[File:Foo.svg|lang=ru]].
This change adds a feature flag that would enable always trying to
display in page language.

* If enabled, Parser will pass a new parameter - 'pagelang' - to
  the media handler.
* SvgHandler uses page language when determining what language to
  render the image in.
* 'pagelang' can always be overridden by 'lang'.
* If no translation in page language is available, the default
  language (English) will be used for thumbnail URLs, to prevent
  cluttering media storage and HTTP caches with useless copies.

Performance: this requires accessing image's metadata during parsing.
My testing indicates there were no code path where this wasn't the
case already, so no performance hit is expected, however we should
still keep an eye on page save performance.

Bug: T205040
Change-Id: I348840ef405e1370cc0c17d69051bce30153c9c0
2018-10-30 16:12:11 -07:00
Tim Starling
a6a017cea4 Fix use of non-existent variable Parser::$config
Fix bug from Ib4394f370cb561ccf195338a1c2e9e465dcb3dc3

Add test.

Bug: T208000
Change-Id: Ia81cca1b64afef2af3cb8dff19719a7f0de9d306
2018-10-25 16:27:55 -07:00
C. Scott Ananian
ad80f0bca2 Handle <nowiki> and other markup consistently in image link/alt options
Use Parser::stripAltText() consistently to handle link and alt options
in both Parser::makeImage() and Parser::renderImageGallery().  This
ensures that link option text can use <nowiki> to escape problematic
text so that (for example) the following works:

```
[[File:Foobar.jpg|link=<nowiki>a''b''c</nowiki>|alt=<nowiki>a''b''c</nowiki>]]
<gallery>
File:Foobar.jpg|link=<nowiki>a''b''c</nowiki>|alt=<nowiki>a''b''c</nowiki>
</gallery>
```

Previously the handling of the link option in
Parser::renderImageGallery() used a bespoke `strip_tags` invocation
which didn't replace <nowiki>s, and the handling of the link option in
Parser::makeImage() didn't strip tags at all, nor did it replace
<nowiki>s.  For example, in Parser::makeImage() double quotes in
titles would be converted to embedded `<i>` tags before being passed
to Parser::parseLinkParameter(), with predictably poor results.

Tests added to confirm behavior of alt/link with HTML-escaped
entities and <nowiki>s exposed a bug in Remex: T207088.  Tests
will fail on PHP 7.0 until that is fixed.

Bug: T206940
Depends-On: Ide67bba20f771868c0e119cb2874464dcf1d758a
Change-Id: Ife4c0edaa85e0cb294c5d4c1e31d5d7d828d9df4
2018-10-22 15:26:36 +00:00
jenkins-bot
b3f03fd75e Merge "Inject Config into Parser instead of using globals" 2018-10-17 14:34:31 +00:00
James D. Forrester
976c50c21a Drop ParserLimitReport, deprecated in 1.22
Change-Id: I4898d92569bd823f09c12f68fa186e2e139790a7
2018-10-10 16:20:18 -07:00
Aryeh Gregor
5173d5ee60 Inject Config into Parser instead of using globals
Change-Id: Ib4394f370cb561ccf195338a1c2e9e465dcb3dc3
2018-10-02 21:26:01 +03:00
jenkins-bot
50f6b24ee6 Merge "Parser: Refactor parsing of [[File:...|link=...]] syntax for reusability" 2018-09-26 19:18:34 +00:00
Bartosz Dziewoński
1c9664d18a Parser: Refactor parsing of [[File:...|link=...]] syntax for reusability
Change-Id: I91467297de4b7c532448a4c20b9a0dc8216c7200
2018-09-26 13:36:32 +02:00
C. Scott Ananian
327f0f92fa Use wfIsHHVM() instead of a HipHop-specific environment variable
Change-Id: I5bbf3e4f65d9b6a0d7419f67e3931e77e92b7e6c
2018-09-20 09:23:54 -04:00
daniel
465954aa23 Provide new, unsaved revision to PST to fix magic words.
This injects the new, unsaved RevisionRecord object into the Parser used
for Pre-Save Transform, and sets the user and timestamp on that revision,
to allow {{subst:REVISIONUSER}} and {{subst:REVISIONTIMESTAMP}} to function.

Bug: T203583
Change-Id: I31a97d0168ac22346b2dad6b88bf7f6f8a0dd9d0
2018-09-06 18:33:44 +02:00
Brian Wolff
13e5700b23 Use annotations for taint in Parser & ParserOutput.
This replaces the builtin taints that are removed in
Ic1e1983a51c. Additionally, parse will no longer warn about
double escaping - there's many situations where such warnings
are wrong (e.g. Using Html::rawElement()). However this also
means that Parser::parse( wfMessage( 'foo' )->parse() ); will
no longer give a double escaping warning, which is unfortunate.

Bug: T202380
Change-Id: Ia52d37411beb62b112c6ff102438063c3d750769
2018-08-31 15:55:44 +00:00
jenkins-bot
a3357744c3 Merge "[MCR] Introduce RevisionRenderer" 2018-08-31 11:25:15 +00:00
Kunal Mehta
eb7150b029 Set @param-taint for Parser::internalParse()
This is not strictly accurate, because Parser::internalParse() actually
returns half-parsed HTML, which is not safe for output. But it is safe for
output from a parser tag.

Maybe phan-taint-check plugin needs to learn about half-parsed HTML as an
extra taint type, and make that an acceptable thing for parser tags to return,
but not other things.

But this fixes the failures for the Listings extension, so I think it's
worthwhile in the meantime.

Change-Id: Idf87f5c3dcf81dd210de73a4ff15e3b1aabd9f89
2018-08-30 21:46:10 -07:00
daniel
e9f71517f7 [MCR] Introduce RevisionRenderer
RevisionRenderer is the MCR replacement for Content::getParserOutput,
as outlined in <https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PageUpdater>.

Note: This change also introduces quite a bit of code for
merging ParserOutput objects.

Bug: T194048
Change-Id: I871978bf79f67c9e7954fb3fc8528d6e365f2cc1
2018-08-30 19:15:12 +02:00
jenkins-bot
ba6c827485 Merge "Apply content wrapping in ParserOutput::getText()" 2018-08-29 16:25:22 +00:00
daniel
0dc7ba02b4 Apply content wrapping in ParserOutput::getText()
Instead of applying wrapping the the parser and unwrapping in
ParserOutput::getText(), turn this around and apply wrapping in getText(),
and only if desired.

This avoids search&replace logic for unwrapping, and it also makes it a lot
easier to merge the output of multiple slots for MCR output.

This changes behavior in two hopefully irrelevant ways:
1) the limit report comments will be inside the wrapper div, instead of
following it.
2) if HTML with a wrapper div is explicitly injected into a ParserOutput
object, it will not be possible to unwrap the text.

Bug: T174035
Change-Id: I1641b7995af9bd297f1acd610d583fbf874f34e0
2018-08-29 16:46:25 +02:00
jenkins-bot
e548e0f35c Merge "Make interwiki transclusion use the WAN cache" 2018-08-27 21:17:04 +00:00
Aaron Schulz
6504e23074 Make interwiki transclusion use the WAN cache
This means that now:
* Entries actually get deleted when expired
* The transclusion cache is shared across wikis
* Large blobs that do not fit in cache no longer cause DB errors
* DB writes are not triggered on GET requests
* Keys are hashed and no longer need to be so restrictive

Also, add a "check key" based purge system and process cache the
text/html values similar to how regular revision text is cached.

Bug: T189702
Change-Id: I8ac12b53c02bb26857175dd5a4af29d49e03dc33
2018-08-27 19:32:04 +00:00
Kunal Mehta
2852255186 Inject SpecialPageFactory into Parser
Change-Id: I6a6a94cbdafdc724ce02408cd9e744e7b3eda92b
2018-08-17 12:03:13 -07:00
Kunal Mehta
b7b8f214bb Parser: Call firstCallInit() in getTags/getFunctionHooks
So callers don't need to do this manually. Pointed out by Tim in T201799.

Depends-On: Ia6c36d5a650095e35093bf47e275e081e89b3daf
Change-Id: Ida62767f3ca53f99609cae01d3a20051bb92ccab
2018-08-14 14:16:42 -07:00
Kunal Mehta
e8370d6977 Parser: Add accessors needed by CodeMirror
Change-Id: Ia2d98baf6caed2cd779cb00aceba5785cf13d633
2018-08-13 22:44:48 -07:00