This avoids a double parse when the edit stash is not used,
which can be confirmed via the SaveParse log for a page
using {{REVISIONID}} when edit stashing is disabled. This
now matches the reuse for the edit stash hit case.
Change-Id: I405c39d4d7ac04e39fbdfe400f73238b734c7833
This code is functionally identical, but less error prone (not so easy
to forget or mix these numerical indexes).
This patch happens to touch the Parser, which might be a bit scary. We
can remove this file from this patch if you prefer.
Change-Id: I8cbe3a9a6725d1c42b86e67678c1af15fbc5961a
This only applies to content namespaces for now since
the cost of vary-revision-id is much less of a concern.
The potential to harm page save time is far worse than what
use they have, which is almost entirely just hacks to check
for preview mode. These have nothing to do with the actual
revision ID nor timestamp itself. They simply check whether
the value is the empty string. Since this magic word still
only returns an empty string in preview mode, such checks
will keep working.
Bug: T137900
Depends-on: I1809354055513a5b9d9589e2d6acda7579af76e2
Change-Id: Ieff8423ae3804b42d264f630e1a029199abf5976
It's a temporary feature flag not included in any release, just
removing it outright. The functonality will now be always enabled.
Bug: T205040
Change-Id: Ia9da82e6f6b2d270f1790a99fc8c35ad5e6aee5e
HTML doesn't allow certain semicolon-less HTML entities in attribute
values to avoid breaking legacy markup like:
<a href="http://example.com?foo¶m=bar">...</a>
(Note that the & in that URL is not properly entity-escaped as `&`.)
Unlike wikitext, HTML generally allows semicolon-less legacy entities
in text.
Our alt and link option processing shove text through
Sanitizer::stripAllTags, which does entity decoding including these
legacy semicolon-less entities. Wikitext doesn't allow semicolon-less
entities, so escape & characters where appropriate to protect alt/link
options and avoid breaking URLs.
This was a "regression" in how alt options were handled starting in
ddb4913f53 when we switched to using
Remex for Sanitizer::stripAllTags -- semicolon-less entities (previously
invalid in wikitext) were now being decoded when stripAllTags was
called on alt text. This change became a problem when
ad80f0bca2 sent link option text through
Sanitizer::stripAllTags (with the new semicolon-less entity decode)
instead of PHP's strip_tags (which, in addition to its other faults,
doesn't do entity decode at all). This suddenly started decoding
"non-wikitext" entities like `¶` inside URLs, breaking links.
Filed T210437 as a follow-up to consider changing the behavior
of Sanitizer::stripAllTags() globally to prevent it from decoding
semicolon-less entities for all callers.
Bug: T209236
Change-Id: I5925e110e335d83eafa9de935c4e06806322f4a9
This adds a method to LinkFilter to build the query conditions necessary
to properly use it, and adjusts code to use it.
This also takes the opportunity to clean up the calculation of el_index:
IPs are handled more sensibly and IDNs are canonicalized.
Also weird edge cases for invalid hosts like "http://.example.com" and
corresponding searches like "http://*..example.com" are now handled more
regularly instead of being treated as if the extra dot were omitted,
while explicit specification of the DNS root like "http://example.com./"
is canonicalized to the usual implicit specification.
Note that this patch will break link searches for links where the host
is an IP or IDN until refreshExternallinksIndex.php is run.
Bug: T59176
Bug: T130482
Change-Id: I84d224ef23de22dfe179009ec3a11fd0e4b5f56d
Future parsers will not support the output generated with tidy disabled.
Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.
No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.
Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
Previously, they were always displayed in defult language unless
forced explicitly in wikitext, e.g. [[File:Foo.svg|lang=ru]].
This change adds a feature flag that would enable always trying to
display in page language.
* If enabled, Parser will pass a new parameter - 'pagelang' - to
the media handler.
* SvgHandler uses page language when determining what language to
render the image in.
* 'pagelang' can always be overridden by 'lang'.
* If no translation in page language is available, the default
language (English) will be used for thumbnail URLs, to prevent
cluttering media storage and HTTP caches with useless copies.
Performance: this requires accessing image's metadata during parsing.
My testing indicates there were no code path where this wasn't the
case already, so no performance hit is expected, however we should
still keep an eye on page save performance.
Bug: T205040
Change-Id: I348840ef405e1370cc0c17d69051bce30153c9c0
Use Parser::stripAltText() consistently to handle link and alt options
in both Parser::makeImage() and Parser::renderImageGallery(). This
ensures that link option text can use <nowiki> to escape problematic
text so that (for example) the following works:
```
[[File:Foobar.jpg|link=<nowiki>a''b''c</nowiki>|alt=<nowiki>a''b''c</nowiki>]]
<gallery>
File:Foobar.jpg|link=<nowiki>a''b''c</nowiki>|alt=<nowiki>a''b''c</nowiki>
</gallery>
```
Previously the handling of the link option in
Parser::renderImageGallery() used a bespoke `strip_tags` invocation
which didn't replace <nowiki>s, and the handling of the link option in
Parser::makeImage() didn't strip tags at all, nor did it replace
<nowiki>s. For example, in Parser::makeImage() double quotes in
titles would be converted to embedded `<i>` tags before being passed
to Parser::parseLinkParameter(), with predictably poor results.
Tests added to confirm behavior of alt/link with HTML-escaped
entities and <nowiki>s exposed a bug in Remex: T207088. Tests
will fail on PHP 7.0 until that is fixed.
Bug: T206940
Depends-On: Ide67bba20f771868c0e119cb2874464dcf1d758a
Change-Id: Ife4c0edaa85e0cb294c5d4c1e31d5d7d828d9df4
This injects the new, unsaved RevisionRecord object into the Parser used
for Pre-Save Transform, and sets the user and timestamp on that revision,
to allow {{subst:REVISIONUSER}} and {{subst:REVISIONTIMESTAMP}} to function.
Bug: T203583
Change-Id: I31a97d0168ac22346b2dad6b88bf7f6f8a0dd9d0
This replaces the builtin taints that are removed in
Ic1e1983a51c. Additionally, parse will no longer warn about
double escaping - there's many situations where such warnings
are wrong (e.g. Using Html::rawElement()). However this also
means that Parser::parse( wfMessage( 'foo' )->parse() ); will
no longer give a double escaping warning, which is unfortunate.
Bug: T202380
Change-Id: Ia52d37411beb62b112c6ff102438063c3d750769
This is not strictly accurate, because Parser::internalParse() actually
returns half-parsed HTML, which is not safe for output. But it is safe for
output from a parser tag.
Maybe phan-taint-check plugin needs to learn about half-parsed HTML as an
extra taint type, and make that an acceptable thing for parser tags to return,
but not other things.
But this fixes the failures for the Listings extension, so I think it's
worthwhile in the meantime.
Change-Id: Idf87f5c3dcf81dd210de73a4ff15e3b1aabd9f89
RevisionRenderer is the MCR replacement for Content::getParserOutput,
as outlined in <https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PageUpdater>.
Note: This change also introduces quite a bit of code for
merging ParserOutput objects.
Bug: T194048
Change-Id: I871978bf79f67c9e7954fb3fc8528d6e365f2cc1
Instead of applying wrapping the the parser and unwrapping in
ParserOutput::getText(), turn this around and apply wrapping in getText(),
and only if desired.
This avoids search&replace logic for unwrapping, and it also makes it a lot
easier to merge the output of multiple slots for MCR output.
This changes behavior in two hopefully irrelevant ways:
1) the limit report comments will be inside the wrapper div, instead of
following it.
2) if HTML with a wrapper div is explicitly injected into a ParserOutput
object, it will not be possible to unwrap the text.
Bug: T174035
Change-Id: I1641b7995af9bd297f1acd610d583fbf874f34e0
This means that now:
* Entries actually get deleted when expired
* The transclusion cache is shared across wikis
* Large blobs that do not fit in cache no longer cause DB errors
* DB writes are not triggered on GET requests
* Keys are hashed and no longer need to be so restrictive
Also, add a "check key" based purge system and process cache the
text/html values similar to how regular revision text is cached.
Bug: T189702
Change-Id: I8ac12b53c02bb26857175dd5a4af29d49e03dc33
So callers don't need to do this manually. Pointed out by Tim in T201799.
Depends-On: Ia6c36d5a650095e35093bf47e275e081e89b3daf
Change-Id: Ida62767f3ca53f99609cae01d3a20051bb92ccab