The code that normalizes line endings ("\r\n" and "\r" to "\n") and
trims trailing whitespace is buried in Parser::preSaveTransform(), and
was duplicated to TextContent in 96b6afb31d, as non-wikitext content
models should still be normalizing line endings.
This splits the duplicated code into
TextContent::normalizeLineEndings(), and utilize it in the Parser.
Additionally, expand the documentation of
TextContent::preSaveTransform() to document that subclasses should make
sure they normalize line endings during the PST stage.
And remove a useless rtrim() call from WikitextContent that did nothing.
Change-Id: I9094c671d4bbd23d75436f8f1d682d6dd6e6d2fc
rawurldecode was being run on unclosed internal links
which could allow an attacker to insert arbitrary
html into the page.
See also related: r13302
Bug: T137264
Change-Id: I4e112a9e918df9fe78b62c311939239b483a21f5
This does the same normalization of newlines that
Parser::preSaveTransform() does. This should be appropriate for any text
content type, especially considering that EditPage uses
WebRequest::getText() which does a less-strict version of this same
transformation.
This also cleans up the code for doing that newline replacement
to be a bit less verbose.
Bug: T142805
Change-Id: I462afcda502f031a8b0360d982ce2398a0383a96
Doxygen requires the full qualified name of the class in a comment
or in the @aram/@return annotation, otherwise the class isn't linked
in the resulting output[1]. This commit changes the LinkRenderer
annotations in SpecialPage and Parser to \MediaWiki\Linker\LinkRenderer.
[1] https://doc.wikimedia.org/mediawiki-core/master/php/classSpecialPage.html#a3560214f63fc2f20c63b4025db5cd81d
Change-Id: I74cedcd764a6053cc5a0c6d2eedbedb72651f57c
We have two hacks which are used when Tidy is not available: one in
Sanitizer::removeHTMLtags(), and the second here as a late Parser pass
equivalent to Tidy itself. But the Sanitizer one was enabled only if
MWTidy::isEnabled() returned false, whereas the Parser one was enabled
also when tidy was disabled in ParserOptions. This patch makes them both
consistent, it enables the bug 2702 hack only when MWTidy::isEnabled()
returns false, and when Tidy is disabled in parser options, the output
is simply passed through.
This allows tidying to be done separately on the ParserOutput, as is
required by the proposed ParserMigration extension (I24d0776a933fa3f).
Eventually the bug 2702 hack will be removed in favour of a pure-PHP
HTML 5 parser, but it looks like it is too early for that.
Change-Id: I94be6c9dec531c23ef80cb36732243bd6858bf22
* Instead of having messy code to create a hidden HTML
comment of English strings at the bottom of the page,
expose the structured data of the parse information
to JS so tools can use it.
* Make makeConfigSetScript() use pretty output so these
variables are also easy to read in "view source".
* Remove ParserLimitReportFormat hook, since the data
is not formatted to HTML anymore.
Bug: T110763
Change-Id: I2783c46c6d80f828f9ecf5e71fc8f35910454582
We originally imagined rolling out the display of empty elements
simultaneously with the Html5Depurate, but now we have added support for
marking empty elements to Html5Depurate and plan on having some sort of
longer migration period. So, move the relevant CSS to content.css, and
remove the concept of CSS dependant on tidy driver.
Add a body class which will allow the effect to be toggled in a gadget or
extension. Actual toggling in the CSS will be in the stage 2 patch, to be
deployed after the varnish cache and parser cache have expired.
I originally imagined that there would be a gadget that overrides the
rule with an !important selector, but that method does not allow you to
recover the original display property, which is often overridden by the
style attribute or site CSS to be "inline".
Also, in RaggettWrapper, switch to the new class mw-empty-elt, following
Html5Depurate, instead of mw-empty-li. The old class will be removed in
the stage 2 patch.
Change-Id: Ic0f432c43a006629ca5a1a7c2dda3552ceb4dc4f
Some pages use constructs like `<b/>` or `<span/>` to protect spaces or
special characters at the beginning/end of templates. This syntax is
incompatible with HTML5 parsing rules, which dictate that these should
be treated as open tags, and instead rely on an unusual quirk of the
`tidy` program that removes invalid constructs.
This syntax is deprecated as part of the process of reconciling `tidy`
with modern HTML5 parsing semantics. Authors can use ` ` or `<nowiki/>`
as valid replacements.
In order to provide time to transition existing content, pages using
self-closing tags in violation of the HTML5 parsing specification
will have their templates/pages added to a new tracking category.
After these uses are fixed, we will change the sanitizer to treat these
as normal open tags, to be consistent with the HTML5 parsing spec.
Note that this construct is already disallowed if tidy is disabled; it
is rendered as `<b/>`. We add a tracking category in the no-tidy
case as well, in preparation for eventually making the no-tidy and
with-tidy behaviors consistent.
Bug: T134423
Change-Id: Ie1cf3aa40d5483bf395ece539f0240b694ff04ab
During both the edit stash and first parse in on page save,
guess what the rev_id will be and use that instead of null.
Only reparse if it turns out to be wrong. This avoids extra
parsing on wikis that have low-medium traffic, and does not
cost much. The parsing that can be avoided is:
a) in doEditContent() by using the stash
b) in doEditUpdates() by using the doEditContent() result,
whether that was able to use the stash or not itself
Also improved the parse operation logging in save paths.
Bug: T137900
Change-Id: Ic6faae70a78b4e223e4d3585cefd482c0fa00677
And SpecialPage::setLinkRenderer(), so the Parser can pass on its
LinkRenderer instance for when special pages are being included in a
page.
Change-Id: If9a9c648ab670b824ce534e7cf0d20d41e1bfd12
In galleries, bad images are rendered as links. This causes the same behavior
to occur in wikitext, rather than the current behavior of not rendering
anything.
Change-Id: I1a074bff7cb661b5b4e6db9503eb6a5de702ee2f
Few maintained extensions still rely on this and it is
bad practice to use this for handling cache correctness.
Change-Id: I2de481198bbff5c4f3dd81fc6d1b137e4c37b93f
Previously {{Special:Foo}} would cause parser cache to be disabled,
now have a method in SpecialPage to control this behaviour and set
arbitrary caching times.
Note: This does not affect caching of direct views to the special page
The new default is now disabling cache if not in miser mode,
otherwise setting to 1 hour, except for Special:Recentchanges
and Special:Newpages which set to 5 minutes. These values are
possibly really low, but for now I think best to be close to the
old behaviour. We had 0 caching for these things for years, and
afaik it hasn't caused any big issues. Part of me wonders if
Special:Recentchanges should stay at 0, but that sounds crazy.
This change also causes transcluded special pages to not be
"per-user" if they are being cached (Specificly $wgUser et al
become 127.0.0.1).
Bug: 60561
Change-Id: Id9ce987adeaa69d886eb1c5cd74c01072583e84d
Previously, no TTL at all was used, which is quite harsh on
performance and had downstream effects like disabling edit
stashing for affected pages.
Bug: T136678
Change-Id: I2462057aa189cfb05fe65d0b3c081a9fd10066a2
* Do not change the result to a null editing user anymore.
* Use a new vary-user flag instead of vary-revision. This
will only cause a reparse on null edits. Normal edits
can still use the prepared output now.
* Edit stashing now applies for pages with this magic word.
* Fixed bug where the second prepareContentForEdit() call
(due to vary-X flags) would still check the edit stash.
Bug: T135261
Bug: T136678
Change-Id: Id1733443ac3bf053ca61e5ae25db3fbf4499e9f9
Just always use the input size for new revisions. If they are
saved, then that should be the revision size. If they are just
null edits, then the size must have matched the current revision.
This also enables edit stashing for this case.
Change-Id: I428c0cc87750eeddd1d7dcebd1a2b03817cec441
Extensions shouldn't be calling this, just the Parser, so make it
protected. And since the only caller passes an empty array for $query,
we can just remove it entirely.
Change-Id: I3adbcaabbb40870eb3df1495c3c2743ff21f0c64
noreferrer is used as support for noopener is very limited.
This is to prevent the attack detailed at
https://mathiasbynens.github.io/rel-noopener/ where you can
navigate the parent window, even if the new window is a cross-origin.
Bug: T133507
Change-Id: I6e4ab938861e246ff44048077b94847e303f1859
Signed-off-by: Chad Horohoe <chadh@wikimedia.org>
Strip markers get substituted for general html, which means the
substitution text general does not escape quote characters. If
someone can convince MW to put a strip marker in an attribute,
you can get around escaping requirements that way. This patch
adds the characters `"' to the strip marker text. At least one
of these characters should be escaped inside attributes (regardless
of what quote character you use for attributes), thus normal html
escaping will deactivate the strip markers, preventing the
vulnrability.
This will break any extension that escapes input with htmlspecialchars,
to add to html/half parsed html output, but assumes that strip markers
are unmangled. I don't think its very common to do this. The primary
example I found was some core usages of Xml::escapeTagsOnly(). (And
even in that case, it only affected the corner case of being called
via {{#tag:..}})
Based on MatmaRex's suggestion.
Change-Id: If887065e12026530f36e5f35dd7ab0831d313561
Signed-off-by: Chad Horohoe <chadh@wikimedia.org>
It's independent of the rest of the Parser, but quite intrusive, with
its own instance variables and several private functions. It's also
pretty big (500 lines).
I removed a few functions from Parser here which were always marked
@private in the doc comment, but were inappropriately marked
"public" in the function declaration after migration to PHP 5. I grepped
core and deployed extensions and found no callers.
The helper functions are now all private, and the constructor is
private, with just a single public static entry point, reflecting
the self-contained nature of the module and its lack of hooks.
Change-Id: I1693ed48a9194719611b4afd9d989d44f0610f8d
* 55313f4e almost got it right, but missed the str_replacing table
headings.
* Thankfully, this was doubly broken before that patch since the
StringUtils::explodeMarkup would have skipped the || which would
go on to be explode by table cell attribute parsing. The test case
provided would look like,
<table>
<tr>
<th class="">|">ha</div> ho
</th></tr></table>
Suffice it to say, noone is using this in production.
* Note that we can't just entity encode the ! since that would break
style attributes with !important.
* Also note, Parsoid already gets this right.
* Adds a StringUtils::replaceMarkup
Change-Id: Iab3ae4518fcb307b795d57eece420ba48af0a3bf
I searched for /\$(\S+) = (.+?\(.*?\);)\n.*?\$\1\[/, ignored
everything involving isset(), unset() or array assigments, then
skimmed through the remaining results and changed things where they
made sense. These changes were not automated, so please review them.
Change-Id: Ib37b4c66fc57648470f151ad412210b3629c2538
* At that point, element attributes are already escaped so it serves no
purpose. Before `doTableStuff` is called, `Sanitizer::removeHTMLtags`
has been invoked which calls `Sanitizer::fixTagAttributes` which
calls `Sanitizer::safeEncodeTagAttributes` and finally gets down to
`Sanitizer::safeEncodeAttribute`, with the goal of "extra armoring
against further wiki processing."
Change-Id: Ieeb9b21148c2909eb839d13195d7d10012b48e3b
* Currently, for images:
[[File:Foobar.jpg|hi|alt=100|ho]]
caption: ho
but for galleries:
<gallery>
File:Foobar.jpg|hi|alt=100|ho
</gallery>
caption: hi|ho
* This patch brings some consistency to them.
Change-Id: I3b73189b27cc35fade4809477cf18779b953aa3b
It is desirable in terms of user-friendly syntax to display an empty
list item if the user adds one to the source. However, we suspect that
this change will break the rendering of existing templates. So, preserve
the empty <li> element, but style it with display:none so that there is
no user-visible change. Changes can then be observed with a user script,
then eventually the CSS can be removed so that the desired behaviour will
be user visible.
This is imagined as a staged deployment of T89331, i.e. it is better to
resolve differences with Html5Depurate one at a time instead of
deploying it all at once.
The CSS module is specified in parser/MWTidy.php since the tidy driver
hierarchy is not meant to be so closely tied to the MW environment.
Bug: T49673
Change-Id: Ifb44b782c617240e3de73dcdf76c8737c7307d94
* Setting mCacheTime to -1 is for old callers that
only check getCacheTime() instead of getCacheExpiry().
Most of them are already broken (WikiLog/SemanticForms) as
they check for -1 which is in fact never returned
due to the TS_MW conversion in Parser::getCacheTime.
* By using -1, the value of page_links_updated can end up
as 1969, which is confusing and broken.
Change-Id: I8809a4258eacff05992a2c27ade7f6a0c1731c51
The warnings are only shown during preview. It seems silly to
split the parser cache for this. There should be no parser cache
pollution to just using the user language without registering it
for use.
See also: 889e988cce
Change-Id: Ib42e8885e23a3c8bef8cf72948359d71254064c3
This doesn't fix all the files under includes/parser -
some of them deserve their own patches.
Bug: T102614
Change-Id: I2fcbc19ee337e1b7db4635b5e5f324c651b4d144
Move the added module from Parser.php to TraditionalImageGallery,
because there the gallerybox class is added to the html and at the
moment all core image galleries are extending the traditional one.
That brings the styles back for special pages like Special:NewFiles,
Special:MostImages and also on category pages with media files.
Follows Ib1aef04dc4fece78e6615386ecaef6a9f368f49e
Bug: T113511
Change-Id: I32697c2c65824d7622c1840330d6074ebb68b488
MWTimestamp::getTimezoneString() returns the timezone name as a message,
that supports wiki localization. The code is moved from Parser::pstPass2.
The default file revert message is currently always in UTC.
This patch sets the default timestamp to be in the wiki timezone (similar
to ~~~~). The timezone is passed as a new parameter to the message, with
the date / time parameters being merged and handled by
$wgContentLang->timeanddate
Bug: T36948
Change-Id: I48772f5f3b1635d33b6185776cedfc4ee1882494