Some versions of html-tidy (e.g. the one currently in use on WMF wikis)
will try to move all <style> tags in the body into the head, effectively
removing them for our purposes. We need to avoid that for TemplateStyles.
Bug: T167349
Change-Id: I133776d16f366cad73ed30af0e5a665fdf9f5ed9
In 1bf5a652 the id selector was changed to a class selector for toctitle.
The cached HTML has been expired now and the id selector is not necessary
anymore.
Also remove the id selector #toc.tochidden for print style. This is not
necessary because the tochidden gets only added to .toc and not to #toc.
Change-Id: I43cfffdb0807e8ed8f6b7b8732ba857b709bee80
This revises 2877402276, which was
reverted in master due to unexpected issues with `-{{...}} ` markup
on translatewiki and enwiki. Test cases are added to ensure that this
is parsed as a template, not as language converter markup.
https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical
documentation for the preprocessor; this will be updated after this
patch is merged. The basic principles described in that page are
maintained in this patch:
* Rightmost opening structure has precedence: `-{{` is parsed as a
dash followed by template opening.
* `{{{` has precedence over `{{` and `-{`: `-{{{{` is parsed as
`-{` `{{{` since we first grab the rightmost `{{{`.
A bunch of test cases were added to verify the "ideal precedence"
order described on that wiki page.
This patch introduced some minor incompatibilities in existing
markup, in particular with chemical formulae in templates.
Fixes for these are being tracked at
https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups
Bug: T146304
Bug: T153761
Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150
With TimedMediaHandler in video.js mode, videos can be inline,
without a wrapper div.
Previously, in this mode two paragraphs where one contained a
video would end up merged into one paragraph, due to BlockLevelPass
matching "<track .../>" against "<tr" in its regexes.
Added \b to a couple of the regexes to protect against such errors,
and corrected a parser test case that had bad output listed, where
"<link .../>" matched against "<li".
Bug: T165817
Change-Id: I06e82b881f5ebddae5e7df7fb940adfa54f6b659
This will allow CSS to target just the parser output, without also
accidentally targeting the edit form, diff tables, and so on.
Bug: T37247
Change-Id: If4eb5bf71f94fa366ec4eddb6964e8f4df6b824a
Depends-On: I330c6aa4aaee045614b1801ed34bc9e03be69650
Depends-On: I52a518fa44e017841fe78474012cd69823e0a41d
* This was introduced in 4d3446a8e3 when galleries were tables.
However, in 05579cf0e6, it switched to ul's, but missed updating the
sanitization.
* As an example, the test shows that summary is currently wrongly
permitted.
Change-Id: I8c52477dc65499d0c8a1ee5cc661a5f9ae78cc07
Self-links are still semantically links, and representing them as <strong>s
is inelegant and more important a real pain to work with, especially in
contexts where they may change state (like inside an editor).
Instead, render them as <a>, with no href to avoid user agent style over-
rides and with a class to style them as before, named 'mw-selflink' to go
with 'mw-redirect'. This allows much easier adjustment later. The old CSS
class 'selflink' is retained for backwards compatibility, but deprecated.
Bug: T160480
Change-Id: If058843924c3b30c116df2520aef93a004d98a5d
This file seems to be a stress-test for the MediaWiki preprocessor.
It doesn't really matter whether the messages references here exist.
As messages are occasionally renamed or deleted, and since this file
was generated in 2011, people keep getting confused when they grep
for a message name and run into this list (and sometimes needlessly
spend their time updating this file, as seen in its Git history).
This commit replaces all of the message names with their SHA1 hash
truncated to 8 hex characters.
Regexps used for matching:
(?<=\?title=MediaWiki\:)([^&{}<>|\[\]]+)
(?<=int:)([^&{}<>|\[\]]+)
(?<=\[\[MediaWiki_talk:)([^&{}<>|\[\]]+)
(?<=action=edit )([^&{}<>|\[\]]+)
Change-Id: I52a71c0cc0e6fa21a61420d52df755066c6e9a08
U+0000 is not allowed in HTML5, there's no reason to allow it in wikitext.
It simplifies our code if we can just strip them at the start. Strip in
PST as well so they don't sneak into our database either.
Tweaked the EXT_LINK URLs to account for the fact that invalid characters
get transformed into U+FFFD when using Preprocessor_DOM. See 73649741ed
(r65967) for context on that change.
Bug: T159174
Change-Id: I3f67e92b61aacc87a40c3662085c84d1dac08bfb
The option says "enable subpages (disabled by default)", but it
currently just enables subpages for namespaces 0 and 2. This tripped me
up when writing some parser tests for TemplateStyles where I need
subpages enabled for namespace 10.
There's probably no reason not to have it enable subpages for all
namespaces.
Change-Id: Icf864dafc4208a76af7b3e71f5f9c97576c065b7
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: I46261416f7603558dceb76ebe695a5cac274e417
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: I3eeffe40e0a752e1e3c79e65fa2fb556950d9a24
When parsing a single line definition list, we track nested tags so that:
; <b>foo:bar</b>: baz
breaks before `baz`, not between `foo` and `bar`. But we currently bail
out of this algorithm entirely if we see a mismatched close tag. We should
just ignore the unmatched tag, like Parsoid does.
Change-Id: I6306dcad6347abeb6ab001d35562f1ab9f374bd1
Given the wikitext:
;-{zh-cn:AAA;zh-tw:BBB}-
Prevent `doBlockLevels` from trying to split the definition list at the
embedded colon and using `AAA;zh-tw:BBB}-` as the `<dd>` portion.
Bug: T153135
Change-Id: I3a4d02f1fbd0d0fe8278d6b7c66005f0dd3dd36b
The first newline was missing so a block like:
!! hooks
source
!! endhooks
would turn into:
!! hookssource
!! endhooks
Change-Id: I2a4c5e52050d55fb0c9b4f5d0494eb00e34b233c
Previously for input -{<span title="-{X}-">X</span>}-, the converter
sees -{<span title="-{X}-">A</span>}-, so <span title="-{X
becomes the content in the first block, and a stray }- is left to output.
Now, the converter sees -{<span title="-{X}-">A</span>}- with
this change. In further processing, the span tag may be parsed and have
its title attrib converted. For cases where the content is not processed
further (eg. "R" = raw flag), "-{X}-" is left as is in the attrib, which
is not so ideal, but at least it's better than the original extra }-
outside the whole tag.
Change-Id: Idbaaf53f914f362e5b8cc9fad02a524f8d591bb7
The leading spaces on the link only cause us problems, such as for the
$noforce check 20 lines later.
Bug: T129218
Change-Id: I93a8da1f73b38fa3da362f8f27479b3039ed3f13
Make sure it returns the default content language on pages where the
language is not explicitly set.
Bug: T59603
Change-Id: I7b1437bf1650166c8be77e5bd84181c577961f27
This effectively reverts commit 2877402276 in
order to unblock the deploy train. The underlying behavior might not be
incorrect, but it was unexpected.
Bug: T153761
Change-Id: Ifc9c7cf3482dd5d222ff4da24a6d4cc401e9d965
A "remove HTML tags to avoid disrupting the layout" block is removed
(previously added in f16d1e4ed7).
This is a follow-up to I9b099273203482ffb570a5654d8ba50c833e526d.
Bug: T54192
Change-Id: I565fac58b3b0da7bfaedf64f5001c364f52e2244
This also protects naked external links, which are internally surrounded by
`-{R|...}-` by LanguageConverter::markNoConversion.
Originally found in failed tests in I7fa2d85d6.
Bug: T54190
Change-Id: I9b099273203482ffb570a5654d8ba50c833e526d
A protected version of explode is factored out as
`StringUtils::delimiterExplode`, since it will be used in follow-up
patches in this series. The `delimiterExplode` implementation creates
an intermediate array of the exploded results, which is reasonable as
the number of image options is small; but since an Iterator is
returned the implementation can be upgraded in the future (at the cost
of additional complexity) to avoid this. The additional code in that
case would be similar to ExplodeIterator.
Bug: T146305
Change-Id: I1327685e9e8c07ef476dceaa6f6dae4ba40989ef
This ensures that `{{echo|-{R|foo}-}}` is parsed correctly as
a template invocation with a single argument, not as two separate
arguments split by the `|`.
Bug: T146304
Change-Id: I709d007c70a3fd19264790055042c615999b2f67