It was noticed that disk usage on the parser cache machines was
increasing since shortly after wmf.4 was redeployed everywhere on the
9th. One theory is that I7fb9ffca9 causes this by making reparses for an
existing old-style cache entry start writing the new-style key where
they would previously have overwritten the old-style key. On that
theory, let's delete that old-style key (that should now be useless) on
save.
I'm assuming here that firing a blind delete for keys that probably
don't exist in the cache (i.e. every new edit) isn't going to hurt
anything. If that's not the case, we'd need to check existence before
deleting.
Bug: T167784
Change-Id: Ie5efb05722cb7da2a90da195a1f244468177175d
The handling of the 'editsection' option prior to I7fb9ffca9 was
unusual: it was included in the cache key, but the getter didn't ever
flag it as "used". This was overlooked in I7fb9ffca9.
This fixes the handling to restore that behavior. It's no longer
considered to be a real parser option, so changing it won't make
isSafeToCache() fail while reading it won't flag it as 'used'.
But to keep Wikibase working (see T85252), if 'editsection' is supplied
in $forOptions optionsHash() will still include it in the hash so
whatever Wikibase is doing by forcing that doesn't break. The hash when
it is included is the same as was used in I7fb9ffca9 to reuse keys.
Once optionsHashPre30() is removed, Wikibase should be changed to use
some other method to fix T85252 so we can remove that hack from
optionsHash().
Change-Id: I77b5519c5a1122a1fafbfc523b77b2268c0efeb1
* ParserOptions is reorganized so it knows all the options and their
defaults, and can report whether the non-key options are at their
defaults.
* Definition of the "canonical" ParserOptions (which is unfortunately
different from the "default" ParserOptions) is moved from
ContentHandler to ParserOptions.
* WikiPage uses this to throw an exception if it's asked to cache
with options that aren't used in the cache key.
* ParserCache gets some temporary code to try to avoid a massive cache
stampede on upgrade.
Bug: T110269
Change-Id: I7fb9ffca96e6bd04db44d2d5f2509ec96ad9371f
Depends-On: I4070a8f51927121f690469716625db4a1064dea5
Tested that parser cache keys stay the same, before and after this
change.
Also use the more obvious ObjectCache::getLocalClusterInstance() instead
of looking up the main cache type in config and using
ObjectCache::getInstance().
Change-Id: Icef646b3c05e732ef4079d6900e6bce111debf2b
This revises 2877402276, which was
reverted in master due to unexpected issues with `-{{...}} ` markup
on translatewiki and enwiki. Test cases are added to ensure that this
is parsed as a template, not as language converter markup.
https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical
documentation for the preprocessor; this will be updated after this
patch is merged. The basic principles described in that page are
maintained in this patch:
* Rightmost opening structure has precedence: `-{{` is parsed as a
dash followed by template opening.
* `{{{` has precedence over `{{` and `-{`: `-{{{{` is parsed as
`-{` `{{{` since we first grab the rightmost `{{{`.
A bunch of test cases were added to verify the "ideal precedence"
order described on that wiki page.
This patch introduced some minor incompatibilities in existing
markup, in particular with chemical formulae in templates.
Fixes for these are being tracked at
https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups
Bug: T146304
Bug: T153761
Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150
Save the backtrace when locking, so that if some code tries locking again,
we can print the lock owner's backtrace for easier debugging.
Change-Id: I6e352b4aa5e7cb35825a66592f6c066d9e8b95c9
This was the only addModules() call ever to be inside Parser.
Introduced in a54ef1a203. Prior to that, mediawiki.toc had always been loaded
by OutputPage (via mediawiki.util; and before that, via wikibits).
This patch restores that, and also fixes T130632 by making OutputPage get
it from the Skin, instead of hardcoding this somewhere in addParserOutput().
* Remove deprecated method OutputPage::enableTOC().
* Move mEnableTOC to addParserOutputText().
Bug: T130632
Change-Id: Iaad84d241a4c4348c712ac1087a664b8c9c46da4
With TimedMediaHandler in video.js mode, videos can be inline,
without a wrapper div.
Previously, in this mode two paragraphs where one contained a
video would end up merged into one paragraph, due to BlockLevelPass
matching "<track .../>" against "<tr" in its regexes.
Added \b to a couple of the regexes to protect against such errors,
and corrected a parser test case that had bad output listed, where
"<link .../>" matched against "<li".
Bug: T165817
Change-Id: I06e82b881f5ebddae5e7df7fb940adfa54f6b659
This will allow CSS to target just the parser output, without also
accidentally targeting the edit form, diff tables, and so on.
Bug: T37247
Change-Id: If4eb5bf71f94fa366ec4eddb6964e8f4df6b824a
Depends-On: I330c6aa4aaee045614b1801ed34bc9e03be69650
Depends-On: I52a518fa44e017841fe78474012cd69823e0a41d
Move link normalization directly into addExternalLink() method,
since you always need to do it - having it separate is just
inviting people to forget to normalize a link.
Additionally, links weren't properly registered for <gallery>.
This was somewhat unnoticed, as the call to recursiveTagParse()
would register free links, but it wouldn't work for example with
protocol relative links.
Issue originally reported by MZMcBride.
Bug: T48143
Change-Id: I557fb3b433ef9d618097b6ba4eacc6bada250ca2
* This was introduced in 4d3446a8e3 when galleries were tables.
However, in 05579cf0e6, it switched to ul's, but missed updating the
sanitization.
* As an example, the test shows that summary is currently wrongly
permitted.
Change-Id: I8c52477dc65499d0c8a1ee5cc661a5f9ae78cc07
System messages may take parameters from untrusted sources. This
may include taking parameters from urls given by unauthenticated
users even if the wiki is a read-only wiki. Allowing <html> tags
in such a context seems like an accident waiting to happen.
Bug: T156184
Change-Id: I661f482986d319cf41da1d3e7b20a0f028a42e90
U+0000 is not allowed in HTML5, there's no reason to allow it in wikitext.
It simplifies our code if we can just strip them at the start. Strip in
PST as well so they don't sneak into our database either.
Tweaked the EXT_LINK URLs to account for the fact that invalid characters
get transformed into U+FFFD when using Preprocessor_DOM. See 73649741ed
(r65967) for context on that change.
Bug: T159174
Change-Id: I3f67e92b61aacc87a40c3662085c84d1dac08bfb
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: I6f59febaf8fc96e80f8cfc11f4356283f461142a
$index is definitely not a int here, see the big switch( $index )-case
statement below. It switches for strings, not numbers. Also, note that
this is lowercase, one might expect it to be uppercase as this is how
magic words are written in wikitext.
Bug: T96633
Change-Id: Iea93c3796fdee4ed7abbb7608e89b627ca95aead
When parsing a single line definition list, we track nested tags so that:
; <b>foo:bar</b>: baz
breaks before `baz`, not between `foo` and `bar`. But we currently bail
out of this algorithm entirely if we see a mismatched close tag. We should
just ignore the unmatched tag, like Parsoid does.
Change-Id: I6306dcad6347abeb6ab001d35562f1ab9f374bd1
Given the wikitext:
;-{zh-cn:AAA;zh-tw:BBB}-
Prevent `doBlockLevels` from trying to split the definition list at the
embedded colon and using `AAA;zh-tw:BBB}-` as the `<dd>` portion.
Bug: T153135
Change-Id: I3a4d02f1fbd0d0fe8278d6b7c66005f0dd3dd36b
Use of &$this doesn't work in PHP 7.1. For callbacks to methods like
array_map() it's completely unnecessary, while for hooks we still need
to pass a reference and so we need to copy $this into a local variable.
Bug: T153505
Change-Id: I8bbb26e248cd6f213fd0e7460d6d6935a3f9e468
Needed for selective updates of pages using a particular feature.
Intended to be run in production, so needs to scale.
Bug: T149723
Change-Id: If20fb1f91de8d4227def5b07d6d52b91161ed3fd
The leading spaces on the link only cause us problems, such as for the
$noforce check 20 lines later.
Bug: T129218
Change-Id: I93a8da1f73b38fa3da362f8f27479b3039ed3f13
As discussed in https://gerrit.wikimedia.org/r/#/c/332702/ , these
methods and fields shouldn't have been marked public in the first place.
No outside users. Also, declare a couple of fields and remove unused ones.
Change-Id: I7775978c87d983784a484ee2ad901d25c42499b3
Undeclared variables are a very common error type that we want to catch
as often as possible. To avoid needing to refactor a variety of global
level code (mostly in old-style maintenance scripts) this ignores
undeclared variables in global scope. This is still a good improvement
over what was happening previously.
Change-Id: I50b41d571724244552074b9408abbdf6160aca59
This effectively reverts commit 2877402276 in
order to unblock the deploy train. The underlying behavior might not be
incorrect, but it was unexpected.
Bug: T153761
Change-Id: Ifc9c7cf3482dd5d222ff4da24a6d4cc401e9d965
This also protects naked external links, which are internally surrounded by
`-{R|...}-` by LanguageConverter::markNoConversion.
Originally found in failed tests in I7fa2d85d6.
Bug: T54190
Change-Id: I9b099273203482ffb570a5654d8ba50c833e526d