Commit graph

338 commits

Author SHA1 Message Date
Gergő Tisza
1bb5b58eb1
Unwrap HTML loaded from parser cache
Partially and temporarily reverts I1641b7995 to deal with cached
HTML the same way the old code did.

Bug: T203716
Change-Id: I29846a6513f6b580b429c0bfe6c310ada50b28bb
2018-09-07 18:49:34 +02:00
daniel
465954aa23 Provide new, unsaved revision to PST to fix magic words.
This injects the new, unsaved RevisionRecord object into the Parser used
for Pre-Save Transform, and sets the user and timestamp on that revision,
to allow {{subst:REVISIONUSER}} and {{subst:REVISIONTIMESTAMP}} to function.

Bug: T203583
Change-Id: I31a97d0168ac22346b2dad6b88bf7f6f8a0dd9d0
2018-09-06 18:33:44 +02:00
daniel
d8c409dd16 Make HTML generation in RenderedRevision optional
This allows optimization for situations in which a caller
needs the meta-data of a ParserOutput, and the respective
ContentHandler can provide that meta-data without generating
HTML output.

Bug: T194048
Change-Id: I786d294d18a6a2e3cea61577313e21b578c44f1e
2018-08-31 10:48:41 +00:00
daniel
e9f71517f7 [MCR] Introduce RevisionRenderer
RevisionRenderer is the MCR replacement for Content::getParserOutput,
as outlined in <https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/MCR-PageUpdater>.

Note: This change also introduces quite a bit of code for
merging ParserOutput objects.

Bug: T194048
Change-Id: I871978bf79f67c9e7954fb3fc8528d6e365f2cc1
2018-08-30 19:15:12 +02:00
daniel
0dc7ba02b4 Apply content wrapping in ParserOutput::getText()
Instead of applying wrapping the the parser and unwrapping in
ParserOutput::getText(), turn this around and apply wrapping in getText(),
and only if desired.

This avoids search&replace logic for unwrapping, and it also makes it a lot
easier to merge the output of multiple slots for MCR output.

This changes behavior in two hopefully irrelevant ways:
1) the limit report comments will be inside the wrapper div, instead of
following it.
2) if HTML with a wrapper div is explicitly injected into a ParserOutput
object, it will not be possible to unwrap the text.

Bug: T174035
Change-Id: I1641b7995af9bd297f1acd610d583fbf874f34e0
2018-08-29 16:46:25 +02:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
Aryeh Gregor
bca6085920 Use ParserFactory in a bunch of places
I wasn't sure how to convert the rest of the occurrences in core (there
are a significant number).

Bug: T200881
Change-Id: I114bba946cd3ea8a293121e275588c3c4d174f94
2018-08-11 00:16:17 -06:00
Aryeh Gregor
62515f7b15 Introduce ParserFactory service
Bug: T200881
Change-Id: I257e78200983cb10afb76de1f07dd1b9d531c52a
2018-08-11 00:15:52 -06:00
Aryeh Gregor
355e21590a Use setContentLang() instead of setMwGlobals()
This changes behavior in some tests by making them set $wgLanguageCode
as well as $wgContLang, but that seems like a good thing.

Bug: T200246
Change-Id: I936888f46ff9fefe2707efba837e2ce3a7ca5e3f
2018-07-26 11:35:58 +00:00
Brad Jorsch
c6810d74d1 Deprecate ContentHandler::makeParserOptions()
Having a different ParserOptions for each content model isn't feasible
in an MCR world. And the only thing using this was Wikibase, which has
been fixed to do what it needs in a different way.

Bug: T194263
Change-Id: I01373b29ee25fa9346c6b0317155be4ccdc8c515
2018-07-13 14:32:59 -04:00
Brad Jorsch
c3900e06be Resolve used lazy options in ParserOptions::optionsHash()
If a lazy option is passed to ParserOptions::optionsHash(), we should
resolve the option so the hash can incorporate the proper value instead
of omitting it.

Also, completely unrelatedly, refactor the hook overriding in the unit
test because people won't stop whining about it in code review.

Change-Id: I2df78ed90875c229090b503b65f20fbbbba7f237
2018-05-15 06:54:55 +00:00
James D. Forrester
846f4f58f5 Remove $wgExperimentalHtmlIds and related code, deprecated in 1.30
Bug: T139744
Change-Id: Ia15d5ab6e7637fd40d5c3399822a3dbeb7b383b5
2018-05-01 14:34:02 -07:00
James D. Forrester
78e5c8d7af Default installations to using RemexHtml for tidying
This combines two changes – defaulting tidying to on, previously off, and
defaulting the tidying library to RemexHtml, previously the tidy binary.
Config options are going to be a bit of a mess until we drop support for
the old tidy binary config route.

Bug: T185753
Depends-On: I0a8973f508fbf65160177b003260831639828eeb
Change-Id: I6879a77a78d780c7c056d807dde20682c6097d1b
2018-04-10 10:51:34 -07:00
Addshore
aada90ac68 Revert "Default installations to using RemexHtml for tidying"
This reverts commit efcef34d3d.

This is causing failures in CI for extensions

Depends-On: If9789a61d52f60882fc2f0226757c9d93e1c6362
Change-Id: I17cf305a951b2bf1f03285b12c3e131abcffd31d
2018-04-06 11:16:43 +00:00
James D. Forrester
efcef34d3d Default installations to using RemexHtml for tidying
This combines two changes – defaulting tidying to on, previously off, and
defaulting the tidying library to RemexHtml, previously the tidy binary.
Config options are going to be a bit of a mess until we drop support for
the old tidy binary config route.

Bug: T185753
Depends-On: I0a8973f508fbf65160177b003260831639828eea
Change-Id: I6879a77a78d780c7c056d807dde20682c6097d1a
2018-04-05 10:30:07 -07:00
Umherirrender
69dbaf3f88 build: Updating mediawiki/mediawiki-codesniffer to 17.0.0
Change-Id: Ib494b47c54fe6354d166055b1e1b31d3583bb992
2018-03-29 21:53:10 +02:00
Timo Tijhof
14644a2fb0 phpunit: Add some @covers and @large/@medium to integration tests
- @small: single class, no I/O (unit test).
- @medium: multi-class (partial or no mocks), no I/O (unit/integration test).
- @large: multi-class, I/O allowed (integration test).

Change-Id: I09317e6dd9b0ee34b7467fbffdd07957ef55dc04
2018-03-20 09:14:34 -07:00
Umherirrender
d9fb8bab5e Move phpunit @group from file comment to class comment
Remove @group from non tests

Change-Id: Iae9ee3bc5f539a9b4ded8374006ab2993234450e
2018-03-10 11:48:28 +01:00
Tim Starling
f0247e05bd StripState testing and cleanup
* Added StripState unit tests
* Deprecated unmaintained "half-parsed" serialization experiment
* Renamed some variables for brevity and removed unused "prefix"

Change-Id: I838d7ac7f9a2189e13d39c6939dba5d70e74a6b7
2018-03-05 16:43:58 +11:00
Umherirrender
63d96c15fd build: Updating mediawiki/mediawiki-codesniffer to 16.0.0
Change-Id: I59b59f79bbf3ce4feff3b3a20c1c31bc16370531
2018-02-17 13:29:13 +01:00
Brad Jorsch
2791fb0861 Hard-deprecate ParserOutput stateful transform methods
This also removes all the in-core calls that had been kept for the
benefit of extensions, and causes them to not have any effect since
anything that had been calling them was already either a no-op or will
probably be broken now that nothing in core is setting or checking the
flags.

Change-Id: Id22c1a5a6d6a249debb14063ae3f8838d105b634
2018-02-13 12:28:36 -05:00
Brad Jorsch
9b2b375fce ParserOutput: Add 'deduplicateStyles' post-cache transformation
This transformation will find <style> tag with a "data-mw-deduplicate"
attribute. For each value of the attribute, the first instance will be
kept as-is, while any subsequent tags with the same value will be
replaced by a <link rel="mw-deduplicated-inline-style"> with its href
referring to the "data-mw-deduplicate" value using a custom scheme.

This also adds an $attribs parameter to Html::inlineStyle() so the
data-mw-deduplicate attribute can be added.

Note this doesn't actually depend on Ib931e25c, but action=mobileview
will break if it starts being used without that patch.

Bug: T160563
Change-Id: I055abdf4d73ec65771eaa4fe0999ec907c831568
Depends-On: Ib931e25ce85072000e62c486bbe5907f03372494
2018-02-11 05:55:56 +00:00
jenkins-bot
4a56e5eb44 Merge "Add a few more @covers to ParserIntegrationTest" 2018-02-10 00:30:06 +00:00
jenkins-bot
8fd5a99f4e Merge "Fix ParserOutput::getText 'unwrap' flag for end-of-doc comment" 2018-02-09 22:34:46 +00:00
Gergő Tisza
5859eff1b0
Fix ParserOutput::getText 'unwrap' flag for end-of-doc comment
The closing div might be followed by debug information in HTML comments.

Bug: T186927
Change-Id: I72d4079dfe9ca9b3a14476ace1364eb5efdee846
2018-02-09 14:17:22 -08:00
Kunal Mehta
6c4d32e394 Add a few more @covers to ParserIntegrationTest
Change-Id: I638077683d9d3cb40f81b4f4d594f16c349f1519
2018-02-09 11:19:03 -08:00
Kunal Mehta
3e16cc20e8 Add @covers tags for parser tests
Change-Id: I24d3550a07be7a5699475047eb78806f36caec2e
2018-02-07 22:18:43 -08:00
Kunal Mehta
48cfdf7163 Add @covers for BlockLevelPass
This class was split out of Parser, so it now needs separate covers tags.

Change-Id: I06c4a6a4fac3d6ff13924e3ca45ee134f7eeab20
2018-02-05 21:21:37 -08:00
Kunal Mehta
2ab7ae9d24 Add @covers for RemexStripTagHandler
This internal class is only used by Sanitizer::stripAllTags().

Change-Id: Ib913ee14524539216305da7e3183c07ab7d72cb5
2018-02-05 21:15:52 -08:00
jenkins-bot
78a9d810ab Merge "Remove wrapclass from parser cache key" 2018-02-02 18:41:02 +00:00
jenkins-bot
74426f3cf7 Merge "Warn if stateful ParserOutput transforms are used" 2018-02-02 02:45:27 +00:00
Brad Jorsch
8b0e7298ac
Remove wrapclass from parser cache key
This will result in an exception from WikiPage::getParserOutput() if
anything was missed.

This also hard-deprecates ParserOptions::setWrapOutputClass( false )

Bug: T181846
Change-Id: Ica541e1f6b52f5eec6d28cff60ba64bf525258c7
Depends-On: Ie5d6c5ce34c05b8fe2353d3bb36b2a3a4166ec4b
Depends-On: Ibfaefde2f3811151ec712554cbc9cf2415ed017f
Depends-On: I55048bbae5d4d2d0c79c241c1784448b82db3bb4
Depends-On: I23a26ba0dfbe83007cd40e97d71a2139a5ecddc7
Depends-On: Ibc013a41f4a463f4014fbbce7ce27f8690161728
Depends-On: Ie936dff918dc0869503a924298b4580402038b52
2018-02-01 14:26:03 -08:00
Brad Jorsch
d511626236
Add 'unwrap' ParserOutput post-cache transform
And deprecate passing false for ParserOptions::setWrapOutputClass().

There are three cases for the Parser wrapper: the default
mw-parser-output, a custom wrapper, or no wrapper. As things currently
stand, we have to fragment the parser cache on each of these options,
which uses a nontrival amount of storage space (T167784).

Ideally we'd do all the wrapping as a post-cache transform, but
TemplateStyles needs to know the wrapper in use in order to properly
prefix its CSS rules (that's why we added the wrapper in the first
place). So, second best option is to make *un*wrapping be a post-cache
transform and make "custom wrapper" be uncacheable.

This patch does the first bit (unwrapping as a post-cache transform),
and a followup will do the second part once the deprecation process is
satisfied.

Bug: T181846
Change-Id: Iba16e78c41be992467101e7d83e9c3134765b101
2018-02-01 14:24:27 -08:00
Umherirrender
45da581551 Use ::class to resolve class names in tests
This helps to find renamed or misspelled classes earlier.
Phan will check the class names

Change-Id: Ie541a7baae10ab6f5c13f95ac2ff6598b8f8950c
2018-01-26 22:49:13 +01:00
Brian Wolff
5fd1e1abe0 Make Gender normalize usernames
This ensures that if GENDER is fed wfEscapeWikitext()'d version
of a username, it will normalize it.

See discussion on T182800.

Note, we do not need to worry about the case of a user named
"Project:*foo" as such namespace prefixes are illegal in
usernames.

Change-Id: Ic5a8fc76c28dca43ce8e334ef1874c2673433f00
2018-01-22 23:12:01 +00:00
Umherirrender
255d76f2a1 build: Updating mediawiki/mediawiki-codesniffer to 15.0.0
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable

For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore

Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
2018-01-01 14:10:16 +01:00
Kunal Mehta
75160bdd3b Use MediaWikiCoversValidator for tests that don't use MediaWikiTestCase
Change-Id: I8c4de7e9c72c9969088666007b54c6fd23f6cc13
2018-01-01 08:28:02 +00:00
Kunal Mehta
61e2c04e4e Add @covers tags to miscellaneous tests (#2)
Change-Id: I9116598bee4f4917e02290d273644c13475ff721
2017-12-28 08:52:48 +00:00
Kunal Mehta
546980e537 Add @covers tags to parser tests
Change-Id: I7bce04bef5e981fd203ad819882482e72ca3f61b
2017-12-24 23:29:00 -08:00
Umherirrender
29323f5622 Fix test class names to match convention
The test class should have Test at end
and same name as the testing class

Change-Id: Id0c90994d257fb325834e123b462f7f0849ac556
2017-12-10 11:41:59 +01:00
Brad Jorsch
c8e4823714 Warn if stateful ParserOutput transforms are used
This should help clean up any missed uses.

Change-Id: I371f3b27d245d6927c74ea52f1df9fd5c675b94a
Depends-On: I30f162aa43c7f513df1153e0884a4339e4279aeb
Depends-On: Iff28b00638c15de7307a130196bbb91cda91c3d1
Depends-On: I432da8c0686c279b3c2e770f7f9e20248589d6db
Depends-On: I404f064b93573e80b61a228e3cf2b5d2add65c39
Depends-On: Ia54a9e3d11c9ab28975947148d0841819f3a8e3c
Depends-On: I2cd7519186f2319f32cf6288655ddb873337a638
Depends-On: I28b46cf4da66cc6e1f04045939a243faa30bc9bf
Depends-On: I3565868af824a08235ab5ce4a34145895ed0e74d
Depends-On: I0d05ce2f565778a4bf39d3d25d26acd0b8043788
Depends-On: I100fae755ae7e729d11163377fbddaebeaa020a6
Depends-On: I38e56d04f7ffbe8796dbda6500106a028a459980
Depends-On: Id6ad08a0b1f8575e7ee98916217a84c09e72dd3b
Depends-On: I2794430f6bc076f073e79e662701403f7e063c35
Depends-On: I39b599246759baad2164a29244150c99f0920684
Depends-On: Ic7aa606b7d697e06c74c1e9207efc77442f5b0c3
Depends-On: I140ff32373430b61b92226689ef9b58cca317450
Depends-On: I9b082e37f19c8baa182b0583c7d70d692fafc16e
2017-11-30 14:27:49 -05:00
Brad Jorsch
92cf49df5c ParserOutput: Add stateless transforms to getText()
The stateful transforms are deprecated.

Inspired by Krinkle's If2fb32fc.

Bug: T171797
Change-Id: Ied5fe1a6159c2d4fa48170042b44d735ce7b6f9b
2017-11-30 14:27:46 -05:00
Roan Kattouw
ddb4913f53 Use Remex in Sanitizer::stripAllTags()
Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:

    > print Sanitizer::stripAllTags( '<p data-foo="a&lt;b>c">Hello</p>' );
    c">Hello

We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:

    > print strip_tags('1<span class="<?php">2</span>3');
    1
    > print strip_tags('1<span class="<?">2</span>3');
    1

Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
2017-11-15 17:31:31 -08:00
Roan Kattouw
7980e38a84 Move Sanitizer.php to includes/parser/
Change-Id: Id08d91c747ec77d715459b89b03eee247ccd4e1b
2017-11-15 15:16:41 -08:00
Brad Jorsch
84694a9d59 Remove ParserOptions::legacyOptions() and cleanup related code
ParserOptions::legacyOptions() has been sitting around since 1.17.
Originally it seems to have been intended as a way to avoid a mass cache
invalidation (similar to optionsHashPre30() from I7fb9ffca9). That code
was mostly removed in 1.23, but legacyOptions() was left behind because
it was also being used in a few places as "all cache-varying options"
(despite it not being documented for that purpose) where we'd rather
have any key than no key at all.

This patch creates an actual ParserOptions::allCacheVaryingOptions()
method for those use cases and deprecates the long-obsolete
legacyOptions().

It also makes more explicit the use of the "all cache-varying options"
fallback in ParserCache::getKey(), and doesn't bother trying to use that
fallback in ParserCache::get() where it no longer makes sense.

Change-Id: Ife1e54744155136a570210c03fe907f18f8e8ece
2017-07-04 01:28:57 +00:00
Brad Jorsch
27fd0920a1 Remove ParserOptions::optionsHashPre30()
The pre-1.30 version of ParserOptions::optionsHash() was kept
temporarily as ParserOptions::optionsHashPre30() to prevent a cache
stampede on WMF sites when the hash format was changed in I7fb9ffca9.

Now that the cache has been rebuilt, it's no longer needed and we should
clean it up instead of leaving it forever to bitrot.

Change-Id: I037d8dfdefe72a295547bd331bc1454e69cb418d
2017-06-28 00:18:59 +00:00
Brad Jorsch
46c0c39514 ParserOptions: Fix handling of 'editsection'
The handling of the 'editsection' option prior to I7fb9ffca9 was
unusual: it was included in the cache key, but the getter didn't ever
flag it as "used". This was overlooked in I7fb9ffca9.

This fixes the handling to restore that behavior. It's no longer
considered to be a real parser option, so changing it won't make
isSafeToCache() fail while reading it won't flag it as 'used'.

But to keep Wikibase working (see T85252), if 'editsection' is supplied
in $forOptions optionsHash() will still include it in the hash so
whatever Wikibase is doing by forcing that doesn't break. The hash when
it is included is the same as was used in I7fb9ffca9 to reuse keys.

Once optionsHashPre30() is removed, Wikibase should be changed to use
some other method to fix T85252 so we can remove that hack from
optionsHash().

Change-Id: I77b5519c5a1122a1fafbfc523b77b2268c0efeb1
2017-06-14 04:52:36 +00:00
Brad Jorsch
0facbe3e3d Try harder to avoid parser cache pollution
* ParserOptions is reorganized so it knows all the options and their
  defaults, and can report whether the non-key options are at their
  defaults.
* Definition of the "canonical" ParserOptions (which is unfortunately
  different from the "default" ParserOptions) is moved from
  ContentHandler to ParserOptions.
* WikiPage uses this to throw an exception if it's asked to cache
  with options that aren't used in the cache key.
* ParserCache gets some temporary code to try to avoid a massive cache
  stampede on upgrade.

Bug: T110269
Change-Id: I7fb9ffca96e6bd04db44d2d5f2509ec96ad9371f
Depends-On: I4070a8f51927121f690469716625db4a1064dea5
2017-06-05 14:17:28 +00:00
Brad Jorsch
537d74d3a1 Add tests for ParserOptions
Change-Id: I3e2d945d109bbb0ebc31d65d9f6faaa7482deefe
2017-06-02 23:10:21 +00:00
Brad Jorsch
1aac0a2992 Wrap parser output in <div class="mw-parser-output">
This will allow CSS to target just the parser output, without also
accidentally targeting the edit form, diff tables, and so on.

Bug: T37247
Change-Id: If4eb5bf71f94fa366ec4eddb6964e8f4df6b824a
Depends-On: I330c6aa4aaee045614b1801ed34bc9e03be69650
Depends-On: I52a518fa44e017841fe78474012cd69823e0a41d
2017-05-08 05:32:03 +00:00