When setting the following on PHP 7, the produced error message
did not make sense (references something about HHVM).
> $wgValidateAllHtml = true
> $wgTidyConfig = ['driver' => 'RemexHtml'];
Change-Id: I5f14505639a79aca66f570a9a00c38cdea0cc1ba
A convenient factory function to eliminate code duplication in
ParserMigration's MigrationEditPage::tidyParserOutput().
Change-Id: I058912885025e7a9402912236c65c44e32ef036e
We originally imagined rolling out the display of empty elements
simultaneously with the Html5Depurate, but now we have added support for
marking empty elements to Html5Depurate and plan on having some sort of
longer migration period. So, move the relevant CSS to content.css, and
remove the concept of CSS dependant on tidy driver.
Add a body class which will allow the effect to be toggled in a gadget or
extension. Actual toggling in the CSS will be in the stage 2 patch, to be
deployed after the varnish cache and parser cache have expired.
I originally imagined that there would be a gadget that overrides the
rule with an !important selector, but that method does not allow you to
recover the original display property, which is often overridden by the
style attribute or site CSS to be "inline".
Also, in RaggettWrapper, switch to the new class mw-empty-elt, following
Html5Depurate, instead of mw-empty-li. The old class will be removed in
the stage 2 patch.
Change-Id: Ic0f432c43a006629ca5a1a7c2dda3552ceb4dc4f
* Have TidySupport provide $wgTidyConfig instead of the legacy globals
* Add --use-tidy-config option to parserTests.php. This tells
TidySupport to use the tidy configuration from LocalSettings.php
instead of the traditional safe defaults.
* Add a way for TidySupport to disable tidy via $wgTidyConfig, using
driver=>disabled
Change-Id: Ie76e68e2d5238d0a1aef49a1a815c0d1cd8bfdae
This is an HTML5-compliant parse/serialize tidy implementation, with
well-delineated hacks to support the <p>-wrapping done by legacy tidy.
Change-Id: I4fd433fd6f1847061b0bf4b3e249c918720d4fae
It is desirable in terms of user-friendly syntax to display an empty
list item if the user adds one to the source. However, we suspect that
this change will break the rendering of existing templates. So, preserve
the empty <li> element, but style it with display:none so that there is
no user-visible change. Changes can then be observed with a user script,
then eventually the CSS can be removed so that the desired behaviour will
be user visible.
This is imagined as a staged deployment of T89331, i.e. it is better to
resolve differences with Html5Depurate one at a time instead of
deploying it all at once.
The CSS module is specified in parser/MWTidy.php since the tidy driver
hierarchy is not meant to be so closely tied to the MW environment.
Bug: T49673
Change-Id: Ifb44b782c617240e3de73dcdf76c8737c7307d94
- Removed space after cast
- Removed spaces in array index
- Removed double spaces
- Added spaces around string concat
- Fixed mixed tabs and spaces at begin of line
Change-Id: I38e849723f055d2d4c05cba72f5c245a28e8d5da
* Split tidy implementations into a class hierarchy
* Bring all tidy configuration into a single associative array and
deprecate the old configuration.
* Remove $wgAlwaysUseTidy
This is preparatory to replacement of Tidy (T89331). I used the name
"Raggett" for things relating to Dave Raggett's Tidy, since if we use
"tidy" to mean the new abstract system as well as Raggett's tidy, it
gets confusing.
Change-Id: I77af1a16cbbb47fc226d05fb9aad56c58e8910b5
Generating one-time, unique strip markers hurts us in multiple ways:
* The strip marker regexes don't benefit from JIT compilation, so they are
slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
compiled, because HHVM bets on regexes getting reused. This extra work is
fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
displaces from the cache regexes which are in fact reused.
Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:
* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
replace any occurences of \x7f with '?', to prevent strip marker forgery.
\x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
prefix may no longer be specified.
Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
This is broken, for reasons indicated in
<https://gerrit.wikimedia.org/r/#/c/180384/>. It was broken before, but I made
it more broken. So revert for now, and I'll give this another stab.
Change-Id: I7e67a61f7d6370f90487be6470bebe1449432a4c
* Make the internal MWTidy::*clean() functions always return an array of two
elements: the output buffer and the error buffer.
* Make MWTidy::externalTidy() always read both stdout and stderr. We can read
stderr after stdout because tidy.c produces output in the same order.
* Remove the $stderr parameter from the private MWTidy::*clean() methods, since
error output is always returned.
* Merge MWTidy::phpClean and MWTidy::hhvmClean, since the difference between
them is now small enough that splitting them up is not warranted.
* On HHVM, MWTidy::internalTidy() always returns an empty string for the error
buffer.
Change-Id: I178b42d6ebdd1a5b9bd5921eb093a6c5014ffa49
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file
Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
EZC doesn't currently support direct access to object properties via the
obj->std.properties hashtable, but tidy uses this extensively. But it
turns out that for production use cases, tidy_repair_string() should be
sufficient. $wgDebugTidy and $wgValidateAllHtml are not used, and
no deployed extension calls MWTidy::checkErrors().
The only difference I know of is that errors from tidy (status==2) lead
to the tidy output being used, rather than discarded. But
TY_(ReportFatal) has very few callers in tidylib -- probably none that
are reachable from stripped parser output.
So, throw an exception if MWTidy::checkErrors() is requested on an HHVM
instance with the tidy extension. For MWTidy::tidy(), use
tidy_repair_string(). Refactor some relevant code.
Bug: T758
Change-Id: I8d5b1c2c9f9ddce46d8ad099a671a2e297d256e0
MediaWiki installations that use the setting
$wgUseTidy = true; are unable to output
MathML since the well defined MathML elements
are filtered out by Tidy. This was reported as
http://sourceforge.net/p/tidy/patches/84/ .
This change hides MathML blocks from
Tidy.
Bug: 66516
Change-Id: Ib48b91238c3eddd6a86b62f6ce57801d7058f0d8