Commit graph

26 commits

Author SHA1 Message Date
James D. Forrester
0da97e7a03 Immediately drop wgValidateAllHtml and related code
Bug: T191670
Change-Id: If13d02ee1b30fec1c701226af9d363c6e08b3737
2018-04-10 10:51:28 -07:00
Timo Tijhof
b65683a088 parser: Update MWTidy::checkErrors() error message
When setting the following on PHP 7, the produced error message
did not make sense (references something about HHVM).

 > $wgValidateAllHtml = true
 > $wgTidyConfig = ['driver' => 'RemexHtml'];

Change-Id: I5f14505639a79aca66f570a9a00c38cdea0cc1ba
2018-03-21 03:56:05 +00:00
Tim Starling
448be2ed3e Add benchmarkTidy.php, to benchmark tidy drivers
Plus representative input file

Change-Id: I254793fc55c57a98c07ae1e4c27e6005965c9a20
2017-04-21 01:02:22 +00:00
Tim Starling
50fe941457 Add RemexHtml to the list of available Tidy drivers
Change-Id: I5a87a6ed24ca3ef7c5fdb21e74f9eb410bf74b4c
2017-03-09 10:19:23 +11:00
Reedy
c6fc119c0a Add/update doc blocks for MWTidy
Change-Id: I0b87e119048fd993f8bfda25a6c6b744d59804d1
2016-07-29 01:24:34 +01:00
Tim Starling
7a5fbec82d Add MWTidy::factory()
A convenient factory function to eliminate code duplication in
ParserMigration's MigrationEditPage::tidyParserOutput().

Change-Id: I058912885025e7a9402912236c65c44e32ef036e
2016-07-26 15:12:55 +10:00
Tim Starling
d3d682fb45 Hide marked empty elements by default (stage 1)
We originally imagined rolling out the display of empty elements
simultaneously with the Html5Depurate, but now we have added support for
marking empty elements to Html5Depurate and plan on having some sort of
longer migration period. So, move the relevant CSS to content.css, and
remove the concept of CSS dependant on tidy driver.

Add a body class which will allow the effect to be toggled in a gadget or
extension. Actual toggling in the CSS will be in the stage 2 patch, to be
deployed after the varnish cache and parser cache have expired.

I originally imagined that there would be a gadget that overrides the
rule with an !important selector, but that method does not allow you to
recover the original display property, which is often overridden by the
style attribute or site CSS to be "inline".

Also, in RaggettWrapper, switch to the new class mw-empty-elt, following
Html5Depurate, instead of mw-empty-li. The old class will be removed in
the stage 2 patch.

Change-Id: Ic0f432c43a006629ca5a1a7c2dda3552ceb4dc4f
2016-07-14 14:24:27 -07:00
Tim Starling
8a57d86ea7 Rewrite TidySupport and add option --use-tidy-config
* Have TidySupport provide $wgTidyConfig instead of the legacy globals
* Add --use-tidy-config option to parserTests.php. This tells
  TidySupport to use the tidy configuration from LocalSettings.php
  instead of the traditional safe defaults.
* Add a way for TidySupport to disable tidy via $wgTidyConfig, using
  driver=>disabled

Change-Id: Ie76e68e2d5238d0a1aef49a1a815c0d1cd8bfdae
2016-07-12 14:25:03 -07:00
C. Scott Ananian
ce081a3d7b Hook up Balancer as a Tidy implementation.
This is an HTML5-compliant parse/serialize tidy implementation, with
well-delineated hacks to support the <p>-wrapping done by legacy tidy.

Change-Id: I4fd433fd6f1847061b0bf4b3e249c918720d4fae
2016-07-12 14:18:04 +10:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Tim Starling
eb40eb0f18 Client-side migration for empty li preservation
It is desirable in terms of user-friendly syntax to display an empty
list item if the user adds one to the source. However, we suspect that
this change will break the rendering of existing templates. So, preserve
the empty <li> element, but style it with display:none so that there is
no user-visible change. Changes can then be observed with a user script,
then eventually the CSS can be removed so that the desired behaviour will
be user visible.

This is imagined as a staged deployment of T89331, i.e. it is better to
resolve differences with Html5Depurate one at a time instead of
deploying it all at once.

The CSS module is specified in parser/MWTidy.php since the tidy driver
hierarchy is not meant to be so closely tied to the MW environment.

Bug: T49673
Change-Id: Ifb44b782c617240e3de73dcdf76c8737c7307d94
2015-10-28 23:35:18 +00:00
umherirrender
c572d18661 Fixed spacing
- Removed space after cast
- Removed spaces in array index
- Removed double spaces
- Added spaces around string concat
- Fixed mixed tabs and spaces at begin of line

Change-Id: I38e849723f055d2d4c05cba72f5c245a28e8d5da
2015-09-26 20:44:54 +00:00
Tim Starling
e9d523b9bd Add Html5Depurate tidy driver
Also document input format for MWTidy::tidy().

Change-Id: I77071d3db0524695c2baf9a4670ca2455438c83d
2015-09-11 03:32:32 +00:00
Tim Starling
2c6c954e23 Abstract and refactor Tidy support
* Split tidy implementations into a class hierarchy
* Bring all tidy configuration into a single associative array and
  deprecate the old configuration.
* Remove $wgAlwaysUseTidy

This is preparatory to replacement of Tidy (T89331). I used the name
"Raggett" for things relating to Dave Raggett's Tidy, since if we use
"tidy" to mean the new abstract system as well as Raggett's tidy, it
gets confusing.

Change-Id: I77af1a16cbbb47fc226d05fb9aad56c58e8910b5
2015-09-10 20:18:52 -07:00
Ori Livneh
12571bde26 Use a fixed marker prefix string in the Parser and MWTidy
Generating one-time, unique strip markers hurts us in multiple ways:

* The strip marker regexes don't benefit from JIT compilation, so they are
  slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
  compiled, because HHVM bets on regexes getting reused. This extra work is
  fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
  displaces from the cache regexes which are in fact reused.

Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:

* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
  complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
  replace any occurences of \x7f with '?', to prevent strip marker forgery.
  \x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
  prefix may no longer be specified.

Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
2015-05-31 19:33:36 -07:00
Chad Horohoe
aa21e125a3 Remove obvious function-level profiling
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.

Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.

Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
2015-01-07 11:14:24 -08:00
Reedy
4d9143c7f5 Add lots of @throws
Change-Id: I09d0c13070f966fcf23d2638d8fc1328279a5995
2014-12-24 13:49:20 +00:00
Ori Livneh
6138e86945 Revert "Simplify MWTidy"
This is broken, for reasons indicated in
<https://gerrit.wikimedia.org/r/#/c/180384/>. It was broken before, but I made
it more broken. So revert for now, and I'll give this another stab.

Change-Id: I7e67a61f7d6370f90487be6470bebe1449432a4c
2014-12-18 14:58:18 -08:00
Aaron Schulz
6a1d9c8ddc Fixed internalClean class/method existence check for HHVM
* Follows up 4f281083fd

Change-Id: I5fa406ed1c4f2eefd1c22e9ab90e72655f31d162
2014-12-10 19:04:58 +00:00
Bryan Davis
4f281083fd hhvm: Check for tidy function instead of class
Bug: T78166
Change-Id: Ie60e23ffbafd698a3458eed1efce92d54c8d0c2a
2014-12-10 11:08:18 -07:00
Ori Livneh
98c2703f81 Simplify MWTidy
* Make the internal MWTidy::*clean() functions always return an array of two
  elements: the output buffer and the error buffer.
* Make MWTidy::externalTidy() always read both stdout and stderr. We can read
  stderr after stdout because tidy.c produces output in the same order.
* Remove the $stderr parameter from the private MWTidy::*clean() methods, since
  error output is always returned.
* Merge MWTidy::phpClean and MWTidy::hhvmClean, since the difference between
  them is now small enough that splitting them up is not warranted.
* On HHVM, MWTidy::internalTidy() always returns an empty string for the error
  buffer.

Change-Id: I178b42d6ebdd1a5b9bd5921eb093a6c5014ffa49
2014-12-09 16:43:08 -08:00
umherirrender
489d793882 Fixed spacing
- Added/removed spaces around parenthesis
- Added newline in empty blocks
- Added space after switch/foreach/function
- Use tabs at begin of line
- Add newline at end of file

Change-Id: I244cdb2c333489e1020931bf4ac5266a87439f0d
2014-12-05 22:28:07 +01:00
Tim Starling
e6fdbfec47 Use HHVM+EZC internal tidy
EZC doesn't currently support direct access to object properties via the
obj->std.properties hashtable, but tidy uses this extensively. But it
turns out that for production use cases, tidy_repair_string() should be
sufficient. $wgDebugTidy and $wgValidateAllHtml are not used, and
no deployed extension calls MWTidy::checkErrors().

The only difference I know of is that errors from tidy (status==2) lead
to the tidy output being used, rather than discarded. But
TY_(ReportFatal) has very few callers in tidylib -- probably none that
are reachable from stripped parser output.

So, throw an exception if MWTidy::checkErrors() is requested on an HHVM
instance with the tidy extension. For MWTidy::tidy(), use
tidy_repair_string(). Refactor some relevant code.

Bug: T758
Change-Id: I8d5b1c2c9f9ddce46d8ad099a671a2e297d256e0
2014-11-28 09:47:25 +11:00
physikerwelt
6a68fad159 Protect MathML from Tidy
MediaWiki installations that use the setting
$wgUseTidy = true; are unable to output
MathML since the well defined MathML elements
are filtered out by Tidy. This was reported as
  http://sourceforge.net/p/tidy/patches/84/ .

This change hides MathML blocks from
Tidy.

Bug: 66516
Change-Id: Ib48b91238c3eddd6a86b62f6ce57801d7058f0d8
2014-08-22 12:21:06 -04:00
addshore
61c989cfc0 Fix phpcs issues in parser
This fixes all issues except for:
 - class names
 - line length

Change-Id: Ie91b010d5b3eec49d3b80b6e93b125a901ef43c6
2014-08-12 01:00:15 +00:00
Timo Tijhof
8ad5719e2c Rename MWNamespace, MWDebug and MWTidy files to match their class
Change-Id: I3e6d13ce366861c865401dde272bc2834a1de670
2014-07-15 20:59:39 +00:00
Renamed from includes/parser/Tidy.php (Browse further)