Commit graph

70 commits

Author SHA1 Message Date
C. Scott Ananian
6db35b3c98 Remove most support for configuring Tidy, including Raggett
Remex is pure PHP so there is no reason to use an external tidy any
more. Configuration variables and implementation classes were
deprecated in 1.32 or earlier.  We've kept only $wgTidyConfig
which can be used for experimental features or debugging Remex.

Bug: T198214
Change-Id: I99d48f858d97b6e1d1e6cd76a42c960cc2c61f9f
2018-11-15 12:22:06 -05:00
C. Scott Ananian
a11a6f619f Hard deprecate non-Remex tidy modes
Let's rip the band-aid off.  Remex is pure PHP so there's no reason to
be running any of the other tidy implementations any more, and we won't
be able to support them in the future.

Follow-up to 7b23382823.

Bug: T198214
Change-Id: Id3d07d44f8434231826e86e623554cac3decfa96
2018-09-21 09:48:38 -04:00
C. Scott Ananian
7b23382823 Soft deprecate non-Remex tidy configurations
Future parsers will not be able to emit output compatible with these
configurations.

Bug: T198214
Change-Id: Id7921a166a62457f289e6c0c4bba6c8563be4760
2018-09-20 15:10:44 +00:00
Tim Starling
690bc4cb6a RemexDriver: improved tracing
Use the new RemexHtml trace features. Add two more tracing modes.

Fix missing member variable declarations and remove unused local
variables.

Change-Id: I512462e1019f9a466684abfa4aab7697b324d5b1
2018-08-14 13:40:11 -07:00
Tim Starling
10c8cfea30 RemexCompatMunger: Don't call endTag() in case B/b
This was naïve, the linked bug documents a case where endTag() was
called despite children of the p-wrap still being in TreeBuilder's
stack. Instead, wait for the parent of the p-wrap to have endTag()
called on it, I've submitted a patch which will clean up the node in
that case.

Bug: T200827
Change-Id: I34694813eace9cadabf2db8f9ccca83d1368cfad
2018-08-07 14:07:31 +10:00
Arlo Breault
5a7f860b78 <ins>/<del> elements can be phrasing or flow
The changes to the parserTests.txt highlight the differing opinions that
doBlockLevels and Remex had on whether these should be paragraph wrapped.

Since the only time they wouldn't have been was when found on a line
with other flow tags, this likely isn't a behaviour that was depended on
in practice.  And, indeed, the task describes this as a bug.

A sampling of pages from an insource:/\<(ins|del)\>/ search on wiki bears
this out.

Bug: T17491
Change-Id: I311da777a63aa3c45013f2cfc090be35a022497e
2018-07-13 11:28:10 -04:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
Bartosz Dziewoński
0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
Kunal Mehta
853b8fe34c tidy: Remove obsolete Depurate and Balancer drivers
The Html5Depurate driver was intended to be used with an external Java
service, but it never gained traction due to deployment concerns.

The Html5Internal (Balancer) driver was originally intended for use with
the balanced templates proposal and could also handle tidying. But it was
tightly coupled to MediaWiki, so part of it was used as the basis of the
RemexHtml library. Remex most likely can also implement the balanced
templates proposal, so there isn't any reason to keep the Balancer code
around anymore,

Change-Id: I8542d69e9cdbf0e2fb7ebbb919933a64c1b8c293
2018-05-08 15:32:49 +00:00
Umherirrender
95ebece410 Add missing use statement
Change-Id: Id14d97b5b74edf6c6bafb29b643ac9b9357bb681
2018-04-27 23:13:43 +02:00
jenkins-bot
4e7673c5b0 Merge "Immediately drop wgValidateAllHtml and related code" 2018-04-12 05:29:53 +00:00
James D. Forrester
0da97e7a03 Immediately drop wgValidateAllHtml and related code
Bug: T191670
Change-Id: If13d02ee1b30fec1c701226af9d363c6e08b3737
2018-04-10 10:51:28 -07:00
Arlo Breault
25a08cc5f9 Munge inline elements found in tidy.conf as well
Bug: T184900
Bug: T184228
Change-Id: I421c4c7cf1eeeb6c44bb64081b49ae05937d1a8b
2018-04-04 20:20:38 -04:00
Fomafix
d59af4c341 Use PHP's implode() with the suggested order of arguments
https://secure.php.net/manual/en/function.implode.php defines the order
of arguments as

 string implode ( string $glue , array $pieces )
 string implode ( array $pieces )

Note:
  implode() can, for historical reasons, accept its parameters in
  either order. For consistency with explode(), however, it may be less
  confusing to use the documented order of arguments.

Change-Id: I03bf5712204e283f52d3ede54af9b9ec117d4280
2018-02-22 20:24:00 +01:00
Thiemo Mättig
409da2d8b3 Remove leading backslashes from "use \…" tags
Change-Id: I494b029de089a07e3b946ee78293a12d5036f63e
2017-12-28 16:30:05 +01:00
Huji Lee
e74bfe13f6 Require indentation of CASE statements in PHP code
Bug: T182546
Change-Id: I91a9555893a08e4ec58da97c6cc4d1e70000ff6b
2017-12-10 22:07:50 -05:00
Tim Starling
324e4bca4f Fix RemexCompatMunger infinite recursion
When TreeBuilder requests reparenting of all child nodes of a given
element, we do this by removing the existing child nodes, and then
inserting the proposed new parent under the old parent. However, when a
p-wrap diversion is in place, the insertion of the new parent is
diverted into the p-wrap, and the p-wrap then becomes a child of the new
parent, causing a reference loop, and ultimately infinite recursion in
Serializer.

Instead, divert the entire reparent request to the p-wrap, so that the
new parent is a child of the p-wrap. This makes sense since the new
parent is always a formatting element. The only caller of
reparentChildren(), apart from proxies, is AAA step 17, which reparents
children under the formatting element cloned from the AFE list.

Left in some debug code for next time.

Bug: T178632
Change-Id: Id77d21d99748e94c064ef24c43ee0033de627b8e
2017-11-17 23:27:14 +11:00
Umherirrender
9aa56950c2 Remove @codingStandardsIgnore from long lines
Breaks some line where the ignore is not needed.

The sniff was changed upstream to be okay
with long unbreakable lines in comments

Change-Id: I2bbe2be7cedd4d3c0ce8dc3e62d0e268bc171876
2017-10-22 16:44:04 +02:00
Umherirrender
f739a8f368 Improve some parameter docs
Add missing @return and @param to function docs and fixed some @param

Change-Id: I810727961057cfdcc274428b239af5975c57468d
2017-09-10 20:32:31 +02:00
Umherirrender
3f1a52805e Use short type bool/int in param documentation
Enable the phpcs sniffs for this and used phpcbf

Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925
2017-08-20 13:20:59 +02:00
WMDE-Fisch
6df9ed1ad6 update mediawiki-codesniffer to 0.11.0 and fix issues
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case  passing by reference

Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
2017-08-11 22:27:51 +02:00
Kunal Mehta
d1cf48a397 build: Update mediawiki/mediawiki-codesniffer to 0.10.1
And auto-fix all errors.

The `<exclude-pattern>` stanzas are now included in the default ruleset
and don't need to be repeated.

Change-Id: I928af549dc88ac2c6cb82058f64c7c7f3111598a
2017-07-22 18:24:09 -07:00
Thiemo Mättig
91a920fd85 Remove auto-generated "Constructor" documentation on constructors
Having such comments is worse than not having them. They add zero
information. But you must read the text to understand there is
nothing you don't already know from the class and the method name.

This is similar to I994d11e. Even more trivial, because this here is
about comments that don't say anything but "constructor".

Change-Id: I474dcdb5997bea3aafd11c0760ee072dfaff124c
2017-07-21 12:19:30 +02:00
addshore
fdb2279e54 TidyDriverBase::validate throws an exception
Change-Id: I05e31c757ed92323ff905d993ac4d030b8aba1da
2017-06-30 14:10:54 +01:00
Umherirrender
be42e09aa8 build: Prepare for mediawiki/mediawiki-codesniffer to 0.9.0
The used phpcs has a bug, so the version 0.9.0 could not be enforced at the moment.
Will be fixed in next version, see T167168

Changed:
- Remove duplicate newline at end of file
- Add space between function and ( for closures
- and -> &&, or -> ||

Change-Id: I4172fb08861729bccd55aecbd07e029e2638d311
2017-06-26 17:14:31 +00:00
Brad Jorsch
83b798bbab Hide <style> tags from Tidy
Some versions of html-tidy (e.g. the one currently in use on WMF wikis)
will try to move all <style> tags in the body into the head, effectively
removing them for our purposes. We need to avoid that for TemplateStyles.

Bug: T167349
Change-Id: I133776d16f366cad73ed30af0e5a665fdf9f5ed9
2017-06-13 13:02:57 -04:00
Thiemo Mättig
e16191caa3 Remove unused and unnecessary imports
Change-Id: I26e623a4e4ba965c07670369a90c8a95185ea1e4
2017-06-12 15:50:43 +00:00
Tim Starling
251b25d700 RemexCompatMunger: fix a couple of memory leaks
Change-Id: I47578b3f73320e84a157417c288de97b5d26e18f
2017-03-23 02:32:52 +00:00
jenkins-bot
82524dc4da Merge "RemexHtml tidy driver with p-wrapping" 2017-03-08 15:24:36 +00:00
Tim Starling
9341a00ed1 RemexHtml tidy driver with p-wrapping
Pull in the RemexHtml library, which is an HTML 5 library I recently
created.

RemexCompatMunger mutates the event stream, inserting <mw:p-wrap>
elements where necessary, and occasionally taking even more invasive
action such as reparenting and removing nodes maintained in Serializer's
tree.

RemexCompatFormatter produces a MediaWiki-style serialization which is
relatively compatible with existing parser tests. It also does final
empty element handling, including translating <mw:p-wrap> to <p>

Tests are imported from both Html5Depurate and Subbu's pwrap.js.

Depends-On: I864f31d9afdffdde49bfd39f07a0fb7f4df5c5d9
Change-Id: I900155b7dd199b0ae2a3b9cdb6db5136fc4f35a8
2017-03-08 16:54:13 +11:00
Timo Tijhof
3a2a707546 Clean up remaining get_class() uses
* get_class()        -> __CLASS__ (same as self::class)
* get_called_class() -> static::class
* get_class($this)   -> static::class

Change-Id: I1888a1897ecf4548a2e5a67a942e5c080dd7e3d3
2017-03-07 22:03:47 +00:00
Bartosz Dziewoński
ecdef925bb Miscellaneous indentation tweaks
I was bored. What? Don't look at me that way.

I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.

Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
2017-02-27 19:23:54 +01:00
C. Scott Ananian
265f2b40dd Update Balancer to latest HTML5 spec
This corresponds to the 1.0.27 release of domino, and matches the
latest HTML5 spec as of 2016-10-18.

Changes include:
* <menuitem> is no longer an empty element.
* <isindex> has been removed.
* Updated html5lib-tests (copied from domino 1.0.27).
* Round-trip-safe serialization of <pre>/<listing>/<textarea> is only
  used when "tidy compatibility" mode is enabled; the behavior in
  the HTML5 spec no longer cleanly round trips.

Change-Id: I656944b0d7bb6c3c0e4fe44fc6ebd1a4c36412ad
2017-01-24 05:44:05 +00:00
Erik Bernhardson
e5b8bf4942 Un-blacklist PhanUndeclaredVariable
Undeclared variables are a very common error type that we want to catch
as often as possible. To avoid needing to refactor a variety of global
level code (mostly in old-style maintenance scripts) this ignores
undeclared variables in global scope. This is still a good improvement
over what was happening previously.

Change-Id: I50b41d571724244552074b9408abbdf6160aca59
2017-01-18 13:07:39 -08:00
Kevin Israel
33f4112a4b RaggettWrapper: Don't use ReplacementArray
Instead, build the array and call strtr() directly.

Also did some other minor cleanup, such as making replaceCallback()
private now that we require at least PHP 5.5, and changing &$this
to $this.

Change-Id: If885df06710c76fdb35d3c7de78df7436ccb7abf
2016-12-27 06:13:38 -05:00
Fomafix
202f695f67 Update weblinks in comments from HTTP to HTTPS
Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link.

Also update some defect links.

Change-Id: Ic3a5eac910d098ed5c2a21e9f47c9b6ee06b2643
2016-11-07 15:24:46 +01:00
C. Scott Ananian
4ae30c9516 Balancer: remove unnecessary extra argument
The full HTML5 spec clones element attributes when they are added to
the ActiveFormattingElements list, so that when an element on that
list is later cloned and reinserted the attributes are the *original*
attributes, not reflecting any changes which embedded JavaScript
in an inline <script> block may have made to them since the element
was pushed.

However, the PHP implementation doesn't run any JavaScript so there's
no way the attributes could change during balancing and there is
thus no reason to keep extra copies of the attributes around.

Change-Id: I89647aeb90c64701d77e862ea9e3d22b19bbdedc
2016-10-12 16:56:51 -07:00
Kunal Mehta
b92bab01f0 Balancer: Add a bunch of phpdoc and 2 fixmes
Change-Id: I0596c73cc87ec609d75aa4d8b241c2377bc4f9b1
2016-10-11 11:27:06 -07:00
Max Semenik
2b51bd1847 Fix function name case
Change-Id: Ibd4f682d2ed8500a50d85aae38f17281646f7c2d
2016-09-26 15:32:54 -07:00
C. Scott Ananian
806701eadf Balancer: pass configuration array to flatten instead of individual booleans
This refactoring makes it easier to add additional options later without
having to pass them manually through the call chain.

Change-Id: I46814f17d1b338b971ab57f63c2ec75d4a6b45d5
2016-08-06 12:08:19 -04:00
Tim Starling
2bec3ad45c Balancer style tweaks
* Use for loops where appropriate, instead of while
* De-indent a large block which was unnecessarily indented
* Use camel case for variable names, per the style guide

Change-Id: I0b2c37fdcab7f7238db0393085c43297e7a03ab2
2016-08-04 22:06:09 +00:00
Tim Starling
3879f633ce Balancer: remove redundant assignment
Change-Id: I6c22d6227e43a2c5be454955eff6b053a94a1657
2016-08-04 22:05:50 +00:00
Tim Starling
77eceb1a9b Balancer: consistent single-line comment style
Also break a line that was over 100 bytes

Change-Id: I875d572d4147f2438526a49ca6cb5b73907bdc9b
2016-08-04 15:40:27 -04:00
C. Scott Ananian
cd64c644b0 Support <textarea> tags in Balancer.
Change-Id: I63c2fd1c343362e49cf3b5a258fc98489744ad68
2016-07-21 03:37:10 +00:00
C. Scott Ananian
bd25891682 Support tokenizing simple HTML comments in the Balancer.
Change-Id: Ib780595b13b7145e99867d16e3c225e6b2b91884
2016-07-21 03:27:53 +00:00
C. Scott Ananian
2f70501364 Support <form> tags in Balancer.
Change-Id: I893fc231fea71f58449ed426d64ac99fdcb31d9e
2016-07-21 03:11:27 +00:00
C. Scott Ananian
b279c586a6 Support <select> tags in Balancer.
Change-Id: Ibc346624a9d035c98a29132a541e7ed6d82b364e
2016-07-21 02:48:26 +00:00
C. Scott Ananian
5f0505b889 Minor bug fixes to Balancer.
This is a follow-up to the refactor done in
5726c9ceb0 which prevents a crash when
the first entry in the stack happens to be a BalanceMarker (and thus
doesn't have a `$localName` property).  It also fixes an unrelated
issue where unpaired close-heading tags (like `</h3>`) get entity-escaped
instead of ignored.

Test cases exposing these bugs are added in
Ie854cf99f7e72bcca1bb8565ace558a43dcb6379.

Change-Id: Ia9a1d435be1be10512071f5ff626b68742863483
2016-07-18 17:42:04 -04:00
Tim Starling
d3d682fb45 Hide marked empty elements by default (stage 1)
We originally imagined rolling out the display of empty elements
simultaneously with the Html5Depurate, but now we have added support for
marking empty elements to Html5Depurate and plan on having some sort of
longer migration period. So, move the relevant CSS to content.css, and
remove the concept of CSS dependant on tidy driver.

Add a body class which will allow the effect to be toggled in a gadget or
extension. Actual toggling in the CSS will be in the stage 2 patch, to be
deployed after the varnish cache and parser cache have expired.

I originally imagined that there would be a gadget that overrides the
rule with an !important selector, but that method does not allow you to
recover the original display property, which is often overridden by the
style attribute or site CSS to be "inline".

Also, in RaggettWrapper, switch to the new class mw-empty-elt, following
Html5Depurate, instead of mw-empty-li. The old class will be removed in
the stage 2 patch.

Change-Id: Ic0f432c43a006629ca5a1a7c2dda3552ceb4dc4f
2016-07-14 14:24:27 -07:00
Tim Starling
7618d3be68 Balancer: Inline BalancerStack::length()
Provides 1% reduction in benchmark time

Change-Id: Ie8ff66a836cd137234828effcce9547e2cb3cd58
2016-07-12 14:18:05 +10:00