Remex is pure PHP so there is no reason to use an external tidy any
more. Configuration variables and implementation classes were
deprecated in 1.32 or earlier. We've kept only $wgTidyConfig
which can be used for experimental features or debugging Remex.
Bug: T198214
Change-Id: I99d48f858d97b6e1d1e6cd76a42c960cc2c61f9f
Let's rip the band-aid off. Remex is pure PHP so there's no reason to
be running any of the other tidy implementations any more, and we won't
be able to support them in the future.
Follow-up to 7b23382823.
Bug: T198214
Change-Id: Id3d07d44f8434231826e86e623554cac3decfa96
Use the new RemexHtml trace features. Add two more tracing modes.
Fix missing member variable declarations and remove unused local
variables.
Change-Id: I512462e1019f9a466684abfa4aab7697b324d5b1
This was naïve, the linked bug documents a case where endTag() was
called despite children of the p-wrap still being in TreeBuilder's
stack. Instead, wait for the parent of the p-wrap to have endTag()
called on it, I've submitted a patch which will clean up the node in
that case.
Bug: T200827
Change-Id: I34694813eace9cadabf2db8f9ccca83d1368cfad
The changes to the parserTests.txt highlight the differing opinions that
doBlockLevels and Remex had on whether these should be paragraph wrapped.
Since the only time they wouldn't have been was when found on a line
with other flow tags, this likely isn't a behaviour that was depended on
in practice. And, indeed, the task describes this as a bug.
A sampling of pages from an insource:/\<(ins|del)\>/ search on wiki bears
this out.
Bug: T17491
Change-Id: I311da777a63aa3c45013f2cfc090be35a022497e
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
The Html5Depurate driver was intended to be used with an external Java
service, but it never gained traction due to deployment concerns.
The Html5Internal (Balancer) driver was originally intended for use with
the balanced templates proposal and could also handle tidying. But it was
tightly coupled to MediaWiki, so part of it was used as the basis of the
RemexHtml library. Remex most likely can also implement the balanced
templates proposal, so there isn't any reason to keep the Balancer code
around anymore,
Change-Id: I8542d69e9cdbf0e2fb7ebbb919933a64c1b8c293
https://secure.php.net/manual/en/function.implode.php defines the order
of arguments as
string implode ( string $glue , array $pieces )
string implode ( array $pieces )
Note:
implode() can, for historical reasons, accept its parameters in
either order. For consistency with explode(), however, it may be less
confusing to use the documented order of arguments.
Change-Id: I03bf5712204e283f52d3ede54af9b9ec117d4280
When TreeBuilder requests reparenting of all child nodes of a given
element, we do this by removing the existing child nodes, and then
inserting the proposed new parent under the old parent. However, when a
p-wrap diversion is in place, the insertion of the new parent is
diverted into the p-wrap, and the p-wrap then becomes a child of the new
parent, causing a reference loop, and ultimately infinite recursion in
Serializer.
Instead, divert the entire reparent request to the p-wrap, so that the
new parent is a child of the p-wrap. This makes sense since the new
parent is always a formatting element. The only caller of
reparentChildren(), apart from proxies, is AAA step 17, which reparents
children under the formatting element cloned from the AFE list.
Left in some debug code for next time.
Bug: T178632
Change-Id: Id77d21d99748e94c064ef24c43ee0033de627b8e
Breaks some line where the ignore is not needed.
The sniff was changed upstream to be okay
with long unbreakable lines in comments
Change-Id: I2bbe2be7cedd4d3c0ce8dc3e62d0e268bc171876
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case passing by reference
Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
And auto-fix all errors.
The `<exclude-pattern>` stanzas are now included in the default ruleset
and don't need to be repeated.
Change-Id: I928af549dc88ac2c6cb82058f64c7c7f3111598a
Having such comments is worse than not having them. They add zero
information. But you must read the text to understand there is
nothing you don't already know from the class and the method name.
This is similar to I994d11e. Even more trivial, because this here is
about comments that don't say anything but "constructor".
Change-Id: I474dcdb5997bea3aafd11c0760ee072dfaff124c
The used phpcs has a bug, so the version 0.9.0 could not be enforced at the moment.
Will be fixed in next version, see T167168
Changed:
- Remove duplicate newline at end of file
- Add space between function and ( for closures
- and -> &&, or -> ||
Change-Id: I4172fb08861729bccd55aecbd07e029e2638d311
Some versions of html-tidy (e.g. the one currently in use on WMF wikis)
will try to move all <style> tags in the body into the head, effectively
removing them for our purposes. We need to avoid that for TemplateStyles.
Bug: T167349
Change-Id: I133776d16f366cad73ed30af0e5a665fdf9f5ed9
Pull in the RemexHtml library, which is an HTML 5 library I recently
created.
RemexCompatMunger mutates the event stream, inserting <mw:p-wrap>
elements where necessary, and occasionally taking even more invasive
action such as reparenting and removing nodes maintained in Serializer's
tree.
RemexCompatFormatter produces a MediaWiki-style serialization which is
relatively compatible with existing parser tests. It also does final
empty element handling, including translating <mw:p-wrap> to <p>
Tests are imported from both Html5Depurate and Subbu's pwrap.js.
Depends-On: I864f31d9afdffdde49bfd39f07a0fb7f4df5c5d9
Change-Id: I900155b7dd199b0ae2a3b9cdb6db5136fc4f35a8
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
This corresponds to the 1.0.27 release of domino, and matches the
latest HTML5 spec as of 2016-10-18.
Changes include:
* <menuitem> is no longer an empty element.
* <isindex> has been removed.
* Updated html5lib-tests (copied from domino 1.0.27).
* Round-trip-safe serialization of <pre>/<listing>/<textarea> is only
used when "tidy compatibility" mode is enabled; the behavior in
the HTML5 spec no longer cleanly round trips.
Change-Id: I656944b0d7bb6c3c0e4fe44fc6ebd1a4c36412ad
Undeclared variables are a very common error type that we want to catch
as often as possible. To avoid needing to refactor a variety of global
level code (mostly in old-style maintenance scripts) this ignores
undeclared variables in global scope. This is still a good improvement
over what was happening previously.
Change-Id: I50b41d571724244552074b9408abbdf6160aca59
Instead, build the array and call strtr() directly.
Also did some other minor cleanup, such as making replaceCallback()
private now that we require at least PHP 5.5, and changing &$this
to $this.
Change-Id: If885df06710c76fdb35d3c7de78df7436ccb7abf
Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link.
Also update some defect links.
Change-Id: Ic3a5eac910d098ed5c2a21e9f47c9b6ee06b2643
The full HTML5 spec clones element attributes when they are added to
the ActiveFormattingElements list, so that when an element on that
list is later cloned and reinserted the attributes are the *original*
attributes, not reflecting any changes which embedded JavaScript
in an inline <script> block may have made to them since the element
was pushed.
However, the PHP implementation doesn't run any JavaScript so there's
no way the attributes could change during balancing and there is
thus no reason to keep extra copies of the attributes around.
Change-Id: I89647aeb90c64701d77e862ea9e3d22b19bbdedc
This refactoring makes it easier to add additional options later without
having to pass them manually through the call chain.
Change-Id: I46814f17d1b338b971ab57f63c2ec75d4a6b45d5
* Use for loops where appropriate, instead of while
* De-indent a large block which was unnecessarily indented
* Use camel case for variable names, per the style guide
Change-Id: I0b2c37fdcab7f7238db0393085c43297e7a03ab2
This is a follow-up to the refactor done in
5726c9ceb0 which prevents a crash when
the first entry in the stack happens to be a BalanceMarker (and thus
doesn't have a `$localName` property). It also fixes an unrelated
issue where unpaired close-heading tags (like `</h3>`) get entity-escaped
instead of ignored.
Test cases exposing these bugs are added in
Ie854cf99f7e72bcca1bb8565ace558a43dcb6379.
Change-Id: Ia9a1d435be1be10512071f5ff626b68742863483
We originally imagined rolling out the display of empty elements
simultaneously with the Html5Depurate, but now we have added support for
marking empty elements to Html5Depurate and plan on having some sort of
longer migration period. So, move the relevant CSS to content.css, and
remove the concept of CSS dependant on tidy driver.
Add a body class which will allow the effect to be toggled in a gadget or
extension. Actual toggling in the CSS will be in the stage 2 patch, to be
deployed after the varnish cache and parser cache have expired.
I originally imagined that there would be a gadget that overrides the
rule with an !important selector, but that method does not allow you to
recover the original display property, which is often overridden by the
style attribute or site CSS to be "inline".
Also, in RaggettWrapper, switch to the new class mw-empty-elt, following
Html5Depurate, instead of mw-empty-li. The old class will be removed in
the stage 2 patch.
Change-Id: Ic0f432c43a006629ca5a1a7c2dda3552ceb4dc4f