This was naïve, the linked bug documents a case where endTag() was
called despite children of the p-wrap still being in TreeBuilder's
stack. Instead, wait for the parent of the p-wrap to have endTag()
called on it, I've submitted a patch which will clean up the node in
that case.
Bug: T200827
Change-Id: I34694813eace9cadabf2db8f9ccca83d1368cfad
The Html5Depurate driver was intended to be used with an external Java
service, but it never gained traction due to deployment concerns.
The Html5Internal (Balancer) driver was originally intended for use with
the balanced templates proposal and could also handle tidying. But it was
tightly coupled to MediaWiki, so part of it was used as the basis of the
RemexHtml library. Remex most likely can also implement the balanced
templates proposal, so there isn't any reason to keep the Balancer code
around anymore,
Change-Id: I8542d69e9cdbf0e2fb7ebbb919933a64c1b8c293
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
When TreeBuilder requests reparenting of all child nodes of a given
element, we do this by removing the existing child nodes, and then
inserting the proposed new parent under the old parent. However, when a
p-wrap diversion is in place, the insertion of the new parent is
diverted into the p-wrap, and the p-wrap then becomes a child of the new
parent, causing a reference loop, and ultimately infinite recursion in
Serializer.
Instead, divert the entire reparent request to the p-wrap, so that the
new parent is a child of the p-wrap. This makes sense since the new
parent is always a formatting element. The only caller of
reparentChildren(), apart from proxies, is AAA step 17, which reparents
children under the formatting element cloned from the AFE list.
Left in some debug code for next time.
Bug: T178632
Change-Id: Id77d21d99748e94c064ef24c43ee0033de627b8e
Pull in the RemexHtml library, which is an HTML 5 library I recently
created.
RemexCompatMunger mutates the event stream, inserting <mw:p-wrap>
elements where necessary, and occasionally taking even more invasive
action such as reparenting and removing nodes maintained in Serializer's
tree.
RemexCompatFormatter produces a MediaWiki-style serialization which is
relatively compatible with existing parser tests. It also does final
empty element handling, including translating <mw:p-wrap> to <p>
Tests are imported from both Html5Depurate and Subbu's pwrap.js.
Depends-On: I864f31d9afdffdde49bfd39f07a0fb7f4df5c5d9
Change-Id: I900155b7dd199b0ae2a3b9cdb6db5136fc4f35a8
This corresponds to the 1.0.27 release of domino, and matches the
latest HTML5 spec as of 2016-10-18.
Changes include:
* <menuitem> is no longer an empty element.
* <isindex> has been removed.
* Updated html5lib-tests (copied from domino 1.0.27).
* Round-trip-safe serialization of <pre>/<listing>/<textarea> is only
used when "tidy compatibility" mode is enabled; the behavior in
the HTML5 spec no longer cleanly round trips.
Change-Id: I656944b0d7bb6c3c0e4fe44fc6ebd1a4c36412ad
* Use a doubly-linked list for the AFE list, instead of an array,
allowing efficient insertion and removal from the middle, and trivial
O(1) lookup of existing elements.
* Use a hashtable of singly-linked lists for storing Noah's Ark buckets,
instead of iterating through the entire AFE list on every push.
* Store attributes in an array instead of serializing them in the
tokenizer. This allows us to avoid sorting them in the output. For the
Noah's Ark clause, the array is copied and then sorted on demand.
* XHTML-style serialization with self-closing tags.
* Clear the AFE list in stopParsing(), otherwise all the BalanceElement
objects are kept alive until after serialization, thus using O(N^2)
memory (in stack depth N) since the full serialization is stored at
each stack level.
Change-Id: I517129c0658f03eb2ddee61fdf33ffe6fbd48509
This is an HTML5-compliant parse/serialize tidy implementation, with
well-delineated hacks to support the <p>-wrapping done by legacy tidy.
Change-Id: I4fd433fd6f1847061b0bf4b3e249c918720d4fae
This adds an implementation of the HTML5 Tree Builder algorithm to PHP,
along with test cases from the tree builder derived from the
html5lib-tests package on github. The test cases were preprocessed
into JSON for the `domino` HTML5 parser, and we're using the JSON
form of the tests.
The implementation follows both the language of the HTML5 specification
and the implementation in `domino` very closely, easing updates if the
specification changes.
This code is used in follow-on commits to support an HTML5-based
"tidy" for mediawiki and the `{{#balance}}` parser function, which
ensures that a template expands to properly-balanced HTML, with all
tags closed and nothing left on the HTML active formatting elements
list.
See: https://github.com/fgnass/domino
Change-Id: I6f4d20a43510dd819776bb333b639315b19d150d