Commit graph

19 commits

Author SHA1 Message Date
Tim Starling
10c8cfea30 RemexCompatMunger: Don't call endTag() in case B/b
This was naïve, the linked bug documents a case where endTag() was
called despite children of the p-wrap still being in TreeBuilder's
stack. Instead, wait for the parent of the p-wrap to have endTag()
called on it, I've submitted a patch which will clean up the node in
that case.

Bug: T200827
Change-Id: I34694813eace9cadabf2db8f9ccca83d1368cfad
2018-08-07 14:07:31 +10:00
Kunal Mehta
853b8fe34c tidy: Remove obsolete Depurate and Balancer drivers
The Html5Depurate driver was intended to be used with an external Java
service, but it never gained traction due to deployment concerns.

The Html5Internal (Balancer) driver was originally intended for use with
the balanced templates proposal and could also handle tidying. But it was
tightly coupled to MediaWiki, so part of it was used as the basis of the
RemexHtml library. Remex most likely can also implement the balanced
templates proposal, so there isn't any reason to keep the Balancer code
around anymore,

Change-Id: I8542d69e9cdbf0e2fb7ebbb919933a64c1b8c293
2018-05-08 15:32:49 +00:00
Umherirrender
255d76f2a1 build: Updating mediawiki/mediawiki-codesniffer to 15.0.0
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable

For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore

Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
2018-01-01 14:10:16 +01:00
Tim Starling
324e4bca4f Fix RemexCompatMunger infinite recursion
When TreeBuilder requests reparenting of all child nodes of a given
element, we do this by removing the existing child nodes, and then
inserting the proposed new parent under the old parent. However, when a
p-wrap diversion is in place, the insertion of the new parent is
diverted into the p-wrap, and the p-wrap then becomes a child of the new
parent, causing a reference loop, and ultimately infinite recursion in
Serializer.

Instead, divert the entire reparent request to the p-wrap, so that the
new parent is a child of the p-wrap. This makes sense since the new
parent is always a formatting element. The only caller of
reparentChildren(), apart from proxies, is AAA step 17, which reparents
children under the formatting element cloned from the AFE list.

Left in some debug code for next time.

Bug: T178632
Change-Id: Id77d21d99748e94c064ef24c43ee0033de627b8e
2017-11-17 23:27:14 +11:00
jenkins-bot
82524dc4da Merge "RemexHtml tidy driver with p-wrapping" 2017-03-08 15:24:36 +00:00
Tim Starling
9341a00ed1 RemexHtml tidy driver with p-wrapping
Pull in the RemexHtml library, which is an HTML 5 library I recently
created.

RemexCompatMunger mutates the event stream, inserting <mw:p-wrap>
elements where necessary, and occasionally taking even more invasive
action such as reparenting and removing nodes maintained in Serializer's
tree.

RemexCompatFormatter produces a MediaWiki-style serialization which is
relatively compatible with existing parser tests. It also does final
empty element handling, including translating <mw:p-wrap> to <p>

Tests are imported from both Html5Depurate and Subbu's pwrap.js.

Depends-On: I864f31d9afdffdde49bfd39f07a0fb7f4df5c5d9
Change-Id: I900155b7dd199b0ae2a3b9cdb6db5136fc4f35a8
2017-03-08 16:54:13 +11:00
Tim Starling
abe8af08e2 Fix @covers for BalancerTest
This test is intended to cover the whole file, not just one method.

Change-Id: Ice800ce467e030e8264db96e19feadf9b68afb9a
2017-02-27 15:15:23 +11:00
C. Scott Ananian
265f2b40dd Update Balancer to latest HTML5 spec
This corresponds to the 1.0.27 release of domino, and matches the
latest HTML5 spec as of 2016-10-18.

Changes include:
* <menuitem> is no longer an empty element.
* <isindex> has been removed.
* Updated html5lib-tests (copied from domino 1.0.27).
* Round-trip-safe serialization of <pre>/<listing>/<textarea> is only
  used when "tidy compatibility" mode is enabled; the behavior in
  the HTML5 spec no longer cleanly round trips.

Change-Id: I656944b0d7bb6c3c0e4fe44fc6ebd1a4c36412ad
2017-01-24 05:44:05 +00:00
jenkins-bot
29d3feb1d0 Merge "Enable additional balancer tests (those starting with <!DOCTYPE html>)" 2016-07-21 17:27:59 +00:00
jenkins-bot
dbc6417a74 Merge "Support <textarea> tags in Balancer." 2016-07-21 17:26:59 +00:00
Kunal Mehta
1d38ce21d9 Fix @covers tag in BalancerTest
Causes failures like
<https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2149/console>

Change-Id: I4a3498b88f203b97639fdd248316fef1058f9ddc
2016-07-20 23:49:37 -07:00
C. Scott Ananian
a400f913e0 Enable additional balancer tests (those starting with <!DOCTYPE html>)
Change-Id: Ie854cf99f7e72bcca1bb8565ace558a43dcb6379
2016-07-21 03:44:17 +00:00
C. Scott Ananian
cd64c644b0 Support <textarea> tags in Balancer.
Change-Id: I63c2fd1c343362e49cf3b5a258fc98489744ad68
2016-07-21 03:37:10 +00:00
C. Scott Ananian
bd25891682 Support tokenizing simple HTML comments in the Balancer.
Change-Id: Ib780595b13b7145e99867d16e3c225e6b2b91884
2016-07-21 03:27:53 +00:00
C. Scott Ananian
2f70501364 Support <form> tags in Balancer.
Change-Id: I893fc231fea71f58449ed426d64ac99fdcb31d9e
2016-07-21 03:11:27 +00:00
C. Scott Ananian
b279c586a6 Support <select> tags in Balancer.
Change-Id: Ibc346624a9d035c98a29132a541e7ed6d82b364e
2016-07-21 02:48:26 +00:00
Tim Starling
5726c9ceb0 Some Balancer improvements for performance and compatibility
* Use a doubly-linked list for the AFE list, instead of an array,
  allowing efficient insertion and removal from the middle, and trivial
  O(1) lookup of existing elements.
* Use a hashtable of singly-linked lists for storing Noah's Ark buckets,
  instead of iterating through the entire AFE list on every push.
* Store attributes in an array instead of serializing them in the
  tokenizer. This allows us to avoid sorting them in the output. For the
  Noah's Ark clause, the array is copied and then sorted on demand.
* XHTML-style serialization with self-closing tags.
* Clear the AFE list in stopParsing(), otherwise all the BalanceElement
  objects are kept alive until after serialization, thus using O(N^2)
  memory (in stack depth N) since the full serialization is stored at
  each stack level.

Change-Id: I517129c0658f03eb2ddee61fdf33ffe6fbd48509
2016-07-12 14:18:04 +10:00
C. Scott Ananian
ce081a3d7b Hook up Balancer as a Tidy implementation.
This is an HTML5-compliant parse/serialize tidy implementation, with
well-delineated hacks to support the <p>-wrapping done by legacy tidy.

Change-Id: I4fd433fd6f1847061b0bf4b3e249c918720d4fae
2016-07-12 14:18:04 +10:00
C. Scott Ananian
a7e2b5b284 HTML5 Balancer
This adds an implementation of the HTML5 Tree Builder algorithm to PHP,
along with test cases from the tree builder derived from the
html5lib-tests package on github.  The test cases were preprocessed
into JSON for the `domino` HTML5 parser, and we're using the JSON
form of the tests.

The implementation follows both the language of the HTML5 specification
and the implementation in `domino` very closely, easing updates if the
specification changes.

This code is used in follow-on commits to support an HTML5-based
"tidy" for mediawiki and the `{{#balance}}` parser function, which
ensures that a template expands to properly-balanced HTML, with all
tags closed and nothing left on the HTML active formatting elements
list.

See: https://github.com/fgnass/domino
Change-Id: I6f4d20a43510dd819776bb333b639315b19d150d
2016-07-12 14:18:04 +10:00