Commit graph

5 commits

Author SHA1 Message Date
C. Scott Ananian
2f70501364 Support <form> tags in Balancer.
Change-Id: I893fc231fea71f58449ed426d64ac99fdcb31d9e
2016-07-21 03:11:27 +00:00
C. Scott Ananian
b279c586a6 Support <select> tags in Balancer.
Change-Id: Ibc346624a9d035c98a29132a541e7ed6d82b364e
2016-07-21 02:48:26 +00:00
Tim Starling
5726c9ceb0 Some Balancer improvements for performance and compatibility
* Use a doubly-linked list for the AFE list, instead of an array,
  allowing efficient insertion and removal from the middle, and trivial
  O(1) lookup of existing elements.
* Use a hashtable of singly-linked lists for storing Noah's Ark buckets,
  instead of iterating through the entire AFE list on every push.
* Store attributes in an array instead of serializing them in the
  tokenizer. This allows us to avoid sorting them in the output. For the
  Noah's Ark clause, the array is copied and then sorted on demand.
* XHTML-style serialization with self-closing tags.
* Clear the AFE list in stopParsing(), otherwise all the BalanceElement
  objects are kept alive until after serialization, thus using O(N^2)
  memory (in stack depth N) since the full serialization is stored at
  each stack level.

Change-Id: I517129c0658f03eb2ddee61fdf33ffe6fbd48509
2016-07-12 14:18:04 +10:00
C. Scott Ananian
ce081a3d7b Hook up Balancer as a Tidy implementation.
This is an HTML5-compliant parse/serialize tidy implementation, with
well-delineated hacks to support the <p>-wrapping done by legacy tidy.

Change-Id: I4fd433fd6f1847061b0bf4b3e249c918720d4fae
2016-07-12 14:18:04 +10:00
C. Scott Ananian
a7e2b5b284 HTML5 Balancer
This adds an implementation of the HTML5 Tree Builder algorithm to PHP,
along with test cases from the tree builder derived from the
html5lib-tests package on github.  The test cases were preprocessed
into JSON for the `domino` HTML5 parser, and we're using the JSON
form of the tests.

The implementation follows both the language of the HTML5 specification
and the implementation in `domino` very closely, easing updates if the
specification changes.

This code is used in follow-on commits to support an HTML5-based
"tidy" for mediawiki and the `{{#balance}}` parser function, which
ensures that a template expands to properly-balanced HTML, with all
tags closed and nothing left on the HTML active formatting elements
list.

See: https://github.com/fgnass/domino
Change-Id: I6f4d20a43510dd819776bb333b639315b19d150d
2016-07-12 14:18:04 +10:00