wiki.techinc.nl/includes/tidy/RemexDriver.php
Tim Starling 9341a00ed1 RemexHtml tidy driver with p-wrapping
Pull in the RemexHtml library, which is an HTML 5 library I recently
created.

RemexCompatMunger mutates the event stream, inserting <mw:p-wrap>
elements where necessary, and occasionally taking even more invasive
action such as reparenting and removing nodes maintained in Serializer's
tree.

RemexCompatFormatter produces a MediaWiki-style serialization which is
relatively compatible with existing parser tests. It also does final
empty element handling, including translating <mw:p-wrap> to <p>

Tests are imported from both Html5Depurate and Subbu's pwrap.js.

Depends-On: I864f31d9afdffdde49bfd39f07a0fb7f4df5c5d9
Change-Id: I900155b7dd199b0ae2a3b9cdb6db5136fc4f35a8
2017-03-08 16:54:13 +11:00

57 lines
1.4 KiB
PHP

<?php
namespace MediaWiki\Tidy;
use RemexHtml\Serializer\Serializer;
use RemexHtml\Tokenizer\Tokenizer;
use RemexHtml\TreeBuilder\Dispatcher;
use RemexHtml\TreeBuilder\TreeBuilder;
use RemexHtml\TreeBuilder\TreeMutationTracer;
class RemexDriver extends TidyDriverBase {
private $trace;
private $pwrap;
public function __construct( array $config ) {
$config += [
'treeMutationTrace' => false,
'pwrap' => true
];
$this->trace = $config['treeMutationTrace'];
$this->pwrap = $config['pwrap'];
parent::__construct( $config );
}
public function tidy( $text ) {
$formatter = new RemexCompatFormatter;
$serializer = new Serializer( $formatter );
if ( $this->pwrap ) {
$munger = new RemexCompatMunger( $serializer );
} else {
$munger = $serializer;
}
if ( $this->trace ) {
$tracer = new TreeMutationTracer( $munger, function ( $msg ) {
wfDebug( "RemexHtml: $msg" );
} );
} else {
$tracer = $munger;
}
$treeBuilder = new TreeBuilder( $tracer, [
'ignoreErrors' => true,
'ignoreNulls' => true,
] );
$dispatcher = new Dispatcher( $treeBuilder );
$tokenizer = new Tokenizer( $dispatcher, $text, [
'ignoreErrors' => true,
'ignoreCharRefs' => true,
'ignoreNulls' => true,
'skipPreprocess' => true,
] );
$tokenizer->execute( [
'fragmentNamespace' => \RemexHtml\HTMLData::NS_HTML,
'fragmentName' => 'body'
] );
return $serializer->getResult();
}
}