wiki.techinc.nl/includes/parser/RemexStripTagHandler.php
Roan Kattouw ddb4913f53 Use Remex in Sanitizer::stripAllTags()
Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:

    > print Sanitizer::stripAllTags( '<p data-foo="a&lt;b>c">Hello</p>' );
    c">Hello

We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:

    > print strip_tags('1<span class="<?php">2</span>3');
    1
    > print strip_tags('1<span class="<?">2</span>3');
    1

Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
2017-11-15 17:31:31 -08:00

40 lines
951 B
PHP

<?php
use RemexHtml\Tokenizer\Attributes;
use RemexHtml\Tokenizer\TokenHandler;
use RemexHtml\Tokenizer\Tokenizer;
/**
* @internal
*/
class RemexStripTagHandler implements TokenHandler {
private $text = '';
public function getResult() {
return $this->text;
}
function startDocument( Tokenizer $t, $fns, $fn ) {
// Do nothing.
}
function endDocument( $pos ) {
// Do nothing.
}
function error( $text, $pos ) {
// Do nothing.
}
function characters( $text, $start, $length, $sourceStart, $sourceLength ) {
$this->text .= substr( $text, $start, $length );
}
function startTag( $name, Attributes $attrs, $selfClose, $sourceStart, $sourceLength ) {
// Do nothing.
}
function endTag( $name, $sourceStart, $sourceLength ) {
// Do nothing.
}
function doctype( $name, $public, $system, $quirks, $sourceStart, $sourceLength ) {
// Do nothing.
}
function comment( $text, $sourceStart, $sourceLength ) {
// Do nothing.
}
}