Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:
> print Sanitizer::stripAllTags( '<p data-foo="a<b>c">Hello</p>' );
c">Hello
We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:
> print strip_tags('1<span class="<?php">2</span>3');
1
> print strip_tags('1<span class="<?">2</span>3');
1
Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
40 lines
951 B
PHP
40 lines
951 B
PHP
<?php
|
|
|
|
use RemexHtml\Tokenizer\Attributes;
|
|
use RemexHtml\Tokenizer\TokenHandler;
|
|
use RemexHtml\Tokenizer\Tokenizer;
|
|
|
|
/**
|
|
* @internal
|
|
*/
|
|
class RemexStripTagHandler implements TokenHandler {
|
|
private $text = '';
|
|
public function getResult() {
|
|
return $this->text;
|
|
}
|
|
|
|
function startDocument( Tokenizer $t, $fns, $fn ) {
|
|
// Do nothing.
|
|
}
|
|
function endDocument( $pos ) {
|
|
// Do nothing.
|
|
}
|
|
function error( $text, $pos ) {
|
|
// Do nothing.
|
|
}
|
|
function characters( $text, $start, $length, $sourceStart, $sourceLength ) {
|
|
$this->text .= substr( $text, $start, $length );
|
|
}
|
|
function startTag( $name, Attributes $attrs, $selfClose, $sourceStart, $sourceLength ) {
|
|
// Do nothing.
|
|
}
|
|
function endTag( $name, $sourceStart, $sourceLength ) {
|
|
// Do nothing.
|
|
}
|
|
function doctype( $name, $public, $system, $quirks, $sourceStart, $sourceLength ) {
|
|
// Do nothing.
|
|
}
|
|
function comment( $text, $sourceStart, $sourceLength ) {
|
|
// Do nothing.
|
|
}
|
|
}
|