Commit graph

8 commits

Author SHA1 Message Date
C. Scott Ananian
6db35b3c98 Remove most support for configuring Tidy, including Raggett
Remex is pure PHP so there is no reason to use an external tidy any
more. Configuration variables and implementation classes were
deprecated in 1.32 or earlier.  We've kept only $wgTidyConfig
which can be used for experimental features or debugging Remex.

Bug: T198214
Change-Id: I99d48f858d97b6e1d1e6cd76a42c960cc2c61f9f
2018-11-15 12:22:06 -05:00
C. Scott Ananian
54ac31f94d Hard deprecate codepaths where tidy is disabled
Future parsers will not support the output generated with tidy disabled.

Parser tests using untidied output will also be deprecated (and
rewritten) in a follow-up patch.

No new release notes necessary since user-visible tidy configuration
was deprecated previously (in 1.32), and individual methods which had
disabled tidy during execution were individually release-noted as they
were updated.

Bug: T198214
Depends-On: I0f417f75a49dfea873e9a2f44d81796a48b9f428
Depends-On: If5c619cdd3e7f786687cfc2ca166074d9197ca11
Change-Id: I592e0e0dfef7d929f05c60ffe4d60e09725b39cc
2018-11-05 18:49:16 +00:00
Erik Bernhardson
0d779c1ac6 Preserve whitespace in search index text content
Certain html tags imply a word break, but our html stripping doesn't
understand that at all. Adjust the html stripping to inject whitespace
for all block level tags (per MDN) along with the <br> element.

Bug: T195389
Change-Id: I9fbfac765ea88628e4f9b2794fb54e1cd0060203
2018-09-14 11:10:35 -07:00
James D. Forrester
846f4f58f5 Remove $wgExperimentalHtmlIds and related code, deprecated in 1.30
Bug: T139744
Change-Id: Ia15d5ab6e7637fd40d5c3399822a3dbeb7b383b5
2018-05-01 14:34:02 -07:00
Kunal Mehta
2ab7ae9d24 Add @covers for RemexStripTagHandler
This internal class is only used by Sanitizer::stripAllTags().

Change-Id: Ib913ee14524539216305da7e3183c07ab7d72cb5
2018-02-05 21:15:52 -08:00
Kunal Mehta
546980e537 Add @covers tags to parser tests
Change-Id: I7bce04bef5e981fd203ad819882482e72ca3f61b
2017-12-24 23:29:00 -08:00
Roan Kattouw
ddb4913f53 Use Remex in Sanitizer::stripAllTags()
Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:

    > print Sanitizer::stripAllTags( '<p data-foo="a&lt;b>c">Hello</p>' );
    c">Hello

We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:

    > print strip_tags('1<span class="<?php">2</span>3');
    1
    > print strip_tags('1<span class="<?">2</span>3');
    1

Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
2017-11-15 17:31:31 -08:00
Roan Kattouw
7980e38a84 Move Sanitizer.php to includes/parser/
Change-Id: Id08d91c747ec77d715459b89b03eee247ccd4e1b
2017-11-15 15:16:41 -08:00
Renamed from tests/phpunit/includes/SanitizerTest.php (Browse further)