Commit graph

9 commits

Author SHA1 Message Date
jenkins-bot
0ebeb72733 Merge "Simplify RemexStripTagHandler by extending NullTokenHandler" 2019-11-04 14:42:30 +00:00
Max Semenik
8a98dd9d59 Convert some private static arrays to constants
Remove @since for some private ones as we don't guarantee anything
about private class members.

Change-Id: Ifb898353c02082e9ef69d67f69339345c6cd154d
2019-10-16 01:30:54 +00:00
Tim Starling
ee80c3f3f4 Simplify RemexStripTagHandler by extending NullTokenHandler
Reduce code size and improve forwards compatibility.

Change-Id: I844d06923cbe965581e911afe8c9d91e8e61079c
2019-08-19 11:21:56 +10:00
Reedy
9f2ffdfbd4 Remove "Squiz.WhiteSpace.FunctionSpacing" from phpcs exclusions
Change-Id: I78b3315f26ab91b6b443f5b028a635552f82f5a3
2019-05-11 02:44:26 +01:00
Erik Bernhardson
aef02d516d Improve RemexStripTagHandler working with tables
HTML, generated by some infoboxes and perhaps other places, gets
stripped in a way that merges words together that should not be
merged. Add tr, th, and td to the list of tags that should force
word separation.

Bug: T218001
Change-Id: Ib374339628b1f543ea4e07f24aa3e3b76f3117b5
2019-03-14 13:11:59 -07:00
Kunal Mehta
cc5d9a92a2 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
Change-Id: I66b1775b7c1d36076d9ca78cbeb42787a743f2aa
2019-02-07 18:39:42 +00:00
Jakub Vrana
9f14c02e20 Remove duplicate keys from arrays
Found by PHPStan.

Change-Id: Ie0e0cfa33b3caa4a13f4dfb04c772c8a0284435a
2018-11-26 19:22:08 +01:00
Erik Bernhardson
0d779c1ac6 Preserve whitespace in search index text content
Certain html tags imply a word break, but our html stripping doesn't
understand that at all. Adjust the html stripping to inject whitespace
for all block level tags (per MDN) along with the <br> element.

Bug: T195389
Change-Id: I9fbfac765ea88628e4f9b2794fb54e1cd0060203
2018-09-14 11:10:35 -07:00
Roan Kattouw
ddb4913f53 Use Remex in Sanitizer::stripAllTags()
Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:

    > print Sanitizer::stripAllTags( '<p data-foo="a&lt;b>c">Hello</p>' );
    c">Hello

We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:

    > print strip_tags('1<span class="<?php">2</span>3');
    1
    > print strip_tags('1<span class="<?">2</span>3');
    1

Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e
2017-11-15 17:31:31 -08:00