wiki.techinc.nl/includes/tidy
Bartosz Dziewoński 0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
..
RaggettBase.php Add missing use statement 2018-04-27 23:13:43 +02:00
RaggettExternal.php
RaggettInternalHHVM.php Fix undefined classes 2016-06-30 15:08:35 -07:00
RaggettInternalPHP.php Update weblinks in comments from HTTP to HTTPS 2016-11-07 15:24:46 +01:00
RaggettWrapper.php Hide <style> tags from Tidy 2017-06-13 13:02:57 -04:00
RemexCompatFormatter.php Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals 2018-06-04 16:20:13 +00:00
RemexCompatMunger.php Munge inline elements found in tidy.conf as well 2018-04-04 20:20:38 -04:00
RemexDriver.php RemexHtml tidy driver with p-wrapping 2017-03-08 16:54:13 +11:00
RemexMungerData.php Fix RemexCompatMunger infinite recursion 2017-11-17 23:27:14 +11:00
tidy.conf Hide <style> tags from Tidy 2017-06-13 13:02:57 -04:00
TidyDriverBase.php Immediately drop wgValidateAllHtml and related code 2018-04-10 10:51:28 -07:00