wiki.techinc.nl/maintenance/language
Bartosz Dziewoński 0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
..
zhtable Replace HTTP by HTTPS 2018-05-22 12:14:14 +02:00
alltrans.php Use ::class to resolve class names in maintenance scripts 2018-01-23 17:40:16 +00:00
checkDupeMessages.php Remove empty lines at begin of function, if, foreach, switch 2017-07-01 11:34:16 +00:00
checkExtensions.php
checkLanguage.inc build: Updating mediawiki/mediawiki-codesniffer to 15.0.0 2018-01-01 14:10:16 +01:00
checkLanguage.php Try to fix some other broken-looking legacy maintenance script options 2016-03-12 00:47:36 +00:00
date-formats.php Use ::class to resolve class names in maintenance scripts 2018-01-23 17:40:16 +00:00
digit2html.php Hard deprecate UtfNormalUtil 2018-02-06 14:44:37 -08:00
dumpMessages.php Use ::class to resolve class names in maintenance scripts 2018-01-23 17:40:16 +00:00
generateCollationData.php Follow-up If8dfdaf1: Hard-deprecate, drop two uses, other pre-5.3 back-compat code 2018-05-24 17:01:02 -07:00
generateNormalizerDataAr.php languages: Use static array files for normalizer data 2018-05-22 21:38:43 +00:00
generateNormalizerDataMl.php languages: Use static array files for normalizer data 2018-05-22 21:38:43 +00:00
langmemusage.php Use ::class to resolve class names in maintenance scripts 2018-01-23 17:40:16 +00:00
languages.inc Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals 2018-06-04 16:20:13 +00:00
listVariants.php Use ::class to resolve class names in maintenance scripts 2018-01-23 17:40:16 +00:00
StatOutputs.php Update suppressWarning()/restoreWarning() calls 2018-02-10 08:50:12 +00:00
transstat.php build: Updating mediawiki/mediawiki-codesniffer to 14.1.0 2017-10-21 03:12:55 +00:00