Commit graph

10 commits

Author SHA1 Message Date
Bartosz Dziewoński
0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00
Amir Sarabadani
5a21de8abb Remove everything related to CollationFa
This workaround was needed when ICU in production was broken
but after T189295 this is not needed anymore and we switched off
this collation from all Persian Wikis already

Bug: T139110
Change-Id: Ifad89555b6ac96a3eb36ca24b55e1f8ee57a1f05
2018-05-18 18:33:25 +02:00
jenkins-bot
1a21a63d52 Merge "Add collation for Abkhaz (ab)" 2018-01-23 18:42:29 +00:00
Thiemo Mättig
ef470ebf7f Remove @param comments that literally repeat what the code says
These comments do not add anything. I argue they are worse than having
no comments, because I have to read them first to understand they
actually don't explain anything. Removing them makes room for actual
improvements in the future (if needed).

Change-Id: Iee70aad681b3385e9af282d5581c10addbb91ac4
2018-01-10 14:14:26 +01:00
Kunal Mehta
a9960b53be tests: Use checkPHPExtension() instead of re-implementing it
Change-Id: I7f5e8684d556befc0aefa302187c573e7a3cff62
2017-12-25 22:06:37 +00:00
Kunal Mehta
222afabc80 Add @covers tags for Collation tests
Change-Id: I8b0623a6b716acdc9d369349fd4e306dbdc91d18
2017-12-25 21:59:01 +00:00
Bartosz Dziewoński
e94587dfbb Add collation for Abkhaz (ab)
* Adding new class AbkhazUppercaseCollation, mapped to 'uppercase-ab'.
* Extended CustomUppercaseCollation with support for sorting digraphs
  and for alphabets larger than 64 letters (up to 4096).

Bug: T183430
Change-Id: I16d44568e44d7ef5b39c38b1a6257b9fe10a34d4
2017-12-25 14:37:14 +00:00
Huji Lee
f05821e87c Do not run CollationFaTest if 'intl' is not loaded
Bug: T176040
Change-Id: I6b19bf1123d4dca5a1c8e002c0de65bab2138180
2017-09-16 16:30:18 -04:00
Brian Wolff
d16c26fd2c Unit tests for CollationFa (0bfcbd724)
Change-Id: I8286244cc1a61f34a3599c4f2e6201ba91c5e79a
2017-05-30 19:05:16 +00:00
Brian Wolff
73f5937047 Add collation for Bashkir (ba)
This is based on a numeric uppercase collation. Bashkir characters
will be remapped to the private use area for the purpose of sorting.

Bug: T162823
Change-Id: I65f1af0b57ff6ded7d464e39efd401f178a3519e
2017-05-10 04:17:46 +00:00