Add the same no-arg options for language code that
{{#dir}} and {{#bcp47}} have, for consistency:
* `{{#language}}` will return the name of the *target language*
(for articles, the content language; for messages, the user language)
The default value for the "in language" argument should be the autonym.
This was working previously but only via a baroque code flow path for
invalid language codes. Make this a bit clearer and add tests.
Since non-autonym language code translations are added via the
[[Extension:CLDR]] in production, hook LanguageGetTranslatedLanguageNames
in the ParserTestRunner to ensure that we can test this.
Followup-To: Ice1c671c5b3cc077d2bb80ea5dc25c5eabbfeb36
Followup-To: I19c3e91a924e080f37dc95a0d4e61493583b533e
Change-Id: Ibf6e7f194cc056eadb48a5ad8e6d01a761d9351c
Copy the Renameuser extension into core, with minimal code changes. The
hook handlers are inlined into Article, SpecialLog and
SpecialContributions.
Bug: T27482
Change-Id: I314021f4138773df6aaf2753b33ab8283cd16974
Follows-up I301f471f86ba2.
For ease of navigation, move Converter subclasses to a group called
"Languages", which for documentation purposes is a subgroup of
"Language". The next commit does the same for Messages* files,
and Language subclasses (done separately for ease of review).
Change-Id: If1cef9aa15f536ebaedd4477ad7453426e7f3b85
Use @phpcs-require-sorted-array from new codesniffer release 32.0.0
Similiar to special page alias in
I827d1f5010d000609324ec398beeb142d9bac299
Bug: T255826
Change-Id: I7b7cbf0c03714001609437af68fe16e06930cc33
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
In some languages it's conventional not to insert a thousands
separator in numbers that are four digits long (1000-9999).
Rather than copy-paste the custom code to do this between 13 files,
introduce another option and have the base Language class handle it.
This also fixes an issue in several languages where this logic
previously would not work for negative or fractional numbers.
To implement this, a new option is added to MessagesXx.php files,
`$minimumGroupingDigits = 2;`, with the meaning as defined in
<http://unicode.org/reports/tr35/tr35-numbers.html>. It is a little
roundabout, but it could allow us to migrate the number formatting
(currently all custom code) to some generic library easily.
Bug: T177846
Change-Id: Iedd8de5648cf2de1c94044918626de2f96365d48
When there are multiple aliases, the first alias MUST be the
preferred alias in that language, so that wikitext code
generators can generate the desired syntax.
The other aliases SHOULD be sorted by the following convention:
- Local first, English last
- Most common first, least common last
Bug: T116020
Change-Id: Ia670512e0cb375335873e7f9a08b638bbe039e45
* All $namespaceNames and similar messages
that reference NS_* constants seem to use '_' for spaces,
except a few cases. I suspect its a mistake, thus replacing.
Regexes used:
([^'"\n]*['"][^'"\n]*(['"][^'"\n]*['"])*[^'"\n]*[^A-Za-z0-9]NS_)
contains a space in front, replaced with _\1
[^A-Za-z0-9]NS_([^'"\n]*['"][^'"\n]*['"])*[^'"\n]*['"][^'"\n]*
contains a space at the very end, replaced with \1_
Change-Id: Ibbc201678ee91db2b5bf3de597c1598b86558d77
* Interface strings are now elsewhere
* MessagesQQQ no longer exists
* Prefer https for translatewiki.net
Change-Id: I76652ea94cca80441cd5d978029e4707ee41c4fd