Language::markNoConversion is used only within Parser.php and differs
from LanguageConverter::markNoConversion in that, contrary to its name
and its namesake, it only protects *things which look like URLs* from
language conversion.
This wasted several days of my time before I realized what was going on.
It's needless; just hoist the "looks like a URL" special casing inline
to the single place where that functionality is used. (And I wonder
if the "looks like a URL" case is actually needed at all any more,
since most of those cases are probably free external links, which
go through a different code path, not bracketed external links.)
This is a clean-up to the clean-up that liangent performed in 2012
with e01adbfc0b.
Change-Id: I80479600f34170651732b032e8881855aa1204d8
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.
This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.
Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
(e.g. in code converting from legacy encodings or testing the
handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
are actually handled by PCRE and have to be dealt with carefully
(those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
probably be left as-is.
The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:
chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
puts chars.split('').map{|char|
'\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
}.join('')
Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/
Replace with: '\1 ?? '
(Everywhere except includes/PHPVersionCheck.php)
(Then, manually fix some line length and indentation issues)
Then manually reviewed the replacements for cases where confusing
operator precedence would result in incorrect results
(fixing those in I478db046a1cc162c6767003ce45c9b56270f3372).
Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
Language::fetchLanguageNames returns already a sorted array. An
additional ksort is only needed when inserting a new value.
Change-Id: If8c7b16fa6e7dfe1545f72ac9c742a2f43eaee57
Returns the displaytitle (if present) or title->getPrefixedText for a
page, converted for each language variant configured on the wiki.
Bug: T178446
Change-Id: I35100af3824ca65c4fe5c106d4a6fbe4e5f75046
Introduce truncateInternal() method in Language class, based on
existing truncate() method. New method abstracts string truncation,
allowing users to specify callable functions for text length measurement
and string truncation.
New method, truncateInternal(), is used to provide two options for
text truncation:
* For DB usage: truncateForDatabase() method is truncating text by
number of bytes.
* For UI usage: truncateForVisual() method is truncating text by number
of characters, using multibyte string PHP methods.
Old truncate() method is deprecated and just returns the results of
truncateForDatabase() method.
Newly introduced truncateForVisual() method is used for
truncation of long tag descriptions in RCFilters menu.
Bug: T179626
Change-Id: Ib01a8c303304064dde3ce983b817d93a88a5affd
Return the unmodified string if there's no need to truncate it without
doing a not-so-trivial round of getting a message from the message
cache.
Change-Id: I11ac88672aeb9d1c4f5709b79ad2d17223bd64d8
In some languages it's conventional not to insert a thousands
separator in numbers that are four digits long (1000-9999).
Rather than copy-paste the custom code to do this between 13 files,
introduce another option and have the base Language class handle it.
This also fixes an issue in several languages where this logic
previously would not work for negative or fractional numbers.
To implement this, a new option is added to MessagesXx.php files,
`$minimumGroupingDigits = 2;`, with the meaning as defined in
<http://unicode.org/reports/tr35/tr35-numbers.html>. It is a little
roundabout, but it could allow us to migrate the number formatting
(currently all custom code) to some generic library easily.
Bug: T177846
Change-Id: Iedd8de5648cf2de1c94044918626de2f96365d48
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
- mostly auto fixes
- some too long lines fixed
- ignore amp space in one case passing by reference
Change-Id: I6472f83bc3cbf4bd629d83050cc3319b19ec465c
Change 950cf6016c took care of the most,
but a few remain, either outside of includes/ and maintenance/
directories (which that change was limited to), or in code introduced
afterwards.
Change-Id: I9c363d0219ea7e71cde520faba39406949a36d27
And auto-fix all errors.
The `<exclude-pattern>` stanzas are now included in the default ruleset
and don't need to be repeated.
Change-Id: I928af549dc88ac2c6cb82058f64c7c7f3111598a
Previously, even if $wgUsePigLatinVariant was false, the language
would show up on Special:Preferences (and some other places) as
'en-x-piglatin - Igpay Atinlay'.
Follow-up to d8375bee24.
Change-Id: I08faacabca87c04299c7b535be8df1770e0a37ac
Guarded by the $wgUsePigLatinVariant variable, off by default.
Pig Latin is a language game where words in English are altered
according to the following rules:
* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
group) moved to the end and an '-ay' suffix appended.
https://en.wikipedia.org/wiki/Pig_Latin
* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
doesn't have a language class or converter.
Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
Look at languages/messages/MessagesEo.php for one of about a dozen real
world examples where this is set to false. All code calling
getDatePreferences checks if it got a truthy value first before using
it.
Change-Id: I4ef5c8be618d41039297325c9dd4cf554ea14559
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
Use of &$this doesn't work in PHP 7.1. For callbacks to methods like
array_map() it's completely unnecessary, while for hooks we still need
to pass a reference and so we need to copy $this into a local variable.
Bug: T153505
Change-Id: I8bbb26e248cd6f213fd0e7460d6d6935a3f9e468
For relative timestamps in $str, strtotime( $str, $now ) returns an
absolute Unix timestamp $str since $now, and this timestamp is given
to $time. However, Language::formatDuration expects a time duration,
not an absolute timestamp. We obtain this duration from the difference
between $time, the absolute timestamp of block expiry, and $now, the
absolute timestamp of the time in which the block action happened.
Tests have been added to test both this patch and 01936fa, the patch
that caused this regression.
Bug: T156453
Change-Id: I6fd8c02dc3c6456067fe25cb9f33f5b4c78332aa
Currently, a user who has an invalid time zone stored in the database is
effectively locked out of their account on HHVM sites. This patch addresses
this by (1) preventing users from setting invalid time zones, and (2) not
throwing an unhandled exception if a user's TZ is unknown.
When the user saves their preferences, the code silently rewrites invalid
time zones to UTC. I think this is OK, since to cause this to happen you
have to manually muck around with the Preferences page DOM or submit the
form from a script.
Bug: T137182
Change-Id: I28c5e2ac9f2e681718c6080fb49b3b01e4af46dd
This makes the code for processing JSON files with
grammar transformations reusable by different languages
and applies the same logic to Russian and Hebrew.
It will be done to other languages in further patches.
This patch is not supposed to change any functionality,
and the tests are intact (except a comment in the test
for Hebrew - the class doesn't exist any longer).
PHP:
* Move the JSON grammar transformation data processing logic
from LanguageRu.php to convertGrammar() in Language.php.
By default all these data files are supposed to be
processed identically, so the code should be common.
If there is no JSON data file, nothing new happens.
* LanguageRu's own convertGrammar() method is removed.
* The LanguageHe class is removed, now that all its functionality
is handled by generic JSON data processing in the Language class.
LanguageHe.php file is removed from the repo and from autoloading.
JavaScript:
* Move the JSON grammar transformation data processing logic
from ru.js to mediawiki.language.js.
* JavaScript grammar code files he.js and ru.js are removed
from the repo and from Resources.php, because all the data
is in JSON, and the default logic in mediawiki.language.js
works for both languages.
Bug: T115217
Change-Id: I5e75467121c3d791bb84f9e6fdfcf07c1840f81a
Not entirely sure what's going on here. Best guess is phan isn't able
to figure out that array + mixed will result in an array, and then
adding validNamespaces (another array) is ok. Could make things a little
more explicit with array_merge, but this seems to work to remove the
issue without changing the meaning of the code.
Change-Id: I7031ae4e68878ec3198e47c55ab5de4d52a6d922