Commit graph

305 commits

Author SHA1 Message Date
jenkins-bot
7ddab17aac Merge "Accept BCP 47 codes in LanguageConverter rules" 2018-11-27 18:49:25 +00:00
Fomafix
512aa4e551 Use PHP 7 '??' operator instead of if-then-else
Change-Id: Ia86f8433f30a166d38ee63d0d1745b26740767b9
2018-10-27 23:46:13 +02:00
C. Scott Ananian
fcbde8ae4e Make Language::hasVariant() more strict
In d59f27aeab we made
LanguageConverter::validateVariant() try harder to convert a variant
into an acceptable MediaWiki-internal form, looking at deprecated
codes and BCP 47 aliases.  However, this misled Language::hasVariant()
into thinking that bogus names (like all-uppercase strings) were
acceptable variant names, which then led exceptions when they were
passed to the various conversion methods.

This is a belt-and-suspenders patch for T207433 -- in that case we
shouldn't have created a Language object with code 'sr-cyrl' in the
first place, but once one was created we shouldn't have tried to
ask LanguageSr to convert texts to 'sr-cyrl'.  The latter problem
is fixed by this patch.

Bug: T207433
Change-Id: Id993bc7989144b5031a551662e8e492bd23f698a
2018-10-22 16:35:26 -04:00
C. Scott Ananian
f7bb180fef Accept BCP 47 codes in LanguageConverter rules
Facilitate a gradual migration away from non-standard MediaWiki language
codes.  This will ensure that (a) rules can be written with standard
BCP 47 codes, and (b) rules written with existing nonstandard codes will
continue to work once these are added to
LanguageCode::$deprecatedLanguageCodeMapping.

Change-Id: I3ba96faafaf40bd47fb5919621f7035f0431a698
2018-10-16 23:58:11 -04:00
C. Scott Ananian
d59f27aeab Accept BCP 47 codes as aliases for nonstandard variants
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places.  Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.

This is a replay of an earlier merged patch,
0818070c59, which had to be reverted
because it was based on 8380f0173e which
caused regressions in the Babel extension (T199941).

Change-Id: Ica89d9547c58967747ab0fa15d4e83be5378796d
2018-10-11 02:23:20 -04:00
Brian Wolff
8dbf6a7b31 Add taint annotation and warnings to Language::convert() et al
If you feed this method unescaped data, it can cause later calls
to be an XSS, which is something I think deserves a warning.

Bug: T202571
Change-Id: I34cb3da9232a22defffb80466263c2f2233822ef
2018-09-01 18:10:55 +00:00
RazeSoldier
24ffbd9bd1 Use "break" instead of "continue"
"continue" statements are equivalent to "break". In PHP 7.3, will generate a warning.

Bug: T200595
Change-Id: I244ecb2e1ce5a76295f014fb1becd8d263196846
2018-08-24 00:18:07 +08:00
Fomafix
73f94fd8cd Add type hint Language where possible
Also use ?? instead of ?: to check for null.

Change-Id: I058b61d7e06cdefecdafa82f60109cc386e2a809
2018-08-12 10:20:11 +02:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
Greg Grossmeier
dc282a46d7 Revert "Accept BCP 47 codes as aliases for nonstandard variants"
This reverts commit 0818070c59.

Reason for revert: Caused T199941

Bug: T199941
Change-Id: I24c178eb33890477de79cbb3122861c140578011
2018-07-23 16:44:55 +00:00
C. Scott Ananian
0818070c59 Accept BCP 47 codes as aliases for nonstandard variants
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places.  Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.

Change-Id: I8468a56d5b88f5786abd0a17b67bda2f1687fd0c
2018-07-13 17:43:20 -04:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
Fomafix
2feb1fccd4 LanguageConverter: Fix @return description
validateVariant returns null, not false.

Change-Id: I5241205da9f4d6266f09b361df856e50ddd96a7d
2018-06-20 18:36:49 +02:00
Kunal Mehta
230958d97c Autofix MediaWiki.Commenting.FunctionComment.SpacingDoc* errors
Change-Id: I63761ebce04c03b9b13237919c27cc10180f198f
2018-05-19 14:07:03 -07:00
jenkins-bot
236488d398 Merge "Add a hook into LanguageConverter#getPreferredVariant() to allow extensions to pull the desired variant from cookies (or other such source)" 2018-01-23 23:01:34 +00:00
Umherirrender
255d76f2a1 build: Updating mediawiki/mediawiki-codesniffer to 15.0.0
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable

For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore

Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
2018-01-01 14:10:16 +01:00
tjones
a0b511319c Crimean Tatar Transliteration
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).

Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.

Regularize CRH language name in messages/* files.

Fix "varient" typos in qqq.json.

Add unit tests for CRH transliteration.

Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
2017-11-20 16:56:38 -05:00
Brian Wolff
4acbbf0972 Follow-up I077d30c50 fix phpcs error
Change-Id: I28cb7060d6149d96ceb0dcad7e2bff2ed3434411
2017-11-15 06:56:38 +00:00
Brian Wolff
f0555bab3d Fix langauge converter parser test with self-close tags
This fixes an issue in f21f3942 where if there was an html
element with an alt or title attribute containing an <
entity, an ascii EOT control character (0x04) may become
inserted into the text if language converter was enabled.

Due to a really old bug in language converter, self-closed tags
got turned into non-self closed tags. However due a different
bug which was fixed in f21f3942 this code path was rarely taken
so nobody noticed until now.

Follow-up Idbc45cac12

Bug: T180552
Change-Id: I077d30c50fcb419837fef937d27caca307153d2d
2017-11-15 06:03:22 +00:00
Reedy
f600b4ede9 Fix phpcs issues from LanguageConverter patches
Change-Id: I34e57c90ffd40fbd9f8afe3c57dd73fa7f655841
2017-11-15 03:37:27 +00:00
Brian Wolff
f21f3942eb SECURITY: Handle -{}- syntax in attributes safely
Previously, if one had an attribute with the contents
"-{}-foo-{}-", foo would get replaced by language converter as if
it wasn't in an attribute. This lead to an XSS attack.

This breaks doing manual conversions in url href's (or any
other attribute that goes through an escaping method
other than Sanitizer's). e.g. http://{sr-el:foo';sr-ec:bar}.com
won't work anymore. See also T87332

Bug: T119158
Change-Id: Idbc45cac12c309b0ccb4adeff6474fa527b48edb
2017-11-15 03:33:03 +00:00
Brian Wolff
fbe78cfa09 SECURITY: XSS in langconverter when regex hits pcre.backtrack_limit
Adjust regexes for what not to convert to avoid backtracking by
preferring possesive quantifiers

Add check that we really have matched to the end of the string, and
log error if the regex hits some sort of error preventing the
entire string from being matched. Should the regex not match to the
end, then language conversion is disabled for the string.

Bug: T124404
Change-Id: I4f0c171c7da804e9c1508ef1f59556665a318f6a
2017-11-15 03:33:03 +00:00
Umherirrender
f739a8f368 Improve some parameter docs
Add missing @return and @param to function docs and fixed some @param

Change-Id: I810727961057cfdcc274428b239af5975c57468d
2017-09-10 20:32:31 +02:00
Jack Phoenix
43da7fb884 Add a hook into LanguageConverter#getPreferredVariant() to allow extensions to pull the desired variant from cookies (or other such source)
Example implementation using this hook: wikiHow's ChineseVariantSelector
extension, installed on zh.wikihow.com, which uses cookies to store the
preferred language variant, allowing anonymous users to change the
language variant without registering/logging in.

Change-Id: I5295a26578b45a8d51f2b7550938088fec18404f
2017-07-23 16:35:09 +03:00
Umherirrender
b5cddfb27b Remove empty lines at begin of function, if, foreach, switch
Organize phpcs.xml a bit

Change-Id: Ifb767729b481b4b686e6d6444cf48b1f580cc478
2017-07-01 11:34:16 +00:00
C. Scott Ananian
5e76bb2657 tests: Use TestingAccessWrapper to reload LanguageConverter tables
Make the LanguageConverter::reloadTables method actually private,
and use the TestingAccessWrapper to call it when running parser tests.

Follow-up to I65736520cd04bfe8949b29ade07338a6e1b88a4d.

Change-Id: I43b81b8fef6441ad50b858ff7757732ecb5eef91
2017-06-27 17:11:09 -04:00
C. Scott Ananian
b80b7020ce tests: Reset LanguageConverter conversion tables between test cases
Conversion rules defined in a previous test case were leaking into
subsequent test cases.  Existing tests had worked around this by defining
non-overlapping rules, but it's better to just fix the problem at the
source.

Change-Id: I65736520cd04bfe8949b29ade07338a6e1b88a4d
2017-06-26 13:56:30 -04:00
Liangent
d8375bee24 New language variant 'en-x-piglatin' for easier variant testing
Guarded by the $wgUsePigLatinVariant variable, off by default.

Pig Latin is a language game where words in English are altered
according to the following rules:

* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
  group) moved to the end and an '-ay' suffix appended.

https://en.wikipedia.org/wiki/Pig_Latin

* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
  provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
  doesn't have a language class or converter.

Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
2017-06-12 16:59:57 -04:00
Kunal Mehta
642ffff845 LanguageConverter: Avoid deprecated wfMemcKey()
Change-Id: I7fe8e3ad6de2eb0a156b046805fa0eca928d0892
2017-05-25 11:41:56 -07:00
Thiemo Mättig
8bbf6cb2eb Use more specific string[] type hint for language variants
This patch only touches PHPDoc documentation, nothing else.

Change-Id: Ia79d06425a3b8629c171cd68ae435c64dac86f46
2017-04-17 22:31:22 +02:00
WMDE-Fisch
caae756f72 Remove deprecated noop functions
Change-Id: Ia821d43e243b1ee146d3bc4ed35f6aff0bf17466
2017-03-17 11:27:04 +01:00
Timo Tijhof
3a2a707546 Clean up remaining get_class() uses
* get_class()        -> __CLASS__ (same as self::class)
* get_called_class() -> static::class
* get_class($this)   -> static::class

Change-Id: I1888a1897ecf4548a2e5a67a942e5c080dd7e3d3
2017-03-07 22:03:47 +00:00
C. Scott Ananian
3e32d21210 Strip U+0000 in wikitext
U+0000 is not allowed in HTML5, there's no reason to allow it in wikitext.

It simplifies our code if we can just strip them at the start.  Strip in
PST as well so they don't sneak into our database either.

Tweaked the EXT_LINK URLs to account for the fact that invalid characters
get transformed into U+FFFD when using Preprocessor_DOM.  See 73649741ed
(r65967) for context on that change.

Bug: T159174
Change-Id: I3f67e92b61aacc87a40c3662085c84d1dac08bfb
2017-03-06 22:23:38 +00:00
jenkins-bot
aa3319c4c0 Merge "Miscellaneous indentation tweaks" 2017-02-28 18:38:36 +00:00
James D. Forrester
3526417586 languages: Replace implicit Bugzilla bug numbers with Phab ones
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.

Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
2017-02-28 00:33:38 +00:00
Bartosz Dziewoński
ecdef925bb Miscellaneous indentation tweaks
I was bored. What? Don't look at me that way.

I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.

Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
2017-02-27 19:23:54 +01:00
C. Scott Ananian
5b050be643 Allow HTML tags in LanguageConverter output.
A "remove HTML tags to avoid disrupting the layout" block is removed
(previously added in f16d1e4ed7).

This is a follow-up to I9b099273203482ffb570a5654d8ba50c833e526d.

Bug: T54192
Change-Id: I565fac58b3b0da7bfaedf64f5001c364f52e2244
2016-12-22 01:32:24 +00:00
Aaron Schulz
aac4b448cf Make MessageCache::load() require a language code
Also make it protected; no outside callers exist.

Change-Id: I9f35d05a5e031d1c536a44b19b108803db068677
2016-10-18 17:50:12 -07:00
Aaron Schulz
0809631edd Convert LanguageConverter to using getLocalServerObjectCache()
Change-Id: I7bfcc389ef0266299d887a3520ab9581ef9aa9be
2016-10-11 20:24:42 +00:00
Amir Sarabadani
9850c542c6 Clean up array() syntax in docs, part VII
Last part

Change-Id: I38f015e2122ef4fd2d2141718bd889794c29f06c
2016-09-27 06:53:25 +03:30
Brad Jorsch
9b94bd502f Check User::isSafeToLoad() in LanguageConverter
Ideally LanguageConverter shouldn't be relying on global state at all.
But as a first step let's make it not try to use the global state when
that global state isn't even there.

Bug: T127233
Change-Id: I391cef3ec211d648b078fc509e0139daa58eb875
2016-03-09 21:59:04 +00:00
Bartosz Dziewoński
c161c46d26 Improve code suffering from PHP 5.3's lack of support for foo()[]
I searched for /\$(\S+) = (.+?\(.*?\);)\n.*?\$\1\[/, ignored
everything involving isset(), unset() or array assigments, then
skimmed through the remaining results and changed things where they
made sense. These changes were not automated, so please review them.

Change-Id: Ib37b4c66fc57648470f151ad412210b3629c2538
2016-02-28 22:49:20 +01:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Tim Starling
059fd9a2ae Don't modify $wgHooks on language object construction
Previously various language objects would install a hook to update the
shared conversion table cache when the object was constructed. This is
not a good idea since language objects may be constructed even when they
are not the content language, but only the content language is
associated with variant conversion and the conversion cache.

Instead, have WikiPage call a method on $wgContLang directly. I put this
with message cache update since the logic is almost identical.

Change-Id: Ief9c0ef993e39645e74a6e158cb4e6e2139ce91d
2016-01-29 15:03:56 +11:00
Florian
e0ad37d49a Remove Language::armourMath() and friends
Change-Id: I0ce18bce2d9b5787221e2dabff143de9792abb3a
2016-01-07 09:21:53 -08:00
jenkins-bot
c14fcf8015 Merge "Made convertNamespace() use APC" 2015-09-28 20:44:38 +00:00
Vivek Ghaisas
c54766586a Fix issues identified by SpaceBeforeSingleLineComment sniff
Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9
2015-09-26 23:06:52 +00:00
Aaron Schulz
eb5a2fd8ea Made convertNamespace() use APC
* This can avoid MessageCache::load() calls on another
  language due to variants. The convertNamespace() method
  takes up a significant amount of time for 404 pages.

Change-Id: I4551d5b8e5b5a0bc01d02702b80f93591fc19440
2015-09-25 22:57:58 -07:00
Liangent
ca38682dda LanguageConverter fix of empty and numeric strings
Bug: T51072
Bug: T48634
Bug: T53551
Change-Id: I2c88f1cf7c0014bebf5c798916b660b334a0b78b
2015-06-08 14:23:42 +00:00
Ori Livneh
12571bde26 Use a fixed marker prefix string in the Parser and MWTidy
Generating one-time, unique strip markers hurts us in multiple ways:

* The strip marker regexes don't benefit from JIT compilation, so they are
  slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
  compiled, because HHVM bets on regexes getting reused. This extra work is
  fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
  displaces from the cache regexes which are in fact reused.

Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:

* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
  complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
  replace any occurences of \x7f with '?', to prevent strip marker forgery.
  \x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
  prefix may no longer be specified.

Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
2015-05-31 19:33:36 -07:00