Commit graph

331 commits

Author SHA1 Message Date
daniel
fe9344d1d1 languages: Don't assume $wgUser in LanguageConverter
LanguageConverter::getPreferredVariant() may be called during session
initialization, triggered by a call to User::isUsableName(). If that
happens, the $wgUser variable cannot be used since it hasn't been
initialized at this point.

The code previously called $wgUser->isSafeToLoad() to make sure the
User object can be initialized. It however made the assumption that
the User object itself is already present, which is not necessarily the
case.

Bug: T235360
Change-Id: I62bd09d7d09b4d82d251fc6010f0dcc11f21c047
2020-03-16 20:28:29 +00:00
Derick A
df4c9f7663 languages: Avoid usage of deprecated MessageCache::singleton()
Change-Id: Idd67639f685f913b823dad3f5eb82d2e86f6e842
2020-03-11 12:12:24 +01:00
jenkins-bot
685db36e91 Merge "Add a user parameter to LanguageConverter::getUserVariant" 2020-02-19 20:56:14 +00:00
DannyS712
3666f43986 Add a user parameter to LanguageConverter::getUserVariant
While passing $wgUser just moves the problem, it reduces the number
of references to the global state by eliminating ::getUserVariant's
reference

Bug: T243708
Change-Id: I29f264d5dd69284decec4b83da64274262b28eab
2020-02-19 05:36:45 +00:00
Petr Pchelko
204fa7e509 Remove usages of deprecated Language methods
Change-Id: Iad3375b141b1d87c890baec6ecd16ed92f93e699
2020-02-16 00:45:48 +00:00
jenkins-bot
0f357294ca Merge "Remove LanguageConverter dependencies on Title and use LinkTarget" 2020-02-12 22:44:12 +00:00
ArtBaltai
3bf4b42490 Remove LanguageConverter dependencies on Title and use LinkTarget
Replace usage of Title in LanguageConverter with LinkTarget which
is more light weighted and provides just the props needed in Language.

Bug: T226834
Change-Id: I02a386bd9898e83c773cbd3d738d347d08f52c11
2020-02-12 18:37:11 +03:00
Timo Tijhof
f5895c2c82 language: Clean up file headers and class-level docs
This follows-up d83fcce5cb, which did something similar for
includes/profiler/.

* Ensure presence of license header.

* Merge any file-level descriptions with the class block,
  where it gets seen in generated docs about that class.

* Add any missing `@ingroup` tags to class blocks.

* Remove remaining `@ingroup` from file blocks.
  These clutter the Doxygen pages with duplicate entries.

* Fix some misspelled words from 61e0908fa2 and f136c2953c.

Change-Id: I5d21ec159766b799ba519da951d4f0716bae5f9f
2020-02-12 02:15:44 +00:00
Peter Ovchyn
ed18dba8f4 language: remove Language hints for type check as it breaks using of StubUserLang
Bug: T244300
Change-Id: Iec1b5629617f1c171e8af507dc1dcebfef0666eb
2020-02-05 16:11:31 +02:00
Peter Ovchyn
baa0e2a425 languages: Decrease visibility of public variables in LanguageConverter class
Bug: T243461
Change-Id: I461ce7daba3a6a85464a69f4de76b1740472702d
2020-02-04 16:53:15 +02:00
Peter Ovchyn
61e0908fa2 languages: Introduce LanguageConverterFactory
Done:
* Replace LanguageConverter::newConverter by LanguageConverterFactory::getLanguageConverter
* Remove LanguageConverter::newConverter from all subclasses
* Add LanguageConverterFactory integration tests which covers all languages by their code.
* Caching of LanguageConverters in factory
* Make all tests running (hope that's would be enough)
* Uncomment  the deprecated functions.
* Rename FakeConverter to TrivialLanguageConverter
* Create ILanguageConverter to have shared ancestor
* Make the LanguageConverter class abstract.
* Create table with mapping between lang code and converter instead of using name convention
* ILanguageConverter @internal
* Clean up code

Change-Id: I0e4d77de0f44e18c19956a1ffd69d30e63cf51bf
Bug: T226833, T243332
2020-02-03 11:38:03 +02:00
James D. Forrester
0958a0bce4 Coding style: Auto-fix MediaWiki.Usage.IsNull.IsNull
Change-Id: I90cfe8366c0245c9c67e598d17800684897a4e27
2020-01-10 14:17:13 -08:00
James D. Forrester
4f2d1efdda Coding style: Auto-fix MediaWiki.Classes.UnsortedUseStatements.UnsortedUse
Change-Id: I94a0ae83c65e8ee419bbd1ae1e86ab21ed4d8210
2020-01-10 09:32:25 -08:00
Umherirrender
8847aff551 Set method visibility on languages classes
Change-Id: I9dd10bbf81d277865301eccde73da17418df1238
2019-11-17 00:22:43 +01:00
Daimona Eaytoy
b5f0d61ee4 Fix new phan errors, part 8
Bug: T231636
Change-Id: I61852ba55362ab9ae8cc8c1ab6b27565ce1d08e7
2019-10-22 10:09:13 +02:00
Daimona Eaytoy
b1a5367ec8 Fix new phan errors, part 7
Bug: T231636
Change-Id: Ia5e0abee7163c5a1abd0bb53b89603cc2e7a9b5c
2019-10-21 22:10:20 +00:00
Umherirrender
f74400487f phan: Disable enable_class_alias_support
It is enabled for b/c in extensions, but not needed in core

Change-Id: I51dca12be9c77049f77563d9bf0edd07928c2300
2019-09-15 08:26:52 +00:00
Daimona Eaytoy
c659bc6308 Unsuppress another phan issue (part 7)
Bug: T231636
Depends-On: I2cd24e73726394e3200a570c45d5e86b6849bfa9
Depends-On: I4fa3e6aad872434ca397325ed7a83f94973661d0
Change-Id: Ie6233561de78457cae5e4e44e220feec2d1272d8
2019-09-03 17:19:21 +00:00
Daimona Eaytoy
e70b5b3309 Unsuppress other phan issues (part 4)
Bug: T231636
Depends-On: I58e67c2b38389df874438deada4239510d21654f
Change-Id: I6e5fba7bd273219b1206559420b5bdb78734aa84
2019-08-31 17:13:39 +00:00
Daimona Eaytoy
fb3428eb8f Unsuppress other phan issues with low count
And also update approximated counts, which for the most part are lower
than reported (hooray!)

Bug: T231636
Depends-On: Ica50297ec7c71a81ba2204f9763499da925067bd
Change-Id: I78354bf5f0c831108c8f606e50c87cf6bc00d8bd
2019-08-30 09:42:15 +00:00
Derick Alangi
339211a1ea Avoid usage of deprecated Revision::* constants, use RevisionRecord
Change-Id: I872fc89e5c02dd6a3ae9cd7e76640b95dc33f514
2019-07-21 15:03:03 +01:00
C. Scott Ananian
25512652d2 LanguageConverter performance: Reuse the same string object for regexp
The regular expression used by LanguageConverter::autoConvert() is a
constant, but it is being created on-the-fly by every invocation.
This causes an expensive full-string comparison when the compiled
regular expression is fetched from the cache -- since the regex is 332
bytes long, the time taken for this comparison can add up quickly: on
page with a lot of tags, the regexp cache may spend more time looking
up the regexp than it takes to execute it.

Bug: T223969
Change-Id: I53c3e631e47a791cf3f0844dd79d4357605c59e3
2019-07-02 14:32:01 -04:00
C. Scott Ananian
930efa63e1 Improve LanguageConverter performance on pages with many HTML tags
We were concatenating a single character to the end of the wikitext
source (which copies the entire string) every time through an inner
loop; when the page was large and the loop count was large this took
an excessive amount of time.

Bug: T223969
Change-Id: Ib80306b0bc6c73b750d492764f0e2dfd3a7a5450
2019-07-02 14:24:45 -04:00
Kunal Mehta
7bd9073c4b Fix/suppress misc phan errors (#2)
* Title: phan false positive
* McrUndoAction: fixed improper use of @param
* UploadSourceAdapter: fixed wrong type
* XmlTypeCheck: Use null so phan doesn't think we're trying to call the
function ''
* Database: phan false positive
* SpecialBlock: Use phan's advanced type documentation so phan knows
specifically what's being returned
* ChangesListSpecialPage: phan false positive
* BatchRowUpdate: Have default callback take a parameter so phan doesn't
think too many arguments are being passed
* MimeAnalyzer: left FIXME for relying on PHP 7.1 unpack() signature
* LanguageConverter: Specify types for $mTables since phan couldn't
determine it automatically
* preprocessorFuzzTest: Implement User::load() method signature

Change-Id: I08080ab636c5fe67ea6a4e14b2212d7523606e21
2019-04-05 16:12:18 -07:00
Bill Pirkle
d993f499ee Refactor calls to deprecated function Content::getNativeData()
Function Content::getNativeData() was deprecated.  Replace with
calls to new function TextContent::getText() in most places.

Bug: T155582
Change-Id: I2bd508c72aac4faf474ba45ab1f92e2e8d2eb9be
2019-02-15 17:48:01 +00:00
Kunal Mehta
cc5d9a92a2 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
Change-Id: I66b1775b7c1d36076d9ca78cbeb42787a743f2aa
2019-02-07 18:39:42 +00:00
jenkins-bot
7ddab17aac Merge "Accept BCP 47 codes in LanguageConverter rules" 2018-11-27 18:49:25 +00:00
Fomafix
512aa4e551 Use PHP 7 '??' operator instead of if-then-else
Change-Id: Ia86f8433f30a166d38ee63d0d1745b26740767b9
2018-10-27 23:46:13 +02:00
C. Scott Ananian
fcbde8ae4e Make Language::hasVariant() more strict
In d59f27aeab we made
LanguageConverter::validateVariant() try harder to convert a variant
into an acceptable MediaWiki-internal form, looking at deprecated
codes and BCP 47 aliases.  However, this misled Language::hasVariant()
into thinking that bogus names (like all-uppercase strings) were
acceptable variant names, which then led exceptions when they were
passed to the various conversion methods.

This is a belt-and-suspenders patch for T207433 -- in that case we
shouldn't have created a Language object with code 'sr-cyrl' in the
first place, but once one was created we shouldn't have tried to
ask LanguageSr to convert texts to 'sr-cyrl'.  The latter problem
is fixed by this patch.

Bug: T207433
Change-Id: Id993bc7989144b5031a551662e8e492bd23f698a
2018-10-22 16:35:26 -04:00
C. Scott Ananian
f7bb180fef Accept BCP 47 codes in LanguageConverter rules
Facilitate a gradual migration away from non-standard MediaWiki language
codes.  This will ensure that (a) rules can be written with standard
BCP 47 codes, and (b) rules written with existing nonstandard codes will
continue to work once these are added to
LanguageCode::$deprecatedLanguageCodeMapping.

Change-Id: I3ba96faafaf40bd47fb5919621f7035f0431a698
2018-10-16 23:58:11 -04:00
C. Scott Ananian
d59f27aeab Accept BCP 47 codes as aliases for nonstandard variants
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places.  Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.

This is a replay of an earlier merged patch,
0818070c59, which had to be reverted
because it was based on 8380f0173e which
caused regressions in the Babel extension (T199941).

Change-Id: Ica89d9547c58967747ab0fa15d4e83be5378796d
2018-10-11 02:23:20 -04:00
Brian Wolff
8dbf6a7b31 Add taint annotation and warnings to Language::convert() et al
If you feed this method unescaped data, it can cause later calls
to be an XSS, which is something I think deserves a warning.

Bug: T202571
Change-Id: I34cb3da9232a22defffb80466263c2f2233822ef
2018-09-01 18:10:55 +00:00
RazeSoldier
24ffbd9bd1 Use "break" instead of "continue"
"continue" statements are equivalent to "break". In PHP 7.3, will generate a warning.

Bug: T200595
Change-Id: I244ecb2e1ce5a76295f014fb1becd8d263196846
2018-08-24 00:18:07 +08:00
Fomafix
73f94fd8cd Add type hint Language where possible
Also use ?? instead of ?: to check for null.

Change-Id: I058b61d7e06cdefecdafa82f60109cc386e2a809
2018-08-12 10:20:11 +02:00
Aryeh Gregor
90d4f56fe4 Mass conversion of $wgContLang to service
Brought to you by vim macros.

Bug: T200246
Change-Id: I79e919f4553e3bd3eb714073fed7a43051b4fb2a
2018-08-11 22:44:29 -06:00
Greg Grossmeier
dc282a46d7 Revert "Accept BCP 47 codes as aliases for nonstandard variants"
This reverts commit 0818070c59.

Reason for revert: Caused T199941

Bug: T199941
Change-Id: I24c178eb33890477de79cbb3122861c140578011
2018-07-23 16:44:55 +00:00
C. Scott Ananian
0818070c59 Accept BCP 47 codes as aliases for nonstandard variants
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places.  Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.

Change-Id: I8468a56d5b88f5786abd0a17b67bda2f1687fd0c
2018-07-13 17:43:20 -04:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
Fomafix
2feb1fccd4 LanguageConverter: Fix @return description
validateVariant returns null, not false.

Change-Id: I5241205da9f4d6266f09b361df856e50ddd96a7d
2018-06-20 18:36:49 +02:00
Kunal Mehta
230958d97c Autofix MediaWiki.Commenting.FunctionComment.SpacingDoc* errors
Change-Id: I63761ebce04c03b9b13237919c27cc10180f198f
2018-05-19 14:07:03 -07:00
jenkins-bot
236488d398 Merge "Add a hook into LanguageConverter#getPreferredVariant() to allow extensions to pull the desired variant from cookies (or other such source)" 2018-01-23 23:01:34 +00:00
Umherirrender
255d76f2a1 build: Updating mediawiki/mediawiki-codesniffer to 15.0.0
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable

For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore

Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
2018-01-01 14:10:16 +01:00
tjones
a0b511319c Crimean Tatar Transliteration
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).

Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.

Regularize CRH language name in messages/* files.

Fix "varient" typos in qqq.json.

Add unit tests for CRH transliteration.

Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
2017-11-20 16:56:38 -05:00
Brian Wolff
4acbbf0972 Follow-up I077d30c50 fix phpcs error
Change-Id: I28cb7060d6149d96ceb0dcad7e2bff2ed3434411
2017-11-15 06:56:38 +00:00
Brian Wolff
f0555bab3d Fix langauge converter parser test with self-close tags
This fixes an issue in f21f3942 where if there was an html
element with an alt or title attribute containing an <
entity, an ascii EOT control character (0x04) may become
inserted into the text if language converter was enabled.

Due to a really old bug in language converter, self-closed tags
got turned into non-self closed tags. However due a different
bug which was fixed in f21f3942 this code path was rarely taken
so nobody noticed until now.

Follow-up Idbc45cac12

Bug: T180552
Change-Id: I077d30c50fcb419837fef937d27caca307153d2d
2017-11-15 06:03:22 +00:00
Reedy
f600b4ede9 Fix phpcs issues from LanguageConverter patches
Change-Id: I34e57c90ffd40fbd9f8afe3c57dd73fa7f655841
2017-11-15 03:37:27 +00:00
Brian Wolff
f21f3942eb SECURITY: Handle -{}- syntax in attributes safely
Previously, if one had an attribute with the contents
"-{}-foo-{}-", foo would get replaced by language converter as if
it wasn't in an attribute. This lead to an XSS attack.

This breaks doing manual conversions in url href's (or any
other attribute that goes through an escaping method
other than Sanitizer's). e.g. http://{sr-el:foo';sr-ec:bar}.com
won't work anymore. See also T87332

Bug: T119158
Change-Id: Idbc45cac12c309b0ccb4adeff6474fa527b48edb
2017-11-15 03:33:03 +00:00
Brian Wolff
fbe78cfa09 SECURITY: XSS in langconverter when regex hits pcre.backtrack_limit
Adjust regexes for what not to convert to avoid backtracking by
preferring possesive quantifiers

Add check that we really have matched to the end of the string, and
log error if the regex hits some sort of error preventing the
entire string from being matched. Should the regex not match to the
end, then language conversion is disabled for the string.

Bug: T124404
Change-Id: I4f0c171c7da804e9c1508ef1f59556665a318f6a
2017-11-15 03:33:03 +00:00
Umherirrender
f739a8f368 Improve some parameter docs
Add missing @return and @param to function docs and fixed some @param

Change-Id: I810727961057cfdcc274428b239af5975c57468d
2017-09-10 20:32:31 +02:00
Jack Phoenix
43da7fb884 Add a hook into LanguageConverter#getPreferredVariant() to allow extensions to pull the desired variant from cookies (or other such source)
Example implementation using this hook: wikiHow's ChineseVariantSelector
extension, installed on zh.wikihow.com, which uses cookies to store the
preferred language variant, allowing anonymous users to change the
language variant without registering/logging in.

Change-Id: I5295a26578b45a8d51f2b7550938088fec18404f
2017-07-23 16:35:09 +03:00