LanguageConverter::getPreferredVariant() may be called during session
initialization, triggered by a call to User::isUsableName(). If that
happens, the $wgUser variable cannot be used since it hasn't been
initialized at this point.
The code previously called $wgUser->isSafeToLoad() to make sure the
User object can be initialized. It however made the assumption that
the User object itself is already present, which is not necessarily the
case.
Bug: T235360
Change-Id: I62bd09d7d09b4d82d251fc6010f0dcc11f21c047
While passing $wgUser just moves the problem, it reduces the number
of references to the global state by eliminating ::getUserVariant's
reference
Bug: T243708
Change-Id: I29f264d5dd69284decec4b83da64274262b28eab
Replace usage of Title in LanguageConverter with LinkTarget which
is more light weighted and provides just the props needed in Language.
Bug: T226834
Change-Id: I02a386bd9898e83c773cbd3d738d347d08f52c11
This follows-up d83fcce5cb, which did something similar for
includes/profiler/.
* Ensure presence of license header.
* Merge any file-level descriptions with the class block,
where it gets seen in generated docs about that class.
* Add any missing `@ingroup` tags to class blocks.
* Remove remaining `@ingroup` from file blocks.
These clutter the Doxygen pages with duplicate entries.
* Fix some misspelled words from 61e0908fa2 and f136c2953c.
Change-Id: I5d21ec159766b799ba519da951d4f0716bae5f9f
Done:
* Replace LanguageConverter::newConverter by LanguageConverterFactory::getLanguageConverter
* Remove LanguageConverter::newConverter from all subclasses
* Add LanguageConverterFactory integration tests which covers all languages by their code.
* Caching of LanguageConverters in factory
* Make all tests running (hope that's would be enough)
* Uncomment the deprecated functions.
* Rename FakeConverter to TrivialLanguageConverter
* Create ILanguageConverter to have shared ancestor
* Make the LanguageConverter class abstract.
* Create table with mapping between lang code and converter instead of using name convention
* ILanguageConverter @internal
* Clean up code
Change-Id: I0e4d77de0f44e18c19956a1ffd69d30e63cf51bf
Bug: T226833, T243332
And also update approximated counts, which for the most part are lower
than reported (hooray!)
Bug: T231636
Depends-On: Ica50297ec7c71a81ba2204f9763499da925067bd
Change-Id: I78354bf5f0c831108c8f606e50c87cf6bc00d8bd
The regular expression used by LanguageConverter::autoConvert() is a
constant, but it is being created on-the-fly by every invocation.
This causes an expensive full-string comparison when the compiled
regular expression is fetched from the cache -- since the regex is 332
bytes long, the time taken for this comparison can add up quickly: on
page with a lot of tags, the regexp cache may spend more time looking
up the regexp than it takes to execute it.
Bug: T223969
Change-Id: I53c3e631e47a791cf3f0844dd79d4357605c59e3
We were concatenating a single character to the end of the wikitext
source (which copies the entire string) every time through an inner
loop; when the page was large and the loop count was large this took
an excessive amount of time.
Bug: T223969
Change-Id: Ib80306b0bc6c73b750d492764f0e2dfd3a7a5450
* Title: phan false positive
* McrUndoAction: fixed improper use of @param
* UploadSourceAdapter: fixed wrong type
* XmlTypeCheck: Use null so phan doesn't think we're trying to call the
function ''
* Database: phan false positive
* SpecialBlock: Use phan's advanced type documentation so phan knows
specifically what's being returned
* ChangesListSpecialPage: phan false positive
* BatchRowUpdate: Have default callback take a parameter so phan doesn't
think too many arguments are being passed
* MimeAnalyzer: left FIXME for relying on PHP 7.1 unpack() signature
* LanguageConverter: Specify types for $mTables since phan couldn't
determine it automatically
* preprocessorFuzzTest: Implement User::load() method signature
Change-Id: I08080ab636c5fe67ea6a4e14b2212d7523606e21
Function Content::getNativeData() was deprecated. Replace with
calls to new function TextContent::getText() in most places.
Bug: T155582
Change-Id: I2bd508c72aac4faf474ba45ab1f92e2e8d2eb9be
In d59f27aeab we made
LanguageConverter::validateVariant() try harder to convert a variant
into an acceptable MediaWiki-internal form, looking at deprecated
codes and BCP 47 aliases. However, this misled Language::hasVariant()
into thinking that bogus names (like all-uppercase strings) were
acceptable variant names, which then led exceptions when they were
passed to the various conversion methods.
This is a belt-and-suspenders patch for T207433 -- in that case we
shouldn't have created a Language object with code 'sr-cyrl' in the
first place, but once one was created we shouldn't have tried to
ask LanguageSr to convert texts to 'sr-cyrl'. The latter problem
is fixed by this patch.
Bug: T207433
Change-Id: Id993bc7989144b5031a551662e8e492bd23f698a
Facilitate a gradual migration away from non-standard MediaWiki language
codes. This will ensure that (a) rules can be written with standard
BCP 47 codes, and (b) rules written with existing nonstandard codes will
continue to work once these are added to
LanguageCode::$deprecatedLanguageCodeMapping.
Change-Id: I3ba96faafaf40bd47fb5919621f7035f0431a698
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places. Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.
This is a replay of an earlier merged patch,
0818070c59, which had to be reverted
because it was based on 8380f0173e which
caused regressions in the Babel extension (T199941).
Change-Id: Ica89d9547c58967747ab0fa15d4e83be5378796d
If you feed this method unescaped data, it can cause later calls
to be an XSS, which is something I think deserves a warning.
Bug: T202571
Change-Id: I34cb3da9232a22defffb80466263c2f2233822ef
"continue" statements are equivalent to "break". In PHP 7.3, will generate a warning.
Bug: T200595
Change-Id: I244ecb2e1ce5a76295f014fb1becd8d263196846
The browser Accept-Language header uses BCP 47 codes, which don't
precisely match our internal mediawiki variant names in a number of
places. Allow proper BCP 47 codes to alias our internal variants
for: Accept-Language parsing, URL parsing, user preferences, and
explicit enumeration of codes in LanguageConverter rules.
Change-Id: I8468a56d5b88f5786abd0a17b67bda2f1687fd0c
Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).
Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.
Regularize CRH language name in messages/* files.
Fix "varient" typos in qqq.json.
Add unit tests for CRH transliteration.
Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
This fixes an issue in f21f3942 where if there was an html
element with an alt or title attribute containing an <
entity, an ascii EOT control character (0x04) may become
inserted into the text if language converter was enabled.
Due to a really old bug in language converter, self-closed tags
got turned into non-self closed tags. However due a different
bug which was fixed in f21f3942 this code path was rarely taken
so nobody noticed until now.
Follow-up Idbc45cac12
Bug: T180552
Change-Id: I077d30c50fcb419837fef937d27caca307153d2d
Previously, if one had an attribute with the contents
"-{}-foo-{}-", foo would get replaced by language converter as if
it wasn't in an attribute. This lead to an XSS attack.
This breaks doing manual conversions in url href's (or any
other attribute that goes through an escaping method
other than Sanitizer's). e.g. http://{sr-el:foo';sr-ec:bar}.com
won't work anymore. See also T87332
Bug: T119158
Change-Id: Idbc45cac12c309b0ccb4adeff6474fa527b48edb
Adjust regexes for what not to convert to avoid backtracking by
preferring possesive quantifiers
Add check that we really have matched to the end of the string, and
log error if the regex hits some sort of error preventing the
entire string from being matched. Should the regex not match to the
end, then language conversion is disabled for the string.
Bug: T124404
Change-Id: I4f0c171c7da804e9c1508ef1f59556665a318f6a
Example implementation using this hook: wikiHow's ChineseVariantSelector
extension, installed on zh.wikihow.com, which uses cookies to store the
preferred language variant, allowing anonymous users to change the
language variant without registering/logging in.
Change-Id: I5295a26578b45a8d51f2b7550938088fec18404f