Clean up use of @codingStandardsIgnore
- @codingStandardsIgnoreFile -> phpcs:ignoreFile
- @codingStandardsIgnoreLine -> phpcs:ignore
- @codingStandardsIgnoreStart -> phpcs:disable
- @codingStandardsIgnoreEnd -> phpcs:enable
For phpcs:disable always the necessary sniffs are provided.
Some start/end pairs are changed to line ignore
Change-Id: I92ef235849bcc349c69e53504e664a155dd162c8
This is a first pass at Latin/Cyrillic translitertion for Crimean
Tatar (crh).
Includes transliteration tables, prefix/suffix mappings, regex
mappings, and exceptions lists for words and abbreviations.
Regularize CRH language name in messages/* files.
Fix "varient" typos in qqq.json.
Add unit tests for CRH transliteration.
Bug: T23582
Change-Id: I424703f99adf837f6217872b882d1ea26bfdd068
This fixes an issue in f21f3942 where if there was an html
element with an alt or title attribute containing an <
entity, an ascii EOT control character (0x04) may become
inserted into the text if language converter was enabled.
Due to a really old bug in language converter, self-closed tags
got turned into non-self closed tags. However due a different
bug which was fixed in f21f3942 this code path was rarely taken
so nobody noticed until now.
Follow-up Idbc45cac12
Bug: T180552
Change-Id: I077d30c50fcb419837fef937d27caca307153d2d
Previously, if one had an attribute with the contents
"-{}-foo-{}-", foo would get replaced by language converter as if
it wasn't in an attribute. This lead to an XSS attack.
This breaks doing manual conversions in url href's (or any
other attribute that goes through an escaping method
other than Sanitizer's). e.g. http://{sr-el:foo';sr-ec:bar}.com
won't work anymore. See also T87332
Bug: T119158
Change-Id: Idbc45cac12c309b0ccb4adeff6474fa527b48edb
Adjust regexes for what not to convert to avoid backtracking by
preferring possesive quantifiers
Add check that we really have matched to the end of the string, and
log error if the regex hits some sort of error preventing the
entire string from being matched. Should the regex not match to the
end, then language conversion is disabled for the string.
Bug: T124404
Change-Id: I4f0c171c7da804e9c1508ef1f59556665a318f6a
Example implementation using this hook: wikiHow's ChineseVariantSelector
extension, installed on zh.wikihow.com, which uses cookies to store the
preferred language variant, allowing anonymous users to change the
language variant without registering/logging in.
Change-Id: I5295a26578b45a8d51f2b7550938088fec18404f
Make the LanguageConverter::reloadTables method actually private,
and use the TestingAccessWrapper to call it when running parser tests.
Follow-up to I65736520cd04bfe8949b29ade07338a6e1b88a4d.
Change-Id: I43b81b8fef6441ad50b858ff7757732ecb5eef91
Conversion rules defined in a previous test case were leaking into
subsequent test cases. Existing tests had worked around this by defining
non-overlapping rules, but it's better to just fix the problem at the
source.
Change-Id: I65736520cd04bfe8949b29ade07338a6e1b88a4d
Guarded by the $wgUsePigLatinVariant variable, off by default.
Pig Latin is a language game where words in English are altered
according to the following rules:
* Words starting with a vowel have a '-way' suffix appended.
* Words starting with a consonant have the initial consonants (or 'qu'
group) moved to the end and an '-ay' suffix appended.
https://en.wikipedia.org/wiki/Pig_Latin
* Added 'en-x-piglatin' as a language name.
* Added 'en' to LanguageConverter::$languagesWithVariants.
* Added LanguageEn class and its corresponding EnConverter which
provides one-way translation from English to Pig Latin.
* Some minor internal changes in code that assumed that English
doesn't have a language class or converter.
Bug: T45547
Depends-On: I1d9691c784032669979f8109c9a5f65cbf4122c9
Change-Id: I7fa2d85d6364958c5138366e8b4504a2697a8731
U+0000 is not allowed in HTML5, there's no reason to allow it in wikitext.
It simplifies our code if we can just strip them at the start. Strip in
PST as well so they don't sneak into our database either.
Tweaked the EXT_LINK URLs to account for the fact that invalid characters
get transformed into U+FFFD when using Preprocessor_DOM. See 73649741ed
(r65967) for context on that change.
Bug: T159174
Change-Id: I3f67e92b61aacc87a40c3662085c84d1dac08bfb
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
A "remove HTML tags to avoid disrupting the layout" block is removed
(previously added in f16d1e4ed7).
This is a follow-up to I9b099273203482ffb570a5654d8ba50c833e526d.
Bug: T54192
Change-Id: I565fac58b3b0da7bfaedf64f5001c364f52e2244
Ideally LanguageConverter shouldn't be relying on global state at all.
But as a first step let's make it not try to use the global state when
that global state isn't even there.
Bug: T127233
Change-Id: I391cef3ec211d648b078fc509e0139daa58eb875
I searched for /\$(\S+) = (.+?\(.*?\);)\n.*?\$\1\[/, ignored
everything involving isset(), unset() or array assigments, then
skimmed through the remaining results and changed things where they
made sense. These changes were not automated, so please review them.
Change-Id: Ib37b4c66fc57648470f151ad412210b3629c2538
Previously various language objects would install a hook to update the
shared conversion table cache when the object was constructed. This is
not a good idea since language objects may be constructed even when they
are not the content language, but only the content language is
associated with variant conversion and the conversion cache.
Instead, have WikiPage call a method on $wgContLang directly. I put this
with message cache update since the logic is almost identical.
Change-Id: Ief9c0ef993e39645e74a6e158cb4e6e2139ce91d
* This can avoid MessageCache::load() calls on another
language due to variants. The convertNamespace() method
takes up a significant amount of time for 404 pages.
Change-Id: I4551d5b8e5b5a0bc01d02702b80f93591fc19440
Generating one-time, unique strip markers hurts us in multiple ways:
* The strip marker regexes don't benefit from JIT compilation, so they are
slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
compiled, because HHVM bets on regexes getting reused. This extra work is
fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
displaces from the cache regexes which are in fact reused.
Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:
* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
replace any occurences of \x7f with '?', to prevent strip marker forgery.
\x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
prefix may no longer be specified.
Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
Instead of instantiating this on every single request. Removes
wfGetLangConverterCacheStorage() and $wgLangConvMemc which were
otherwise unused.
Change-Id: Ic500944a92c2a94bc649e1b492c33714d81dca00
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
There are so many slightly different understandings of what a
"section" is or can be. I'm aware the documentation was improved
just a few weeks ago. I still find it incomplete and confusing.
1. I renamed it to $sectionId to make it more clear what it
really is.
2. Sections are usually numbers. 0, 1 and so on. There is no
reason to disallow the use of ints or even floats (this works
because the string representation of 0.0 is "0"). The code never
disallowed numbers.
3. 'T1' never was supported, as far as I can tell. 'T-1' is
supported. See Parser::extractSections().
4. null and false and '' all mean "the whole page" in
WikiPage::replaceSectionAtRev() but for some reason this meaning got
lost in WikitextContent::replaceSection(). I made it the same again.
Change-Id: Icc3997722d2ed742bf7703cd7c06d09199225720
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Change-Id: I7a4dec6a8de96ee21ef34e52bb755f723aa3b0e6
Also swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Change-Id: Ic36c8c7820a6c2d603f1138130670c6bf6a1ca59
This toggle was introduced in 8d06ad6e, but the most useful feature for
human users there (disabling <h1> conversion on a per-user basis) has
been dropped due to cache fragmentation. The only remaining part is not
quite useful and can be covered by the URL parameter &linkconvert=no.
Change-Id: I12f2cdc9b0d44d6e47487b14fa8ef010de5c94a7