U+0000 is not allowed in HTML5, there's no reason to allow it in wikitext.
It simplifies our code if we can just strip them at the start. Strip in
PST as well so they don't sneak into our database either.
Tweaked the EXT_LINK URLs to account for the fact that invalid characters
get transformed into U+FFFD when using Preprocessor_DOM. See 73649741ed
(r65967) for context on that change.
Bug: T159174
Change-Id: I3f67e92b61aacc87a40c3662085c84d1dac08bfb
It's unreasonable to expect newbies to know that "bug 12345" means "Task T14345"
except where it doesn't, so let's just standardise on the real numbers.
Change-Id: Id2f9d229d17b8eee66b2ca4e3927f3f66ac62988
I was bored. What? Don't look at me that way.
I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.
Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
A "remove HTML tags to avoid disrupting the layout" block is removed
(previously added in f16d1e4ed7).
This is a follow-up to I9b099273203482ffb570a5654d8ba50c833e526d.
Bug: T54192
Change-Id: I565fac58b3b0da7bfaedf64f5001c364f52e2244
Ideally LanguageConverter shouldn't be relying on global state at all.
But as a first step let's make it not try to use the global state when
that global state isn't even there.
Bug: T127233
Change-Id: I391cef3ec211d648b078fc509e0139daa58eb875
I searched for /\$(\S+) = (.+?\(.*?\);)\n.*?\$\1\[/, ignored
everything involving isset(), unset() or array assigments, then
skimmed through the remaining results and changed things where they
made sense. These changes were not automated, so please review them.
Change-Id: Ib37b4c66fc57648470f151ad412210b3629c2538
Previously various language objects would install a hook to update the
shared conversion table cache when the object was constructed. This is
not a good idea since language objects may be constructed even when they
are not the content language, but only the content language is
associated with variant conversion and the conversion cache.
Instead, have WikiPage call a method on $wgContLang directly. I put this
with message cache update since the logic is almost identical.
Change-Id: Ief9c0ef993e39645e74a6e158cb4e6e2139ce91d
* This can avoid MessageCache::load() calls on another
language due to variants. The convertNamespace() method
takes up a significant amount of time for 404 pages.
Change-Id: I4551d5b8e5b5a0bc01d02702b80f93591fc19440
Generating one-time, unique strip markers hurts us in multiple ways:
* The strip marker regexes don't benefit from JIT compilation, so they are
slower to execute than they could be.
* Although the regexes don't benefit from JIT compilation, they are still
compiled, because HHVM bets on regexes getting reused. This extra work is
fairly costly (1-2% of CPU usage on the app servers) and doesn't pay off.
* The size of the PCRE JIT cache is finite, and the caching of one-off regexes
displaces from the cache regexes which are in fact reused.
Tim's preferred solution (per his review comment on
https://gerrit.wikimedia.org/r/167530/) is to use fixed strip markers.
So:
* Replace usage of $parser->mUniqPrefix with Parser::MARKER_PREFIX, which
complements the existing Parser::MARKER_SUFFIX.
* Deprecate Parser::mUniqPrefix and its accessor, Parser::uniqPrefix().
* Deprecate Parser::getRandomString(), since it is no longer useful.
* In Preprocessor_*:preprocessToObj() and Parser::fetchTemplateAndTitle,
replace any occurences of \x7f with '?', to prevent strip marker forgery.
\x7f is not valid input anyway.
* Deprecate the $prefix parameter for StripState::__construct, since a custom
prefix may no longer be specified.
Change-Id: I31d4556bbb07acb72c33fda335fa5a230379a03f
Instead of instantiating this on every single request. Removes
wfGetLangConverterCacheStorage() and $wgLangConvMemc which were
otherwise unused.
Change-Id: Ic500944a92c2a94bc649e1b492c33714d81dca00
Xhprof generates this data now. Custom profiling of various
sub-function units are kept.
Calls to profiler represented about 3% of page execution
time on Special:BlankPage (1.5% in/out); after this change
it's down to about 0.98% of page execution time.
Change-Id: Id9a1dc9d8f80bbd52e42226b724a1e1213d07af7
There are so many slightly different understandings of what a
"section" is or can be. I'm aware the documentation was improved
just a few weeks ago. I still find it incomplete and confusing.
1. I renamed it to $sectionId to make it more clear what it
really is.
2. Sections are usually numbers. 0, 1 and so on. There is no
reason to disallow the use of ints or even floats (this works
because the string representation of 0.0 is "0"). The code never
disallowed numbers.
3. 'T1' never was supported, as far as I can tell. 'T-1' is
supported. See Parser::extractSections().
4. null and false and '' all mean "the whole page" in
WikiPage::replaceSectionAtRev() but for some reason this meaning got
lost in WikitextContent::replaceSection(). I made it the same again.
Change-Id: Icc3997722d2ed742bf7703cd7c06d09199225720
Swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Change-Id: I7a4dec6a8de96ee21ef34e52bb755f723aa3b0e6
Also swapped some "$var type" to "type $var" or added missing types
before the $var. Changed some other types to match the more common
spelling. Makes beginning of some text in captial.
Change-Id: Ic36c8c7820a6c2d603f1138130670c6bf6a1ca59
This toggle was introduced in 8d06ad6e, but the most useful feature for
human users there (disabling <h1> conversion on a per-user basis) has
been dropped due to cache fragmentation. The only remaining part is not
quite useful and can be covered by the URL parameter &linkconvert=no.
Change-Id: I12f2cdc9b0d44d6e47487b14fa8ef010de5c94a7
Split the variable assignment and the return statement in two lines for
better readability.
When there was two return statements in one method the logic was swapped
to have only one return statement.
Change-Id: Id7a01b4a2df96036435f9e1a9be5678dd124b0af
The Line continuation Coding conventions prefers the closing parenthesis
on the same line than the beginning curly braces. This is done for ifs
and functions.
Also move some boolean operator from the end of a line to the beginning
and changed some indentation to make the condition hopefully better
readable.
Change-Id: Id0437b06bde86eb5a75bc59eefa19e7edb624426
- Place commas correct
- Moved comments
- Add space after if/foreach/catch
- Reformat some conditions
- Removed trailing spaces/tabs
Change-Id: I40ccda72c418c4a33fcd675773cb08d971510cdb
The math specific functions in core are not needed
anymore and should be removed in future versions.
Math can access these settings in the same way as
all other extensions do.
Since Math 2.0 the rendered element has the property
"markerType" => 'nowiki'
Change-Id: I20d3714bed9da864146f133a08cf4ca90eda42ab
Now with the introduction of page language, a site can have pages
in all languages, and different languages have different variants.
This patch allows users to set preferred variants for every page
they may see on the site.
Change-Id: Ie7e82bee0b1f8f902b38bb4a464cf0ebc4df4d89
Currently if the site language is zh and a user is using variant zh-tw,
namespace names from zh-hant are displayed because of the language
converter, but they're not accepted by MediaWiki as valid namespace names
by default because zh falls back to zh-hans.
For core namespaces, all converted namespace names are manually added as
$namespaceAliases in MessagesZh.php but it's not always done in extensions.
With this patch converted namespace names are automatically added as
namespace aliases when namespace aliases are loaded.
In some followup commit it makes sense to remove existing core namespace
aliases which were created for this reason.
Change-Id: I01873d9c64a9943afbb655d6203cec9ebd39fb72
Using the following command line, I have found doc comments mentioning
"1.21" when they should mention "1.22" instead, which I have fixed
manually:
git diff REL1_21 | grep --color=always -C 10 -iE \
'^\+.*(since|deprecated).*1\.21(\D|$)' | aha > oldver.html
I also moved the release notes for I1987190f ("Combine JavaScript and
JSON encoding logic") from RELEASE-NOTES-1.21 to RELEASE-NOTES-1.22
because I had reverted the commit on REL1_21 only (see Id3b88102 and
bug 47431 for the rationale).
Change-Id: I11b917a371e07267dfa98b8449776d0c1cb29b15
Follows-Up: I25cf5a94f6e47f85a9d0b80cc1c9c9f957288478
Follows-Up: I3d72e4105f6244b0695116940e62a2ddef66eb66
Follows-Up: I3faa9c3e8107c6e46cdf21f8c18adda1f42890d7
Follows-Up: I6aab19c8d68bf47beddad42632b0360a7b12f251
Follows-Up: I86368821fc2cd0729df5342b8572eb470c0f77a0
Follows-Up: Id3b88102e768318e3605a19e9952121091a40915
Follows-Up: Ie667088010e24eb6cb569f9e8e8e2553005223eb
Follows-up 97caae596d which makes HTML5 the default
and removes support for XHTML 1.0 and HTML < 5.
* <script type>
* <style type>
* <html xmlns>
* Quick-closing slash in non-XML HTML5 documents
Change-Id: I71855fa8d4095a5a448ebdc3dc36506ddab6f70c
Currently the main language code means "no conversion",
but may technically be used as a standalone variant (and
usually falls back to another variant). However for the
page title, since there can be a nice unconverted version
from real page name, it seems better to provide people who
choose "no conversion" with this unconverted version, rather
than a fallback from a variant which is always converted.
If a string targeting exactly the main language code exists
(ie. no fallbacks needed), it's still used. Because the
magic word DISPLAYTITLE cannot be used together with
language conversion syntax and people must use -{T| }- to
show a modified display title for "no conversion" version.
Change-Id: I25cf5a94f6e47f85a9d0b80cc1c9c9f957288478
After Iaaf6cceb, MessageCache::get() goes through the fallback chain,
which is unwanted for conversion tables (for example, we don't want
zh-hans table for zh, where zh means "no conversion"). Since the only
needed feature is to fetch text from MediaWiki namespace (conversion
tables in PHP are stored somewhere else), it is now changed to use
MessageCache::getMsgFromNamespace() instead to avoid fallbacks.
Change-Id: I46e0be31c9c0fe0a6e4923fc1aff0fbbadbf1d67