To follow Message. This is approved as part of RFC T166010.
Also namespace it but doing it properly with PSR-4 would require
namespacing every class under language/ and that will take some time.
Bug: T321882
Change-Id: I195cf4c67bd51410556c2dd1e33cc9c1033d5d18
None are used in WMF-deployed extensions and have been hard deprecated
for multiple releases as well.
Change-Id: I62cfa22291f81295b4908192de8657a750c6716d
Part one, none of these hooks are used in extensions deployed in
production. I skipped any hook that has silenced its deprecation
warnings.
Change-Id: Idf1fd12cc61ca30867dc9f8aeb1701fe035fc5ff
PHP 8.0 changed the behavior of numeric comparisons such that
non-numeric strings no longer weakly equal 0.[1] This breaks the logic
in Parser::extractSections(), which was relying on the old comparison
behavior for section indexes and in turn causes the revisions API to
return a bogus 'nosuchsection' for error when called with rvsection=new.
Fix the logic by explicitly casting the section index to a number, which
will yield the appropriate numeric section index for a numbered section
index and 0 for a non-numeric section index (like 'new'). Also add test
cases for the relevant API module.
--
[1] https://wiki.php.net/rfc/string_to_number_comparison
Change-Id: If32aa4d575cff66bd4eee56f9e3b0b0d9ba04fde
Bug: T323373
This is identical to Parser::getTargetLanguage() in modern MediaWiki,
since 7df3473cfe in MW 1.19 (2012).
Bug: T318860
Depends-On: If5fa696e27e84a3aa1343551d7482c933da0a9b6
Depends-On: I87a7ceedce173f6de4bb6722ffe594273c7b0359
Change-Id: Ieed03003095656e69b8e64ed307c6bd67c45c1e7
Many things can cause the failure here, it's most likely not the
lack of support for Unicode properties (which we also check for
in Installer::envCheckPCRE()).
Bug: T321467
Change-Id: I65b7f302c4c6708e0a0c056cfd722f0ef24bd09e
In external link syntax like `[http://example.org Example]`,
the space between link target and label is technically optional
when the label starts with characters not allowed in the URL,
such as `[http://example.org<b>Example</b>]`.
This is done with a regexp that matches a required opening bracket,
a required URL, optional spaces, optional label, and a required
closing bracket. The real regexp is messy to handle various characters
allowed in each part, but for illustration purposes, it's basically
the same as `\[([^\]\s\<]+) *([^\]\s]*?)\]`.
When given input that looks like a link, but doesn't have the closing
bracket, the regexp engine (PCRE) would therefore attempt matching
every possible combination of target and label lengths before failing:
Input: [http://example.org
| Target | Label |
| ------------------ | ----- |
| http://example.or | g |
| http://example.o | r |
| http://example.o | rg |
| http://example. | o |
| http://example. | or |
| http://example. | org |
| http://example | . |
| http://example | .o |
| http://example | .or |
| http://example | .org |
| http://exampl | e |
| http://exampl | e. |
| http://exampl | e.o |
| http://exampl | e.or |
| http://exampl | e.org |
…and so on. This would take (1 + 2 + 3 + … + 18) = 171 steps to fail
in this example, or `N * (N+1) / 2` steps in general. For sufficiently
large inputs this hits a limit designed to protect against exactly
this situation, and the whole wikitext parser crashes.
(To hit the pathological case, it's also required for a `]` to appear
somewhere later in the input, otherwise PCRE would detect that a match
is never possible and exit before doing any of the above.)
Live example: https://regex101.com/debugger/?regex=%5C%5B%28%5B%5E%5C%5D%5Cs%5C%3C%5D%2B%29%20%2A%28%5B%5E%5C%5D%5Cs%5D%2A%3F%29%5C%5D&testString=%5Bhttp%3A%2F%2Fexample.org%0A%5D
We can fix it by changing the lazy quantifier `*?` to the greedy `*`.
This is correct for this regexp only because the label isn't allowed
to contain ']' (otherwise, the first external link on the page would
consume all of the content until the last external link as its label).
This allows PCRE to only consider the cases where the label has the
maximum possible length:
| Target | Label |
| ------------------ | ----- |
| http://example.or | g |
| http://example.o | rg |
| http://example. | org |
| http://example | .org |
| http://exampl | e.org |
…and so on. Only 18 steps, or `N` steps in general.
Live example: https://regex101.com/debugger/?regex=%5C%5B%28%5B%5E%5C%5D%5Cs%5C%3C%5D%2B%29%20%2A%28%5B%5E%5C%5D%5Cs%5D%2A%29%5C%5D&testString=%5Bhttp%3A%2F%2Fexample.org%0A%5D
I think this bug has been present since 2004, when external link
parsing was rewritten in badf11ffe6 (SVN r4579).
Bug: T321467
Change-Id: I993a10d9a90ab28cce61eba6beabee8a06a2d562
* Use @phan-var to suppress issues with version independence.
* In Parser strictly compare to false. I still think this is a Phan bug.
Bug: T322278
Change-Id: I654b73e5ed843474ed35c3780d95b04dce388bea
Introduced in PHP 7.3. I used it to replace reset()/end() followed by
key() where the return value of reset() is not captured and internal
pointer iteration is not locally occuring.
I also used it in a couple of places when reset() is absent but
array_key_first() is semantically desired.
Change-Id: I750d3fa71420cbdca5fb00d82ac5ca40821769d4
* Remove some use of `count()` in favour of boolean checks.
* Where trivial to do, add native return types to remove doubt
of the value possibly not being an array.
* Follow-up I7d97a4cdc9 (d4df7daa) and mark more methods as private
that were wrongly marked public en-mass when we added visibility
attributes but have no use outside core on a SpecialPage class that
generally has no use case for being extended or instantiated
outside core.
Change-Id: Iaf28b6132097fe34872c2a2da374ff00593ca6a9
* Use str_starts_with where appropriate
* Avoid confusing reuse of variable names
* Add comments explaining why we handle tag attributes in a funny way
* Improve documentation comments
Change-Id: I09f32b6922f319425c33697779a2d14f79678ce1
This patch only adds and removes suppressions, which must be done in the
same patch as the version bump.
Bug: T298571
Change-Id: I4044d4d9ce82b3dae7ba0af85bf04f22cb1dd347
Previously we just logged this. If this happens, it replaces the
entire page with null, which I think is exception worthy and
very confusing if it happens.
The most likely cause of this error would be if the input
text had invalid UTF-8 in it.
I checked logstash, and this warning did not appear in the logs
in the last 90 days.
See also bug T319218
Change-Id: Ic0c9083bd8c524ddce9c7d9d10629daa8f6b8999
This is a regression from 4880a82555. Previously this function
returned the resulting Title after resolving all redirects.
The difference is subtle, mostly around what title attribute to
use for Media: links, and what alt text to use for images with
no specified alt text.
Adds parser tests for the image redirect case.
Change-Id: If5de59968d17054c9b8860513a08fdce6a4bb6c6
This prepares for a split of the parser class. These properties were
deprecated for public use in 1.35.
Some adjustments to phan annotations were necessary, as phan seems to
have a stronger analysis of the Parser class after this patch.
Bug: T236810
Bug: T236812
Change-Id: I66ad07d004a081096edec641141e787fc2cc0958
Extensions can modify the HTML in various ways which could break a
too-specific search-and-replace for the TOC_PLACEHOLDER. In particular,
DiscussionTools was postprocessing parser HTML in a way which was
breaking the TOC replacement in Parser::replaceTableOfContentsMarker()
(T317857), but in the future Parsoid might also emit slightly different
TOC markers, eg with additional attributes.
Make the search-and-replace more robust, at perhaps a small performance
cost.
Bug: T317857
Change-Id: Id0065d81bbfbe1bf6bea6af1de39ea7e9d6598d9
The anchor property comes from Sanitizer::escapeIdForAttribute() and
should be used if you want to (eg) look up an element by ID using
document.getElementById(). The linkAnchor property comes from
Sanitizer::escapeIdForLink() and contains additional escaping
appropriate for use in a URL fragment, and should be used (eg) if you
are creating the href attribute of an <a> tag.
Bug: T315222
Change-Id: Icecf9640a62117c2729dca04af343fb1ddaaf8f8
* Lua modules have been written to inspect nowiki strip state markers
and extract nowiki content to further process them. Callers might have
used nowikis in arguments for any number of reasons including needing
to have the argument be treated as raw text intead of wikitext.
While we might add first-class typing features to wikitext, templates,
extensions, and the like in the future which would let Parsoid process
template arguments based on type info (rather than as wikitext always),
we need a solution now to enable modules to work properly with Parsoid.
* The core issue is the decoupled model used by Parsoid where
transclusions are preprocessed before further processing. Since
nowikis cannot be processed and stripped during preprocessing,
Lua modules don't have access to nowiki strip markers in this model.
* In this patch, we change extension tag processsing for nowikis.
When generating HTML, nowikis are replaced with a 'nowiki' strip
marker with the nowiki's "innerXML" (only tag contents).
In this patch, during preprocessing, instead of adding a 'general'
strip marker with the "outerXML" (tag contents and the tag wrapper),
we add a 'nowiki' strip marker with its "outerXML".
* Since Parsoid (and any clients using the preprocessed output) will
unstrip all strip markers, the shift from a general to nowiki
strip marker won't make a difference.
* To support Scribunto and Lua modules unstrip usage, this patch adds
new functionality to StripState to replace the (preprocessing-)nowiki
strip markers with whatever its users want. So, Scribunto could
pass in a callback that replaces these with the "innerXML" by
stripping out the tag wrapper.
* Hat tip to Tim Starling for recommending this strategy.
* Updated strip state tests.
Bug: T272507
Bug: T299103
Depends-On: Id6ea611549e98893f53094116a3851e9c42b8dc8
Change-Id: Ied0295feab06027a8df885b3215435e596f0353b
* This patch relies on extensions setting a flag in their Parsoid ext.
config indicating that a specific tag handler needs nowikis stripped
from #tag arguments.
In the #tag parser function implementation, Parsoid's SiteConfig is
looked up to see if nowiki needs to be stripped.
* This need not be limited to nowikis, but to support extension use in
{{#tag:ext|...}} more generally, we would need to either
(a) implement the #tag parser function in Parsoid natively; OR
(b) find a way to call Parsoid from extensionSubstitution
Soln (a) needs Parsoid to support parser functions natively.
If this general support becomes necessary, a later patch can
generalize this appropriately.
Bug: T272939
Bug: T299103
Depends-On: I6a653889afd42fefb61daefd8ac842107dce8759
Depends-On: I56043e0cb7d355a3f0d08e429bb1dbba6acb4fba
Change-Id: I614153af67b5a14f33b7dfc04bd00dd9e03557d0
This reduces ambiguity between a parser function invocation (where the
colon separates the first argument) and a magic variable invocation
(where the colon is considered part of the magic variable name).
There shouldn't actually be any of these out in the wild, but it is
safer to deprecate than to assume.
Bug: T236813
Change-Id: I69e4f3b794f22a69efb98f5815df61199d077048
Pages outside of the main namespace now have the following markup in
their <h1> page titles, using 'Talk:Hello' as an example:
<h1>
<span class="mw-page-title-namespace">Talk</span>
<span class="mw-page-title-separator">:</span>
<span class="mw-page-title-main">Hello</span>
</h1>
(line breaks and spaces added for readability)
Pages in the main namespace only have the last part, e.g. for 'Hello':
<h1>
<span class="mw-page-title-main">Hello</span>
</h1>
The change is motivated by a desire to style the titles differently on
talk pages in the DiscussionTools extension (T313636), but it could
also be used for other things:
* Language-specific tweaks (e.g. adding typographically-correct spaces
around the colon separator: T249149, or replacing it with a
different character: T36295)
* Site-specific tweaks (e.g. de-emphasize or emphasize specific
namespaces like 'Draft': T62973 / T236215)
The markup is also added to automatically language-converted titles.
It is not added when the title is overridden using the wikitext
`{{DISPLAYTITLE:…}}` or `-{T|…}-` forms. I think this is a small
limitation, as those forms mostly used in the main namespace, where
the extra markup isn't very helpful anyway. This may be improved in
the future. As a workaround, users could also just add the same HTML
markup to their wikitext (as those forms accept it).
It is not also added when the title is overridden by an extension
like Translate. Maybe we'll have a better API before anyone wants
to do that. If not, one could un-mark Parser::formatPageTitle()
as @internal, and use that method to add the markup themselves.
Bug: T306440
Change-Id: I62b17ef22de3606d736e6c261e542a34b58b5a05
Don't expose the parser's internal caching mechanism; shenanigans
with this parameter were deprecated in 1.35.
Bug: T236813
Change-Id: Iea74946c806d536ce321cba9675a7fabc117e4f1
Split out from the I44045b3b9e78e change.
This is consistent with what Parsoid will use for the TOC marker.
Bug: T287767
Bug: T270199
Bug: T311502
Depends-On: I1f607cf1ef1b61fb4d2e1880de756fb94d5a6b22
Change-Id: Ie63eed07b9bca1bfa07d4c256aba3728cedd8f93
Split out from the I44045b3b9e78e and Ie63eed07b9bca changes. We
first add code to handle the new tag as well as the old tag in
ParserCache contents. This will allow us to safely rollback if needed
when deploying the follow-on patch which actually changes the tag
used.
Bug: T287767
Bug: T270199
Bug: T311502
Change-Id: Ib3e5e010b9f5ca2c4ea7c4fe28080170b6a88812
This exposed internal cache mechanisms of the Parser, and appears
to have been originally added in c08da372bc
but is unused in any code indexed by codesearch.
Bug: T236813
Change-Id: Iaa5da572d76b1d396ecc7e3d3eb29c8d7d4bcddd
The "${var}" and "${expr}" style string interpolations are deprecated in
PHP 8.2. Migrate usages in core to "{$var}" as appropriate.
Bug: T314096
Change-Id: I269bad3d4a68c2b251b3e71a066289d4ad9fd496
When I implemented the ParserOutput merge logic in OutputPage
(I0909ac85c6c785d9089b077a16923c61d6a09996) I realized that
consistent "combine with OR" merge logic for the TOC flag
is obtained only if we invert the flag; that is, the existing
code showed a TOC *if any ParserOutput contained a shown TOC*
otherwise the TOC was hidden.
I'd originally implemented this in
I35e199cca40c0e4359ac493e5806dcf4ae49321c with the opposite sense in
order to avoid having to wait for ParserCache contents to expire:
since the default on most pages was to have the TOC shown anyway, if
"out of date" parser cache entries were missing a HIDE_TOC flag, it
wouldn't be a big deal, whereas if a SHOW_TOC flag were required then
upon deploy all cached pages would lose their TOC rendering.
BUT a better solution is just to let a "parser cache expiration time"
elapse between the time we start generating this flag and the time we
start using it. The existing patch to export this
(I6cf76c870124c162dc1bcbc2f7e9ca0c5fdcd10e) uses
ParserOutput::getTOCHTML() anyway, so we can just wait to switch this
over to use the SHOW_TOC flag
(I10c3d06fb162103c06395bf9d1d27ac3c199d7b6) until the parser cache has
expired.
Anyway, this is a bit of a hassle to switch now, but I think having
consistent merge semantics for ParserOutput flags is worth the
short-term pain.
Bug: T310083
Change-Id: I3b76010f1e2283add629b84bf3824f215f932903
RevisionRecord::getTimestamp is documented as potentially returning
null. Parser::getTimestamp() is documented as using the revision
timestamp if available, otherwise using the current timestamp, but
never returning null.
Change behaviour to match documentation.
This fixes a php warning on php 8.1 during tests.
Bug: T313663
Change-Id: I2b3e4d79ff5b179f013104eb2158c8e537a57545
There are two related issues here: first, when parsing non-wikitext
pages for side effects (categories, etc) we want to ensure that any
spurious `===` or `<h2>` on the page don't create nonsense "sections".
We introduce a ParserOption to suppress the ToC in this case; a
follow-up patch will set this parser option from the correct path in
CodeContentHandler and its subclasses. [T307691]
Second, modern skins can generate the ToC on-the-fly outside the
content area, and need to be able to regenerate the ToC from API
output when the page is edited. A ParserOutput flag is added to
mirror the $enoughToc variable from the parser to indicate whether
or not the ToC should be generated and/or updated after edit.
(See I6cf76c870124c162dc1bcbc2f7e9ca0c5fdcd10e for parallel code
to echo this value in ApiParse.)
Bug: T294950
Bug: T307691
Change-Id: I35e199cca40c0e4359ac493e5806dcf4ae49321c
extensionSubstitution returns a string, not null,
so if $params['attr'] is not set, default to using
an empty string (''), not null.
Bug: T312519
Bug: T312520
Change-Id: I566d95a32cffe1ef20f18ae9d9af96d57e0823a9