CVE-2025-32699
Ensure that Unicode NFC normalization can be applied to our HTML
output safely. Even though the W3C officially recommends against
normalizing HTML
https://www.w3.org/International/questions/qa-html-css-normalization#converting
this is still easily done inadvertently, especially when using the
MediaWiki action API which normalizes parameters and results by
default.
See also I671648603c4635a35585c860b4857f5ea085e47f in Parsoid, and
T266140 / I2e78e660ba1867744e34eda7d00ea527ec016b71 for another similar
issue.
The following changes are made:
* The various HTML serializers (Remex/Tidy-derived, as well as the
Html::* helpers) are tweaked to entity-escape U+0338 wherever it
appears.
* Similarly, Message::escaped() is tweaked to entity-escape U+0338.
* Finally, a post-processing pass is added to the OutputTransform
pipeline to catch any remaining U+0338 and entity-escape them.
This catches U+0338 added during any of the previous OutputTransform
stages (like TOC insertion, section edit links, etc).
*When backporting* this code will likely need to be moved to
ParserOutput::getText(), as the OutputTransform pipeline wasn't added
until MW 1.42.
Bug: T387130
Change-Id: I66564e14e730f5393f4fa5780b80f24de6075af5
The extraneous sequence \r\n is not required.
Avoid the use of hexcode to avoid future confusions.
Bug: T388733
Change-Id: I1092ff76ed5e8221e43ea7b70cf0c9d9d3abb1f3
(cherry picked from commit 6753123a0629de81ce4899958180272736e7ba61)
Log a warning with preg_replace error instead of passing null to trim.
Bug: T385519
Change-Id: If4ad78168d7899685f4fa1f1d89245c85f0beb0b
(cherry picked from commit 270499b6e1f96f402c852843d446a7946589986b)
In a future release this will return an array, not a reference to the
internal array, to maintain abstraction and allow for representation
changes internal to ParserOutput.
This patch just add deprecation notices to the class and to the
release notes.
Change-Id: Ie3a3f98402c5a5a3a92326d7736c0df874829a6b
HtmlInputTransformHelper::setMetrics() and
HtmlToContentTransform::setMetrics() take a StatsFactory now; deprecate
passing a StatsdDataFactoryInterface.
Depends-On: I0d8eb6cacd761fa4959419b10d59046e61c714ff
Change-Id: I2374731f6d37a191fc4a865d2665f2ca18182db1
This deprecates a number of methods which returned arrays by reference and
exposed internal representation details of the ParserOutput. It also
regularizes the return values to return consistent LinkTarget values,
working around the wide variety of different internal storage formats
used for links.
In the future, once these methods which expose the internal representation
are removed, we can simplify our internal storage as well. But for the
moment we add the new getter without changing the internal representation.
Note that by returning TitleValue objects this new interface also provides
a means to fix the issue identified in T204792 where interwiki and namespace
prefixes were getting confused. A TitleValue properly distinguishes between
these -- although the callers will still have to be careful to use it as
a TitleValue and not attempt to reparse it.
These methods also correctly handle fragments, which are present for the
language link type but stripped for the other linkt types.
Bug: T204792
Change-Id: I48a2077b9645124f83082afd953d6bf7a861270b
Transclude a special page can result in extra database queries,
track it as expensive operation to avoid to many usages on one page.
When the expensive function count is exceeded, the parser fallbacks to
display the transclusion as normal wikilink.
Change-Id: I86c9cf1fdd0833012ddbf51184080e3135eb83ec
Document that ::observeTiming takes an argument *in milliseconds*
(not seconds). Use the new Metric::setLabels() method to simplify
the implementation a bit as well.
Change-Id: I374ff380466cfc5c12abb24793e8a4ed195db382
Implicitly marking parameter $... as nullable is deprecated in php8.4,
the explicit nullable type must be used instead
Created with autofix from Ide15839e98a6229c22584d1c1c88c690982e1d7a
Break one long line in SpecialPage.php
Bug: T376276
Change-Id: I807257b2ba1ab2744ab74d9572c9c3d3ac2a968e
This patch adds tests for the caching fix in
Ie76020dc4fa3545f827e1674051530b479f01f31, but these tests also revealed
that the recursive invocation of the legacy parser to expand magic
variables like {{PAGELANGUAGE}} wasn't using the pageLanguageOverride,
aka ParserOptions::getTargetLanguage().
The page language override is used when parsing new context which
doesn't currently exist in the database and therefore doesn't have a
page language set by its title (which doesn't yet exist).
Bug: T376783
Follows-Up: Ie76020dc4fa3545f827e1674051530b479f01f31
Change-Id: If6fe7cf00be6e78ef46181b17f01138383e95e46
A constant is not a variable. The type is hard-coded via the value
and can never change. While the extra @var probably doesn't hurt much,
it's redundant and error-prone and can't provide any additional
information.
Change-Id: Iee1f36a1905d9b9c6b26d0684b7848571f0c1733
PHPUnit tests that mock the ParserOutput object are unable to
correctly infer that the mock should return an empty array rather
than null for `getExternalLinks`. This is currently causing test
failures in SpamBlacklist in CI.
Add the return type definition to the function field definition
so that PHPUnit has a better chance at doing the right thing.
Note that `getExternalLinks` returns `$this->mExternalLinks` by
reference; if there’s some existing code which reassigns a non-array
value to that reference (and, consequently, to `$this->mExternalLinks`,
such code will start to throw TypeErrors during the assignment.
Bug: T376633
Change-Id: I246d5541200c9d0c405f30ea9de091ff9c0e759c
*Most* implementations of ContentHandler::fillParserOutput() ensure
that the returned ParserOutput has had
ParserOutput::resetParseStartTime() called on it at an appropriate
time -- but not *all*. This is a belt-and-suspenders fix that ensures
that every code path which creates a ParserOutput has *some* "start
time" defined. This could be misleading if the parsing is done first
and the parser output is created at the very end of the parse, but in
all the code that I've looked at the ParserOutput is the first thing
created and so this default should be reasonable.
While we're at it, remove the parseStartTime from the serialized form
of the ParserOutput, because it is useless after the object is
unserialized.
Bug: T376433
Change-Id: I3bdf3996401a7d5ac4d8e1e5e6afb7ca410cbe6c
This was deprecated in 1.42 but did not previously emit deprecation
warnings.
Depends-On: I072b111b047cfe13e32a822678d68165d1c76f84
Depends-On: I2734383207b92f71bffc66ba2392a592a1df0954
Depends-On: I79bb5030c13e83f664da1635254f4bc171ed4f3e
Depends-On: If64a5239a40953f244657e60f95b2e938abfe447
Change-Id: Ifefd3dab43247d988b7c7ff7874c05c90fc8ce1f
Every other option passed to parsoid (except `body_only`) is in
camelCase, so make 'sampleStats' into a camel as well.
Pass the render reason to Parsoid so that parsoid-specific parse stats
can be correlated with stats coming from the ParserOutputAccess.
Used in I88ba26fefd9d69ad3e2354d1e235b1e42d1914a0 but does not depend
on that patch.
Change-Id: I2e5c897c55e41224567ed94bbf903c8fff96e841