Commit graph

3490 commits

Author SHA1 Message Date
C. Scott Ananian
94f193a894 SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
CVE-2025-32699

Ensure that Unicode NFC normalization can be applied to our HTML
output safely.  Even though the W3C officially recommends against
normalizing HTML

https://www.w3.org/International/questions/qa-html-css-normalization#converting

this is still easily done inadvertently, especially when using the
MediaWiki action API which normalizes parameters and results by
default.

See also I671648603c4635a35585c860b4857f5ea085e47f in Parsoid, and
T266140 / I2e78e660ba1867744e34eda7d00ea527ec016b71 for another similar
issue.

The following changes are made:

* The various HTML serializers (Remex/Tidy-derived, as well as the
  Html::* helpers) are tweaked to entity-escape U+0338 wherever it
  appears.

* Similarly, Message::escaped() is tweaked to entity-escape U+0338.

* Finally, a post-processing pass is added to the OutputTransform
  pipeline to catch any remaining U+0338 and entity-escape them.
  This catches U+0338 added during any of the previous OutputTransform
  stages (like TOC insertion, section edit links, etc).
  *When backporting* this code will likely need to be moved to
  ParserOutput::getText(), as the OutputTransform pipeline wasn't added
  until MW 1.42.

Bug: T387130
Change-Id: I66564e14e730f5393f4fa5780b80f24de6075af5
2025-04-10 15:56:06 +01:00
David Causse
0f921b7878 Sanitizer::normalizeWhitespace: simplify redundant preg_replace
The extraneous sequence \r\n is not required.
Avoid the use of hexcode to avoid future confusions.

Bug: T388733
Change-Id: I1092ff76ed5e8221e43ea7b70cf0c9d9d3abb1f3
(cherry picked from commit 6753123a0629de81ce4899958180272736e7ba61)
2025-03-18 13:26:43 +00:00
jenkins-bot
6beb3946d5 Merge "MagicWord::replace*: Make sure we don't pass null into preg_match/preg_replace" into REL1_43 2025-03-17 16:44:20 +00:00
James D. Forrester
6b0ed71937 Sanitizer::normalizeSectionNameWhitespace: Apply same anti-null fix as 270499b
Follow-up to 270499b6e1f96f402c852843d446a7946589986b.

Bug: T388728
Bug: T385519
Change-Id: Idae7128c09bcf32a6c2d40e02158902c289898b9
(cherry picked from commit e130d34c15e418004a5ae42c0238206d70b2be0f)
2025-03-17 15:57:07 +00:00
James D. Forrester
a2bc03b8d8 MagicWord::replace*: Make sure we don't pass null into preg_match/preg_replace
Bug: T388924
Change-Id: I02a3e724dc614f0a2306548f58f71d16a8a1dc5b
(cherry picked from commit 2e4e9428580d4829911313644913c3c74cf43244)
2025-03-17 14:19:01 +00:00
David Causse
332d1dfd83 Sanitizer::normalizeWhitespace warn on preg_replace error
Log a warning with preg_replace error instead of passing null to trim.

Bug: T385519
Change-Id: If4ad78168d7899685f4fa1f1d89245c85f0beb0b
(cherry picked from commit 270499b6e1f96f402c852843d446a7946589986b)
2025-03-06 18:30:40 +00:00
C. Scott Ananian
d0e0baf6a9 ParserOutput::getExternalLinks(): Deprecate use of the internal array reference
In a future release this will return an array, not a reference to the
internal array, to maintain abstraction and allow for representation
changes internal to ParserOutput.

This patch just add deprecation notices to the class and to the
release notes.

Change-Id: Ie3a3f98402c5a5a3a92326d7736c0df874829a6b
2024-10-22 16:33:27 -04:00
jenkins-bot
39313b13bc Merge "ParserOutput: Introduce ParserOutput::getLinkList()" 2024-10-21 21:04:38 +00:00
jenkins-bot
5188bd1976 Merge "ParserOutput::runPipelineInternal: pass ParserOptions if provided" 2024-10-21 20:29:41 +00:00
jenkins-bot
f057afcc4d Merge "Deprecate ::setMetrics() calls with StatsdDataFactoryInterface" 2024-10-21 17:12:19 +00:00
C. Scott Ananian
d1813e09a2 ParserOutput::runPipelineInternal: pass ParserOptions if provided
Pipeline passes don't yet depend on ParserOptions, but they will.

Change-Id: Ib15134a598a7e783c69c8e19bb29b53da6c4be55
2024-10-21 12:34:09 -04:00
jenkins-bot
cd58285157 Merge "Parsoid: SiteConfig::prefixedStatsFactory() can never return null" 2024-10-21 10:16:12 +00:00
jenkins-bot
61347b17a8 Merge "parser: Increment expensive function count for special page transclusion" 2024-10-18 23:14:21 +00:00
C. Scott Ananian
4d4715326a Deprecate ::setMetrics() calls with StatsdDataFactoryInterface
HtmlInputTransformHelper::setMetrics() and
HtmlToContentTransform::setMetrics() take a StatsFactory now; deprecate
passing a StatsdDataFactoryInterface.

Depends-On: I0d8eb6cacd761fa4959419b10d59046e61c714ff
Change-Id: I2374731f6d37a191fc4a865d2665f2ca18182db1
2024-10-18 18:45:00 -04:00
C. Scott Ananian
c49d9199a5 Parsoid: SiteConfig::prefixedStatsFactory() can never return null
SiteConfig::$statsFactory is non-nullable, and
StatsFactory::withComponent() never returns null.

Change-Id: Ib14a1ee44b81476447717bc6aa00b54de1dca995
2024-10-18 18:45:00 -04:00
jenkins-bot
9d11b79291 Merge "ParsoidParser: add wiki as a label to parse metrics" 2024-10-18 17:52:55 +00:00
C. Scott Ananian
004cb43c56 ParserOutput: Introduce ParserOutput::getLinkList()
This deprecates a number of methods which returned arrays by reference and
exposed internal representation details of the ParserOutput.  It also
regularizes the return values to return consistent LinkTarget values,
working around the wide variety of different internal storage formats
used for links.

In the future, once these methods which expose the internal representation
are removed, we can simplify our internal storage as well.  But for the
moment we add the new getter without changing the internal representation.

Note that by returning TitleValue objects this new interface also provides
a means to fix the issue identified in T204792 where interwiki and namespace
prefixes were getting confused.  A TitleValue properly distinguishes between
these -- although the callers will still have to be careful to use it as
a TitleValue and not attempt to reparse it.

These methods also correctly handle fragments, which are present for the
language link type but stripped for the other linkt types.

Bug: T204792
Change-Id: I48a2077b9645124f83082afd953d6bf7a861270b
2024-10-18 13:24:10 -04:00
jenkins-bot
2f096fefae Merge "Slightly simplify SiteConfig metrics implementation & improve doc" 2024-10-18 17:18:24 +00:00
Lucas Werkmeister
ace0e6fc81 Add comment to ParserOutput::setIndexPolicy()
Change-Id: I01d03aa6204a13a92bb8bc00364c822c27aa60b9
2024-10-18 15:20:52 +00:00
Umherirrender
cef0bdc230 parser: Increment expensive function count for special page transclusion
Transclude a special page can result in extra database queries,
track it as expensive operation to avoid to many usages on one page.
When the expensive function count is exceeded, the parser fallbacks to
display the transclusion as normal wikilink.

Change-Id: I86c9cf1fdd0833012ddbf51184080e3135eb83ec
2024-10-18 14:36:48 +00:00
C. Scott Ananian
2ee15b8ec0 ParsoidParser: add wiki as a label to parse metrics
Refactored slightly to use the new MetricTrait::setLabels() method
as well.

Change-Id: I4203c68d221630bc945a616544f80b05e40a1dad
2024-10-17 23:55:17 -04:00
C. Scott Ananian
674e7b1c4a ParserOutput::addLanguageLink: Avoid a full Title parse
Bug: T296019
Change-Id: I8a8d499a6a6646bc86a4be7e843430eecd08d0a4
2024-10-17 23:51:41 -04:00
C. Scott Ananian
fda71c4391 Use OutputPage::$metadata to store the 'prevent clickjacking' flag
Bug: T301020
Depends-On: I885f778eef92fa7d2b7d6a2c2997db6a8b0142e5
Change-Id: I3bfd47b078a5b84a88fffc04b48abe4c0023370f
2024-10-17 23:46:21 -04:00
jenkins-bot
53bd99f5dc Merge "Use statslib for metrics emitted by HtmlInputTransformHelper, HtmlToContentTransform" 2024-10-17 23:46:56 +00:00
C. Scott Ananian
f3c5d81939 Slightly simplify SiteConfig metrics implementation & improve doc
Document that ::observeTiming takes an argument *in milliseconds*
(not seconds).  Use the new Metric::setLabels() method to simplify
the implementation a bit as well.

Change-Id: I374ff380466cfc5c12abb24793e8a4ed195db382
2024-10-17 18:54:26 -04:00
Yiannis Giannelos
331c181598 Use statslib for metrics emitted by HtmlInputTransformHelper, HtmlToContentTransform
Bug: T359475
Change-Id: I7d4ca748c106dfd560dae31294decfb2b181e2db
2024-10-17 21:28:04 +02:00
Umherirrender
e662614f95 Use explicit nullable type on parameter arguments
Implicitly marking parameter $... as nullable is deprecated in php8.4,
the explicit nullable type must be used instead

Created with autofix from Ide15839e98a6229c22584d1c1c88c690982e1d7a

Break one long line in SpecialPage.php

Bug: T376276
Change-Id: I807257b2ba1ab2744ab74d9572c9c3d3ac2a968e
2024-10-16 20:58:33 +02:00
James D. Forrester
a5387c7c20 Namespace all remaining classes in includes/parser
Bug: T353458
Change-Id: If02cc9b1ff78e26c1cf8c91ee4695845eb133829
2024-10-15 23:54:32 +01:00
jenkins-bot
433c535ecc Merge "ParsoidParser: pass render reason to Parsoid; fix case of 'sampleStats'" 2024-10-12 00:11:57 +00:00
jenkins-bot
d253cb33a7 Merge "Add static return type for ParserOutput::getExternalLinks" 2024-10-10 10:42:06 +00:00
jenkins-bot
ae3afe014b Merge "Remove meaningless @var documentation from constants" 2024-10-09 22:03:23 +00:00
C. Scott Ananian
8a650d5d48 ParsoidParser: ensure magic variable expansion uses pageLanguageOverride
This patch adds tests for the caching fix in
Ie76020dc4fa3545f827e1674051530b479f01f31, but these tests also revealed
that the recursive invocation of the legacy parser to expand magic
variables like {{PAGELANGUAGE}} wasn't using the pageLanguageOverride,
aka ParserOptions::getTargetLanguage().

The page language override is used when parsing new context which
doesn't currently exist in the database and therefore doesn't have a
page language set by its title (which doesn't yet exist).

Bug: T376783
Follows-Up: Ie76020dc4fa3545f827e1674051530b479f01f31
Change-Id: If6fe7cf00be6e78ef46181b17f01138383e95e46
2024-10-09 12:28:23 -04:00
thiemowmde
b1c9ec74fa Remove meaningless @var documentation from constants
A constant is not a variable. The type is hard-coded via the value
and can never change. While the extra @var probably doesn't hurt much,
it's redundant and error-prone and can't provide any additional
information.

Change-Id: Iee1f36a1905d9b9c6b26d0684b7848571f0c1733
2024-10-09 09:33:12 +02:00
jenkins-bot
121559810b Merge "ParserOutput::setPageProperty(): emit deprecation warnings for non-strings" 2024-10-08 21:17:48 +00:00
Arthur Taylor
9ae964fde7 Add static return type for ParserOutput::getExternalLinks
PHPUnit tests that mock the ParserOutput object are unable to
correctly infer that the mock should return an empty array rather
than null for `getExternalLinks`. This is currently causing test
failures in SpamBlacklist in CI.

Add the return type definition to the function field definition
so that PHPUnit has a better chance at doing the right thing.

Note that `getExternalLinks` returns `$this->mExternalLinks` by
reference; if there’s some existing code which reassigns a non-array
value to that reference (and, consequently, to `$this->mExternalLinks`,
such code will start to throw TypeErrors during the assignment.

Bug: T376633
Change-Id: I246d5541200c9d0c405f30ea9de091ff9c0e759c
2024-10-08 09:51:23 +02:00
jenkins-bot
a37de059aa Merge "ParserOutput: ensure all created ParserOutputs have a "start of parse" time set" 2024-10-07 23:13:54 +00:00
C. Scott Ananian
1e2af489ae ParserOutput: ensure all created ParserOutputs have a "start of parse" time set
*Most* implementations of ContentHandler::fillParserOutput() ensure
that the returned ParserOutput has had
ParserOutput::resetParseStartTime() called on it at an appropriate
time -- but not *all*.  This is a belt-and-suspenders fix that ensures
that every code path which creates a ParserOutput has *some* "start
time" defined.  This could be misleading if the parsing is done first
and the parser output is created at the very end of the parse, but in
all the code that I've looked at the ParserOutput is the first thing
created and so this default should be reasonable.

While we're at it, remove the parseStartTime from the serialized form
of the ParserOutput, because it is useless after the object is
unserialized.

Bug: T376433
Change-Id: I3bdf3996401a7d5ac4d8e1e5e6afb7ca410cbe6c
2024-10-04 19:14:37 -04:00
Yiannis Giannelos
473e8c32bf Provide a prefixed StatsFactory in parsoid config
Change-Id: Ic3fc353b030a292952091813c9847cd697b25444
2024-10-04 18:54:33 +00:00
C. Scott Ananian
e205a24456 ParserOutput::setPageProperty(): emit deprecation warnings for non-strings
This was deprecated in 1.42 but did not previously emit deprecation
warnings.

Depends-On: I072b111b047cfe13e32a822678d68165d1c76f84
Depends-On: I2734383207b92f71bffc66ba2392a592a1df0954
Depends-On: I79bb5030c13e83f664da1635254f4bc171ed4f3e
Depends-On: If64a5239a40953f244657e60f95b2e938abfe447
Change-Id: Ifefd3dab43247d988b7c7ff7874c05c90fc8ce1f
2024-10-04 09:56:51 -04:00
James D. Forrester
91a37f53b4 Switch over a bunch of class_alias uses to actuals
Change-Id: Id175a83e71cc910eaee5d5890a9106872a3ca3b8
2024-10-03 17:09:36 +00:00
jenkins-bot
4627fe60af Merge "Add namespace to remaining parts of Wikimedia\Mime and Wikimedia\Stats" 2024-10-03 14:16:24 +00:00
jenkins-bot
db7ee3db99 Merge "Add namespace to remaining parts of Wikimedia\ObjectCache" 2024-10-03 14:02:47 +00:00
jenkins-bot
d527b5a4e4 Merge "Deprecate ParserOutput::setLanguageLinks(null)" 2024-10-02 23:11:49 +00:00
jenkins-bot
f4dc788b5c Merge "Allow localized gallery widths; avoid spurious "double px" tracking category" 2024-10-02 21:39:40 +00:00
C. Scott Ananian
22cdf9cdf6 Deprecate ParserOutput::setLanguageLinks(null)
Bug: T376323
Follows-Up: I82a05a51d94782ebb9fa87ff889ca0f633b3e15c
Change-Id: I0952659ab245326e9e8352170fb0a629ec109e72
2024-10-02 16:10:39 -04:00
C. Scott Ananian
af49d2f323 ParsoidParser: pass render reason to Parsoid; fix case of 'sampleStats'
Every other option passed to parsoid (except `body_only`) is in
camelCase, so make 'sampleStats' into a camel as well.

Pass the render reason to Parsoid so that parsoid-specific parse stats
can be correlated with stats coming from the ParserOutputAccess.

Used in I88ba26fefd9d69ad3e2354d1e235b1e42d1914a0 but does not depend
on that patch.

Change-Id: I2e5c897c55e41224567ed94bbf903c8fff96e841
2024-09-28 09:19:15 -04:00
jenkins-bot
315de0e434 Merge "Deduplicate language links in ParserOutput and OutputPage" 2024-09-27 22:43:43 +00:00
James D. Forrester
cc28acc455 Add namespace to remaining parts of Wikimedia\Mime and Wikimedia\Stats
Bug: T353458
Change-Id: If0137003ab625017d322d57870448a02569668c3
2024-09-27 16:19:10 -04:00
James D. Forrester
53b67ae0a6 Add namespace to remaining parts of Wikimedia\ObjectCache
Bug: T353458
Change-Id: I3b736346550953e3b2977c14dc3eb10edc07cf97
2024-09-27 16:19:10 -04:00
James D. Forrester
9e5c1e8ac7 Add namespace to IDBAccessObject and DBAccessObjectUtils
Bug: T353458
Change-Id: I23cf7991f8792d4d000d1780463d8ce76dc0aee0
2024-09-27 16:19:10 -04:00