wiki.techinc.nl/includes/OutputTransform
C. Scott Ananian 94f193a894 SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization
CVE-2025-32699

Ensure that Unicode NFC normalization can be applied to our HTML
output safely.  Even though the W3C officially recommends against
normalizing HTML

https://www.w3.org/International/questions/qa-html-css-normalization#converting

this is still easily done inadvertently, especially when using the
MediaWiki action API which normalizes parameters and results by
default.

See also I671648603c4635a35585c860b4857f5ea085e47f in Parsoid, and
T266140 / I2e78e660ba1867744e34eda7d00ea527ec016b71 for another similar
issue.

The following changes are made:

* The various HTML serializers (Remex/Tidy-derived, as well as the
  Html::* helpers) are tweaked to entity-escape U+0338 wherever it
  appears.

* Similarly, Message::escaped() is tweaked to entity-escape U+0338.

* Finally, a post-processing pass is added to the OutputTransform
  pipeline to catch any remaining U+0338 and entity-escape them.
  This catches U+0338 added during any of the previous OutputTransform
  stages (like TOC insertion, section edit links, etc).
  *When backporting* this code will likely need to be moved to
  ParserOutput::getText(), as the OutputTransform pipeline wasn't added
  until MW 1.42.

Bug: T387130
Change-Id: I66564e14e730f5393f4fa5780b80f24de6075af5
2025-04-10 15:56:06 +01:00
..
Stages SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization 2025-04-10 15:56:06 +01:00
ContentDOMTransformStage.php Namespace all remaining classes in includes/parser 2024-10-15 23:54:32 +01:00
ContentTextTransformStage.php Namespace all remaining classes in includes/parser 2024-10-15 23:54:32 +01:00
DefaultOutputPipelineFactory.php SECURITY: Ensure emitted HTML is safe against Unicode NFC normalization 2025-04-10 15:56:06 +01:00
OutputTransformPipeline.php Namespace all remaining classes in includes/parser 2024-10-15 23:54:32 +01:00
OutputTransformStage.php Namespace all remaining classes in includes/parser 2024-10-15 23:54:32 +01:00
README.md

Output transformations pipelines for wikitext

The classes in the Stages/ subdirectory contains HTML and DOM transforms for use in output processing pipelines, i.e. postprocessors for ParserOutput objects that either directly result from a parse or are fetched from ParserCache.

The default pipeline is created by DefaultOutputTransformFactory; it corresponds to what was previously contained in ParserOutput::getText. The shouldRun method in these stages uses defaults that indicates if the stage runs or not in the default OutputTransformPipeline.