Parser::extractBody: Use possessive matcher and once-only subpattern

We were getting PREG_BACKTRACK_LIMIT_ERROR in production from certain
inputs to Parser::extractBody().  Use possessive matchers and a
once-only subpattern to ensure that we don't backtrack unnecessarily
once a <body> tag is found.

Bug: T399064
Follows-Up: I59abad3a58ccd6edc6517b13a56d8253ba0e0928
Change-Id: If6860ca268236cf428d574f6bb21c2070f5aa6a3
(cherry picked from commit 2c56237235a5603a1757982f02d3e542bdafaf06)
This commit is contained in:
C. Scott Ananian 2025-07-10 12:33:23 -04:00 committed by Reedy
parent e7fa1c246c
commit 330ef61cbe

View file

@ -6483,12 +6483,12 @@ class Parser {
* @unstable
*/
public static function extractBody( string $text ): string {
$text = preg_replace( '!^.*?<body[^>]*>!s', '', $text, 1 );
$text = preg_replace( '!^(?>.*?<body)[^>]*+>!s', '', $text, 1 );
if ( $text === null ) {
// T388729: this should never happen
// T399064: this should never happen
throw new RuntimeException( 'Regex failed: ' . preg_last_error() );
}
$text = preg_replace( '!</body>\s*</html>\s*$!', '', $text, 1 );
$text = preg_replace( '!</body>\s*+</html>\s*+$!', '', $text, 1 );
return $text;
}