Paser::getExternalLinkRel() is defined to return `null` if there's
no attribute to add, but then ParserOptions::getExternalLinkTarget()
may try to append to it and external users might try to actually pass
the $attribs to (eg) Xml::element() and become unhappy if the value
is `null`.
Bug: T357668
Followup-To: Ifec733a923f193b72eaba9a1e604ad4e56c0aef2
Change-Id: I907c22ef070616d81b9a50b0e807a7b8f78b59b5
Previously, Parser.php used Linker::makeHeadline() in order to
generate the `<h2><span class="mw-headline" id="...">...</span></h2>`
markup for section headings, and this was saved in the parser cache.
Now it generates heading tags with placeholder attributes like
`<h2 data-mw-...="..." ...>...</h2>`, and they are replaced in a
post-cache transform to generate the final heading markup, similarly
to how section edit links already worked.
The purpose of these changes is to allow changing the final markup
depending on skin options without splitting the parser cache (T13555).
Deployment and undeployment safety:
* The new post-cache transform has been already added in commit
Ibce512b3c4a52f74b2d2124f0159e306f2689ea5 for forward-compatibility
(so that if this patch is reverted, new parser cache entries
will still be shown correctly).
Implementation notes:
* There are many ways to keep the temporary information other than
`data-mw-...` attributes, but this way is the easiest to handle
in a post-cache transform (everything is on the DOM node we want
to modify), is compatible with other heading-enhancing code in
DiscussionTools and MobileFrontend, and remains human-readable
if the post-cache transform doesn't run.
* Sadly this code can't be reused to add section heading markup and
section edit links to Parsoid (T269630), because it lacks some of
the necessary metadata, and exposes the rest in ways that are
trickier to handle in a post-cache transform (on other DOM nodes
or outside the document).
Bug: T13555
Change-Id: I4eae18d9d16f54391daba0de82ad05e50f07f9eb
* Switch out raw Exceptions, mostly for InvalidArgumentExceptions.
* Fake exceptions triggered to give Monolog a backtrace are for
some reason "traditionally" RuntimeExceptions, instead, so we
continue to use that pattern in remaining locations.
* Just entirely give up on PostgresResultWrapper's resource vs. object mess.
* Drop now-unneeded false positive hits.
Change-Id: Id183ab60994cd9c6dc80401d4ce4de0ddf2b3da0
Parser::braceSubstitution is only called from PPFrame_Hash::expand with
the result of PPNode_Hash_Tree::splitRawTemplate which always sets
'parts' to a PPNode_Hash_Array
Parser::argSubstitution is similarly called without the unnecessary null
check..
The comment was introduced in e002df9 and, although true, even then
the ternary may have been made redundant by a previous refactor.
Change-Id: Ia1c5b8570c65c8e174c723dbd292e11c3a72f54d
Broadened the argument type to allow passing LinkTarget to:
* ParserOutput::addCategory()
* ParserOutput::addLanguageLink()
* ParserOutput::addLink()
* ParserOutput::addImage()
* ParserOutput::addTemplate()
This allows for a tighter interface with Parsoid's
ContentMetadataCollector class and avoids errors caused by passing the
wrong form of string title ("text" with spaces versus "dbkey" with
underscores).
There are a few performance problems remaining after this patch, which
only apply to use by Parsoid (not the legacy parser):
1. ::addLink() does inefficient db requests to fetch the page id for
each link if the optional $id parameter is not passed. These lookups
should be deferred and a LinkBatch used. (The legacy parser always
passes $id.)
2. ::addTemplate() similarly requires $page_id (and $rev_id) to be
passed, so is not currently usable by Parsoid.
3. ::addLanguageLink() uses Title::getFullText() which is not present
in LinkTarget and is currently implemented as a full Title lookup.
This is not an issue for the legacy parser, because it already has a
Title object so the lookup is a no-op, but could be improved for
Parsoid's use.
Bug: T296023
Change-Id: If21ec8563c8a619bdde7c0cb6534bb9009480a21
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.
The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.
Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.
Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
There are a couple of user options related classes already,
and the T321527 work on dynamic defaults is going to add
even more. Let's move them into a separate namespace
to make core a bit more organized.
Old name is kept as an alias for compatibility purposes.
Bug: T321527
Bug: T352284
Change-Id: I9822eb1553870b876d0b8a927e4e86c27d83bd52
This reverts commit 0791724ead.
Reason for revert: Breaks math rendering in Parsoid (and hence for all clients)
Change-Id: I9abe07060e5d11a9a1a2c953344eb50d4536e8c4
* See T351461 and T303015 for examples where calling top-level doc
parser hooks during extension processing causes problems further
downstream.
The hooks are: ParserAfterTidy and ParserAfterParse
* Since any extension that relies on those two hooks will need a
Parsoid-equivalent implementation to work properly with Parsoid,
we don't need to preemptively run those hooks on a sublevel doc.
We can instead let the Parsoid-compatible implementation process
the full doc.
* Accordingly, this patch removes the parseExtensionTagAsTopLevelDoc
method from Parser.php and has DataAccess::parseWikitext simply
call Parser::recursiveTagParseFully instead.
Change-Id: I58e693499e1a53e0814911dc2ea424aa822b8320
* This broke in 0e1b889a.
* HtmlHolder (via Remex) serializes self-closing meta tags without a
trailing / char.
* Separately, worth exploring if HtmlHolder should use Parsoid's
XML serializer.
Co-Authored-By: C. Scott Ananian <cscott@cscott.net>
Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org>
Change-Id: I9fba68a8cfe63540fec83eb9c886e2956ba75660
Parse the heading contents as HTML. This makes it easier to strip out
some HTML tags using DOM operations, and ensures that we generate
balanced HTML at the end (T218330).
There are a few minor changes in behavior:
* [improvement] Fixed inconsistency with Parsoid in whitespace
handling around stripped tags (see changed test case 1)
* [bug fix] Allows `<span dir>` even when `dir` is not the first
attribute (see changed test case 2)
* [improvement] Unnecessary entities are no longer preserved in
the TOC (see changed test case 3a)
* [bug fix] Underscores in headings are preserved in section edit
link title (see changed test case 3b)
* [bug fix] Attributes on `<q>` tags are now correctly removed
(this behavior wasn't covered by a test case)
Bug: T218330
Change-Id: Ibad7480088b82a1fd515831a9813ce18c2b1f3ea
The main motivation is to further reduce the complexity of the class:
* There is no code that ever writes to $this->mSubstIDs. It's
effectively a constant.
* According to CodeSearch the getSubstIDs() method is not used
anywhere. It's @internal to the parser.
* I find it weird that the parser needs to call 2 factory methods to
do 1 thing.
* I still find it a good idea to keep the knowledge encapsulated in
the factory and not have the [ 'subst', 'safesubst' ] array in the
parser. That's why I propose the new method.
Change-Id: I5c147c75200c3c34a410d93a0328b56ea00a050f
This patch is intentionally "incomplete". It's limited to places
where we can be 100% sure about the type just from looking at the
code. More to be done in later patches.
Change-Id: Ideea49ea9603127038ef08c6a9805f40a0b86b6d
Intentionally split across multiple patches. This is only about
documentation and impossible to break anything (other than Phan).
MagicWordArray::matchAndRemove is particularly confusing because the
documentation and structure of the returned array make it look like
it would support parameters. But it never (!) did.
The method was added like this in 2008 via commit 269a9103 (r31113).
There was always only a single caller in the Parser class. The
parser never used the array values, only the keys (via isset). Which
makes sense because that code in the parser is about "double
underscore" magic words (e.g. __NOTOC__). These don't support
parameters anyway.
Change-Id: Ife92fc3d6d5b03606ba2b209a886cadef3451fea
Setting mTitle to null has been deprecated since 1.34.
Enforce this with a type declaration, now that this is possible in PHP 7.4.
To keep existing behavior, have getPage return null if mTitle is set
to Special:Badtitle/Missing. getTitle never returned null to begin with.
Change-Id: I2e0f87265f88ed6db97957af4faee8733e27df79
The complexity is really not needed in these cases. strtr() does have
the behavior we want: It does all replacements at the same time instead
of sequentially.
We are also adding test cases for the previously uncovered
StringUtils::escapeRegexReplacement() we rely on in this patch.
Bug: T308395
Change-Id: I6741303775d6d54f3ad0d50635a986ff992ae8f4
Instead of replacing 1 character at a time the functions used here
can replace sequences of any length. This can dramatically reduce the
function call overhead.
Also make use of the `fn ()` syntax because we can.
Change-Id: I2dbc2271aa7847d9b687703f837cb0d850596ef0
The subst: magic word gets removed from $part1, but the whitespace is
not removed, so trim $part1 after the remove to ensure the next step can
detect the variable, which is using a regex without whitespaces at
begin, assuming the code has already trimmed.
Bug: T340806
Change-Id: I8eea173bdf992511989b8a433c11032d3864abc1