Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
Arlo Breault	5ed94aba15	Drop comments in cleanUpTocLine Needed-By: Ie6760dd25f937d4f6acbab1c0e1475b54878d4ed Change-Id: I10f96435f892b188cffe64b92cdf2701a3e2058b	2024-02-22 19:06:15 -05:00
Arlo Breault	909043c539	Remove empty spans while traversing in cleanUpTocLine Change-Id: I2d75bc6aa03c112c6e1dccd9a3b4f608cafde6cb	2024-02-20 19:09:00 -05:00
Arlo Breault	b05e4b98ce	Walk the dom instead of using a queryselector in cleanUpTocLine Change-Id: Ic59a4883f5b830c0c513e1836ad0de7c29a4b96d	2024-02-20 18:54:40 -05:00
Arlo Breault	89ddae6805	Remove metadata content while traversing all nodes in cleanUpTocLine Change-Id: I900cff697b1d644140d0a8755ba601d8f94abb3e	2024-02-20 18:52:28 -05:00
Subramanya Sastry	e55cc517da	Move Parser to Mediawiki\Parser namespace Bug: T166010 Co-Authored-By: Daimona Eaytoy <daimona.wiki@gmail.com> Co-Authored-By: James Forrester <jforrester@wikimedia.org> Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org> Change-Id: I79b4e732c45095eedbaa80afa5eb7479b387ed8a	2024-02-16 09:18:38 -05:00
C. Scott Ananian	f7ba84855a	Parser::getExternalLinkAttribs: Don't set rel attribute to null Paser::getExternalLinkRel() is defined to return `null` if there's no attribute to add, but then ParserOptions::getExternalLinkTarget() may try to append to it and external users might try to actually pass the $attribs to (eg) Xml::element() and become unhappy if the value is `null`. Bug: T357668 Followup-To: Ifec733a923f193b72eaba9a1e604ad4e56c0aef2 Change-Id: I907c22ef070616d81b9a50b0e807a7b8f78b59b5	2024-02-15 17:32:28 -05:00
C. Scott Ananian	e72e1cd163	Revert "Move section heading formatting to post-cache transform" This reverts commit `de0646843a`. Reason for revert: caused T357723. Change-Id: I4690c03a34e8796090563e19a214d8ede63fe5d1	2024-02-15 20:58:32 +00:00
Bartosz Dziewoński	de0646843a	Move section heading formatting to post-cache transform Previously, Parser.php used Linker::makeHeadline() in order to generate the `<h2><span class="mw-headline" id="...">...</span></h2>` markup for section headings, and this was saved in the parser cache. Now it generates heading tags with placeholder attributes like `<h2 data-mw-...="..." ...>...</h2>`, and they are replaced in a post-cache transform to generate the final heading markup, similarly to how section edit links already worked. The purpose of these changes is to allow changing the final markup depending on skin options without splitting the parser cache (T13555). Deployment and undeployment safety: * The new post-cache transform has been already added in commit Ibce512b3c4a52f74b2d2124f0159e306f2689ea5 for forward-compatibility (so that if this patch is reverted, new parser cache entries will still be shown correctly). Implementation notes: * There are many ways to keep the temporary information other than `data-mw-...` attributes, but this way is the easiest to handle in a post-cache transform (everything is on the DOM node we want to modify), is compatible with other heading-enhancing code in DiscussionTools and MobileFrontend, and remains human-readable if the post-cache transform doesn't run. * Sadly this code can't be reused to add section heading markup and section edit links to Parsoid (T269630), because it lacks some of the necessary metadata, and exposes the rest in ways that are trickier to handle in a post-cache transform (on other DOM nodes or outside the document). Bug: T13555 Change-Id: I4eae18d9d16f54391daba0de82ad05e50f07f9eb	2024-02-15 13:09:08 -05:00
jenkins-bot	cb6d6e8bae	Merge "Parser: Convert wikitext entities to HTML entities in TOC"	2024-02-12 19:52:55 +00:00
James D. Forrester	102a4f8a35	build: Upgrade mediawiki/mediawiki-phan-config from 0.13.0 to 0.14.0 manually * Switch out raw Exceptions, mostly for InvalidArgumentExceptions. * Fake exceptions triggered to give Monolog a backtrace are for some reason "traditionally" RuntimeExceptions, instead, so we continue to use that pattern in remaining locations. * Just entirely give up on PostgresResultWrapper's resource vs. object mess. * Drop now-unneeded false positive hits. Change-Id: Id183ab60994cd9c6dc80401d4ce4de0ddf2b3da0	2024-02-10 02:22:41 +00:00
Bartosz Dziewoński	fb1be73a07	Parser: Convert wikitext entities to HTML entities in TOC Bug: T355386 Bug: T324763 Change-Id: Ic0a805f29c928d0c2edf266ea045b0d29bb45a28	2024-02-09 02:00:38 +00:00
jenkins-bot	e831aa9c8b	Merge "Namespace includes/context"	2024-02-08 18:04:34 +00:00
James D. Forrester	4bae64d1c7	Namespace includes/context Bug: T353458 Change-Id: I4dbef138fd0110c14c70214282519189d70c94fb	2024-02-08 11:07:01 -05:00
C. Scott Ananian	242c6d2cf9	Introduce ParserOutput:setFromParserOptions() and use for preview flag Bug: T341010 Co-Authored-by: cananian <cananian@wikimedia.org> Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org> Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6	2024-02-07 21:22:06 -05:00
Daimona Eaytoy	7acfa6a0a5	Replace more instances of unchecked MWException Most (all?) of the remaining usages are caught somewhere and will be migrated later. Bug: T328220 Change-Id: I5c36693a5361dd75b4f1e7a0bab5ad48626ed75c	2024-01-23 16:20:53 +00:00
Arlo Breault	4318039a23	Remove redundant internal tag Change-Id: I09b282324ae8d6307ae963bede4848dbdfb2a150	2024-01-17 17:55:52 -05:00
Arlo Breault	4b987168d0	Remove unnecessary null check from Parser::braceSubstitution Parser::braceSubstitution is only called from PPFrame_Hash::expand with the result of PPNode_Hash_Tree::splitRawTemplate which always sets 'parts' to a PPNode_Hash_Array Parser::argSubstitution is similarly called without the unnecessary null check.. The comment was introduced in `e002df9` and, although true, even then the ternary may have been made redundant by a previous refactor. Change-Id: Ia1c5b8570c65c8e174c723dbd292e11c3a72f54d	2024-01-17 17:42:10 -05:00
Bartosz Dziewoński	c2c4645fa2	Parser: Normalize dot segments in URL paths Bug: T352827 Change-Id: Id90a26b656067481039fa77080417f34347f9c22	2024-01-04 01:46:33 +01:00
Fomafix	45c450aacb	Parser: Remove hard-deprecated getCustomDefaultSort and setDefaultSort getCustomDefaultSort and setDefaultSort are unused: * https://codesearch.wmcloud.org/search/?q=getCustomDefaultSort * https://codesearch.wmcloud.org/search/?q=setDefaultSort and are hard-deprecated since `dc3d489156` included in MediaWiki 1.38. Change-Id: Ib9a9622d50a5807f55be91885e473b90f98c2cb9	2023-12-29 11:19:28 +00:00
James D. Forrester	9bfb75ff90	Namespace ParserOutput Most used non-namespaced class! Bug: T353458 Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf	2023-12-14 14:57:34 -05:00
jenkins-bot	c57120300a	Merge "ParserOutput: Allow passing LinkTarget to title-related methods"	2023-12-11 18:02:25 +00:00
Isabelle Hurbain-Palatin	a3f51c732d	Refactor DefaultOutputTransform into a pipeline of transforms Bug: T348253 Change-Id: I53551ec6d6471569709c71c1155729e550f64de8	2023-12-08 18:06:19 -05:00
C. Scott Ananian	4b83285954	ParserOutput: Allow passing LinkTarget to title-related methods Broadened the argument type to allow passing LinkTarget to: * ParserOutput::addCategory() * ParserOutput::addLanguageLink() * ParserOutput::addLink() * ParserOutput::addImage() * ParserOutput::addTemplate() This allows for a tighter interface with Parsoid's ContentMetadataCollector class and avoids errors caused by passing the wrong form of string title ("text" with spaces versus "dbkey" with underscores). There are a few performance problems remaining after this patch, which only apply to use by Parsoid (not the legacy parser): 1. ::addLink() does inefficient db requests to fetch the page id for each link if the optional $id parameter is not passed. These lookups should be deferred and a LinkBatch used. (The legacy parser always passes $id.) 2. ::addTemplate() similarly requires $page_id (and $rev_id) to be passed, so is not currently usable by Parsoid. 3. ::addLanguageLink() uses Title::getFullText() which is not present in LinkTarget and is currently implemented as a full Title lookup. This is not an issue for the legacy parser, because it already has a Title object so the lookup is a no-op, but could be improved for Parsoid's use. Bug: T296023 Change-Id: If21ec8563c8a619bdde7c0cb6534bb9009480a21	2023-12-08 17:50:29 -05:00
jenkins-bot	b7fc1b2f43	Merge "Only cache expensive renderings"	2023-11-30 21:24:34 +00:00
daniel	e3fb964439	Only cache expensive renderings Pages that are fast to render can be omitted from the parser cache to preserve disk space and cache write operations. The threshold is configurable per namespace, so the tradeoff can be evaluated based on different access patterns. For example, pages that are accessed rarely, like file description pages on commons, may have a high threshold configured, while pages that are read frequently, like wikipedia articles, may be configured to be always cached, using a 0 threshold. Filtering is based on a time profile recorded in the ParserOutput. A generic mechanism for capturing the timing profile is implemented in the ContentHandler base class. Subclasses may implement a more rigorous capture mechanism. Bug: T346765 Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac	2023-11-30 20:56:12 +00:00
Martin Urbanec	29af4dd074	Move user options related classes into its own namespace There are a couple of user options related classes already, and the T321527 work on dynamic defaults is going to add even more. Let's move them into a separate namespace to make core a bit more organized. Old name is kept as an alias for compatibility purposes. Bug: T321527 Bug: T352284 Change-Id: I9822eb1553870b876d0b8a927e4e86c27d83bd52	2023-11-29 13:27:13 +01:00
Subramanya Sastry	00d64e4156	Revert "Parsoid DataAccess: Stop processing extensions as top-level docs" This reverts commit `0791724ead`. Reason for revert: Breaks math rendering in Parsoid (and hence for all clients) Change-Id: I9abe07060e5d11a9a1a2c953344eb50d4536e8c4	2023-11-28 03:59:19 +00:00
Subramanya Sastry	0791724ead	Parsoid DataAccess: Stop processing extensions as top-level docs * See T351461 and T303015 for examples where calling top-level doc parser hooks during extension processing causes problems further downstream. The hooks are: ParserAfterTidy and ParserAfterParse * Since any extension that relies on those two hooks will need a Parsoid-equivalent implementation to work properly with Parsoid, we don't need to preemptively run those hooks on a sublevel doc. We can instead let the Parsoid-compatible implementation process the full doc. * Accordingly, this patch removes the parseExtensionTagAsTopLevelDoc method from Parser.php and has DataAccess::parseWikitext simply call Parser::recursiveTagParseFully instead. Change-Id: I58e693499e1a53e0814911dc2ea424aa822b8320	2023-11-26 22:23:35 -06:00
C. Scott Ananian	3f23b09748	[parser] Broaden TOC placeholder regular expression * This broke in `0e1b889a`. * HtmlHolder (via Remex) serializes self-closing meta tags without a trailing / char. * Separately, worth exploring if HtmlHolder should use Parsoid's XML serializer. Co-Authored-By: C. Scott Ananian <cscott@cscott.net> Co-Authored-By: Subramanya Sastry <ssastry@wikimedia.org> Change-Id: I9fba68a8cfe63540fec83eb9c886e2956ba75660	2023-11-21 17:26:54 +00:00
Bartosz Dziewoński	68ccfa46ad	Use DOM to clean up headings for the table of contents (TOC) Parse the heading contents as HTML. This makes it easier to strip out some HTML tags using DOM operations, and ensures that we generate balanced HTML at the end (T218330). There are a few minor changes in behavior: * [improvement] Fixed inconsistency with Parsoid in whitespace handling around stripped tags (see changed test case 1) * [bug fix] Allows `<span dir>` even when `dir` is not the first attribute (see changed test case 2) * [improvement] Unnecessary entities are no longer preserved in the TOC (see changed test case 3a) * [bug fix] Underscores in headings are preserved in section edit link title (see changed test case 3b) * [bug fix] Attributes on `<q>` tags are now correctly removed (this behavior wasn't covered by a test case) Bug: T218330 Change-Id: Ibad7480088b82a1fd515831a9813ce18c2b1f3ea	2023-11-17 18:27:46 +01:00
thiemowmde	10a828ba72	Deprecate MagicWordFactory::getSubstIDs The main motivation is to further reduce the complexity of the class: * There is no code that ever writes to $this->mSubstIDs. It's effectively a constant. * According to CodeSearch the getSubstIDs() method is not used anywhere. It's @internal to the parser. * I find it weird that the parser needs to call 2 factory methods to do 1 thing. * I still find it a good idea to keep the knowledge encapsulated in the factory and not have the [ 'subst', 'safesubst' ] array in the parser. That's why I propose the new method. Change-Id: I5c147c75200c3c34a410d93a0328b56ea00a050f	2023-11-13 11:10:24 +01:00
jenkins-bot	c544883e84	Merge "Strip state from attributes before inserting them"	2023-10-23 14:35:42 +00:00
jenkins-bot	3285c8d5d3	Merge "parser: Add strict type constraints to MagicWord… classes"	2023-10-18 15:32:31 +00:00
jenkins-bot	70ef48b846	Merge "Improve performance of trivial encoding/decoding regexes"	2023-10-17 20:54:11 +00:00
thiemowmde	2e0301e634	parser: Add strict type constraints to MagicWord… classes This patch is intentionally "incomplete". It's limited to places where we can be 100% sure about the type just from looking at the code. More to be done in later patches. Change-Id: Ideea49ea9603127038ef08c6a9805f40a0b86b6d	2023-10-16 10:36:36 +02:00
jenkins-bot	f98ae5faa9	Merge "parser: Improve PHPDoc type hints in MagicWord… classes"	2023-10-13 00:34:27 +00:00
thiemowmde	bef3da3210	parser: Improve PHPDoc type hints in MagicWord… classes Intentionally split across multiple patches. This is only about documentation and impossible to break anything (other than Phan). MagicWordArray::matchAndRemove is particularly confusing because the documentation and structure of the returned array make it look like it would support parameters. But it never (!) did. The method was added like this in 2008 via commit `269a9103` (r31113). There was always only a single caller in the Parser class. The parser never used the array values, only the keys (via isset). Which makes sense because that code in the parser is about "double underscore" magic words (e.g. __NOTOC__). These don't support parameters anyway. Change-Id: Ife92fc3d6d5b03606ba2b209a886cadef3451fea	2023-10-11 00:07:19 +00:00
mainframe98	8451cbfa87	Parser: remove usages of $wgTitle Change-Id: Iaff236f096c2b8a966da01479b80e98b76e80425	2023-10-10 01:34:18 +00:00
mainframe98	fdfc99e01f	Parser: Remove ability to initialize mTitle to null Setting mTitle to null has been deprecated since 1.34. Enforce this with a type declaration, now that this is possible in PHP 7.4. To keep existing behavior, have getPage return null if mTitle is set to Special:Badtitle/Missing. getTitle never returned null to begin with. Change-Id: I2e0f87265f88ed6db97957af4faee8733e27df79	2023-10-09 19:32:37 +02:00
Isabelle Hurbain-Palatin	6c109970a8	Strip state from attributes before inserting them This patch fixes the referenced bug by resolving strip markers before they get stashed in an attribute. There is some concern about breaking out of the attribute (see https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/parser/Parser.php#175 around that topic), but these seem to be taken care of by the wrapping in htmlspecialchars. Bug: T347552 Change-Id: I6ce45e56c00ce8eff7e178746502afa946aba768	2023-10-09 18:19:34 +02:00
jenkins-bot	1005e7c9b9	Merge "parser: Fix detection of variable with whitespace after subst:"	2023-10-07 19:43:48 +00:00
thiemowmde	06051e1256	Replace complex preg_replace_callback with strtr/preg_replace The complexity is really not needed in these cases. strtr() does have the behavior we want: It does all replacements at the same time instead of sequentially. We are also adding test cases for the previously uncovered StringUtils::escapeRegexReplacement() we rely on in this patch. Bug: T308395 Change-Id: I6741303775d6d54f3ad0d50635a986ff992ae8f4	2023-10-05 10:47:46 +02:00
thiemowmde	f5cd1ba7ca	Improve performance of trivial encoding/decoding regexes Instead of replacing 1 character at a time the functions used here can replace sequences of any length. This can dramatically reduce the function call overhead. Also make use of the `fn ()` syntax because we can. Change-Id: I2dbc2271aa7847d9b687703f837cb0d850596ef0	2023-10-04 11:09:44 +02:00
Umherirrender	b718462479	parser: Hard-deprecate Parser::getFreshParser Bug: T325959 Depends-On: I301cfecd95db04585e0f65b7919ea1c2e2bbff2a Change-Id: I97938348407e3096187cfb41adb433a09ac77866	2023-10-03 17:01:22 +02:00
Umherirrender	87fadf2484	parser: Fix detection of variable with whitespace after subst: The subst: magic word gets removed from $part1, but the whitespace is not removed, so trim $part1 after the remove to ensure the next step can detect the variable, which is using a regex without whitespaces at begin, assuming the code has already trimmed. Bug: T340806 Change-Id: I8eea173bdf992511989b8a433c11032d3864abc1	2023-10-01 18:30:15 +00:00
James D. Forrester	468e69bccc	Namespace Sanitizer under \MediaWiki\Parser Bug: T166010 Change-Id: Id13dcbf7a0372017495958dbc4f601f40c122508	2023-09-21 05:39:23 +00:00
James D. Forrester	1d0b7ae1e2	Namespace User under \MediaWiki\User Bug: T166010 Change-Id: I7257302b485588af31384d4f7fc8e30551f161f1	2023-09-19 19:18:16 +00:00
jenkins-bot	14e52d187d	Merge "Parser: use PHPDoc comments on properties, typed private properties"	2023-09-19 05:53:21 +00:00
James D. Forrester	5bc2a04b08	Namespace remaining Title-related classes under \MediaWiki\Title Bug: T166010 Change-Id: Ia2e5a7367cc8cdbd8a7b845ae2fd5d776ff22891	2023-09-19 05:21:23 +00:00
James D. Forrester	b16be7a36c	Namespace TitleFormatter under \MediaWiki\Title One of the big ones, so doing this alone. Bug: T166010 Change-Id: Ic2d59eb6764b1a273ed7162ecabf641f638b8f66	2023-09-19 05:17:18 +00:00

1 2 3 4 5 ...

1480 commits