Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
C. Scott Ananian	242c6d2cf9	Introduce ParserOutput:setFromParserOptions() and use for preview flag Bug: T341010 Co-Authored-by: cananian <cananian@wikimedia.org> Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org> Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6	2024-02-07 21:22:06 -05:00
Subramanya Sastry	6e5413b1d8	ParsoidParser: Record page title in ParserCache entries * This lets post-cache transforms have access to the title. * Specifically, DiscussionTools uses this to post-process the HTML. Bug: T341010 Change-Id: I328f533e6cdb11c0c3a873d23bab1a113dfa39be	2023-10-30 13:36:36 -05:00
Subramanya Sastry	225be51fa7	ParsoidParser: Register watcher after creating ParserOutput object * Updated documentation around this point * Adjust tests to reflect this change. * While it initially appeared that this can cause ParserCache impacts, 'disableContentConversion' isn't part of the cache key and thus has no deployment impacts. Change-Id: I535cb21cc104a358aa70829b030ae3751b76ae00	2023-10-17 17:51:19 -05:00
Daimona Eaytoy	6b1a62e169	Fix more non-database tests accessing the database Mock the needed services, or set fixed values to avoid DB lookups, when possible. Add the test to the Database group otherwise, e.g. for things like Skin and Parser that use global state all over the place. Change-Id: I8d87013d89accaf04d0ac19cb4b7216290383eb5	2023-08-06 15:30:41 +00:00
Subramanya Sastry	68805e2f50	ParsoidParser: Record ParserOptions watcher on ParserOutput object * ParsoidParser hadn't registered a watcher on ParserOptions so far. Because of this, you can see that the current parser cache key (in deployed production code) doesn't have 'useParsoid=1' in it. Ex: View source on enwiki:Hospet shows that the parser cache key there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical". The only reason this doesn't conflict with legacy parser output is because we use "parsoid-pcache", a diferent cache instance than "pcache" used for legacy parser output. But if/when we decide to use the same parser cache instance, this could cause cache corruptions. With FlaggedRevisions, where a single "stable-pcache" parser cache instance is used, in local testing, this was causing Parsoid HTML to be saved without "useParsoid=1", and so Parsoid HTML was being returned for legacy parser cache requests. * In addition, fix the code in PageBundleParserOutputConverter to copy over internal metadata (which includes used options). This ensures that any tracked parser options aren't lost and the right parser cache key is constructed later on. * Added / updated a number of new tests that verifies that usedOptions is tracked correctly in the useParsoid code paths. The tests fail without the code changes in this patch. Bug: T340703 Bug: T335157 Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7	2023-07-11 10:53:11 -05:00
Umherirrender	d36073cdcf	tests: Make some PHPUnit data providers static Initally used a new sniff with autofix (T333745), but some provide are defined non-static in TestBase class and need more work to make them static in a compatible way Bug: T332865 Change-Id: I889d33424f0c01fb26f2d86f8d4fc3de3e568843	2023-05-20 01:05:27 +02:00
C. Scott Ananian	cfd9c516e1	Allow setting a ParserOption to generate Parsoid HTML This is an initial quick-and-dirty implementation. The ParsoidParser class will eventually inherit from \Parser, but this is an initial placeholder to unblock other Parsoid read views work. Currently Parsoid does not fully implement all the ParserOutput metadata set by the legacy parser, but we're working on it. This patch also addresses T300325 by ensuring the the Page HTML APIs use ParserOutput::getRawText(), which will return the entire Parsoid HTML document without post-processing. This is what the Parsoid team refers to as "edit mode" HTML. The ParserOutput::getText() method returns only the <body> contents of the HTML, and applies several transformations, including inserting Table of Contents and style deduplication; this is the "read views" flavor of the Parsoid HTML. We need to be careful of the interaction of the `useParsoid` flag with the ParserCacheMetadata. Effectively `useParsoid` should always be marked as "used" or else the ParserCache will assume its value doesn't matter and will serve legacy content for parsoid requests and vice-versa. T330677 is a follow up to address this more thoroughly by splitting the parser cache in ParserOutputAccess; the stop gap in this patch is fragile and, because it doesn't fork the ParserCacheMetadata cache, may corrupt the ParserCacheMetadata in the case when Parsoid and the legacy parser consult different sets of options to render a page. Bug: T300191 Bug: T330677 Bug: T300325 Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011	2023-03-26 21:46:05 -04:00

7 commits