Commit graph

7 commits

Author SHA1 Message Date
C. Scott Ananian
242c6d2cf9 Introduce ParserOutput:setFromParserOptions() and use for preview flag
Bug: T341010
Co-Authored-by: cananian <cananian@wikimedia.org>
Co-Authored-by: ihurbain <ihurbainpalatin@wikimedia.org>
Change-Id: I03125fdaa7dd71ba57d593e85ecb98be6806f3f6
2024-02-07 21:22:06 -05:00
Subramanya Sastry
6e5413b1d8 ParsoidParser: Record page title in ParserCache entries
* This lets post-cache transforms have access to the title.
* Specifically, DiscussionTools uses this to post-process the HTML.

Bug: T341010
Change-Id: I328f533e6cdb11c0c3a873d23bab1a113dfa39be
2023-10-30 13:36:36 -05:00
Subramanya Sastry
225be51fa7 ParsoidParser: Register watcher after creating ParserOutput object
* Updated documentation around this point
* Adjust tests to reflect this change.
* While it initially appeared that this can cause ParserCache impacts,
  'disableContentConversion' isn't part of the cache key and thus
  has no deployment impacts.

Change-Id: I535cb21cc104a358aa70829b030ae3751b76ae00
2023-10-17 17:51:19 -05:00
Daimona Eaytoy
6b1a62e169 Fix more non-database tests accessing the database
Mock the needed services, or set fixed values to avoid DB lookups, when
possible. Add the test to the Database group otherwise, e.g. for things
like Skin and Parser that use global state all over the place.

Change-Id: I8d87013d89accaf04d0ac19cb4b7216290383eb5
2023-08-06 15:30:41 +00:00
Subramanya Sastry
68805e2f50 ParsoidParser: Record ParserOptions watcher on ParserOutput object
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
  Because of this, you can see that the current parser cache key
  (in deployed production code) doesn't have 'useParsoid=1' in it.

  Ex: View source on enwiki:Hospet shows that the parser cache key
  there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".

  The only reason this doesn't conflict with legacy parser output
  is because we use "parsoid-pcache", a diferent cache instance than
  "pcache" used for legacy parser output. But if/when we decide to use
  the same parser cache instance, this could cause cache corruptions.

  With FlaggedRevisions, where a single "stable-pcache" parser cache
  instance is used, in local testing, this was causing Parsoid HTML to be
  saved without "useParsoid=1", and so Parsoid HTML was being returned
  for legacy parser cache requests.

* In addition, fix the code in PageBundleParserOutputConverter to copy
  over internal metadata (which includes used options). This ensures
  that any tracked parser options aren't lost and the right parser cache
  key is constructed later on.

* Added / updated a number of new tests that verifies that usedOptions
  is tracked correctly in the useParsoid code paths. The tests fail
  without the code changes in this patch.

Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
2023-07-11 10:53:11 -05:00
Umherirrender
d36073cdcf tests: Make some PHPUnit data providers static
Initally used a new sniff with autofix (T333745),
but some provide are defined non-static in TestBase class
and need more work to make them static in a compatible way

Bug: T332865
Change-Id: I889d33424f0c01fb26f2d86f8d4fc3de3e568843
2023-05-20 01:05:27 +02:00
C. Scott Ananian
cfd9c516e1 Allow setting a ParserOption to generate Parsoid HTML
This is an initial quick-and-dirty implementation.  The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.

Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.

This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing.  This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.

We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata.  Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa.  T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.

Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011
2023-03-26 21:46:05 -04:00