This avoids confusion with the "render timestamp" held by the cache,
and is consistent with ::get*RevisionId() etc.
The old ::getTimestamp() and ::setTimestamp() methods have been
deprecated.
Change-Id: Idb5e687709c98086c5d3075d31885c58a0723197
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse). This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface. Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.
In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value. Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538). The primary location for
setting cache properties is the ContentRenderer.
Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths. An extra parameter was added to
ContentRenderer::getParserOutput() to support this.
Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.
In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it
should only be necessary in the ContentRenderer.
Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
This has been replaced by ::getLinkTarget(), which returns a Parsoid
LinkTarget. This is identical to the core LinkTarget interface, but
we can't quite alias them for technical reasons (sigh). In actual
practice, LinkTargets generated by core are usually Title objects, so
Title::newFromLinkTarget() is a no-op that just returns the argument
after a type check.
It appears that newer code uses a TitleFormatter rather than calling
methods on Title, but TitleFormatter currently takes LinkTarget not a
ParsoidLinkTarget. That would force us to go via
TitleValue::newFromLinkTarget() which isn't a simple type check.
Change-Id: I490bb38108d0202b43ea2a9b391b2e664e7d2d48
This relocates the code added in 95d3c025b0.
Also: this is just a small bit of extra CSS, so it can be a ModuleStyle
not a full Module.
Bug: T335157
Depends-On: I9320e3083d2e71db42fb1348dcd3bea01d22cc5c
Change-Id: Iadedba5b41190ea4665f28db61f9565d914774b3
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.
The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.
Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.
Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
* This lets post-cache transforms have access to the title.
* Specifically, DiscussionTools uses this to post-process the HTML.
Bug: T341010
Change-Id: I328f533e6cdb11c0c3a873d23bab1a113dfa39be
* I had already used this on one property of one file here
and noticed that Isabelle used this on a newly created
class in output transform and that prompted me to switch
over all these files.
* I am about to start adding new files here for new hooks for
DiscussionTools and updated everything in this namesspace
to keep usage consistent.
* This exposed initialization and bad typing issues in
SiteConfig.php and LanguageVariantConverter.php
Change-Id: I35f131a8f584ccc82a915dbfb1b50b3ef1ec6b06
* Updated documentation around this point
* Adjust tests to reflect this change.
* While it initially appeared that this can cause ParserCache impacts,
'disableContentConversion' isn't part of the cache key and thus
has no deployment impacts.
Change-Id: I535cb21cc104a358aa70829b030ae3751b76ae00
* Updated ParserOutput to set Parsoid render ids that REST API
functionality expects in ParserOutput objects.
* CacheThresholdTime functionality no longer exists since it was
implemented in ParsoidOutputAccess and ParserOutputAccess doesn't
support it. This is tracked in T346765.
* Enforce the constraint that uncacheable parses are only for fake or
mutable revisions. Updated tests that violated this constraint to
use 'getParseOutput' instead of calling the parse method directly.
* Had to make some changes in ParsoidParser around use of preferredVariant
passed to Parsoid. I also left some TODO comments for future fixes.
T267067 is also relevant here.
PARSOID-SPECIFIC OPTIONS:
* logLinterData: linter data is always logged by default -- removed
support to disable it. Linter extension handles stale lints properly
and it is better to let it handle it rather than add special cases
to the API.
* offsetType: Moved this support to ParsoidHandler as a post-processing
of byte-offset output. This eliminates the need to support this
Parsoid-specific options in the ContentHandler hierarchies.
* body_only / wrapSections: Handled this in HtmlOutputRendererHelper
as a post-processing of regular output by removing sections and
returning the body content only. This does result in some useless
section-wrapping work with Parsoid, but the simplification is probably
worth it. If in the future, we support Parsoid-specific options in
the ContentHandler hierarchy, we could re-introduce this. But, in any
case, this "fragment" flavor options is likely to get moved out of
core into the VisualEditor extension code.
DEPLOYMENT:
* This patch changes the cache key by setting the useParsoid option
in ParserOptions. The parent patch handles this to ensure we don't
encounter a cold cache on deploy.
TESTS:
* Updated tests and mocks to reflect new reality.
* Do we need any new tests?
Bug: T332931
Change-Id: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
* Explicitly set wrapSections to true. This has have no significant
impact since it defaults to true within Parsoid.
* 'pageName' and 'prefix' removed from ParsoidOutputAccess since
they are not needed / used in Parsoid.
* 'logLinterData' need to be set in the ParserOutputAccess paths.
* A bunch of documentation FIXMEs as I was digging through the code.
* Record a FIXME that ParsoidOutputAccess and ParsoidParser (which
is used in the ParserOutputAccess use page) differ in how they
handle the language value (whether the default value of the title /
page or the pageLanguageOverride from the REST API). ParsoidParser
computes a preferred variant whereas ParsoidOutputAccess right now
does NOT do that. So, as part of the switchover to ParserOutputAccess,
we will need to set disableContentConversion in ParserOptions.
That will happen in a later patch.
Bug: T332931
Change-Id: I7326ae3452a7d496a57f5c4ff2ddeaf0daa7ab70
LanguageVariantConverterUnitTest: don't mock a method in the Parsoid
class that no longer exists.
ParsoidParser: pass a Bcp47Code (in the form of a Language object),
not a string, when selecting the preferred variant for the output
Followup-To: Ib8554f98b1c653df3864110e0e66796b8da67b5f
Change-Id: I32fd64a9495b8aed729b0b5b00535180006e0223
Now that the latest Parsoid has been released to mediawiki-vendor,
the method_exists() calls aren't necessary.
Bug: T343155
Followup-To: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Change-Id: I081225a268d608f763814245f9cab1c44bf49bad
The method_exists are kept, not sure if old objects are in any cache
Follow-Up: I9da2566cc003e2f05cae16229444dcf3baf61fa4
Bug: T343155
Change-Id: I0aaa3dce26df1619bedc39696a115145a61d4d14
This allows any bad cached parses due to a train deploy to be selectively
rolled back in the RejectParserCacheValue hook, which provides some
operational insurance against corrupted caches. The version is also
added to the debug information in the HTML footer to aid diagnosis
of any issue in real time.
Depends-On: I3d3caabd959c1ba16f4dc702c2eae38d5d4dcb14
Change-Id: Ibb37a82ec0ce764aefd8c9fab2868073a66301ec
* ParsoidParser hadn't registered a watcher on ParserOptions so far.
Because of this, you can see that the current parser cache key
(in deployed production code) doesn't have 'useParsoid=1' in it.
Ex: View source on enwiki:Hospet shows that the parser cache key
there is "enwiki:parsoid-pcache:idhash:2360619-0!canonical".
The only reason this doesn't conflict with legacy parser output
is because we use "parsoid-pcache", a diferent cache instance than
"pcache" used for legacy parser output. But if/when we decide to use
the same parser cache instance, this could cause cache corruptions.
With FlaggedRevisions, where a single "stable-pcache" parser cache
instance is used, in local testing, this was causing Parsoid HTML to be
saved without "useParsoid=1", and so Parsoid HTML was being returned
for legacy parser cache requests.
* In addition, fix the code in PageBundleParserOutputConverter to copy
over internal metadata (which includes used options). This ensures
that any tracked parser options aren't lost and the right parser cache
key is constructed later on.
* Added / updated a number of new tests that verifies that usedOptions
is tracked correctly in the useParsoid code paths. The tests fail
without the code changes in this patch.
Bug: T340703
Bug: T335157
Needed-By: I0e954949768044eea6ec275a36d0d6d7ed457e8e
Change-Id: I076d5d362bdfd9d4b2ca8886bf6b30c1a746aee7
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.
Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.
Do the same with the corresponding TitleFactory methods.
Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).
Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
This is an initial quick-and-dirty implementation. The
ParsoidParser class will eventually inherit from \Parser,
but this is an initial placeholder to unblock other Parsoid
read views work.
Currently Parsoid does not fully implement all the ParserOutput
metadata set by the legacy parser, but we're working on it.
This patch also addresses T300325 by ensuring the the Page HTML
APIs use ParserOutput::getRawText(), which will return the entire
Parsoid HTML document without post-processing. This is what
the Parsoid team refers to as "edit mode" HTML. The
ParserOutput::getText() method returns only the <body> contents
of the HTML, and applies several transformations, including
inserting Table of Contents and style deduplication; this is
the "read views" flavor of the Parsoid HTML.
We need to be careful of the interaction of the `useParsoid` flag with
the ParserCacheMetadata. Effectively `useParsoid` should *always* be
marked as "used" or else the ParserCache will assume its value doesn't
matter and will serve legacy content for parsoid requests and
vice-versa. T330677 is a follow up to address this more thoroughly by
splitting the parser cache in ParserOutputAccess; the stop gap in this
patch is fragile and, because it doesn't fork the ParserCacheMetadata
cache, may corrupt the ParserCacheMetadata in the case when Parsoid
and the legacy parser consult different sets of options to render a
page.
Bug: T300191
Bug: T330677
Bug: T300325
Change-Id: Ica09a4284c00d7917f8b6249e946232b2fb38011