This allows HtmlOutputRendererHelper to function for all kinds of
content.
Bug: T311728
Bug: T311648
Bug: T359426
Change-Id: Ib32af7cf2a7ad989eb0b13ecca37c857fc9199ec
* Switch out raw Exceptions, mostly for InvalidArgumentExceptions.
* Fake exceptions triggered to give Monolog a backtrace are for
some reason "traditionally" RuntimeExceptions, instead, so we
continue to use that pattern in remaining locations.
* Just entirely give up on PostgresResultWrapper's resource vs. object mess.
* Drop now-unneeded false positive hits.
Change-Id: Id183ab60994cd9c6dc80401d4ce4de0ddf2b3da0
This avoids confusion with the "render timestamp" held by the cache,
and is consistent with ::get*RevisionId() etc.
The old ::getTimestamp() and ::setTimestamp() methods have been
deprecated.
Change-Id: Idb5e687709c98086c5d3075d31885c58a0723197
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse). This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface. Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.
In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value. Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538). The primary location for
setting cache properties is the ContentRenderer.
Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths. An extra parameter was added to
ContentRenderer::getParserOutput() to support this.
Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.
In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it
should only be necessary in the ContentRenderer.
Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
* I had already used this on one property of one file here
and noticed that Isabelle used this on a newly created
class in output transform and that prompted me to switch
over all these files.
* I am about to start adding new files here for new hooks for
DiscussionTools and updated everything in this namesspace
to keep usage consistent.
* This exposed initialization and bad typing issues in
SiteConfig.php and LanguageVariantConverter.php
Change-Id: I35f131a8f584ccc82a915dbfb1b50b3ef1ec6b06
* Parsoid REST API which considers both title and revid will soon
be made internal. The core REST API only has endpoints where either
the title OR the revid is provided and is not subject to this issue.
* This patch ignores page id mismatches and simply uses the revision
page id where the mismatch is detected. This is only supported for
ParsoidHandler which sets the lenient revision handling while fetching
the HtmlOutputRendererHelper.
For these API requests, the output is not cached.
* Local testing shows that this fixes the issue.
* Added new phpunit tests to ParsoidHandlerTest to verify expectations.
Also verified that disabling this fix fails that test.
Bug: T349235
Change-Id: I2f4a4a644710ee1e3894e6dc6a066eb37846bdfd
* Updated ParserOutput to set Parsoid render ids that REST API
functionality expects in ParserOutput objects.
* CacheThresholdTime functionality no longer exists since it was
implemented in ParsoidOutputAccess and ParserOutputAccess doesn't
support it. This is tracked in T346765.
* Enforce the constraint that uncacheable parses are only for fake or
mutable revisions. Updated tests that violated this constraint to
use 'getParseOutput' instead of calling the parse method directly.
* Had to make some changes in ParsoidParser around use of preferredVariant
passed to Parsoid. I also left some TODO comments for future fixes.
T267067 is also relevant here.
PARSOID-SPECIFIC OPTIONS:
* logLinterData: linter data is always logged by default -- removed
support to disable it. Linter extension handles stale lints properly
and it is better to let it handle it rather than add special cases
to the API.
* offsetType: Moved this support to ParsoidHandler as a post-processing
of byte-offset output. This eliminates the need to support this
Parsoid-specific options in the ContentHandler hierarchies.
* body_only / wrapSections: Handled this in HtmlOutputRendererHelper
as a post-processing of regular output by removing sections and
returning the body content only. This does result in some useless
section-wrapping work with Parsoid, but the simplification is probably
worth it. If in the future, we support Parsoid-specific options in
the ContentHandler hierarchy, we could re-introduce this. But, in any
case, this "fragment" flavor options is likely to get moved out of
core into the VisualEditor extension code.
DEPLOYMENT:
* This patch changes the cache key by setting the useParsoid option
in ParserOptions. The parent patch handles this to ensure we don't
encounter a cold cache on deploy.
TESTS:
* Updated tests and mocks to reflect new reality.
* Do we need any new tests?
Bug: T332931
Change-Id: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
* Was used during the Parsoid JS -> PHP port and is no longer used.
* This also eliminated the need to inject ParsoidSettings into some
classes.
* Once this merges and lands in core, I'll remove this from the Parsoid
repo as well.
Change-Id: I008d30ea81f5a3db26e512c87762b90e3ca3c4ff
This class is used heavily basically everywhere, moving it to Utils
wouldn't make much sense. Also with this change, we can move
StatusValue to MediaWiki\Status as well.
Bug: T321882
Depends-On: I5f89ecf27ce1471a74f31c6018806461781213c3
Change-Id: I04c1dcf5129df437589149f0f3e284974d7c98fa
* Explicitly set wrapSections to true. This has have no significant
impact since it defaults to true within Parsoid.
* 'pageName' and 'prefix' removed from ParsoidOutputAccess since
they are not needed / used in Parsoid.
* 'logLinterData' need to be set in the ParserOutputAccess paths.
* A bunch of documentation FIXMEs as I was digging through the code.
* Record a FIXME that ParsoidOutputAccess and ParsoidParser (which
is used in the ParserOutputAccess use page) differ in how they
handle the language value (whether the default value of the title /
page or the pageLanguageOverride from the REST API). ParsoidParser
computes a preferred variant whereas ParsoidOutputAccess right now
does NOT do that. So, as part of the switchover to ParserOutputAccess,
we will need to set disableContentConversion in ParserOptions.
That will happen in a later patch.
Bug: T332931
Change-Id: I7326ae3452a7d496a57f5c4ff2ddeaf0daa7ab70
A shared get content assertion is added to PageConfigFactory::create
Bug: T338925
Bug: T336501
Follows-Up: I647ed253691970bbf39321a3cd652ea14bc11278
Change-Id: Iaf3898e5c53f1673ade639f7990911e4595801a8
It is very easy for developers and maintainers to mix up "internal
MediaWiki language codes" and "BCP-47 language codes"; the latter are
standards-compliant and used in web protocols like HTTP, HTML, and
SVG; but much of WMF production is very dependent on historical codes
used by MediaWiki which in some cases predate the IANA standardized
name for the language in question.
Phan and other static checking tools aren't much help distinguishing
BCP-47 from internal codes when both are represented with the PHP
string type, so the wikimedia/bcp-47-code package introduced a very
lightweight wrapper type in order to uniquely identify BCP-47 codes.
Language implements Bcp47Code, and LanguageFactory::getLanguage() is
an easy way to convert (or downcast) between Bcp47Code and Language
objects.
This patch updates the Parsoid integration code and the associated
REST handlers to use Bcp47Code in APIs so that the standalone Parsoid
library does not need to know anything about MediaWiki-internal codes.
The principle has been, first, to try to convert a string to a
Bcp47Code as soon as possible and as close to the original input as
possible, so it is easy to see *why* a given string is a BCP-47 code
(usually, because it is coming from HTTP/HTML/etc) and we're not stuck
deep inside some method trying to figure out where a string we're
given is coming from and therefore what sort of string code it might
be. Second, we've added explicit compatibility code to accept
MediaWiki internal codes and convert them to Bcp47Code for backward
compatibility with existing clients, using the @internal
LanguageCode::normalizeNonstandardCodeAndWarn() method. The intention
is to gradually remove these backward compatibility thunks and replace
them with HTTP 400 errors or wfDeprecated messages in order to
identify and repair callers who are incorrectly using
non-standard-compliant language codes in web standards
(HTTP/HTML/SVG/etc).
Finally, maintaining a code as a Bcp47Code and not immediately
converting to Language helps us delay or even avoid full loading of a
Language object in some cases, which is another reason to occasionally
push Bcp47Code (instead of Language) down the call stack.
Bug: T327379
Depends-On: I830867d58f8962d6a57be16ce3735e8384f9ac1c
Change-Id: I982e0df706a633b05dcc02b5220b737c19adc401
The Parsoid entrypoints should always have a "real" ParserOutput
passed as the ContentMetadataCollector object, so that recursive
invocations of extensions, etc, can set appropriate metadata
properties in the ParserOutput.
This is part of a belt-and-suspenders fix for T331084, where a
StubMetadataCollector is being used in production -- production should
never use a stub, it should always use a real ParserOutput object.
The other fix for T331084 is
I30ea2bb24e6c9b0950a8f46dc8e5b9bf5ee3378b, which ensures that if you
*were* to use a StubMetadataCollector in production, it wouldn't throw
an error when a numeric category string was encountered.
Bug: T331084
Change-Id: I8711a51fc1bcac48eae92ab1ba15a33fe05937ed
Pages with content models not supported by Parsoid before already
had dummy parser output written to PC but we still got an internal
server error because the output didn't have a render key.
This patch fixes the issue and when we try to render page with
unsupported content models, we get the dummy parser outputs.
Bug: T311728
Change-Id: I49ebbfc0475fb296f2a906ce7dce237641fb375b
The motivation is to restore parsoid support for the content models
defined in the Proofread extension.
Bug: T246403
Change-Id: I33d269e42fede28139f7c923504326a77d11ee13
The previous attempt at restoring call to the ParserLogLinterData hook
had the undesirable effect of bypassing parser cache. This change
optionally enables the call to the lind data hook without disabling
parser cache.
This patch us working under the assumption that we only need to log lint
data for canonical parses.
Follow-Up to I1f69498ef75
Change-ID: I39ab54ca6e9f9a6a58b59cdd6feea07638fc908f
While RESTbase insists on getting Parsoid renderings for any content
model, don't waste CPU cycles trying to render garbage. Just output
dummy content. Nobody should ever see it.
Bug: T324711
Change-Id: I407171a5f515b594603b287a7a9e49f54e837161
ChangeProp is currently requesting a parsoid parse for all page updates,
regardless of content model. Parsoid renderings of non-wikitext content are
unusable, so we shouldn't bother the parser cache with them. This is
especially true for wikidata items.
Bug: T324711
Change-Id: I6f6325f2b8581dfcc9a8bcd97281ccf4caa7e8f1
This allows parsoid render anything, even if the output is garbage.
This is a quick fix pending the real solution (T311728).
Bug: T324711
Change-Id: If4e4eb8582ab8a62f592394820b30c1b28fb1216
This restores change Ie430acd0753880d88370bb9f22bb40a0f9ded917.
This reverts commit ab6baad1a5.
NOTE: Also needs the patch the fixes the original reason for the
revert: Ief721c23ed9a57d781cfdac625a62113f22f87a5
Change-Id: Ic48db1b5fdff1dfd4f2d2643d64252e5fc721e79
NOTE: This causes Parsoid output to be written to the parser cache.
This should be unconditional in the future, but for now it is
controled by wgTemporaryParsoidHandlerParserCacheWriteRatio.
This change affects the following endpoints that use the wt2html method:
* /coredev/v0/transform/wikitext/to/html in core
* /{domain}/v3/transform/wikitext/to/html from parsoid
* /{domain}/v3/page/html/{title} from parsoid
The /v1/page/{title}/html endpoint is not affected, since it
doesn't use wt2html, but has always been using HtmlOutputRendererHelper
directly.
Bug: T322672
Depends-On: Ic37f606bb51504c8164d005af55ca9a65f595041
Change-Id: Ie430acd0753880d88370bb9f22bb40a0f9ded917
This adds setters to HtmlOutputRendererHelper which allow it to be used
more conveniently in different contexts. This is aimed specifically at
making it easier for DirectParsoidClient in the VisualEditor extension
to re-use this code.
NOTE: HtmlOutputRendererHelper is declared @unstable, but the changes in
this patch need to be backwards compatible at least temporarily, to
allow the VisualEditor extension to be updated in a follow-up.
Change-Id: I18c8bc6f5aa7c204f0faa56919bfe64026761bd4
This is needed by VE when performing Wikitext -> HTML transformation
during editing.
Also, this patch introduces the new flavor: fragment, that is passed in
via $envOptions to activate VisualEditor's body only mode functionality.
NOTE: This patch also fixes a PHPUnit test that broke by correctly
injecting the appropriate parsoid instance for checking error handling.
Bug: T308743
Change-Id: I838a3b05d7d8523a469236cf112158349063283c
If a render ID is given via the use-cache parameter, but the key is not
found in the parsoid stash, look at the most recent known rendering of
the revision, and use it if it matches the render ID.
This patch moves the responsibility for looking up RevisionRecords and
PageRecords into ParsoidOutputAccess. This way, callers only need to
have a PageIdentity, and optionally a revision ID.
Bug: T318395
Change-Id: I1aa5b0fd9fb1acaa2544d5a58125fa3810a0eb39
In Ie87f823e721ed5ae9d49cf7ead8e77cbef254cd7, we changed the signature
of `parse()` to accept a PageIdentity instead of PageRecord and it broke
some tests in other places, specifically: HtmlOutputRendererHelperTest,
so this patch fixes the interfaces.
Change-Id: I35685412c52f7d4ae9e63960695e686fb2bb9b21
Move code to create ParserOutput from PageBundle and vice versa to a
separate final class. An final class was used instead of a trait
because traits do not support constants for PHP version < 8.2.
The plan is to use this final class in various interfaces in order
to avoid exposing them to Parsoid concepts.
Bug: T317019
Change-Id: I33076c359ee45719c1c4ef63f77c1f1285951d0c
When parsing content for page creation, we need to be able to pass the
page language explicitly. This patch allows the language to be looped
through from ParsoidHTMLHelper via ParsoidOutputAccess into Parsoid
itself.
Change-Id: I1bbf9c2180de2d91679edbc9d73adfe44075dde3
The ParsoidOutputAccess service shouldn't know about caching
parsoid outputs into the parsoid parser cache.
This patch refactors the getParserOutput() method on ParsoidOutputAccess
so that it handles the caching and let parse() to focus on parsing
only.
Change-Id: Ia75271213b8f0a0d2a3ad6e47141f3d912b8713c
* Introduce a method in ParsoidOutputAccess that parses and returns
a parse output directly without caring about cache.
* Parse a non-existent page with the new method when the page object
is not a PageRecord, but a PageIdentity
Change-Id: Ie87f823e721ed5ae9d49cf7ead8e77cbef254cd7
Parsoid currently only supports wikitext (and JSON), so don't give it anything else.
NOTE: ParsoidOutputAccess will fail on content that is unsupported by parsoid.
This will however not affect the /transform and /page endpoints in the
parsoid extension, since they use the ParsoidHandler base class, which doesn't
rely on ParsoidOutputAccess.
Bug: T301371
Change-Id: I6bc9b978947b31455a4bce6385b7bdf64ed4043c
This removes a cyclic dependency:
ParsoidHTML helper in the REST component uses ParsoidOutputAccess in the
parser component. So ParsoidOutputAccess cannot use LocalizedHttpException
from the REST component.
This also improves separation of concerns: the parsing component should
not be concerned with HTTP status codes.
Bug: T301371
Change-Id: I2e661fe3ce0824dbfd7579650972f9019c92ed59
This isolates ParsoidHTMLHelper from the internal of
ParsoidOutputAccess. The corresponding test cases were changed to use a
mock ParsoidOutputAccess, and to not test the behavior of
ParsoidOutputAccess.
Bug: T301371
Change-Id: Id693fae2264f15e5d35f28acc5adc4239b2ae24f
This patch introduces a ParsoidOutputAccess service for
getting parsoid outputs and warms the cache with pregenerated
outputs.
It also introduces a config variable in ParsoidCacheConfig that
is turned off by default for controlling the cache warming.
Bug: T301371
Change-Id: I6152c42ea765d94093d8d62598b1b4278314adec