* Apply Legacy Temporary redirects (302) if page is a redirect in order
to have feature parity with RESTBase
* Check for normalization redirects and execute permanent redirects (301)
* Add/Update mocha tests for the redirects functionality
* Add query parameter 'redirect=no' check to bypass redirect logic
* Unit tests to check status code and location headers
Bug: T301372
Change-Id: I841c21d54a58e118617aaf5e2c604ea22914adaa
When no original HTML and no etag is provided,
selser should still be attempted based on a rendered
version of the old wikitext.
Change-Id: Ic3cc3a598f32cad6122964cb8a7376a56be9129f
When visual editor switches from source mode to visual mode, we need to
stash the wikitext. Otherwise, we later lack the proper context to
convert the modified HTML back to wikitext.
Bug: T321862
Change-Id: Id611e6e022bf8d9d774ca1a3a214220ada713285
This patch is intended to allow HtmlInputTransformHelper to be used
without constructing a fake request body. The idea is to make it easier
to use it in action API modules, such as ApiVisualEditorEdit.
Change-Id: I4002342820b19060da30b6fb8622c85c49eec6a0
* We will have several kinds of HTML transformations.
Rename HTMLTransform to indicate that its for converting HTML to Content
objects.
* Using Naming Convention 'Html' instead of 'HTML'
Change-Id: I506f3303ae8f9e4db17299211366bef1558f142c
Variant conversion is based on the Accept-Language header. Updated
the HtmlOutputRendererHelper to set the HTTP headers related to
variant conversion.
Bug: T317019
Change-Id: I5e11452f1c531a757e8d860f9c727b5810406bce
This adds setters to HtmlOutputRendererHelper which allow it to be used
more conveniently in different contexts. This is aimed specifically at
making it easier for DirectParsoidClient in the VisualEditor extension
to re-use this code.
NOTE: HtmlOutputRendererHelper is declared @unstable, but the changes in
this patch need to be backwards compatible at least temporarily, to
allow the VisualEditor extension to be updated in a follow-up.
Change-Id: I18c8bc6f5aa7c204f0faa56919bfe64026761bd4
This is needed by VE when performing Wikitext -> HTML transformation
during editing.
Also, this patch introduces the new flavor: fragment, that is passed in
via $envOptions to activate VisualEditor's body only mode functionality.
NOTE: This patch also fixes a PHPUnit test that broke by correctly
injecting the appropriate parsoid instance for checking error handling.
Bug: T308743
Change-Id: I838a3b05d7d8523a469236cf112158349063283c
If we don't have a render id given, but we do have a revision id, we can
fall back to the current rendering that revision to provide a baseline for
selser. This is better than no selser. On wikis that do not heavily rely
on templates, or where templates rarely change while an edit is in
progress, this will produce a clean diff.
Bug: T318398
Change-Id: If7612cc6e64d1f1243289b7d6ba96c71f09fe15d
ParsoidHandler should pass the metrics object from the
SiteConfig to HtmlInputTransformHelper, instead of using the global
metrics instance. Otherwise, the metricsPrefix defined in the parsoid
settings is ignored.
Change-Id: Ie85f2306e8b0f123b9fdd737faffdd85117015b1
What was previously a REST API-only feature (the thumbnails
hook allowing for thumbnails for non-file pages via the
PageImages extension) is now also being adopted in the main
search page.
That hook will now be called with NS_FILE result thumbnails
pre-filled, which was not the case previously. PageImages
essentially duplicated NS_FILE thumbnail logic that was
already present in Special:Search, so that can (and will
in a follow-up patch) then be removed there. Special:Search
will then simply take whatever is produced from the provider
(which will include both NS_FILE thumbs - which it handled
already - as well as whatever else it receives from the hook),
as will the REST API (which already received both)
Since thumbnails can now come in for multiple namespaces &
having some of those results with & others without a thumbnail
can be quite jarring, it was decided that we'd display
placeholder images (for certain namespaces). This is now
controlled by $wgThumbnailNamespaces.
I also split up a few things in FullSearchResultWidget::
generateFileHtml for more clarity.
Meanwhile also updated mediawiki.special.search.styles.less
to use variables for known colors.
Also implemented a 'transform' (required for testing this
change properly) and 'getDisplayWidthHeight' (it became
needed after implementing transform) callback function for
mock Files, and updated some existing tests in response to
these changes.
And some more Rest test files have been updated to allow
passing around a HookContainer instead of only an array of
hooks (from which a new HookContainer would then be created)
to allow the same container to be used across all relevant
objects, who may have it injected as dependency.
Bug: T306883
Change-Id: I2a679b51758020d3e822da01a1bde1ae632b0b0a
If a render ID is given via the use-cache parameter, but the key is not
found in the parsoid stash, look at the most recent known rendering of
the revision, and use it if it matches the render ID.
This patch moves the responsibility for looking up RevisionRecords and
PageRecords into ParsoidOutputAccess. This way, callers only need to
have a PageIdentity, and optionally a revision ID.
Bug: T318395
Change-Id: I1aa5b0fd9fb1acaa2544d5a58125fa3810a0eb39
Parsoid needs the original rendering in order to apply
selective serialization (selser). The page/{title}/html endpoint
can stash the rendering, and now the transform endpoint can make use
of the stashed rendering.
Bug: T310464
Change-Id: Ia58043ed3aa1eb12731d82aa87606c82ec63f663
We need the output content language when fetching HTML in VE
so this needs to match whatever parsoid gives us. In order for
this to happen, we need to loop in more data to the parser output
after parsing. This patch adds that more relevant data and
exposes it via a public method: `getHtmlOutputContentLanguage()`
In addition, this patch fixes a bug that was introduced in the
PageBundleParserOutputConverterTest when setting extension data
on parser output (L#64).
Follow-up: I33076c359ee45719c1c4ef63f77c1f1285951d0c (test fix)
Change-Id: I06bf9f575ed5a2521cf4b2c42fc6e0e7faab6bc0
The HtmlInputTransformHelper is intended to provide code sharing
between VisualEditor's DirectParsoidClient and the ParsoidHandler
base class used by TransformHandler.
Bug: T310376
Change-Id: I9c15f075cfc5f198e290758fc23d25990b47a185
In Ie87f823e721ed5ae9d49cf7ead8e77cbef254cd7, we changed the signature
of `parse()` to accept a PageIdentity instead of PageRecord and it broke
some tests in other places, specifically: HtmlOutputRendererHelperTest,
so this patch fixes the interfaces.
Change-Id: I35685412c52f7d4ae9e63960695e686fb2bb9b21
Move code to create ParserOutput from PageBundle and vice versa to a
separate final class. An final class was used instead of a trait
because traits do not support constants for PHP version < 8.2.
The plan is to use this final class in various interfaces in order
to avoid exposing them to Parsoid concepts.
Bug: T317019
Change-Id: I33076c359ee45719c1c4ef63f77c1f1285951d0c
Parsoid supports other source formats besides wikitext.
This patch improves support for non-wikitext content by removing
assumptions about the source type.
Change-Id: I5480ff200a93026cea7f1542e12834b06ac6f730
This also moves the creation of PageConfig from HTMLTransformFactory
into HTMLTransform, to ensure all relevant info, particularly the
page language, is known.
Change-Id: Id354862d6497816e0c007b9cb3b0d183c9d4b719
NOTE: stats key has been updated to reflect this change so we'll
no longer get data on the "parsoidhtmlhelper..." key after this
is deployed.
Change-Id: I599b1fd22c2d962b57e80beb84fe6f3a335f488c
When parsing content for page creation, we need to be able to pass the
page language explicitly. This patch allows the language to be looped
through from ParsoidHTMLHelper via ParsoidOutputAccess into Parsoid
itself.
Change-Id: I1bbf9c2180de2d91679edbc9d73adfe44075dde3
* Introduce a method in ParsoidOutputAccess that parses and returns
a parse output directly without caring about cache.
* Parse a non-existent page with the new method when the page object
is not a PageRecord, but a PageIdentity
Change-Id: Ie87f823e721ed5ae9d49cf7ead8e77cbef254cd7
This patch creates a patch to cover the aforementioned code path for
when it works correctly and when it throws.
Change-Id: I4b3d9f280a0977d3811afb768824da302673e659
By splitting the setOriginalData methods into several setters, we remove
any knowledge about the structure of the request body from HTMLTransform.
It also allows us to be specific about which data to operate on.
This also removes the concept of page bundles from the public interface
of HTMLTransform. PageBundle objects are used only internally.
Change-Id: If97a74ce251f281b7d980928a01b764d6ec0d0a4
* Without this fix, the test fails vendor patches whenever Parsoid's
default version number is bumped in the Parsoid repo.
Change-Id: Icce7b61dfbbbbd57b4f1ed76a32d160e92b48b15
This patch moves remaining transformation logic to a renamed (from
HTMLTransformInput -> HTMLTransform) class. Also, the HTMLTransform
class is moved to the correct directory, hence namespace (including
tests).
Some data files have been copied over to it's own sub-directory in
the correct place since HTMLTransformTest needs it. ParsoidHandler
class is fine where it is because its operation is what happens in
the REST land.
NOTE: The 2 remaining methods moved into HTMLTransform are the last
ones we intended to move into this class to make the refactoring of
html2wt() method complete in this context.
Change-Id: I8929931e1b0acf247abe9d826eef57f3e0d4e132
When refactoring, we were assuming that if the input is a page bundle,
it will always contain original HTML. That is not the case. We need to
make use of the supplied data-parsoid array even if there is no original
HTML given.
Change-Id: Ida8cbfcaac059af8902db9560a3ad6884e8b1790
This is a regression test for an issue that broke parsoid's roundtrip
tests. The issue itself was already fixed on the master branch.
Change-Id: Ia7edb26a149231fbfc43f4edb8185304341b19ca
In If09afc4b933 we made ParsoidHandler measure input size in bytes
consistently, rather than using sometimes bytes, and sometimes
characters.
However, that was going to cause input limits to trigger early for
languages that use a lot of multibyte characters. So now we are switching
everything to measuring in characters.
NOTE: this may cause the html2wt.timePerInputKB to report worse values.
It also makes the name slightly misleading, since it's no longer in KB,
it's in kilo-chars.
Change-Id: I41872db6d1f5d96776fef54624428cc3ee5f21b3
This moved the logic for applying page bundle data into the getters that
return DOM elements. It also makes the application of version
downgrades implicit.
NOTE: This patch changes the expected value of one of the phpunit tests
to a version that has no closing </div> tags. This appears to be the
original and expected behavior, per the corresponding test in the
parsoid extension's Parsoid.js test suite.
Change-Id: If2d7b06d8ba92fb63e6955ec7587ed4aea557251
* Several unit tests had lost information about the input format
being a pagebundle format.
* Fix the input format for a number of tests and a test expectation
in one case.
I removed a number of spurious comments which were misinterpreting
why the pagebundle format was being used. If the input payload is
in pagebundle format, the specified format should match that.
In integration tests, the API endpoint implicitly provides this
format information. But, in unit tests, this should be done explicitly
which would also be the case for internal service object callers.
Change-Id: If3d3c0ad2ee1fe19f4c1590caa43bea4340cc08c
Moving lazy initialization logic into HTMLTransformInput to clarify flow
of information.
DEPLOY: Mention that we've fixed $htmlSize and it will affect metrics
graphs when this roles out.
Change-Id: If09afc4b933ec99561ac9e4d53383bd42a856eaa
We were only testing the offset type mismatch case with a non-standard
offset type in the 'original' data structure. This cause us to miss a
bug. This adds a test that has a non-standard offset type coming from
the request attributes.
Change-Id: Icac28b3f8c85bc594e1cae42a4ee407f0d54a19b
This patch fixes an issue that happened when the assumption was
made that offsetType is in the opts sub-array but it's really not
part of it. It's directly in attributes, copied from envOptions,
which causes the offset type to always be byte.
Change-Id: I5d1a8a7cdfe13c22a2afa70128203050a4b5f98a
This renames TransformContext to HTMLTransformInput. It is becoming a
wrapper around the input HTML, with a bunch of optional context data
attached.
This introduces a factory method for HTMLTransformInput, so we can
extract knowledge about the structure of the $attribs array from
HTMLTransformInput.
This also allows us to inject Document objects and perhaps PageBundle
objects, instead of just arrays.
Change-Id: I66f9c5dbb50c6bf1f582adad7766422216482402
This continues the work in the child patch to replace callers
of setMwGlobals() with the appropriate method. Directory this
patch covers is `tests/phpunit/integration/`.
Change-Id: I0a9abf0d2a43587f2ffa029b68024a1ba5165fc7
The main object cache is disabled during testing. Some integration tests
need it though. This provides a clean way to enable it, to replace the hacks
that were used so far.
Note that we may want to enable the main cache during testing soon. When
that happens, this method is still useful to disable the cache in certain
tests, and to set a specific cache instance.
Change-Id: I04ae1bf1b6b2c8f6310acd2edf89459d01a9c870
Parsoid currently only supports wikitext (and JSON), so don't give it anything else.
NOTE: ParsoidOutputAccess will fail on content that is unsupported by parsoid.
This will however not affect the /transform and /page endpoints in the
parsoid extension, since they use the ParsoidHandler base class, which doesn't
rely on ParsoidOutputAccess.
Bug: T301371
Change-Id: I6bc9b978947b31455a4bce6385b7bdf64ed4043c
This removes a cyclic dependency:
ParsoidHTML helper in the REST component uses ParsoidOutputAccess in the
parser component. So ParsoidOutputAccess cannot use LocalizedHttpException
from the REST component.
This also improves separation of concerns: the parsing component should
not be concerned with HTTP status codes.
Bug: T301371
Change-Id: I2e661fe3ce0824dbfd7579650972f9019c92ed59
This isolates ParsoidHTMLHelper from the internal of
ParsoidOutputAccess. The corresponding test cases were changed to use a
mock ParsoidOutputAccess, and to not test the behavior of
ParsoidOutputAccess.
Bug: T301371
Change-Id: Id693fae2264f15e5d35f28acc5adc4239b2ae24f
This patch introduces a ParsoidOutputAccess service for
getting parsoid outputs and warms the cache with pregenerated
outputs.
It also introduces a config variable in ParsoidCacheConfig that
is turned off by default for controlling the cache warming.
Bug: T301371
Change-Id: I6152c42ea765d94093d8d62598b1b4278314adec
Cache the parsoid outputs only if a certain time is exceeded on
parse and consider the parse operation within this time limit as
not expensive per that wiki and not cache the parsoid output at all.
Bug: T308588
Change-Id: I7793b77feab13400ccd04343e7878ad701f5e6a7
This introduces the ParsoidOutputStash config setting, which defines the
storage backend and cache duration. The storage backend name refers to
an entry in the ObjectCache setting, and defaults to the main stash.
Bug: T267990
Bug: T309016
Change-Id: Ic67dc43ed9843810e4b180127f9a3bb7608f7608
When JSON support was introduced into ParserCache in 1.36, it was
controlled by a feature flag, $wgParserCacheUseJson. The feature flag
was "born deprecated" in 1.36. It can now be removed.
This means that ParserCache will always store entries as JSON.
Support for reading old non-JSON entries remains intact.
This is needed when updating wikis from a version older than 1.36
to the current version.
Change-Id: Id04e42bfb458d98414bac50e0d6c505e8878e5c0
As a means of understanding the usage of the stash FEAT for
/page/html & /revision/html endpoints used by VE extension,
this patch introduces the collection of stats using the
StatsDataFactory.
Bug: T309017
Change-Id: I4e17d50e79da263637bdd55ab62e993df441fe38
This patch enables the response from PageHTMLHandler and
RevisionHTMLHandler to have different eTags for different
output modes and varying flavors.
Before, the only difference we got was when the stashing
option is set or not, but we need more flavors.
Bug: T308744
Change-Id: I2e9679e46a31955a2106a52af4eb612b32799c8c
Add stash option to /page/html & /revision/html endpoints.
When this option is set, the PageBundle returned by Parsoid is
stashed and an etag is returned that can later be used to
make use of the stashed PageBundle.
The stash is for now backed by the BagOStuff returned by
ObjectCache::getLocalClusterInstance().
This patch adds additional data to the ParserOutput stored in ParserCache.
Old entries lacking that data will be ignored.
Bug: T267990
Co-Authored-by: Nikki <nnikkhoui@wikimedia.org>
Change-Id: Id35f1423a69e3ff63e4f9883b3f7e3f9521d81d5
NOTE: This changes the HTML returned by the endpoint!
It will now include the id="mwXYZ" attributes needed to
later map to data-parsoid entries.
Bug: T268205
Change-Id: I0a29434b996cc289eb67083e62bd6f1ad750cb4d
All revision related classes are namespaced MediaWiki\Revision
instead of MediaWiki\Storage since 1.32. The old namespaced
class names are deprecated and only kept for backwards-compatibility.
Bug: T305784
Change-Id: I34e492d84d9fc4bc78481667202716d93b3c43cb
These three endpoints have been experimental for many months:
revision/{id}
revision/{id}/html
revision/{id}/with_html
Promote them to officially released. This completes the
basic "revision" endpoint support, and helps clear out
the coreDevelopmentRoutes.json file for unrelated
experiments.
This also modifies the existing revision/{id}/bare endpoint
response, which previously pointed callers to the experimental
endpoint for html. It now points callers to the official one.
Bug: T305506
Change-Id: Iee8d1723e98dd3e3e389a0514dde28799914b2fd
All revision related classes are namespaced MediaWiki\Revision
instead of MediaWiki\Storage since 1.32. The old namespaced
class names are deprecated and only kept for backwards-compatibility.
Bug: T305784
Change-Id: Ia0030814ce2176d06e2898acffe533d31633fccb
Since MediaWiki 1.36, this method is provisioned to replace creating
new instances of the services object. If one is already created and
seen by the service locator, just use it.
Change-Id: I9509497a8380194aa93310343b1896521070fc31
Before MovePageTest was skipped if the move was valid,
claiming we can't test actual moves. Now we can.
Additionally, use MediaTestTrait for file and repo
mocking.
Change-Id: Ie8a1edbdb2f22432919f03a60c2dacc5d4528615
Move MockTitleTrait::makeMockTitleCodec to DummyServicesTrait, and
replace the two existing uses, which are in core. Add some new
uses instead of mocking each time.
Unfortunately, we cannot use an actual MediaWikiTitleCodec
for the tests in BadFileLookup, because those tests are unit tests
and a MalformedTitleException cannot be created in the context
of a unit test. BadFileLookupTest gets around this by using
a mock that throws a mock exception - add a comment inline
explaining why we cannot use a real MediaWikiTitleCodec.
Paired with adding of NamespaceInfo to make mocking the language
methods related to namespaces easier by matching the real
logic in the Language class to the extend possible. Update a few
tests to use the DummyServicesTrait for their NamespaceInfo services.
Change-Id: Ibd691ccf0e632e1bf0bc1f7e9ddc0c660d5cad32
ParserOptions not updated cause they depend on Title::getLanguage
implementation.
Tests converted to not require a DB anymore. Can't be proper unit
tests yet due to globals in ParserOptions and fake time hacks,
but exec time does go down from 70 seconds to 9 seconds.
Page content model is still emitted in the metrics since
it was considered useful. Should be removed when we get
something like a page type concept.
Change-Id: Ib16fd0b5b87ffc3cb4d21f4aa43d1203cb7206d2
The response from a null-edit should contain the current revision's
revision ID and timestamp, not the info from the edit's base revision.
Bug: T277601
Change-Id: I9d353cdc4cb9e3c1435c93ffe63ef4fef173ec4d
This is micro-optimization of closure code to avoid binding the closure
to $this where it is not needed.
Created by I25a17fb22b6b669e817317a0f45051ae9c608208
Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653
As we convert the RevisionRecord to using Authority,
we no longer need Title instances, so we can convert
that to PageIdentity.
Ideally, we'd part away from using Title at all, but:
1. For foreign wikis PageIdentity has stronger validation,
so calling PageIdentity getId() on Title will break things.
There's still a lot of code depending on lax Title guarantees,
so we keep it.
2. A lot of code still depends on Title, so we try to pass it
through even if we don't nesessarily need to, to save cost
on recreating it later on.
Bug: T271458
Depends-On: I287400b967b467ea18bebbb579e881a785a19158
Change-Id: I63d9807264d7e2295afef51fc9d982447f92fcbd
These are not only 100% identical to the actual code, but also:
* It's error-prone. Some are already wrong.
* These test…() functions are not meant to be called from
anywhere. What is the target audience for this documentation?
* There is a @dataProvider. What such @param tags actually do is
document the provider, but in an odd place. Just looking at
the provider should give the same information.
* The MediaWiki CodeSniffer allows to skip @param when there is
a @dataProvider, for the reasone listed.
Change-Id: I0f6f42f9a15776df944a0da48a50f9d5a2fb6349
The functionality of creating title mocks is generally useful
and this will also allow to make HandlerTestTrait more narrow.
Bug: T264058
Change-Id: I76eca48dfcff65a6203fccde5366912a2d66c495
Mutating the interwiki table invalidates the Title codec and in
general leads to a bunch of complications. Easier to just use the
`wgInterwikiCache` mechanism, as a lot of other phpunit tests do.
Bug: T271287
Change-Id: Id1899a89ae6b55e7032befe73990d215370828d8