Commit graph

31 commits

Author SHA1 Message Date
daniel
f545d5efeb Rename HTMLTransform to HtmlToContentTransform
* We will have several kinds of HTML transformations.
Rename HTMLTransform to indicate that its for converting HTML to Content
objects.

* Using Naming Convention 'Html' instead of 'HTML'

Change-Id: I506f3303ae8f9e4db17299211366bef1558f142c
2022-11-03 16:47:36 +01:00
Abijeet
1b53f15e7f page/html endpoint: Support variant conversion
Variant conversion is based on the Accept-Language header. Updated
the HtmlOutputRendererHelper to set the HTTP headers related to
variant conversion.

Bug: T317019
Change-Id: I5e11452f1c531a757e8d860f9c727b5810406bce
2022-11-01 19:21:42 +05:30
Derick Alangi
ebc3f41399 Rest: Rename ParsoidHTMLHelper -> HtmlOutputRendererHelper
NOTE: stats key has been updated to reflect this change so we'll
  no longer get data on the "parsoidhtmlhelper..." key after this
  is deployed.
Change-Id: I599b1fd22c2d962b57e80beb84fe6f3a335f488c
2022-09-06 10:30:55 +01:00
Derick Alangi
1854fb02d9 Storage: Warm parsoid parser cache with parsoid outputs
This patch introduces a ParsoidOutputAccess service for
getting parsoid outputs and warms the cache with pregenerated
outputs.

It also introduces a config variable in ParsoidCacheConfig that
is turned off by default for controlling the cache warming.

Bug: T301371
Change-Id: I6152c42ea765d94093d8d62598b1b4278314adec
2022-06-28 09:05:41 +00:00
Derick Alangi
270699ec34 Configure caching parsoid output per wiki based on threshold
Cache the parsoid outputs only if a certain time is exceeded on
parse and consider the parse operation within this time limit as
not expensive per that wiki and not cache the parsoid output at all.

Bug: T308588
Change-Id: I7793b77feab13400ccd04343e7878ad701f5e6a7
2022-06-16 11:42:06 +01:00
daniel
6955380fbe Add rate limiting to ParsoidHTMLHelper
Bug: T267991
Change-Id: I52a83e7d3bdb0bcde59160e2d193f06908fda3d4
2022-06-15 13:40:56 +02:00
Derick Alangi
141b42c7ca Rest: Collect stats on Cache & Stash usage
As a means of understanding the usage of the stash FEAT for
/page/html & /revision/html endpoints used by VE extension,
this patch introduces the collection of stats using the
StatsDataFactory.

Bug: T309017
Change-Id: I4e17d50e79da263637bdd55ab62e993df441fe38
2022-05-30 09:51:55 +01:00
Derick Alangi
d62f97d5e0 Rest: Return different eTags for different output modes
This patch enables the response from PageHTMLHandler and
RevisionHTMLHandler to have different eTags for different
output modes and varying flavors.

Before, the only difference we got was when the stashing
option is set or not, but we need more flavors.

Bug: T308744
Change-Id: I2e9679e46a31955a2106a52af4eb612b32799c8c
2022-05-25 11:15:47 +00:00
Derick Alangi
13f6ec9e1b Rest: Migrate parsoid stashing logic from RESTbase
Add stash option to /page/html & /revision/html endpoints.
When this option is set, the PageBundle returned by Parsoid is
stashed and an etag is returned that can later be used to
make use of the stashed PageBundle.

The stash is for now backed by the BagOStuff returned by
ObjectCache::getLocalClusterInstance().

This patch adds additional data to the ParserOutput stored in ParserCache.
Old entries lacking that data will be ignored.

Bug: T267990
Co-Authored-by: Nikki <nnikkhoui@wikimedia.org>
Change-Id: Id35f1423a69e3ff63e4f9883b3f7e3f9521d81d5
2022-05-23 17:28:29 +01:00
Bartosz Dziewoński
1cdd6d6cbd PageHTMLHandler: Do not de-duplicate styles in Parsoid HTML
Parsoid already does it in a slightly different way. Doing it again
differently could break assumptions in consumers of Parsoid HTML.

Bug: T300325
Change-Id: I9570e0db7313d22f04e35ad0fdc903d871c89875
2022-01-28 23:38:34 +01:00
Petr Pchelko
4ca16e8d08 Eliminate use of Title object in REST infrastructure
Change-Id: I585f0f23cac5f6dc2a4879f69f7b83828fda3dd3
2021-05-05 18:54:58 -07:00
Petr Pchelko
3a2e8883b4 Rest: use Authority in all core handlers
Bug: T239753
Change-Id: Idf2229255f49514dd8b68bf63573c5b619b4f2f1
2021-01-21 18:22:33 -06:00
daniel
637f630fe9 Implement caching for old revision HTML endpoint
Bug: T269663
Change-Id: I2d17ec37d25f3a6e1c4836c05576bf0fabb7d429
2020-12-15 23:40:08 +01:00
daniel
a4b51d2774 Implement /revision/{id}/html endpoint
This doesn't have caching yet.

Bug: T267981
Change-Id: I32a35bb7bc6c6832ce7c79fb942922abc1ddb0e0
2020-12-14 16:54:35 +00:00
Petr Pchelko
1162411d7f Make /page/{title}/html emit etags in RESTBase format
RESTBase used to emit ETag in the `"<rev_id>/<render_id>" format.
For the benefit of the clients, preserve the formar.

Render ID is a UUIDv1 uniquely identifying the ParserOutput.
In future it would be used as a stashing key for stash deduplication.
At this time I decided to just attach the render ID as extension data
to our fake ParserOutput. Once we integrate Parsoid more into core,
we will likely move it into a ParserOutput property, or even
replace CacheTime::mCacheTime with a UUIDv1, but it's too early for that.

Bug: T268234
Change-Id: Ie604e9c98021d59eb1a17ca65f227e8f234a45be
2020-12-09 16:36:07 -06:00
Cindy Cicalese
808d841447 Moved page/{title}/bare to PageSourceHandler
Bug: T267981
Change-Id: Ie1a5ee9da5d8231bbf7ea2cbb419ab4bcec33c43
2020-12-09 22:02:11 +01:00
Daniel Kinzler
3bc61324b9 Re-Apply "Extract helper classes from PageHTMLHandler"
This reverts commit d51a697e13.

Reason for revert: Let's try this again...

Change-Id: Ie0218adff95576c972ff4c1d51cadd02f41eba3e
2020-12-07 16:59:29 +00:00
Petr Pchelko
2b45136ae8 PageHtmlHandler: use canonical options for ParserCache interaction.
Using newFromAnon was a mistake, it's documented to not be viable
for interactions with ParserCache.

Change-Id: Ifca149c78577cbf77420c81a0d240fe1d98db833
2020-12-03 09:27:39 -06:00
Subramanya Sastry
d51a697e13 Revert "Extract helper classes from PageHTMLHandler"
This reverts commit b98f7a6fc1.

Reason for revert: Breaks Parsoid CI but doesn't seem to run on core patches?

Change-Id: I1eaf1495dce6f6ba78093aacb9475a023a2aabfa
2020-12-02 23:32:27 +00:00
daniel
b98f7a6fc1 Extract helper classes from PageHTMLHandler
This extracts two helper classes from PageHTMLHandler:
* PageContentHelper for accessing page content. This replaces the
  LatestRevisionContentHandler mase class.
* ParsoidHtmlHelper for generating HTML from wikitext using parsoid.

The idea is to decouple the functionality from the REST handlers, so we
can easily mix and match functionality to create a handler for the
new per-revision HTML endpoint.

Bug: T267981
Bug: T267982
Change-Id: I3226833d12e51c959712d642b0195de1fe1ef979
2020-12-02 18:08:12 +00:00
Ppchelko
d2565533c4 Re-Re-apply "Use parsoid directly in /page/html handler
This reverts commit d4789dc29a.

Reason for revert: it's still good, resolving dependencies.

Change-Id: Ib5b75cf71b3d9ba2be21b1a369bf20db368c6968
2020-11-19 14:16:50 -07:00
Ppchelko
d4789dc29a Revert "Re-apply "Use parsoid directly in /page/html handler""
This reverts commit 38ca1b261e.

Reason for revert: Even though API appserver is ready, the REST API traffic is not routed to the correct MW cluster.

Change-Id: I00582e32c87e803c305930dd8de60c38b771b219
2020-11-17 17:05:19 +00:00
Ppchelko
38ca1b261e Re-apply "Use parsoid directly in /page/html handler"
This reverts commit 1157007658.

Reason for revert: can be reapplied after dependencies are resolved.

Change-Id: I1270853766fd5bf59ed191065b9e52b76e3d9fc9
2020-11-16 14:23:18 +00:00
Ppchelko
1157007658 Revert "Use parsoid directly in /page/html handler"
This reverts commit 4191c9fe31.

Reason for revert: This can not be released yet. It has slipped my mind that Parsoid extension is not enabled on the API MW cluster, thus releasing this will break the html endpoint. This code is good and can be re-reverted once https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/635096 is resolved.

Change-Id: I808be187ae582995e6c1899044b2a7019bf02d32
2020-10-19 22:39:01 +00:00
Petr Pchelko
4191c9fe31 Use parsoid directly in /page/html handler
Bug: T265295
Change-Id: I6d9999b315def616e973daca0b7d544e502c7212
2020-10-16 15:21:39 -07:00
daniel
9b0a4da72f REST API: inject TitleFactory
Bug: T265295
Change-Id: I7e9140200fe210b6142ddb0da88a055e2b803d24
2020-10-14 16:00:40 +02:00
Nikki Nikkhoui
1111c338f4 Hard code html type in REST /page/{html_type} route
When trying to move PageHTMLHandler route to v1, there was
ambiguity with the route formats between /page/history
and /page/html_types endpoints.
(https://phabricator.wikimedia.org/T255043#6212358).

Hard coding the html types in the route names and making
appropriate changes in the handler.

Bug: T255043
Change-Id: I156ded37033690abc413723ca9e30ec206d934c1
2020-06-11 07:44:35 -07:00
daniel
c4382301cc REST Handlers: use max-age not maxage for cache-control
Smally typo, big consequence.

Change-Id: I1d8f43dd3b11e4854b08d41fb5f0c7ede3dba90e
2020-03-25 22:38:03 +01:00
Thiemo Kreuz
7a4df9b019 Remove auto-generated and empty lines in comments
… and add the missing newline after the initial <?php.

Change-Id: I83bbbb1504e4b2bd97eec63c7626d34c655c3197
2020-03-17 09:55:24 +01:00
Umherirrender
e32739973e Remove multi-empty lines
Prepare updating mediawiki/mediawiki-codesniffer to 30.0.0
Autofix result

Change-Id: If796413edb3720dd6a9aae82e8e4ab53a5806ad3
2020-02-22 01:33:34 +01:00
Petr Pchelko
a136005a35 REST: /page/{title}/{bare,html,with_html} endpoints backed by RESTBase.
Bug: T234377
Bug: T234375
Change-Id: I77709c17e951e3efb542028e5c0d53eedda8c7bf
2020-01-23 11:55:20 -08:00