Commit graph

22 commits

Author SHA1 Message Date
daniel
8ce08c0cbc Move knowledge about HTTP status out of ParsoidOutputAccess
This removes a cyclic dependency:
ParsoidHTML helper in the REST component uses ParsoidOutputAccess in the
parser component. So ParsoidOutputAccess cannot use LocalizedHttpException
from the REST component.

This also improves separation of concerns: the parsing component should
not be concerned with HTTP status codes.

Bug: T301371
Change-Id: I2e661fe3ce0824dbfd7579650972f9019c92ed59
2022-06-28 12:30:44 +02:00
daniel
1271faa381 Move access to the page bundle into ParsoidOutputAccess
This isolates ParsoidHTMLHelper from the internal of
ParsoidOutputAccess. The corresponding test cases were changed to use a
mock ParsoidOutputAccess, and to not test the behavior of
ParsoidOutputAccess.

Bug: T301371
Change-Id: Id693fae2264f15e5d35f28acc5adc4239b2ae24f
2022-06-28 11:49:36 +02:00
Derick Alangi
1854fb02d9 Storage: Warm parsoid parser cache with parsoid outputs
This patch introduces a ParsoidOutputAccess service for
getting parsoid outputs and warms the cache with pregenerated
outputs.

It also introduces a config variable in ParsoidCacheConfig that
is turned off by default for controlling the cache warming.

Bug: T301371
Change-Id: I6152c42ea765d94093d8d62598b1b4278314adec
2022-06-28 09:05:41 +00:00
Derick Alangi
1e42da2762 Rest: Fix stats logging for parsoid stash & caching
Change-Id: I57fea2b27222ee4d0744f18db575cf8c487c016c
2022-06-21 10:47:57 +01:00
jenkins-bot
8aea793db3 Merge "Have Parsoid\Config\PageConfigFactory take a rev instead of wikitext" 2022-06-17 16:43:31 +00:00
jenkins-bot
439f89721d Merge "Configure caching parsoid output per wiki based on threshold" 2022-06-16 19:16:31 +00:00
Derick Alangi
d01e3ed739 Replace deprecated calls ParserOptions::newCanonical( 'canonical' )
This is a quick find & replace of calls to the deprecated method
ParserOptions::newCanonical() when the context is the string literal
'canonical'. This can be safely replaced by called newFromAnon().

Change-Id: If7bb68459b11e0c5f5de188f10fdae85ad1a78bf
2022-06-16 14:22:24 +01:00
Derick Alangi
270699ec34 Configure caching parsoid output per wiki based on threshold
Cache the parsoid outputs only if a certain time is exceeded on
parse and consider the parse operation within this time limit as
not expensive per that wiki and not cache the parsoid output at all.

Bug: T308588
Change-Id: I7793b77feab13400ccd04343e7878ad701f5e6a7
2022-06-16 11:42:06 +01:00
Subramanya Sastry
5f5b4cbbb4 Have Parsoid\Config\PageConfigFactory take a rev instead of wikitext
* This let us pass mocked revisions in the parser test runner while
  running in Parsoid mode.

* This leads to improvement in wt2html tests results where a revision
  id is queried. I've verified this in the Cite extension repo as
  also the main parserTests.text file but I cannot enable Parsoid
  integrated testing on the main parser tests file without doing a
  sweep over all parser tests and adding appropriate test sections

* Currently, PageConfigFactory doesn't have unit tests. Will look
  into adding them separately in a followup.

* Moved the setupParsoidTransform function to a more suitable place
  in the ParserTestRunner.php file.

Bug: T270310
Change-Id: I94d68c8528bb2f7b367c68d80d14ebc1ab904a7f
2022-06-15 22:55:28 -05:00
daniel
6955380fbe Add rate limiting to ParsoidHTMLHelper
Bug: T267991
Change-Id: I52a83e7d3bdb0bcde59160e2d193f06908fda3d4
2022-06-15 13:40:56 +02:00
Derick Alangi
141b42c7ca Rest: Collect stats on Cache & Stash usage
As a means of understanding the usage of the stash FEAT for
/page/html & /revision/html endpoints used by VE extension,
this patch introduces the collection of stats using the
StatsDataFactory.

Bug: T309017
Change-Id: I4e17d50e79da263637bdd55ab62e993df441fe38
2022-05-30 09:51:55 +01:00
Derick Alangi
d62f97d5e0 Rest: Return different eTags for different output modes
This patch enables the response from PageHTMLHandler and
RevisionHTMLHandler to have different eTags for different
output modes and varying flavors.

Before, the only difference we got was when the stashing
option is set or not, but we need more flavors.

Bug: T308744
Change-Id: I2e9679e46a31955a2106a52af4eb612b32799c8c
2022-05-25 11:15:47 +00:00
Derick Alangi
13f6ec9e1b Rest: Migrate parsoid stashing logic from RESTbase
Add stash option to /page/html & /revision/html endpoints.
When this option is set, the PageBundle returned by Parsoid is
stashed and an etag is returned that can later be used to
make use of the stashed PageBundle.

The stash is for now backed by the BagOStuff returned by
ObjectCache::getLocalClusterInstance().

This patch adds additional data to the ParserOutput stored in ParserCache.
Old entries lacking that data will be ignored.

Bug: T267990
Co-Authored-by: Nikki <nnikkhoui@wikimedia.org>
Change-Id: Id35f1423a69e3ff63e4f9883b3f7e3f9521d81d5
2022-05-23 17:28:29 +01:00
Derick Alangi
1618bbd671 Add data-parsoid data to ParserOutput for caching
NOTE: This changes the HTML returned by the endpoint!
It will now include the id="mwXYZ" attributes needed to
later map to data-parsoid entries.

Bug: T268205
Change-Id: I0a29434b996cc289eb67083e62bd6f1ad750cb4d
2022-05-16 15:06:15 +00:00
Tim Starling
4f41e2addd Add slow-parsoid log channel
By analogy with slow-parse.log. Also, I fixed the log message so that it
has the full title in it.

Change-Id: Icaeb6f002c5c2a676467d4c760f99cb2676ad73b
2021-09-15 15:48:11 +10:00
Petr Pchelko
4ca16e8d08 Eliminate use of Title object in REST infrastructure
Change-Id: I585f0f23cac5f6dc2a4879f69f7b83828fda3dd3
2021-05-05 18:54:58 -07:00
daniel
637f630fe9 Implement caching for old revision HTML endpoint
Bug: T269663
Change-Id: I2d17ec37d25f3a6e1c4836c05576bf0fabb7d429
2020-12-15 23:40:08 +01:00
daniel
a4b51d2774 Implement /revision/{id}/html endpoint
This doesn't have caching yet.

Bug: T267981
Change-Id: I32a35bb7bc6c6832ce7c79fb942922abc1ddb0e0
2020-12-14 16:54:35 +00:00
Petr Pchelko
1162411d7f Make /page/{title}/html emit etags in RESTBase format
RESTBase used to emit ETag in the `"<rev_id>/<render_id>" format.
For the benefit of the clients, preserve the formar.

Render ID is a UUIDv1 uniquely identifying the ParserOutput.
In future it would be used as a stashing key for stash deduplication.
At this time I decided to just attach the render ID as extension data
to our fake ParserOutput. Once we integrate Parsoid more into core,
we will likely move it into a ParserOutput property, or even
replace CacheTime::mCacheTime with a UUIDv1, but it's too early for that.

Bug: T268234
Change-Id: Ie604e9c98021d59eb1a17ca65f227e8f234a45be
2020-12-09 16:36:07 -06:00
Daniel Kinzler
3bc61324b9 Re-Apply "Extract helper classes from PageHTMLHandler"
This reverts commit d51a697e13.

Reason for revert: Let's try this again...

Change-Id: Ie0218adff95576c972ff4c1d51cadd02f41eba3e
2020-12-07 16:59:29 +00:00
Subramanya Sastry
d51a697e13 Revert "Extract helper classes from PageHTMLHandler"
This reverts commit b98f7a6fc1.

Reason for revert: Breaks Parsoid CI but doesn't seem to run on core patches?

Change-Id: I1eaf1495dce6f6ba78093aacb9475a023a2aabfa
2020-12-02 23:32:27 +00:00
daniel
b98f7a6fc1 Extract helper classes from PageHTMLHandler
This extracts two helper classes from PageHTMLHandler:
* PageContentHelper for accessing page content. This replaces the
  LatestRevisionContentHandler mase class.
* ParsoidHtmlHelper for generating HTML from wikitext using parsoid.

The idea is to decouple the functionality from the REST handlers, so we
can easily mix and match functionality to create a handler for the
new per-revision HTML endpoint.

Bug: T267981
Bug: T267982
Change-Id: I3226833d12e51c959712d642b0195de1fe1ef979
2020-12-02 18:08:12 +00:00