Commit graph

173 commits

Author SHA1 Message Date
Cole White
2a21e8220d ParserCache: Inject StatsFactory for StatsLib metric capture
Enables ParserCache to accept either IBufferingStatsdDataFactory or
StatsFactory and emit based on instance type.

Bug: T356815
Change-Id: I40a8372a76f33c5f62ea73bb1180dd7c47412c89
2024-03-15 06:47:22 +00:00
Cole White
d951ff97f6 ParserCache: split metricSuffix into discrete components
Breaks up current usage of $metricSuffix into discrete components for
later conversion to StatsLib.

Bug: T356815
Change-Id: I09c8fe97da699af33fd2c53b4cd66f84a3f85775
2024-03-07 14:43:25 +00:00
C. Scott Ananian
5c0e9216e4 [ParserCache] Add logging for T350538
In I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78 we moved setting a bunch
of cache-related properties into the ContentRenderer so that they
would be set consistently and as early as possible.  Add some warnings
to ParserCache to catch any cases where we are putting content into
the ParserCache which has not gone through ContentRenderer or
otherwise had its cache-related properties set.

Bug: T350538
Change-Id: I9296f0be3866623be08a9e7942e80f3b7746024e
2024-02-07 21:22:06 -05:00
C. Scott Ananian
0de13d7662 Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse).  This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface.  Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.

In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value.  Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538).  The primary location for
setting cache properties is the ContentRenderer.

Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths.  An extra parameter was added to
ContentRenderer::getParserOutput() to support this.

Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.

In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache.  Eventually (T350538) it
should only be necessary in the ContentRenderer.

Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
2024-02-07 21:22:06 -05:00
C. Scott Ananian
dbc75831fe [JsonCodec] throw JsonException now that we require PHP >= 7.4
Also fixes JsonCodeTest::testInvalidJsonData() which was misusing the
data provided by ::provideSimpleTypes().

Change-Id: Ia654359e0fdec3ad546e8bea2e9133c142f0f144
2024-01-08 20:03:12 +00:00
C. Scott Ananian
9935a4f4f7 Be more aggressive in protecting against unserialization issues
Bug: T353835
Change-Id: Ife1c03c9fd24244ad7435b8917762ff8c5af2a17
2023-12-20 15:31:50 -05:00
James D. Forrester
9bfb75ff90 Namespace ParserOutput
Most used non-namespaced class!

Bug: T353458
Change-Id: I4c2cbb0a808b3881a4d6ca489eee5d8c8ebf26cf
2023-12-14 14:57:34 -05:00
daniel
e3fb964439 Only cache expensive renderings
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.

The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.

Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.

Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
2023-11-30 20:56:12 +00:00
Subramanya Sastry
7514bf3921 Revert "Hacks to avoid cold cache misses after ParsoidOutputAccess changes"
* This reverts commit c1b82097.
* This reverts commit 56025174.
* This updates a test change from commit c8d0470f.

* Now that ParsoidOutputAccess has become a thin wrapper over
  ParserOutputAccess and the code has landed in production without
  needing to be reverted, we can revert the above hacks as soon as the
  hits from the 'parsoid' instance start to go down to a small number.
  As of the time of creating of this patch, of the combined hits to the
  'parsoid' and 'parsoid_pcache' instance, over 90% are now from the
  'parsoid_pcache' instance. We can wait for a couple more days to
  watch how this number changes.

* Note that once we deploy this patch, the accesses which would have
  hit in the 'parsoid' instance (with this hack) will instead result
  in a cache miss thus adding the full parse latency to REST API
  requests (whether by VisualEditor or by other clients). So, we need
  to figure out what the cutoff point is. While 3 weeks is a guaranteed
  switchover timeframe (because all entries in 'parsoid' cache will
  expire at that time and we'll get no more hits from there after that),
  note that we are at < 10% hits in this cache just 4 days after the
  train rollout. So, there is a good chance we could get beyond 95%
  by the end of this week.

Bug: T347632
Change-Id: Ibd741b92b860b4d4b03ca220863debaf53fab44a
2023-10-24 20:08:23 +00:00
Subramanya Sastry
56025174a2 Hacks to avoid cold cache misses after ParsoidOutputAccess changes
* ParsoidOutputAccess used a 'parsoid' ParserCache instance and did not
  set the 'useParsoid' parser option for tier 2 ParserOutput cache key
  computations.

* ParserOutputAccess uses 'pcache' for legacy parser output and
  'parsoid-pcache' for Parsoid parser output objects based on whether
  'useParsoid' parser option is true or false.

* 'parsoid-pcache' is right now very sparsely populated since useParsoid
  is only used for testing.

* In Ic9b7cc0fcf36, where we make ParsoidOutputAccess a thin wrapper
  over ParserOutputAccess, all Parsoid parser output requests will go
  to ParserOutputAccess's 'parsoid-pcache' instance which is sparsely
  populated and hence will result in a lot of cold cache misses.

* To eliminate this scenario, this patch adds hardcoded hacks to both
  ParserOutputAccess and ParserCache to query the 'parsoid' PC instance
  on cache misses to the 'parsoid-pcache' instance. Over a 3-week
  period, as 'parsoid-pcache' fills up, there will be fewer and fewer
  access to the 'parsoid' PC instance which will also expire. At the
  end of that period, we can remove this hack.

  T347632 tracks removal of these hacks.

* Added new PHP unit test verifying that the hack work as intended.

Bug: T332931
Change-Id: I7f933fd61bf358c6ea0e0c1202231cac618f9e8d
2023-09-30 07:20:52 +00:00
Bartosz Dziewoński
6ba47296d9 Fix Phan suppressions related to Title::castFrom*() and friends
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.

Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.

Do the same with the corresponding TitleFactory methods.

Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).

Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
2023-04-22 16:45:09 +02:00
daniel
025cc3d2be Parsoid: cache warming job: add render reason
We want to be able to track what activity causes renders and cache
writes. To achieve this, we need to plumb causeAgent and causeAction
from DerivedPageDataUpdater through ParsoidCachePrewarmJob to
ParserOptions.

Change-Id: I0274ec3976a8ef48ccb99156fb4fbeec85048189
2023-01-24 14:08:01 +01:00
daniel
beace26339 ParserCache: fix metrics keys
When using the render reason in a metrics key, first sanitize the
string. This is particularly important when the render reason is a method
name, since "::" is turned into "." in the key. That breaks our
dashboards.

Change-Id: Ie5ebb75798d312626ac38a171da3fe2bbd1997b1
2022-12-13 16:18:05 +00:00
daniel
a43f67a4c6 ParserCache: fix metrics keys
Keep metrics keys indicating the reason for saving to the cache
separate from keys that indicate the outcome of trying to save.
This is needed to fix Grafana dashboards that evaluate the save_* key
prefix. Such dashboards have started to count most operations twice, once
for save_success and once for save_reason_*

NOTE: We were indiscriminately replacing "." with "_" in all keys.
Per this patch, all established keys will keep using their old format
with a "_", while the new keys that indicate the reaons will use a ".".
It would be nicer to use "." in the old keys as well, but that would
break existing dashboards.

See https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=1669815989493&to=1669988789493&viewPanel=14

Bug: T324216
Change-Id: I7aa7ba98bf26a17969939eb1366d7c474c469431
2022-12-02 13:46:49 +00:00
Amir Sarabadani
09b18a8f4c Reorg: Move Title-related classes to title/
These three classes:
 - TitleArray
 - TitleArrayFromResult
 - TitleFactory

We need to move these and the rest of files under title/ to Title/ (and
namespace them) but the patch will become way too big given that Title class is
also one of them.

Bug: T321882
Change-Id: Iac1688172ee457348a08a470c86e047571feb8e0
2022-11-26 09:30:32 +00:00
daniel
118d4980b2 Track the reason for rendering.
Allow the causeAction that triggers page rendering to be looped through
to ParserCache, so we can count what causes writes to the cache.

Change-Id: I6ad8e105a3ce457e3ab4f85cd154f47a32085e0d
2022-11-09 09:38:57 +00:00
Tim Starling
43a93d9782 Use the null coalescing assignment operator
Available since PHP 7.4.

Automated search, manual replacement.

Change-Id: Ibb163141526e799bff08cfeb4037b52144bb39fa
2022-10-21 13:26:49 +11:00
daniel
65dee01426 ParserCache: ensure we know a revision ID
ParserCache::checkOutdated relies on ParserOutput::getCacheRevisionId() to determine
whether a revision is still current after loading it from the cache. If
the revision ID is 0 or null, this will result in false negatives, and
the revision will always be considered outdated.

It is better to detect and report this before writing the ParserOutput to the cache.

This also adds an assertion in DerivedPageDataUpdater that will trigger
an exception if we try to write to the parser cache before the revision
has been saved and the ID is known.

Change-Id: I242b769afbc7e1ae1e3f218d451f04945dfa8be4
2022-06-27 13:29:25 +00:00
daniel
697f28df32 ParserCache: always use JSON
When JSON support was introduced into ParserCache in 1.36, it was
controlled by a feature flag, $wgParserCacheUseJson. The feature flag
was "born deprecated" in 1.36. It can now be removed.

This means that ParserCache will always store entries as JSON.
Support for reading old non-JSON entries remains intact.
This is needed when updating wikis from a version older than 1.36
to the current version.

Change-Id: Id04e42bfb458d98414bac50e0d6c505e8878e5c0
2022-06-07 15:19:45 +02:00
Umherirrender
1f71eccf63 phan: Disable null_casts_as_any_type setting
Make phan stricter about null types by setting null_casts_as_any_type to
false (the default in mediawiki-phan-config)
Remaining false positive issues are suppressed.
The suppression and the setting change can only be done together

Bug: T242536
Bug: T301991
Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01
2022-03-21 18:25:07 +00:00
Siddharth VP
38295f9226 Fix typos in comments (N-R)
Change-Id: I2d1bdb7531ff5126114a391550c2615ea6e244b3
2022-01-09 23:14:44 +05:30
Petr Pchelko
d334de960a Expand local URLs to absolute URLs in ParserOutput
New option 'absoluteURLs' was added to getText method
of the ParserOutput object that replaces all links
in the page HTML with absolute URLs.

Removing the action=render special case from Title
seems safe cause we will end up replacing the result
with absolute URL if we're in a render action no matter
where Title::getLocalUrl was called from.

This change is safely revertable from the perspective
of ParserCache.

Bug: T263581
Change-Id: Id660e1026192f40181587199d3418568f0fdb6d3
2021-09-23 11:48:51 -07:00
Timo Tijhof
e387cd9c35 Change trivial use of getVal('action') to getRawVal
Per docs added in I18767cd809f67b, these don't need normalization
as they are only compared against predefined strings, and besides
are generally entered manually in a form, and even then would not
require the kinds of Unicode chars that have multiple/non-normalized
forms.

In nearby areas to also fix some trivial cases:

* getVal('title') obviously needs normalization.
  Use getText() to make this more obvious.

* getVal() compared against simple string literals within the code
  obviously don't need normalization (e.g. printable === 'no').

* Change hot code in MediaWiki checking for whether 'diff' or 'oldid'
  are set to getCheck (which uses getRawVal) instead of getVal.
  As a bonus this means it now handles values like "0" correctly,
  which could theoretically have caused bad behaviour before.

Change-Id: Ied721cfdf59c7ba11d1afa6f4cc59ede1381238e
2021-08-26 22:11:58 +01:00
Petr Pchelko
1aa68d183d Remove depecated ParserCache::getKey and ::getEtag
Change-Id: Idea037eaab851110d0c58f537dafcb2153cd2613
2021-07-27 14:47:49 -07:00
Thiemo Kreuz
51777ee8c1 Add and fix various type hints in PHPDocs
Random fixes I collected the past weeks in my local dev
environment.

Change-Id: Ic8a6262fd28e05cb57335f2faf390a47ff97dbaa
2021-06-18 08:19:23 +00:00
daniel
489e2826e0 ParserCache: fix stats for metadata cache missed
Cache misses in metadata were miscounted as miss.unserialize.
Count them as miss.absent.metadata instead.

Change-Id: Idff062325a34445478a4543709a9f2b3cc365f60
2021-04-08 17:54:01 +02:00
Petr Pchelko
d1f481f242 ParserCache: only use in-process caching for metadata
CachedBagOStuff caches negatives, so it breaks PoolCounter.
We only need to cache metadata in-process, since it's commonly
used twice within the request.

Bug: T277829
Change-Id: I11a147c24b6cdb275b521b48802d6f3d0e1a4387
2021-04-06 17:53:38 -06:00
Petr Pchelko
f642215aed Convert ParserCache to PageRecord
ParserOptions not updated cause they depend on Title::getLanguage
implementation.

Tests converted to not require a DB anymore. Can't be proper unit
tests yet due to globals in ParserOptions and fake time hacks,
but exec time does go down from 70 seconds to 9 seconds.

Page content model is still emitted in the metrics since
it was considered useful. Should be removed when we get
something like a page type concept.

Change-Id: Ib16fd0b5b87ffc3cb4d21f4aa43d1203cb7206d2
2021-04-02 21:14:54 -06:00
Petr Pchelko
37030c04f0 RevisionRenderer should set revision ID/Timestamp in ParserOutput
ParserOutput object wraps revision ID and revision timestamp
of the parsed revision. Currently ParserCache sets these properties,
but it's not at all it's job - whatever generates the ParserOutput
knows much better what revision it parsed. This also allows us to
simplify ParserCache and easier switch it to PageRecord.

I've only removed setting the timestamp inside ParserCache
cause it's a blocker for page record, I will do followupus
to remove the $revId parameter from ParserCache as well.

cacheRevisionId should also be renamed, but later.

Bug: T278284
Change-Id: I9a82e9fd154b29a81d1f7a3c4abb073c9a27314e
2021-03-24 10:25:56 -06:00
Timo Tijhof
eb7b9c8e7d ParserCache: Instrument CachedBagOStuff to understand dupe fetches
Follows-up 66cc685b45.

Bug: T269593
Change-Id: Iff5267689a17281330307575d618cfd531051e57
2021-03-13 01:43:10 +00:00
jenkins-bot
d491f23b90 Merge "Respect used options for ParserOptions::isSafeToCache" 2021-01-25 19:13:53 +00:00
Petr Pchelko
7e8d1a11c8 Return back accidentally removed ParserCache 'hit' metric
Change-Id: Ibd69e532a2f373f9d0129ac2a2c6ac70039c9bec
2021-01-05 14:44:19 -06:00
Petr Pchelko
46b66f093a Respect used options for ParserOptions::isSafeToCache
Bug: T269293
Change-Id: Ic3cf908265ad470815f0ac81442d33bde04a5665
2021-01-04 10:32:34 -06:00
Petr Pchelko
71bb51ed55 ParserCache: general code cleanup, abstracted expiration checks.
Change-Id: I7374f30d582064236b8f782e6a2528eb692e3010
2020-12-16 12:09:55 +00:00
Petr Pchelko
66cc685b45 Make ParserCache use CachedBagOStuff
Bug: T269593
Change-Id: I21e6e39eccad22b781252b142c1e5b079c1ee0b4
2020-12-07 10:28:30 -06:00
Petr Pchelko
4417b13d58 Make ParserCache respect ParserOptions::isSafeToCache
Bug: T269154
Change-Id: I8e9ecd2787aa8d172e708ba64ea936e63fbc6b36
2020-12-02 14:02:36 -06:00
Petr Pchelko
b956c77d27 Merge CacheTime and ParserOutput accessedOptions properties
Change-Id: I5785596d68e8923f8bcbd182ace0b1991bd75c9a
2020-11-19 10:12:39 -07:00
Petr Pchelko
dbdc2a3cd3 Introduce JsonCodec to help with serialization/deserialization
Change-Id: I5433090ae8e2b3f2a4590cc404baf838025546ce
2020-11-19 08:32:21 -07:00
Petr Pchelko
7c68ae9296 Safe ParserOutput extension data and JsonUnserializable helper.
One major difference with what we've had before is that now we
actually write class names into the serialization - given that
this new mechanism is extencible, we can't establish any kind
of mapping of allowed classes. I do not think it's a problem
though.

Bug: T264394
Change-Id: Ia152f3b76b967aabde2d8a182e3aec7d3002e5ea
2020-11-10 11:21:09 -07:00
Petr Pchelko
8cc6b7f99a ParserCache JSON - do not \u encode unicode and special characters.
Without passing ALL_OK constant, json-encoding will \u-escape
all the unicode, which will blow the size of serialized data,
especially on Russian wiki out of proportion.

Bug: T263579
Change-Id: Ifaaf1cdfaeeb17c3a99ed742b64ae5cc3157500c
2020-10-22 18:26:59 -07:00
DannyS712
e2731a76ad Normalize error messages for non-serializable properties
Change-Id: If599082bd4acdc9df5b32aaabf2ba8d24e830914
2020-10-21 22:49:57 +00:00
Petr Pchelko
2bbf1dc97e ParserCache: add serialization format to HTML debug message.
Bug: T263579
Change-Id: I80f316ce78285cb245e05d01c7e1a8e314a2e732
2020-10-20 12:48:44 -07:00
Petr Pchelko
e269dd028b Hard-deprecate ParserCache::getETag.
This is not ParserCache business to build etags for output.

See https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/4862
for removal of the only use.
Change-Id: Iceb6bd761acc7511ea7d9d14b9df2e9e1fa51648
2020-10-16 20:17:26 +00:00
jenkins-bot
ed57d5295f Merge "Move serializability validation from ParserOutput to ParserCache" 2020-10-16 13:19:59 +00:00
Petr Pchelko
0f16608e6d Add basic docs for ParserCache
Change-Id: I6290c2f064d6ddc4693a27f1d8bf933bcdb4293f
2020-10-15 13:51:25 -07:00
Petr Pchelko
09c14b9dd0 Move serializability validation from ParserOutput to ParserCache
Bug: T263579
Change-Id: Iac2dbc817c2e7af4a6d112f01bd380a04354db22
2020-10-15 13:15:30 -07:00
daniel
0c059b7381 ParserCache: introduce feature flag for enabling JSON encoding.
This introduces $wgParserCacheUseJson for selectively enabling
JSON encoding in the parser cache. This is intended for testing only.

It should be removed before the release of 1.36.

Bug: T263579
Change-Id: I0d9cab3fafb984a3159e24f9e80f792429ff3c71
2020-10-13 23:46:57 +00:00
daniel
600f64029f Use JSON for parser cache
This adds JSON serialization and deserialization capabilities
to CacheTime and ParserOutput.

NOTE: JSON serialization is disabled for now. Merging this patch
should not change behavior in production.

Bug: T263579
Change-Id: I18187e8bce573d21f6f1bd29106e07c63a6d2f4d
2020-10-13 16:28:52 -07:00
Petr Pchelko
bb39896603 Hard-deprecate ParserCache::getKey.
Bug: T263689
Depends-On: I20b5a3eece79afaac6a4fef733d7a60ea23c6ffe
Depends-On: I3ed1188e267f4eaab0ae46f2bc6f9a379dea58ce
Change-Id: I30d05ee5b217fce0521d14867309979e76f34760
2020-10-13 08:31:23 -07:00
Petr Pchelko
13574e8404 Deprecate ParserCache::getKey and replace it with getMetadata
Bug: T263689
Change-Id: I4a71e5a7eb1c25cd53b857c115883cd00160736b
2020-10-13 08:31:22 -07:00