Enables ParserCache to accept either IBufferingStatsdDataFactory or
StatsFactory and emit based on instance type.
Bug: T356815
Change-Id: I40a8372a76f33c5f62ea73bb1180dd7c47412c89
Breaks up current usage of $metricSuffix into discrete components for
later conversion to StatsLib.
Bug: T356815
Change-Id: I09c8fe97da699af33fd2c53b4cd66f84a3f85775
In I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78 we moved setting a bunch
of cache-related properties into the ContentRenderer so that they
would be set consistently and as early as possible. Add some warnings
to ParserCache to catch any cases where we are putting content into
the ParserCache which has not gone through ContentRenderer or
otherwise had its cache-related properties set.
Bug: T350538
Change-Id: I9296f0be3866623be08a9e7942e80f3b7746024e
Set the render ID for each parse stored into cache so that we are able
to identify a specific parse when there are dependencies (for example
in an edit based on that parse). This is recorded as a property added
to the ParserOutput, not the parent CacheTime interface. Even though
the render ID is /related/ to the CacheTime interface, CacheTime is
also used directly as a parser cache key, and the UUID should not be
part of the lookup key.
In general we are trying to move the location where these cache
properties are set as early as possible, so we check at each location
to ensure we don't overwrite a previously-set value. Eventually we
can convert most of these checks into assertions that the cache
properties have already been set (T350538). The primary location for
setting cache properties is the ContentRenderer.
Moved setting the revision timestamp into ContentRenderer as well, as
it was set along the same code paths. An extra parameter was added to
ContentRenderer::getParserOutput() to support this.
Added merge code to ParserOutput::mergeInternalMetaDataFrom() which
should ensure that cache time, revision, timestamp, and render id are
all set properly when multiple slots are combined together in MCR.
In order to ensure the render ID is set on all codepaths we needed to
plumb the GlobalIdGenerator service into ContentRenderer, ParserCache,
ParserCacheFactory, and RevisionOutputCache. Eventually (T350538) it
should only be necessary in the ContentRenderer.
Bug: T350538
Bug: T349868
Followup-To: Ic9b7cc0fcf365e772b7d080d76a065e3fd585f80
Change-Id: I72c5e6f86b7f081ab5ce7a56f5365d2f75067a78
Also fixes JsonCodeTest::testInvalidJsonData() which was misusing the
data provided by ::provideSimpleTypes().
Change-Id: Ia654359e0fdec3ad546e8bea2e9133c142f0f144
Pages that are fast to render can be omitted from the parser cache
to preserve disk space and cache write operations.
The threshold is configurable per namespace, so the tradeoff can
be evaluated based on different access patterns. For example, pages
that are accessed rarely, like file description pages on commons,
may have a high threshold configured, while pages that are read
frequently, like wikipedia articles, may be configured to be always
cached, using a 0 threshold.
Filtering is based on a time profile recorded in the ParserOutput.
A generic mechanism for capturing the timing profile is implemented
in the ContentHandler base class. Subclasses may implement a more
rigorous capture mechanism.
Bug: T346765
Change-Id: I38a6f3ef064f98f3ad6a7c60856b0248a94fe9ac
* This reverts commit c1b82097.
* This reverts commit 56025174.
* This updates a test change from commit c8d0470f.
* Now that ParsoidOutputAccess has become a thin wrapper over
ParserOutputAccess and the code has landed in production without
needing to be reverted, we can revert the above hacks as soon as the
hits from the 'parsoid' instance start to go down to a small number.
As of the time of creating of this patch, of the combined hits to the
'parsoid' and 'parsoid_pcache' instance, over 90% are now from the
'parsoid_pcache' instance. We can wait for a couple more days to
watch how this number changes.
* Note that once we deploy this patch, the accesses which would have
hit in the 'parsoid' instance (with this hack) will instead result
in a cache miss thus adding the full parse latency to REST API
requests (whether by VisualEditor or by other clients). So, we need
to figure out what the cutoff point is. While 3 weeks is a guaranteed
switchover timeframe (because all entries in 'parsoid' cache will
expire at that time and we'll get no more hits from there after that),
note that we are at < 10% hits in this cache just 4 days after the
train rollout. So, there is a good chance we could get beyond 95%
by the end of this week.
Bug: T347632
Change-Id: Ibd741b92b860b4d4b03ca220863debaf53fab44a
* ParsoidOutputAccess used a 'parsoid' ParserCache instance and did not
set the 'useParsoid' parser option for tier 2 ParserOutput cache key
computations.
* ParserOutputAccess uses 'pcache' for legacy parser output and
'parsoid-pcache' for Parsoid parser output objects based on whether
'useParsoid' parser option is true or false.
* 'parsoid-pcache' is right now very sparsely populated since useParsoid
is only used for testing.
* In Ic9b7cc0fcf36, where we make ParsoidOutputAccess a thin wrapper
over ParserOutputAccess, all Parsoid parser output requests will go
to ParserOutputAccess's 'parsoid-pcache' instance which is sparsely
populated and hence will result in a lot of cold cache misses.
* To eliminate this scenario, this patch adds hardcoded hacks to both
ParserOutputAccess and ParserCache to query the 'parsoid' PC instance
on cache misses to the 'parsoid-pcache' instance. Over a 3-week
period, as 'parsoid-pcache' fills up, there will be fewer and fewer
access to the 'parsoid' PC instance which will also expire. At the
end of that period, we can remove this hack.
T347632 tracks removal of these hacks.
* Added new PHP unit test verifying that the hack work as intended.
Bug: T332931
Change-Id: I7f933fd61bf358c6ea0e0c1202231cac618f9e8d
There is no way to express that Title::castFromPageIdentity(),
Title::castFromPageReference() and Title::castFromLinkTarget()
can only return null when the parameter is null. We need to add
Phan suppressions or explicit types almost everywhere that these
methods are used with parameters that are known to not be null.
Instead, introduce new methods Title::newFromPageIdentity() and
Title::newFromPageReference() (Title::newFromLinkTarget() already
exists), without the null-coalescing behavior, and use them when
the parameter is not null. This lets static analysis tools, and
humans, easily understand where nulls can't appear.
Do the same with the corresponding TitleFactory methods.
Change the obvious uses of castFrom*() to newFrom*() (if there is
a Phan suppression, a type check, or a method call on the result).
Change-Id: Ida4da75953cf3bca372a40dc88022443109ca0cb
We want to be able to track what activity causes renders and cache
writes. To achieve this, we need to plumb causeAgent and causeAction
from DerivedPageDataUpdater through ParsoidCachePrewarmJob to
ParserOptions.
Change-Id: I0274ec3976a8ef48ccb99156fb4fbeec85048189
When using the render reason in a metrics key, first sanitize the
string. This is particularly important when the render reason is a method
name, since "::" is turned into "." in the key. That breaks our
dashboards.
Change-Id: Ie5ebb75798d312626ac38a171da3fe2bbd1997b1
Keep metrics keys indicating the reason for saving to the cache
separate from keys that indicate the outcome of trying to save.
This is needed to fix Grafana dashboards that evaluate the save_* key
prefix. Such dashboards have started to count most operations twice, once
for save_success and once for save_reason_*
NOTE: We were indiscriminately replacing "." with "_" in all keys.
Per this patch, all established keys will keep using their old format
with a "_", while the new keys that indicate the reaons will use a ".".
It would be nicer to use "." in the old keys as well, but that would
break existing dashboards.
See https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=1669815989493&to=1669988789493&viewPanel=14
Bug: T324216
Change-Id: I7aa7ba98bf26a17969939eb1366d7c474c469431
These three classes:
- TitleArray
- TitleArrayFromResult
- TitleFactory
We need to move these and the rest of files under title/ to Title/ (and
namespace them) but the patch will become way too big given that Title class is
also one of them.
Bug: T321882
Change-Id: Iac1688172ee457348a08a470c86e047571feb8e0
Allow the causeAction that triggers page rendering to be looped through
to ParserCache, so we can count what causes writes to the cache.
Change-Id: I6ad8e105a3ce457e3ab4f85cd154f47a32085e0d
ParserCache::checkOutdated relies on ParserOutput::getCacheRevisionId() to determine
whether a revision is still current after loading it from the cache. If
the revision ID is 0 or null, this will result in false negatives, and
the revision will always be considered outdated.
It is better to detect and report this before writing the ParserOutput to the cache.
This also adds an assertion in DerivedPageDataUpdater that will trigger
an exception if we try to write to the parser cache before the revision
has been saved and the ID is known.
Change-Id: I242b769afbc7e1ae1e3f218d451f04945dfa8be4
When JSON support was introduced into ParserCache in 1.36, it was
controlled by a feature flag, $wgParserCacheUseJson. The feature flag
was "born deprecated" in 1.36. It can now be removed.
This means that ParserCache will always store entries as JSON.
Support for reading old non-JSON entries remains intact.
This is needed when updating wikis from a version older than 1.36
to the current version.
Change-Id: Id04e42bfb458d98414bac50e0d6c505e8878e5c0
Make phan stricter about null types by setting null_casts_as_any_type to
false (the default in mediawiki-phan-config)
Remaining false positive issues are suppressed.
The suppression and the setting change can only be done together
Bug: T242536
Bug: T301991
Change-Id: I0f295382b96fb3be8037a01c10487d9d591e7e01
New option 'absoluteURLs' was added to getText method
of the ParserOutput object that replaces all links
in the page HTML with absolute URLs.
Removing the action=render special case from Title
seems safe cause we will end up replacing the result
with absolute URL if we're in a render action no matter
where Title::getLocalUrl was called from.
This change is safely revertable from the perspective
of ParserCache.
Bug: T263581
Change-Id: Id660e1026192f40181587199d3418568f0fdb6d3
Per docs added in I18767cd809f67b, these don't need normalization
as they are only compared against predefined strings, and besides
are generally entered manually in a form, and even then would not
require the kinds of Unicode chars that have multiple/non-normalized
forms.
In nearby areas to also fix some trivial cases:
* getVal('title') obviously needs normalization.
Use getText() to make this more obvious.
* getVal() compared against simple string literals within the code
obviously don't need normalization (e.g. printable === 'no').
* Change hot code in MediaWiki checking for whether 'diff' or 'oldid'
are set to getCheck (which uses getRawVal) instead of getVal.
As a bonus this means it now handles values like "0" correctly,
which could theoretically have caused bad behaviour before.
Change-Id: Ied721cfdf59c7ba11d1afa6f4cc59ede1381238e
Cache misses in metadata were miscounted as miss.unserialize.
Count them as miss.absent.metadata instead.
Change-Id: Idff062325a34445478a4543709a9f2b3cc365f60
CachedBagOStuff caches negatives, so it breaks PoolCounter.
We only need to cache metadata in-process, since it's commonly
used twice within the request.
Bug: T277829
Change-Id: I11a147c24b6cdb275b521b48802d6f3d0e1a4387
ParserOptions not updated cause they depend on Title::getLanguage
implementation.
Tests converted to not require a DB anymore. Can't be proper unit
tests yet due to globals in ParserOptions and fake time hacks,
but exec time does go down from 70 seconds to 9 seconds.
Page content model is still emitted in the metrics since
it was considered useful. Should be removed when we get
something like a page type concept.
Change-Id: Ib16fd0b5b87ffc3cb4d21f4aa43d1203cb7206d2
ParserOutput object wraps revision ID and revision timestamp
of the parsed revision. Currently ParserCache sets these properties,
but it's not at all it's job - whatever generates the ParserOutput
knows much better what revision it parsed. This also allows us to
simplify ParserCache and easier switch it to PageRecord.
I've only removed setting the timestamp inside ParserCache
cause it's a blocker for page record, I will do followupus
to remove the $revId parameter from ParserCache as well.
cacheRevisionId should also be renamed, but later.
Bug: T278284
Change-Id: I9a82e9fd154b29a81d1f7a3c4abb073c9a27314e
One major difference with what we've had before is that now we
actually write class names into the serialization - given that
this new mechanism is extencible, we can't establish any kind
of mapping of allowed classes. I do not think it's a problem
though.
Bug: T264394
Change-Id: Ia152f3b76b967aabde2d8a182e3aec7d3002e5ea
Without passing ALL_OK constant, json-encoding will \u-escape
all the unicode, which will blow the size of serialized data,
especially on Russian wiki out of proportion.
Bug: T263579
Change-Id: Ifaaf1cdfaeeb17c3a99ed742b64ae5cc3157500c
This introduces $wgParserCacheUseJson for selectively enabling
JSON encoding in the parser cache. This is intended for testing only.
It should be removed before the release of 1.36.
Bug: T263579
Change-Id: I0d9cab3fafb984a3159e24f9e80f792429ff3c71
This adds JSON serialization and deserialization capabilities
to CacheTime and ParserOutput.
NOTE: JSON serialization is disabled for now. Merging this patch
should not change behavior in production.
Bug: T263579
Change-Id: I18187e8bce573d21f6f1bd29106e07c63a6d2f4d