Why:
- When comparing the newly generated HTML to the cached HTML, there
might be cases when the new ParserOutput doesn't contain HTML.
What:
- If hasText() returns false, don't compare HTML and use the "unknown"
value for the html_changed stats label.
Bug: T388406
Change-Id: Ibc3e79e79a6421d4780739104a949bac50a5b01f
(cherry picked from commit a275e02771bc2ed4243804d5294188f54e47f9fc)
There's no point in retrying a job when you get "Revision x is not current".
It just causes log spam. Makes people think there's a problem when there isn't as it
logs in the error channel.
Bug: T379656
Change-Id: Iaa5bd006bf3f26277e81ad5bea1387ef4b925f68
Changes to the use statements done automatically via script
Addition of missing use statement done manually
Change-Id: I73fb416573f5af600e529d224b5beb5d2e3d27d3
Why:
- The JobQueue docs currently do not describe the
semantics of handling job execution failures.
- Some other features, such as job deduplication, enqueue semantics and
the overall execution flow, are also missing from the architecture
document.
What:
- Expand the JobQueue architecture document to describe how jobs that
return false from Job::run() or complete exceptionally are treated.
- Rework some preexisting sections of the same document for brevity.
- Describe JobSpecification and JobQueueGroup in the same document.
- Describe the different between push() and lazyPush() semantics.
- Describe the execution flow of a Job.
- Briefly summarize how job deduplication is expected to function.
Change-Id: Ib0c1f165feabefe862710b28977b598faf637ec5
Implicitly marking parameter $... as nullable is deprecated in php8.4,
the explicit nullable type must be used instead
Created with autofix from Ide15839e98a6229c22584d1c1c88c690982e1d7a
Break one long line in SpecialPage.php
Bug: T376276
Change-Id: I807257b2ba1ab2744ab74d9572c9c3d3ac2a968e
Automatically perform the empty-transaction commit logic in methods
like beginPrimaryChanges() and commitPrimaryChanges(). This avoids
the proliferation of callers having to call the same set of methods
one after another for non-obvious reasons. It also discourages code
from making brittle assumptions that might fail for setups where
there is only a primary or the primary has non-zero read load.
Deprecate flushReplicaSnapshots() and remove callers.
Clarify the related method documentation.
Bug: T315664
Change-Id: I255afd22ffcaeac0fad2d4e4a2a0c55c99be7905
While investigating T376433 I realized we were also generating metrics
for wikidatawiki for the various content models implemented by
Wikibase. Add the wiki ID and main content model to the metrics
labels so we can distinguish or filter these out.
While we're at it, fix the case of the `parsercache_` metrics prefix to match
the other `ParserCache_` metrics already being recorded.
Bug: T371713
Depends-On: I11386e307caaa9fce34870b08bd4dce4c5e6eb25
Change-Id: Iaf9d8cac1fe008f1441c46e5bc70e7d060358b27
Why:
- JobSpecification is the preferred way to enqueue jobs without
instantiating a full Job subclass instance, and the only way to
enqueue a job in the context of a foreign wiki.
- As such, it's used by several extensions. However, it's not explicitly
marked as newable.
What:
- Mark JobSpecification as newable.
Change-Id: I3dca96857c875da1ee6f0f6054a12aa6ec276697
Another instance of the null/false confusion in the return value to
ParserCache::getDirty() snuck in.
Follows-Up: I497f6956cd4f5b22f13a97c01029f3201e56e7c0
Follows-Up: I208aeac1b315a96bdb9669427cd03de461b914b4
Change-Id: I42bbd370c4eba46de40261511cf49d7c462f5bfe
Why:
- When temporary users are enabled, creating IP actors is disallowed
apart from specific cases, such as importing revisions authored by
anonymous users.
- If such a revision includes a category link and
wgRCWatchCategoryMembership is true, MediaWiki will fire a job to
create a corresponding RC entry, which will attempt to attribute the RC
to the anonymous IP that authored the imported revision and fail in
doing so.
What:
- Track whether a category membership change job was triggered by an
import, and allow RecentChange objects created by such jobs to create
anonymous actors.
Test Plan:
1. On a wiki with temporary accounts enabled and wgRCWatchCategoryMembership = true,
import a revision via Special:Import that was authored by an anonymous user
and contains a category link.
2. Verify that the import succeeds and that the corresponding RC entry
shows up.
Bug: T373318
Change-Id: I89abdca9c4ab8796a211df8b37c1bd7173a496e5
Collect parse time statistics as a counter in order to determine both
the number of opportunities for selective update as well as the
proportion of cpu time spent on parses where selective update is
feasible.
Bug: T371713
Change-Id: I5b8c7ab48d5a1d6c1e311149fcac6abdc523aa13
Callers should not catch an unchecked exception, so it doesn't belong
in a function signature. Unchecked exceptions indicate a coding error,
which by definition the code will not be able to handle correctly.
If any of these exceptions were supposed to be in response to an edge
case, user input, or initial conditions, then they should be changed
to a runtime error. If the exception class cannot be changed, then
the annotation should include a comment explaining its purpose and
prognosis.
Bug: T240672
Change-Id: I2e640b9737cb68090a8e1cb70067d1b74037d647
Controlled by $wgParsoidSelectiveUpdateSampleRate (which defaults to off)
randomly sample 1 in N parses to collect statistics to inform the design
of Parsoid selective update:
* For both legacy parses and Parsoid, count how many times a previous
parse is in the cache when a new parse is requested. This needs to
sample the legacy parser as well as Parsoid because Parsoid is not
yet invoked from the RefreshLinksJob. We also count the relative
number of parses from the different
RevisionRenderer::getRenderedRevision() call sites to determine
which pathways might account for the most opportunities for
optimized selective update.
* For sampled parses using the Parsoid parser where a previous parse
result is available, also fetch the previous wikitext source from the
database.
Bug: T371713
Change-Id: I208aeac1b315a96bdb9669427cd03de461b914b4
wfGetUrlUtils() is also deprecated, but less so, so we can do this first
and then properly replace the individual uses with dependency injection
in local pieces of work.
Also:
* Switching Parser::getExternalLinkRel to UrlUtils::matchesDomainList
exposed a type error in media.txt where $wgNoFollowDomainExceptions
was set to a string (which is invalid) instead of an array.
Bug: T319340
Change-Id: Icb512d7241954ee155b64c57f3782b86acfd9a4c
Follows-up I18e4236b673396e, to ensure that:
1. The metric labels remain together for easier understanding
and discovery, and reduces changes of calling it too early,
incompletely, too often, or not at all in some cases.
2. Avoids situations where the metric would become invalid if the
label only exists sometimes. Prometheus generally requires
consistent labelling over time. By setting it once we gain
the benefit of Phan static analysis noticing if the variable
is unset, instead of just not setting one of the labels in some
branches.
Change-Id: I9d27e5a7220e565fcc23f4f19a5824be57676552
This touches various production classes and maintenance scripts.
The code should do the exact same as before. The main benefit is that
the syntax avoids any repetition.
Change-Id: I5c552125469f4d7fb5b0fe494d198951b05eb35f
This patch lays the groundwork for incremental/selective parsing in
Parsoid by ensuring that we can pass previous cached parses through
the parse pipeline to Parsoid. We do this by adding a new render
hint type, `previous-output`, and ensuring it is passed along.
Because revisions can contain a ParserOutput which is the combination
of separate ParserOutput objects for each of their slots, RenderedRevision
also contains a method to unsplit the combined ParserOutput to reconstruct
an original ParserOutput for use in incremental parsing. Currently this
is mostly a stub, but illustrates how slot combination and splitting can
work, assuming those transformations are reversible.
Extra calls to ParserCache::getDirty() are added to some code paths
in order to ensure that any previously-cached ParserOutput is available
for selective update. In order to mitigate any performance concerns,
these are only done for the Parsoid parser at the moment. Future
patches will add additional metrics to quantify the cost/benefit ratio
of the additional cache lookups on these paths.
Bug: T363421
Bug: T371713
Change-Id: I440884f1d7e09c1ff9806f848b7b53a636367690
We already use CONN_TRX_AUTOCOMMIT for mysql/postgres and
a separate DB file (e.g. "server" config) for sqlite (via
the installer). For any legacy case where the main DB is
still used with sqlite, the removed method was unlikely to
be very effective.
Ignore DBO_TRX/DBO_DEFAULT in JobQueueDB "server" config
for good measure, similar to SqlBagOStuff.
This removes more direct uses of IDatabase::setFlags(),
a method which does not play well with DBConnRef.
Bug: T311090
Change-Id: Ia8457dea2ed30539e23345f89cb6b382be442975
Set the html_changed label on all code paths that emit the
refreshlinks_parsercache_operations_total metric.
Change-Id: I219547d0947a81f2a78c93a46c376eeaf82d6fa3
get_debug_type() does the same thing but better (spelling type names
in the same way as in type declarations, and including names of
object classes and resource types). It was added in PHP 8, but the
symfony/polyfill-php80 package provides it while we still support 7.4.
Also remove uses of get_class() and get_resource_type() where the new
method already provides the same information.
For reference:
https://www.php.net/manual/en/function.get-debug-type.phphttps://www.php.net/manual/en/function.gettype.php
In this commit I'm only changing code where it looks like the result
is used only for some king of debug, log, or test output. This
probably won't break anything important, but I'm not sure whether
anything might depend on the exact values.
Change-Id: I7c1f0a8f669228643e86f8e511c0e26a2edb2948
get_debug_type() does the same thing but better (spelling type names
in the same way as in type declarations, and including names of
object classes and resource types). It was added in PHP 8, but the
symfony/polyfill-php80 package provides it while we still support 7.4.
Also remove uses of get_class() and get_resource_type() where the new
method already provides the same information.
For reference:
https://www.php.net/manual/en/function.get-debug-type.phphttps://www.php.net/manual/en/function.gettype.php
To keep this safe and simple to review, I'm only changing cases where
the type is immediately used in an exception message.
Change-Id: I325efcddcb58be63b1592b9c20ac0845393c15e2
The two exceptions caught here should never be thrown in actual practice,
and indicate programming errors if they are thrown: the first would only
occur for an invalid language variant code or invalid DOM name characters,
and the second would only be thrown if the wikitext size limit is exceeded,
which should never happen with articles retrieved from the database.
(Parsoid's handling of overlarge articles also differs from legacy,
and it should probably match legacy and just truncate to the size
limit rather than throw an exception, but that's a separate issue.)
In any case, there's not much gained here by catching and rethrowing
rather than just letting the original Parsoid exception, were it to
ever occur, bubble up and be logged directly as a uncaught exception.
Followup-To: I96161a64952e1809c0aec773d5a3dd4c71105657
Change-Id: Ibac2d01306d0637097c0cc3363e63f238bb649fe
Collect statistics on how often we re-parse pages in refreshLinksJob but
end up generating the same HTML we already had. This typically happens
when pages get re-rendered because a template was changed, but that
change doesn't actually affect the page in question.
Bug: T369898
Change-Id: I18e4236b673396e28d638372ce0f1e24a07dca16
And deprecated aliases for the the no namespaced classes.
ReplicatedBagOStuff that already is deprecated isn't moved.
Bug: T353458
Change-Id: Ie01962517e5b53e59b9721e9996d4f1ea95abb51
BacklinkCache::partition() is persistently cached with an expiry time
of one hour, with no invalidation. If a page starts using a template
after the partition cache is saved, and its page_id happens to fall in
a gap between batch ranges, a subsequent update to the template would
miss the page. Its page_touched would be permanently incorrect.
So, when loading the page IDs in a batch, use the start of the next
batch as the endpoint, rather than the highest page ID at the time of
the cache update.
Change-Id: I4459fc1c4242cd59505f0e05bf5c20a0b96cab33
One more step in gradually replacing uses of ParsoidOutputAccess. This
one was pretty easy, as ParsoidOutputAccess was pretty much directly
calling ParserOutputAccess when provided with a ExistingPageRecord
and RevisionRecord.
Bug: T367074
Change-Id: I96161a64952e1809c0aec773d5a3dd4c71105657
Changes to the use statements done automatically via script
Addition of missing use statement done manually
Change-Id: Id9f3e775e143d1a17b6b96812a8230cfba14d9d3