Consolidate more logic into JobRunner::execute() and make
it public. Add "caught" field to the resulting map. The
intended use case for this method is JobExecutor. Calling
this method from there could cut down on code duplication.
Also:
* Use try/finally to restore state instead of ScopedCallback.
* Use more generic Throwable instead of Exception.
* Reorganize JobRunner::run() slightly for readability.
* Set class constant visibility and improve code comments.
Bug: T243492
Change-Id: I90566a49c603aa78f45b35c0d3fc1925d2cfe2f8
Repeating the variable name doesn't do anything. Documentation
generators don't need it. It's more stuff to read that doesn't add new
information. And it can become outdated.
Note there are two types of @var docs. When used inline (and not on a
class property) the variable name is needed.
Change-Id: If5a520405efacd8cefd90b878c999b842b91ac61
Use this in JobRunner to avoid overly sensitive lag timeouts and
log spam. The 3 second timeout is between the regular web default
and the CLI default.
Follow-up to e8df0fbab1.
Bug: T235244
Change-Id: I92f657a638031d913b0575d74bf48c3e3a63cd17
PHP doesn't care much but I think we humans do because we should
call methods by the name we give them. Method fixed are;
- isOk() -> isOK()
- setOk() -> setOK()
- teardown() -> tearDown()
Change-Id: I6b3f0cf3902887058efa426968da380803869e0b
If JobRunner is called when replica transactions exists, the first job
would previously use the stale REPEATABLE-READ snapshot data.
Also clear any master connection snapshots via commitMasterChanges().
This makes the code more similar to DeferredUpdates::attemptUpdate().
Change-Id: I2157a91fb01ea8c233f964b1f3164e8c3b1a07ca
JobQueueGroup is giving RunnableJob on pop(), so it should take the same
type for ack() and deduplicateRootJob()
JobQueue::ack alsready accept the interface
Also change to RunnableJob in JobRunner to work with the type from the
job queue
Change-Id: I7b09586cff8affabe807ee16e80d04f5137dce45
This is slightly more robust and makes the intent much clearer
than random calling code checking getServerCount() all over the
place. In addition, this yields better separation of concern.
Also, cleanup the LoadBalancer constructer a bit and make the
validation a bit stricter.
Make some server index comparisons strict while at it.
Change-Id: Icc1a35bd65c6862ff81faa3ab9b2aa7cafe29443
This assures that MergeableUpdate tasks that lazy push job will actually
have those jobs run instead of being added after the lone callback update
to call JobQueueGroup::pushLazyJobs() already ran.
This also makes it more obvious that push will happen, since a mergeable
update is added each time lazyPush() is called and a job is buffered,
rather than rely on some magic callback enqueued into DeferredUpdates at
just the right point in multiple entry points.
Bug: T207809
Change-Id: I13382ef4a17a9ba0fd3f9964b8c62f564e47e42d
Use the given fname for all places.
The __METHOD__ inside the unlock closure would be shown as {closure} in
logs
Change-Id: I87ef26e893af858f58d1a77dcb2d8ee192456f5c
For maintenance scripts it is usually harmful to throw an exception.
For jobs the exception was already caught and handled appropriately,
so this can continue as before. For DeferredUpdates it was extremely
harmful to throw an exception. So in the web case, reduce the timeout to
1s and continue as normal if the 1s timeout is reached. This allows the
DeferredUpdate to be throttled without being killed.
In the updater, increase the replication wait timeout to 5 minutes.
ALTER TABLE could indeed cause replication lag, but exiting the update
script with an exception will probably ruin your day. Update actions are
not necessarily efficiently restartable.
Do not call JobQueue::waitForBackups() when jobs are popped. Maybe it
makes sense to call a queue-specific replication wait function for
bulk inserts, like copyJobQueue.php, but doing it when jobs are popped
just makes no sense. Surely the worst that could happen is that the
queue would become locally empty? Removing this waitForBackups() call
avoids waiting for replication twice when JobQueueDB is used.
Bug: T201482
Change-Id: Ia820196caccf9c95007aea12175faf809800f084
Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/
Replace with: '\1 ?? '
(Everywhere except includes/PHPVersionCheck.php)
(Then, manually fix some line length and indentation issues)
Then manually reviewed the replacements for cases where confusing
operator precedence would result in incorrect results
(fixing those in I478db046a1cc162c6767003ce45c9b56270f3372).
Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
The replaces the hacky use of onTransactionIdle(), which no longer runs
immediately in explicit transaction rounds since d4c31cf841.
Also clarified TransactionRoundDefiningUpdate comment about rounds.
Change-Id: Ie17eacdcaea4e47019cc94e1c7beed9d7fec5cf2
These were meant as sanity checks, but would fail in those
unusual cases anyway with exceptions. Instead, have an
early check to make sure no explicit transaction rounds
are active when JobRunner:run is called.
Change-Id: I723c77c8d3ef7ec4dcf09ce6d549b4fd57bdf1c2
The mediawiki.runJobs errors are collected in Logstash but the whole job
description and errors are a single field message. That is challenging
to split logs per job type, get the longest running jobs ...
That can be worked around on the log receiving side by parsing
MediaWiki messages eg https://gerrit.wikimedia.org/r/#/c/312504/
Bryan Davis suggested a better long term solution is to use the PSR3
logger with structured log messages.
Culprit: 'type' is a reverved word. Hence prefix all context variables
with 'job_'.
Bug: T146469
Change-Id: Ib6a771c7d3f83bd75b2994bfab9bbebfd1f5aa6c
Previously, tryOpportunisticExecute() tried to nest transaction rounds,
which would fail. Added LBFactory::hasTransactionRound() as needed.
Also cleaned up some unqualified class names in callbacks and set the
PRESEND flag for the JobQueueDB AutoCommitUpdate callback. Use the
proper getMasterDB() method while at it. These follow up 24842cfac.
Bug: T154425
Change-Id: Ib1d38f68bd217903d1a7d46fb15b7d7d9620daa6
This is needed for deferred updates LinksDeletionUpdate and LinksUpdate, else
callbacks registered with onTransactionIdle prevent other transactions from
being executed, at least in this case.
Bug: T154425
Bug: T154438
Bug: T157679
Change-Id: Iecd396d584a62ac936cd963915339159467b44cd
In the postprocessing, some jobs can be executed but given the deferred
updates were already "closed", any new DeferredUpdate were directly called
(as explained by Krinkle on T165714), and the transactions opened by
classical jobs are badly mixed with transactions (directly) executed by
DeferredUpdates jobs, issuing a DBError, avoiding the job, which stays
in a 'claimed' status even if failed.
Quite similarly, some DeferredUpdates callables use JobQueueGroup::lazyPush
so it is needed to really push the generated jobs.
This change removes the run-immediately-deferred-updates behaviour even
in the post-connection shutdown, and given there is a call to
DeferredUpdates::doUpdates in JobRunner::execute it is not necessary to
add another one and hence execution of Web jobs is more similar to execution
of CLI jobs. In the same spirit to reconcile Web jobs and CLI jobs, the
call to JobQueueGroup::pushLazyJobs is done in JobRunner::execute.
Bug: T165714
Bug: T100085
Change-Id: I721e7167eca5b0b6227234fe516005243ab22388
This is similar to $wgMaxUserDBWriteDuration except for jobs.
Also use the Config class in JobRunner instead of globals.
Bug: T95501
Change-Id: I4949bb99c26451429c7acf82ecc4444bf9fb835f
Such error are likely to persist longer than other random
exceptions. In that case, it is better to avoid burning
through the job retry count.
Change-Id: I6785bd608856f98d21e0b0b05d3899a7081c38e2
Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link.
Also update some defect links.
Change-Id: Ic3a5eac910d098ed5c2a21e9f47c9b6ee06b2643
This lets the runJobs.php $wgCommandLineMode hack be removed.
Some fixes based on unit tests:
* Only call applyTransactionRoundFlags() for master connections
for transaction rounds from beginMasterChanges().
* Also cleaned up the commitAndWaitForReplication() reset logic.
* Removed deprecated DataUpdate::doUpdate() calls from jobs
since they cannot nest in a transaction round.
Change-Id: Ia9b91f539dc11a5c05bdac4bcd99d6615c4dc48d
This is better than having to use the less safe commitAll(),
which also checks and commits masters with writes.
Change-Id: I01c95f1ebae6927ed5acf0c23dd19b5c2413f661
* Use this to exclude some common cases of harmless queries that
happen to block on row-level locks for a long time. This does
not apply to UPDATE/DELETE however, due to the ambiguity of
time spent scanning vs locking.
* Update commitMasterChanges() and JobRunner to use the new
mode to avoid pointless rollback or lag checks.
Change-Id: Ifc2743f2d8cd109840c45cda5028fbb4df55d231
These are not safe for the common case where the local DB
handle is used for the queue (and other table writes).
Change-Id: Ic24a05c18bf31e49bf7e9a3c058deb5d35271511
* Refactor out some code duplication in query() into a
separate private method.
* Remove the total master/slave query profiling, which is not
necessary and redundant.
* Provide a default implementation for reconnect().
* Make reconnect() catch errors so it can match the docs that say
it returns true/false to indicate failure. Likewise for ping().
* Optimize ping() to no-op if there was obvious recent activity.
* Move the ping() round in JobRunner to approveMasterChanges.
This way, all commit rounds benefit from this logic.
* Add more doc comments for DatabaseBase fields.
Change-Id: Ic90ce2be4187244a0e8d44854c39d4b78be8e642
Using waitForOne() barely goes beyond semi-sync replication
already in place on serious DB clusters.
Change-Id: Idb719deaa5993bc2f818cd110d49d09567e0afb3
* Disallow $ignoreErrors in query() on deadlocks, since that would otherwise
silently rollback all changes from any other callers.
* Move recoverability checks for disconnects to canRecoverFromDisconnect().
* The first write of a DBO_TRX transaction is now considered recoverable.
* Run onTransactionResolution() callbacks on disconnect/deadlock rollback.
Some DeferrableUpdate need this to know to abort.
* Disallow $ignoreErrors on disconnects considered unrecoverable. This
makes it so that query() callers cannot cause writes from other callers
to be silently lost, which is hard to reason about.
* Moved ping() logic to simple reconnect() method and ping() simply do
a dummy SELECT, which triggeres reconnection if safe. Previously,
ping() might cause subtle partial transaction loss.
* Remove ping() from strencode(), which would cause partial transaction
loss where it actually reached.
* Remove mysqlPing() per https://bugs.php.net/bug.php?id=52561.
Bug: T142079
Change-Id: Ifb7f772ae849d67c0d92240a115c3f392e252937
teardown() callbacks are primarily used to reset session after
job is done. It seems important to do this, even if exception is
thrown by job.
Change-Id: I0bd449414527321b0ed9063cea268dea5b0766c4
We currently push a request id into structured logging (monolog/
logstash) to allow seeing all logs that were triggered by the same
request. This extends that to pass the id through jobs so jobs triggered
by a web request also share the same id and can be tracked together.
This web request id will follow jobs both directly created by a request,
and jobs created by those jobs.
This should give us some more visibility when debugging into what
started a particular job, and if a large number of jobs blowing up the
job queue are somehow related.
Change-Id: Iedbd031e6e9bb18fd6f7b923c8c305102255ab4b