Commit graph

108 commits

Author SHA1 Message Date
Petr Pchelko
0f87f5885c Convert JobRunner into a service and use DI
Bug: T246156
Change-Id: If4f67a6fa0e26ade3fc0420e62fa836c2a3e4b2e
2020-02-27 08:04:48 -08:00
Petr Pchelko
c8136454cd Add test for JobRunner
Bug: T220127
Change-Id: I35ff5f97d3e8e677c5c236723df6f74b5e21214d
2020-02-07 11:32:17 -05:00
Aaron Schulz
6aa41cbd37 jobqueue: cleanup JobRunner for reability and code reuse
Consolidate more logic into JobRunner::execute() and make
it public. Add "caught" field to the resulting map. The
intended use case for this method is JobExecutor. Calling
this method from there could cut down on code duplication.

Also:
* Use try/finally to restore state instead of ScopedCallback.
* Use more generic Throwable instead of Exception.
* Reorganize JobRunner::run() slightly for readability.
* Set class constant visibility and improve code comments.

Bug: T243492
Change-Id: I90566a49c603aa78f45b35c0d3fc1925d2cfe2f8
2020-01-29 21:00:39 +00:00
James D. Forrester
4f2d1efdda Coding style: Auto-fix MediaWiki.Classes.UnsortedUseStatements.UnsortedUse
Change-Id: I94a0ae83c65e8ee419bbd1ae1e86ab21ed4d8210
2020-01-10 09:32:25 -08:00
Thiemo Kreuz
78ca9eff4a Remove duplicate variable name from class property PHPDocs
Repeating the variable name doesn't do anything. Documentation
generators don't need it. It's more stuff to read that doesn't add new
information. And it can become outdated.

Note there are two types of @var docs. When used inline (and not on a
class property) the variable name is needed.

Change-Id: If5a520405efacd8cefd90b878c999b842b91ac61
2019-12-02 12:58:29 +00:00
Umherirrender
c7ad21c25f Improve param docs
Change-Id: I746a69f6ed01c3ff000da125457df62b02d13b34
2019-11-28 19:08:59 +01:00
jenkins-bot
7badf41667 Merge "rdbms: add ILBFactory::setDefaultReplicationWaitTimeout() method" 2019-10-27 00:44:31 +00:00
James D. Forrester
2bc660c95a Collapse uses of now-deprecated wfGetRusage()
Change-Id: I9a2b5d1234ebb458b6cd29797de3f387d1399e6f
2019-10-22 11:32:06 +01:00
Aaron Schulz
d5f5dd2a52 rdbms: add ILBFactory::setDefaultReplicationWaitTimeout() method
Use this in JobRunner to avoid overly sensitive lag timeouts and
log spam. The 3 second timeout is between the regular web default
and the CLI default.

Follow-up to e8df0fbab1.

Bug: T235244
Change-Id: I92f657a638031d913b0575d74bf48c3e3a63cd17
2019-10-21 20:05:30 -07:00
Derick Alangi
52a21ace03 Fix method/function names case mismatch in core files
PHP doesn't care much but I think we humans do because we should
call methods by the name we give them. Method fixed are;

- isOk() -> isOK()
- setOk() -> setOK()
- teardown() -> tearDown()

Change-Id: I6b3f0cf3902887058efa426968da380803869e0b
2019-08-31 23:17:51 +00:00
Aaron Schulz
0844db0b6d Make the JobRunner flushReplicaSnapshots() call cover the first job
If JobRunner is called when replica transactions exists, the first job
would previously use the stale REPEATABLE-READ snapshot data.

Also clear any master connection snapshots via commitMasterChanges().
This makes the code more similar to DeferredUpdates::attemptUpdate().

Change-Id: I2157a91fb01ea8c233f964b1f3164e8c3b1a07ca
2019-08-25 14:07:24 +00:00
Umherirrender
cb82a52adf Fix type hints in jobqueue related classes
JobQueueGroup is giving RunnableJob on pop(), so it should take the same
type for ack() and deduplicateRootJob()
JobQueue::ack alsready accept the interface

Also change to RunnableJob in JobRunner to work with the type from the
job queue

Change-Id: I7b09586cff8affabe807ee16e80d04f5137dce45
2019-07-05 22:20:56 +02:00
Aaron Schulz
69c503148f rdbms: add replica server counting methods to ILoadBalancer
This is slightly more robust and makes the intent much clearer
than random calling code checking getServerCount() all over the
place. In addition, this yields better separation of concern.

Also, cleanup the LoadBalancer constructer a bit and make the
validation a bit stricter.

Make some server index comparisons strict while at it.

Change-Id: Icc1a35bd65c6862ff81faa3ab9b2aa7cafe29443
2019-06-20 12:47:23 +01:00
Aaron Schulz
6030e9cf2c Create JobQueueEnqueueUpdate class to call JobQueueGroup::pushLazyJobs()
This assures that MergeableUpdate tasks that lazy push job will actually
have those jobs run instead of being added after the lone callback update
to call JobQueueGroup::pushLazyJobs() already ran.

This also makes it more obvious that push will happen, since a mergeable
update is added each time lazyPush() is called and a job is buffered,
rather than rely on some magic callback enqueued into DeferredUpdates at
just the right point in multiple entry points.

Bug: T207809
Change-Id: I13382ef4a17a9ba0fd3f9964b8c62f564e47e42d
2018-10-28 22:19:06 +00:00
Aaron Schulz
ebbccf1845 Migrate some wfWikiId() callers to getLocalDomainID()
Change-Id: I33fe222b7ca66babd61610febaebcf52d3806a7d
2018-10-15 23:58:49 -07:00
Umherirrender
ff95c7a4ba Fix caller name in JobRunner::commitMasterChanges
Use the given fname for all places.
The __METHOD__ inside the unlock closure would be shown as {closure} in
logs

Change-Id: I87ef26e893af858f58d1a77dcb2d8ee192456f5c
2018-10-01 18:48:36 +00:00
Tim Starling
e8df0fbab1 Don't throw an exception when waiting for replication times out
For maintenance scripts it is usually harmful to throw an exception.
For jobs the exception was already caught and handled appropriately,
so this can continue as before. For DeferredUpdates it was extremely
harmful to throw an exception. So in the web case, reduce the timeout to
1s and continue as normal if the 1s timeout is reached. This allows the
DeferredUpdate to be throttled without being killed.

In the updater, increase the replication wait timeout to 5 minutes.
ALTER TABLE could indeed cause replication lag, but exiting the update
script with an exception will probably ruin your day. Update actions are
not necessarily efficiently restartable.

Do not call JobQueue::waitForBackups() when jobs are popped. Maybe it
makes sense to call a queue-specific replication wait function for
bulk inserts, like copyJobQueue.php, but doing it when jobs are popped
just makes no sense. Surely the worst that could happen is that the
queue would become locally empty? Removing this waitForBackups() call
avoids waiting for replication twice when JobQueueDB is used.

Bug: T201482
Change-Id: Ia820196caccf9c95007aea12175faf809800f084
2018-09-03 12:29:35 +10:00
Umherirrender
130ec2523d Fix PhanTypeMismatchDeclaredParam
Auto fix MediaWiki.Commenting.FunctionComment.DefaultNullTypeParam sniff

Change-Id: I865323fd0295aabd06f3e3c75e0e5043fb31069e
2018-07-07 00:34:30 +00:00
Bartosz Dziewoński
485f66f174 Use PHP 7 '??' operator instead of '?:' with 'isset()' where convenient
Find: /isset\(\s*([^()]+?)\s*\)\s*\?\s*\1\s*:\s*/
Replace with: '\1 ?? '

(Everywhere except includes/PHPVersionCheck.php)
(Then, manually fix some line length and indentation issues)

Then manually reviewed the replacements for cases where confusing
operator precedence would result in incorrect results
(fixing those in I478db046a1cc162c6767003ce45c9b56270f3372).

Change-Id: I33b421c8cb11cdd4ce896488c9ff5313f03a38cf
2018-05-30 18:06:13 -07:00
Aaron Schulz
c6b668c2ec Do not start explicit transaction rounds for RecentChangesUpdateJob
The replaces the hacky use of onTransactionIdle(), which no longer runs
immediately in explicit transaction rounds since d4c31cf841.

Also clarified TransactionRoundDefiningUpdate comment about rounds.

Change-Id: Ie17eacdcaea4e47019cc94e1c7beed9d7fec5cf2
2018-04-17 12:39:05 +00:00
Aaron Schulz
7f571f9bca Remove useless commit calls in JobRunner
These were meant as sanity checks, but would fail in those
unusual cases anyway with exceptions. Instead, have an
early check to make sure no explicit transaction rounds
are active when JobRunner:run is called.

Change-Id: I723c77c8d3ef7ec4dcf09ce6d549b4fd57bdf1c2
2017-10-12 12:00:07 -07:00
Antoine Musso
40a9ad6ea1 jobqueue: Add job_type to PSR logging context
The mediawiki.runJobs errors are collected in Logstash but the whole job
description and errors are a single field message. That is challenging
to split logs per job type, get the longest running jobs ...

That can be worked around on the log receiving side by parsing
MediaWiki messages eg https://gerrit.wikimedia.org/r/#/c/312504/

Bryan Davis suggested a better long term solution is to use the PSR3
logger with structured log messages.

Culprit: 'type' is a reverved word. Hence prefix all context variables
with 'job_'.

Bug: T146469
Change-Id: Ib6a771c7d3f83bd75b2994bfab9bbebfd1f5aa6c
2017-08-08 05:00:52 +00:00
jenkins-bot
38a2a5661e Merge "Add $wgMaxJobDBWriteDuration setting for avoiding replication lag" 2017-06-12 18:15:57 +00:00
Aaron Schulz
95fdff36c2 Make DeferredUpdates detect LBFactory transaction rounds
Previously, tryOpportunisticExecute() tried to nest transaction rounds,
which would fail. Added LBFactory::hasTransactionRound() as needed.

Also cleaned up some unqualified class names in callbacks and set the
PRESEND flag for the JobQueueDB AutoCommitUpdate callback. Use the
proper getMasterDB() method while at it. These follow up 24842cfac.

Bug: T154425
Change-Id: Ib1d38f68bd217903d1a7d46fb15b7d7d9620daa6
2017-06-10 15:22:32 +00:00
Seb35
24842cfac0 Use AutoCommitUpdate instead of Database->onTransactionIdle
This is needed for deferred updates LinksDeletionUpdate and LinksUpdate, else
callbacks registered with onTransactionIdle prevent other transactions from
being executed, at least in this case.

Bug: T154425
Bug: T154438
Bug: T157679
Change-Id: Iecd396d584a62ac936cd963915339159467b44cd
2017-06-06 14:23:37 +02:00
Seb35
d80fca05e1 Better handling of jobs execution in post-connection shutdown
In the postprocessing, some jobs can be executed but given the deferred
updates were already "closed", any new DeferredUpdate were directly called
(as explained by Krinkle on T165714), and the transactions opened by
classical jobs are badly mixed with transactions (directly) executed by
DeferredUpdates jobs, issuing a DBError, avoiding the job, which stays
in a 'claimed' status even if failed.

Quite similarly, some DeferredUpdates callables use JobQueueGroup::lazyPush
so it is needed to really push the generated jobs.

This change removes the run-immediately-deferred-updates behaviour even
in the post-connection shutdown, and given there is a call to
DeferredUpdates::doUpdates in JobRunner::execute it is not necessary to
add another one and hence execution of Web jobs is more similar to execution
of CLI jobs. In the same spirit to reconcile Web jobs and CLI jobs, the
call to JobQueueGroup::pushLazyJobs is done in JobRunner::execute.

Bug: T165714
Bug: T100085
Change-Id: I721e7167eca5b0b6227234fe516005243ab22388
2017-06-01 13:16:08 +02:00
Aaron Schulz
ac202927d4 Add $wgMaxJobDBWriteDuration setting for avoiding replication lag
This is similar to $wgMaxUserDBWriteDuration except for jobs.

Also use the Config class in JobRunner instead of globals.

Bug: T95501
Change-Id: I4949bb99c26451429c7acf82ecc4444bf9fb835f
2017-05-25 19:43:27 +00:00
Aaron Schulz
dd359741cc Move DB errors to Rdbms namespace
Change-Id: I463bd86123501abc68fdb78b4cda6110f7af2549
2017-04-15 10:47:41 -07:00
Aaron Schulz
4a177b34ef Move LBFactory to Rdbms namespace
Change-Id: I5ae10783228d0252284807c9562bc8e328d4becb
2017-02-03 17:24:03 -08:00
Aaron Schulz
6477026675 Back off from job types longer for DB read-only errors
Such error are likely to persist longer than other random
exceptions. In that case, it is better to avoid burning
through the job retry count.

Change-Id: I6785bd608856f98d21e0b0b05d3899a7081c38e2
2016-12-09 23:26:34 -08:00
jenkins-bot
69ae945e8d Merge "Update weblinks in comments from HTTP to HTTPS" 2016-11-08 21:32:00 +00:00
Fomafix
202f695f67 Update weblinks in comments from HTTP to HTTPS
Use HTTPS instead of HTTP where the HTTP link is a redirect to the HTTPS link.

Also update some defect links.

Change-Id: Ic3a5eac910d098ed5c2a21e9f47c9b6ee06b2643
2016-11-07 15:24:46 +01:00
Kunal Mehta
61adc1e146 Use namespaced ScopedCallback
The un-namespaced \ScopedCallback is deprecated.

Change-Id: Ie014d5a775ead66335a24acac9d339915884d1a4
2016-10-17 15:46:05 -07:00
Gergő Tisza
d304f5e394 Pass Job success status to teardown callbacks
Change-Id: Icf2e03efcfd9232fe4ead776096b61cef1c06141
2016-10-05 02:55:45 +00:00
Aaron Schulz
1cb13cff08 Remove pointless double exception logging from JobRunner
Change-Id: I12a2e6db326af25a3a276a477fbff505feac87b6
2016-09-13 04:38:36 +00:00
Aaron Schulz
703b0691ca Use ESTIMATE_DB_APPLY for total transaction time estimate
Individual write queries already do this, but the COMMIT step
still used the old accounting.

Change-Id: I416a524d6652f933cbc49033b49745db732c8b92
2016-09-11 16:04:21 -07:00
Aaron Schulz
c14ddc5c30 Make sure the lock in JobRunner::commitMasterChanges() releases
Used a ScopedCallback in case of exception to avoid queue backup

Change-Id: I58a5f152a54ed9a0d5544014788792bd62afbf4a
2016-09-08 02:19:32 -07:00
Aaron Schulz
6c73b32fd5 Convert JobRunner to using beginMasterChanges()
This lets the runJobs.php $wgCommandLineMode hack be removed.

Some fixes based on unit tests:
* Only call applyTransactionRoundFlags() for master connections
  for transaction rounds from beginMasterChanges().
* Also cleaned up the commitAndWaitForReplication() reset logic.
* Removed deprecated DataUpdate::doUpdate() calls from jobs
  since they cannot nest in a transaction round.

Change-Id: Ia9b91f539dc11a5c05bdac4bcd99d6615c4dc48d
2016-09-07 03:56:37 +00:00
Aaron Schulz
d1f09fb4c3 Fix IDEA errors in JobRunner
Change-Id: I15939326afa80139a4d1000e43057b61cd374f18
2016-09-06 15:17:14 -07:00
Aaron Schulz
57e19b610d Renamed some variables from "slave" to "replica"
Change-Id: I455278294cd7ea344d14a76ac5957ece2e07fbf3
2016-09-05 23:03:01 -07:00
Aaron Schulz
16266edff3 Change "slave" => "replica DB" in /includes
Change-Id: Icb716219c9335ff8fa447b1733d04b71d9712bf9
2016-09-05 21:01:01 +00:00
Aaron Schulz
de0b371aac Add flushReplicaSnapshots() method for just clearing snapshots
This is better than having to use the less safe commitAll(),
which also checks and commits masters with writes.

Change-Id: I01c95f1ebae6927ed5acf0c23dd19b5c2413f661
2016-09-02 11:06:56 -07:00
Aaron Schulz
dac1a29b43 Add more estimation modes to pendingWriteQueryDuration()
* Use this to exclude some common cases of harmless queries that
  happen to block on row-level locks for a long time. This does
  not apply to UPDATE/DELETE however, due to the ambiguity of
  time spent scanning vs locking.
* Update commitMasterChanges() and JobRunner to use the new
  mode to avoid pointless rollback or lag checks.

Change-Id: Ifc2743f2d8cd109840c45cda5028fbb4df55d231
2016-08-29 18:36:17 -07:00
Aaron Schulz
f3cfdf0baa Remove commit() calls from JobQueueDB
These are not safe for the common case where the local DB
handle is used for the queue (and other table writes).

Change-Id: Ic24a05c18bf31e49bf7e9a3c058deb5d35271511
2016-08-23 17:24:58 +00:00
Aaron Schulz
8359993708 Various database class cleanups
* Refactor out some code duplication in query() into a
  separate private method.
* Remove the total master/slave query profiling, which is not
  necessary and redundant.
* Provide a default implementation for reconnect().
* Make reconnect() catch errors so it can match the docs that say
  it returns true/false to indicate failure. Likewise for ping().
* Optimize ping() to no-op if there was obvious recent activity.
* Move the ping() round in JobRunner to approveMasterChanges.
  This way, all commit rounds benefit from this logic.
* Add more doc comments for DatabaseBase fields.

Change-Id: Ic90ce2be4187244a0e8d44854c39d4b78be8e642
2016-08-22 20:15:41 -07:00
Aaron Schulz
4209b81ad0 Use waitForAll() for slow JobRunner commits
Using waitForOne() barely goes beyond semi-sync replication
already in place on serious DB clusters.

Change-Id: Idb719deaa5993bc2f818cd110d49d09567e0afb3
2016-08-12 20:45:00 -07:00
Aaron Schulz
3675f1d447 Make Database disconnect and error suppression more robust
* Disallow $ignoreErrors in query() on deadlocks, since that would otherwise
  silently rollback all changes from any other callers.
* Move recoverability checks for disconnects to canRecoverFromDisconnect().
* The first write of a DBO_TRX transaction is now considered recoverable.
* Run onTransactionResolution() callbacks on disconnect/deadlock rollback.
  Some DeferrableUpdate need this to know to abort.
* Disallow $ignoreErrors on disconnects considered unrecoverable. This
  makes it so that query() callers cannot cause writes from other callers
  to be silently lost, which is hard to reason about.
* Moved ping() logic to simple reconnect() method and ping() simply do
  a dummy SELECT, which triggeres reconnection if safe. Previously,
  ping() might cause subtle partial transaction loss.
* Remove ping() from strencode(), which would cause partial transaction
  loss where it actually reached.
* Remove mysqlPing() per https://bugs.php.net/bug.php?id=52561.

Bug: T142079
Change-Id: Ifb7f772ae849d67c0d92240a115c3f392e252937
2016-08-11 07:26:33 +00:00
Aaron Schulz
f879dd8079 Fix increment() statsd call in JobRunner
Change-Id: I17e04db59a44a491aae99c4542216316361010a0
2016-08-06 00:54:42 +00:00
Brian Wolff
fb7b637660 Call $job->teardown() even if Job throws an exception.
teardown() callbacks are primarily used to reset session after
job is done. It seems important to do this, even if exception is
thrown by job.

Change-Id: I0bd449414527321b0ed9063cea268dea5b0766c4
2016-05-16 01:44:48 -04:00
Erik Bernhardson
afc3b5a120 Track which web request created a job
We currently push a request id into structured logging (monolog/
logstash) to allow seeing all logs that were triggered by the same
request. This extends that to pass the id through jobs so jobs triggered
by a web request also share the same id and can be tracked together.
This web request id will follow jobs both directly created by a request,
and jobs created by those jobs.

This should give us some more visibility when debugging into what
started a particular job, and if a large number of jobs blowing up the
job queue are somehow related.

Change-Id: Iedbd031e6e9bb18fd6f7b923c8c305102255ab4b
2016-04-13 10:41:13 -07:00