Providing the perfectly correct number of pages
that have acctually been added to or removed from a
category is extermly hard.
Rather than providing data that is most likely wrong
and a bit useless ONLY provide a link to Special:WhatLinksHere.
This touches on the following bugs and may mean that
some of them don't need to be thought about any further.
Bug: T126855
Bug: T126407
Bug: T126139
Change-Id: Ida06d822d1955091595c17c9c6c2968a40a93bcd
We currently push a request id into structured logging (monolog/
logstash) to allow seeing all logs that were triggered by the same
request. This extends that to pass the id through jobs so jobs triggered
by a web request also share the same id and can be tracked together.
This web request id will follow jobs both directly created by a request,
and jobs created by those jobs.
This should give us some more visibility when debugging into what
started a particular job, and if a large number of jobs blowing up the
job queue are somehow related.
Change-Id: Iedbd031e6e9bb18fd6f7b923c8c305102255ab4b
This reduces the runtime of database-bound tests by about 40%
(on my system, from 4:55 to 2:47; results from Jenkins are
inconclusive).
The basic idea is to call addCoreDBData() only once, and have
a addDBDataOnce() that is called once per test class, not for
every test method lie addDBData() is. Most tests could be
trivially be changed to implement addDBDataOnce() instead of
addDBData(). The ones for which this did not work immediately
were left out for now. A closer look at the tests that still
implement addDBData() may reveal additional potential for
improvement.
TODO: Once this is merged, try to change addDBData() to
addDBDataOnce() where possible in extensions.
Change-Id: Iec4ed4c8419fb4ad87e6710de808863ede9998b7
I created a basic test yesterday to cover two bugs. Now the test covers
all public methods. I was also able to get rid of the test double.
Change-Id: I53110280e3ef7b7a72d175b11b7fc4ccf1d648b3
* Track queues with non-abandoned jobs per partition server.
The s-queuesWithJobs key can easily be queried to see which
queues need to have periodic tasks run (or for debugging).
* This is requirement for the redis jobchron service to be able to
avoid hitting N=(no. types X no. wikis) queues for periodic tasks
when only a tiny fraction of those actually have any jobs. For WMF,
there are over 30K queues, most of them empty, so doing that can help
lower redis-server CPU (or at least make jobchron more responsive).
* This also allows for jobchron to manage the aggregator by taking the
per-server aggregator sets and merging them. This scales much better
as there are only a modest number of these daemons (18 for WMF) but
vastly more web thread pushing jobs. This cuts down on the connections
to the active aggregator server (the one with the hash table).
* Use Lua unpack() more for stylistic consistency.
Change-Id: I1549f0edc78cc4004dd887b475dec4c0ebd306c6
If we really need this we can do it in MediaWikiTestCase, next
to the setting of wgMainCacheType. But from what I can see the
code being tested here already doesn't use the old $wgMemc.
Change-Id: I9e4b2109b2f3c18d8d5551bbadae5711c1d4c0a6
* Remove some getAcquiredCount() assertions when claimTTL=0
as this is not well defined enough (queues may take a few
minutes to garbage collect the failed jobs).
* Added some tests to make sure push() only de-duplicates
among unclaimed jobs.
Change-Id: Ie0a5e539095c245dfcc8c160417e12824eb7ab83
I noticed JobQueueTest::testRootDeduplication takes ~ 6.5 seconds, which
is due to the test method using sleep(1) and being passed the provider
provider_queueLists which yields six items.
The reason is to have the array returned by Job::newRootJobParams() to
have an incread value for 'rootJobTimestamp'. Instead, just copy the
previous array of parameters and increment the UNIX timestamp and
converting back to TS_MW format.
Change-Id: I75066df73f9f92e56b89eb6d928c41e949a2d6a9
- Place commas correct
- Moved comments
- Add space after if/foreach/catch
- Reformat some conditions
- Removed trailing spaces/tabs
Change-Id: I40ccda72c418c4a33fcd675773cb08d971510cdb
* The changes refreshLinks to handle both per-title (leaf) and backlink jobs.
The base job now splits into some leaf jobs and a remaining partition job.
The partition job does the same until there are only a small number
of backlinks in the remaining range (so only leaf jobs are added).
Since the leaf jobs are pushed first, this works well for FIFO queues
to avoid bloating the queue. This also improves per-title job
de-duplication, which isQueueDeprioritized() pretty much killed.
* The refreshLinks2 class is no longer used for new jobs.
* Fix process cache bug with JobQueueGroup::push with empty arrays.
* This adds a BacklinksJobUtils with helper functions for partitioning.
* RefreshLinksJob jobs now have a simple version parameter.
* Also moved refreshLinks2Job to its own file.
Change-Id: Id378d47df17248ae02938d5a54ef7ecd29efadbd
Also update some previous inconsistencies pointed out by Krinkle in change IDs:
* Ide20743a2e84ff68549286120e6cff9d9f396f54
* I811ca957b6588085d67606ebc0cd4033a1e53839
Change-Id: Ife33b931870d0d7e04fcb40974997436d27f528f
Change some tests to use setMwGlobals to have restoring of globals after
the test.
This also removes some save/restore code, which is not needed, due to
the automatically restoring on tearDown with setMwGlobals.
Change-Id: I8d2ac9f6cc14f0bd4ee8eb851c09f2e71babc6e0
* Cleaned up some data structures into hashes, which get better
compression and play well with the KEYS parameter in Lua scripts.
The claim list is now a sorted set with O(logN) removal in ack()
and O(log(N)+M) searching in recycleAndDeleteStaleJobs().
* Made the class itself control object serialization, so that lua
scripts have an easy time. Only the job data itself needs to be
serialized, where as other things just get bloated.
* Used Lua scripts to get push(), pop() and ack() down to 1 RTT.
* Likewise rewrote recycleAndDeleteStaleJobs() to use a script.
* Fixed bug where claimed duplicate jobs removed the data on ack(),
which meant that claimed duplicated jobs could no-op newer ones.
De-duplication should only apply to unclaimed jobs like for the
JobQueueDB class, so that unfinished jobs don't no-op new ones.
* Removed locking in recycleAndDeleteStaleJobs(), which would not do
much since the exclusive set request would serialize on the lua
script anyway. The lua script will finish quickly the next times
if done more than once in a row due to sorted set usage.
Also made recycleAndDeleteStaleJobs() run randomly to reduce the
chance of a single calling tying up the server.
* Removed useless hDel() call in getJobFromUidInternal().
* Changed unit tests to handle the different supported orders better.
Added tests for the 'timestamp' ordering.
Change-Id: Ib2d7aff18753195248ab856afd4a46e18b301db9
* Cleaned up 'server' option to not fragment the pool.
Also made it actually match the documentation.
* Made it use doGetPeriodicTasks() for job recycling.
* Made it so that other job queue classes can be tested.
* Renamed "redisConf" => "redisConfig".
* Tweaked comments about the "random" order option.
Change-Id: I7823d90010e6bc9d581435c3be92830c5ba68480