Commit graph

48 commits

Author SHA1 Message Date
Aaron Schulz
8f829de5f0 Add action/user tracking to link refresh jobs
Change-Id: Ie7261eacddb869988b005ba2f17968df88c7003e
2017-10-23 11:06:16 -07:00
Aaron Schulz
2cb965c5a5 Set getDeduplicationInfo() for HTMLCacheUpdateJob
This allows de-duplication of single page jobs for the
same page due to edits to different templates. This is
the same logic that RefreshLinksJob already has.

Also fix a bug in that method in RefreshLinksJob.

Change-Id: I2f79031c945eb3d195f9dbda949077bbc3e67918
2017-10-19 22:38:58 -07:00
Kunal Mehta
1fd095ec1c Avoid using the deprecated ParserCache::singleton()
Change-Id: I0da6d9cbfad26c89bf5dab564071ef97acaf44f9
2017-09-09 14:20:10 -07:00
Aaron Schulz
70d1bc0091 Make workItemCount() smarter for htmlCacheUpdate/refreshLinks
Do not count jobs that just make subdivide as having any
"work items". This makes $wgJobBackoffThrottling less
overzealous when used to limit these type of jobs.

The main reason to limit htmlCacheUpdate would be for
CDN purge rate limiting. For refreshLinks, it would
mostly be lag, though that is already handled for
leaf jobs and JobRunner itself.

Bug: T173710
Change-Id: Ide831b555e51e3111410929a598efb6c0afc0989
2017-08-23 10:35:34 -07:00
Umherirrender
3f1a52805e Use short type bool/int in param documentation
Enable the phpcs sniffs for this and used phpcbf

Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925
2017-08-20 13:20:59 +02:00
Aaron Schulz
a549313076 Avoid lock acquisition errors for multi-title refreshlinks jobs
Bug: T173462
Change-Id: I9dab9b4e5c4cae7306dc29bad9e62287d54b2281
2017-08-17 09:36:56 -07:00
Aaron Schulz
fcc2895cad Fix bogus variable use in RefreshLinksJob::run()
Also removed two unused loop variables.

Change-Id: I9a9e0a83bdaa13c031857bc20f977161cf85baff
2017-04-20 11:29:25 -07:00
Aaron Schulz
dd359741cc Move DB errors to Rdbms namespace
Change-Id: I463bd86123501abc68fdb78b4cda6110f7af2549
2017-04-15 10:47:41 -07:00
Bartosz Dziewoński
ecdef925bb Miscellaneous indentation tweaks
I was bored. What? Don't look at me that way.

I mostly targetted mixed tabs and spaces, but others were not spared.
Note that some of the whitespace changes are inside HTML output,
extended regexps or SQL snippets.

Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0
2017-02-27 19:23:54 +01:00
Aaron Schulz
2d4ed16bd8 Make RefreshLinksJob handle LinksUpdateConstructed hooks doing DB writes
Bug: T153618
Change-Id: Iae52e9225fe132f2aa99e161611bf8258736d38d
2017-01-07 17:40:11 +00:00
Aaron Schulz
f8a9490f88 Reorganize RefreshLinksJob code slightly and avoid deprecated functions
Change-Id: I6ff4bec61b37bfbffc1e96eac61d692dd7feb31a
2016-09-12 21:11:11 -07:00
Aaron Schulz
6c73b32fd5 Convert JobRunner to using beginMasterChanges()
This lets the runJobs.php $wgCommandLineMode hack be removed.

Some fixes based on unit tests:
* Only call applyTransactionRoundFlags() for master connections
  for transaction rounds from beginMasterChanges().
* Also cleaned up the commitAndWaitForReplication() reset logic.
* Removed deprecated DataUpdate::doUpdate() calls from jobs
  since they cannot nest in a transaction round.

Change-Id: Ia9b91f539dc11a5c05bdac4bcd99d6615c4dc48d
2016-09-07 03:56:37 +00:00
Aaron Schulz
16266edff3 Change "slave" => "replica DB" in /includes
Change-Id: Icb716219c9335ff8fa447b1733d04b71d9712bf9
2016-09-05 21:01:01 +00:00
Aaron Schulz
21ddcf1592 Add convenience commitAndWaitForReplication() method
This also does sanity checks to avoid breaking transactions

Change-Id: I7453c245eee25a26243e606970ef5f79b21a8141
2016-08-16 22:09:17 +00:00
Aaron Schulz
63a3911a67 Improvements to RefreshLinksJob/DeleteLinksJob locking
* Removed the lockAndGetLatest() call which caused contention problems.
  Previously, job #2 could block on job #1 in that method, then job #1
  yields the row lock to job #2 in LinksUpdate::acquirePageLock() by
  committing, then job #1 blocks on job #2 in updateLinksTimestamp().
  This caused timeout errors. It also is not fully safe ever since
  batching and acquirePageLock() was added.
* Add an outer getScopedLockAndFlush() call to runForTitle() which
  avoids this contention (as well as contention with page edits)
  but still prevents older jobs from clobbering newer jobs. Edits
  can happen concurrently, since they will enqueue a job post-commit
  that will block on the lock.
* Use the same lock in DeleteLinksJob to edit/deletion races.

Change-Id: I9e2d1eefd7cbb3d2f333c595361d070527d6f0c5
2016-07-19 13:04:21 -07:00
Aaron Schulz
dc4cc32100 Use READ_LATEST for the WikiPage in RefreshLinksJob
Also sanity check that the revision belongs to that page.

Change-Id: I4e6897b52212d9787d74fb017861ec62f2927f0e
2016-07-10 04:14:12 -07:00
Aaron Schulz
d022475854 Remove "masterPos" stuff from RefreshLinksJob
Just do a single slave lag wait check when branching the base job.
Any remnant/leaf jobs after than do not have to do anything special.

This should also improve de-duplication and reduce commonswiki
errors like "Could not acquire lock on page #42482792" due to
insane pages.

Change-Id: I40f9c6e0e905bd8149bb364c33a0642628cb1423
2016-06-09 04:57:25 -07:00
Roan Kattouw
01b2516175 Add LinksUpdate::getRevision()
Similar to getTriggeringUser(). Also propagate it
to subjobs similarly.

Bug: T135959
Change-Id: I3d894acaf3d85b790e5034c7d9f76bf94672f445
2016-05-26 15:43:22 -07:00
Aaron Schulz
a6f75ac03c Tweak RefreshLinksJob cache logic
* Make this actually use the cache beyond edge cases
  by making the page_touched check less strict. The
  final check on the cache timestamp is good enough.
* Log metrics to statsd to give visibility.

Change-Id: I14c14846a7b68d079e1a29c6d50e354a3c1926d6
2016-05-02 22:17:45 -07:00
Brad Jorsch
e96c81bab5 Quick-fail refreshLinksJob if the triggering revision isn't the latest
If we already know that the triggeringRevisionId is outdated, fail early
instead of doing all the work of re-parsing that old revision and
preparing all the updates only to fail later at the lockAndGetLatest()
call.

Change-Id: Ic70c659899d5d47e74fa17c88ed26b436732ca8a
2016-05-02 16:36:10 -04:00
Aaron Schulz
cbc9745eb6 Make refreshLinksJob explicitly check the cache rev ID
This is needed if the $useOutdated behavior of ParserCache
is modified per Ibd111bed203dd.

Bug: T133659
Change-Id: I70806dffba8af255d7cdad7663132b58479f63e3
2016-05-02 12:10:38 -07:00
Kunal Mehta
6e9b4f0e9c Convert all array() syntax to []
Per wikitech-l consensus:
 https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html

Notes:
* Disabled CallTimePassByReference due to false positives (T127163)

Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b
2016-02-17 01:33:00 -08:00
Kunal Mehta
829c4a8503 RefreshLinksJob: Restore LinksUpdate::setTriggeringUser() call
This partially reverts 22476baa85, as the setTriggeringUser()
call that was removed was being used by Echo to be able to determine
which user caused a LinksUpdate to be triggered.

Bug: T121780
Change-Id: I62732032a6b74f17b5ae6a2497fa519f9ff38d4f
2015-12-17 10:50:53 -08:00
Krinkle
a2c30ecb02 Merge "Remove obsolete category links code" 2015-12-06 18:42:39 +00:00
Aaron Schulz
22476baa85 Remove obsolete category links code
* These calls and methods should no longer be needed
* Follow-up to 6dedffc2d7

Change-Id: Iff121263610117112c84edb5e575f039456d1ac8
2015-12-05 16:07:59 -08:00
Aaron Schulz
25a39d255c Make RefreshLinksJob de-duplication more robust
* Do not de-duplicate jobs with "masterPos". It either does not
  catch anything or is not correct. Previously, it was the later,
  by making getDuplicationInfo() ignore the position. That made the
  oldest DB position win among "duplicate" jobs, which is unsafe.
* From graphite, deduplication only applies .5-2% of the time for
  "refreshLinks", so there should not be much more duplicated
  effort. Dynamic and Prioritized refreshLinks jobs remain
  de-duplicated on push() and root job de-duplication still applies
  as it did before. Also, getLinksTimestamp() is still checked to
  avoid excess work.
* Document the class constants.

Change-Id: Ie9a10aa58f14fa76917501065dfe65083afb985c
2015-12-04 12:40:10 -08:00
Aaron Schulz
9b386d2436 Race condition fixes for refreshLinks jobs
* Use READ_LATEST when needed to distinguish slave lag
  affecting new pages from page deletions that happened
  after the job was pushed. Run-of-the-mill mass backlink
  updates still typically use "masterPos" and READ_NORMAL.
* Search for the expected revision (via READ_LATEST)
  for jobs triggered by direct page edits. This avoids lag
  problems for edits to existing pages.
* Added a CAS-style check to avoid letting jobs clobber
  the work of other jobs that saw a newer page version.
* Rename and expose WikiPage::lock() method.
* Split out position wait logic to a separate protected
  method and made sure it only got called once instead of
  per-title (which didn't do anything). Note that there is
  normally 1 title per job in any case.
* Add FIXME about a related race-conditions.

Bug: T117332
Change-Id: Ib3fa0fc77040646b9a4e5e4b3dc9ae3c51ac29b3
2015-11-16 13:21:05 -08:00
Aaron Schulz
d570d5102f Generalize the LinkCache clear() call to JobRunner
The use for this logic is not specific to RefreshLinksJob

Change-Id: I4bb911ab5882d1795e12163df8ae6b227c58bc8a
2015-11-14 05:22:47 -08:00
Aaron Schulz
180ce81139 Break long lines and cleanup some RefreshLinksJob checks
Change-Id: I02c007a2c2032610551d71ce1b21e03db5c011db
2015-11-06 20:18:06 -08:00
addshore
6a65ce223f Add triggeringRevisionId to LinksUpdate JobSpec
Bug: T117860
Change-Id: I8c730a434b8bdda7664fd1e3bb3fbc8840804950
2015-11-05 12:07:01 +00:00
Kunal Mehta
c52e5a21f6 LinksUpdate: Keep track of the triggering User
So extensions like Echo are able to attribute post-edit link updates to
specific the users who triggered them.

Bug: T116485
Change-Id: I083736a174b6bc15e3ce60b2b107c697d0ac13da
2015-10-27 17:10:19 -07:00
jenkins-bot
cc167acbc6 Merge "Fixes related to WikiPage::triggerOpportunisticLinksUpdate()" 2015-10-27 10:07:48 +00:00
Aaron Schulz
e6aabda9b6 Remove paranoid title check from RefreshLinksJob::runForTitle
Change-Id: Ie2b875dcb394e9cf20818a26d245684933765baf
2015-10-23 21:14:11 -07:00
Aaron Schulz
d705ae970a Fixes related to WikiPage::triggerOpportunisticLinksUpdate()
* Focus on updating links that would *not* already be updated
  by jobs, not those that already *will* be updated.
* Place the jobs into a dedicated queue so they don't wait
  behind jobs that actually have to parse every time. This
  helps avoid queue buildup.
* Make Job::factory() set the command field to match the value
  it had when enqueued. This makes it easier to have the same
  job class used for multiple queues.
* Given the above, remove the RefreshLinksJob 'prioritize' flag.
  This worked by overriding getType() so that the job went to a
  different queue. This required both the special type *and* the
  flag to be set if using JobSpecification or either ack() would
  route to the wrong queue and fail or the job would go in the
  regular queue. This was too messy and error prone. Cirrus jobs
  using the same pattern also had ack() failures for example.

Change-Id: I5941cb62cdafde203fdee7e106894322ba87b48a
2015-10-24 00:10:12 +00:00
Brian Wolff
4d3fb38bff Properly make LinksUpdate be recursive when done from job queue
New enqueue method of DeferredUpdates was turning LinksUpdate
updates into Jobs. However RefreshLinksJob was not properly
reconstructing the secondary updates as being recursive (if they
were recursive). This means that templates weren't having the pages
that were using them being updated.

See also related T116001.

Change-Id: Ia06246efb2034fdfe07232fd8c2334160edbcf02
2015-10-22 12:56:03 -06:00
addshore
d40cd42b9f Enable users to watch category membership changes #2
This is part of a chain that reverts:
e412ff5ecc.

NOTE:
- The feature is disabled by default
- User settings default to hiding changes
- T109707 Touching a file on wikisource adds and
      removes it from a category... Even when page
      has no changes.... WTF? See linked issue,
      marked as stalled with a possible way forward
      for this patch.
      @see https://gerrit.wikimedia.org/r/#/c/235467/

Changes since version 1:
- T109604 - Page names in comment are no longer
      url encoded / have _'s
- T109638 & T110338 - Reserved username now used
      when we can't determine a username for the change
      (we could perhaps set the user and id to be blank
      in the RC table, but who knows what this might do)
- T109688 - History links are now disabled in RC....
      (could be fine for the introduction and worked
      on more in the future)
- Categorization changes are now always patrolled
- Touching on T109672 in this change emails will never
      be sent regarding categorization changes. (this
      can of course be changed in a followup)
- Added $wgRCWatchCategoryMembership defaulting to true
      for enabling / disabling the feature
- T109700 - for cases when no revision was retrieved
      for a category change set the bot flag to true.
      This means all changes caused by parser functions
      & Lua will be marked as bot, as will changes that
      cant find their revision due to slave lag..

Bug: T9148
Bug: T109604
Bug: T109638
Bug: T109688
Bug: T109700
Bug: T110338
Bug: T110340
Change-Id: I51c2c1254de862f24a26ef9dbbf027c6c83e9063
2015-10-20 14:23:48 -07:00
Vivek Ghaisas
c54766586a Fix issues identified by SpaceBeforeSingleLineComment sniff
Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9
2015-09-26 23:06:52 +00:00
Aaron Schulz
c3d9666051 jobqueue: A few small code cleanups to RefreshLinksJob
Change-Id: Ia331e9dbf9d2be137c34a8c93ef2d6da8aad6c56
2015-09-22 01:27:36 +00:00
CSteipp
e412ff5ecc Revert "Enable users to watch category membership changes"
This reverts commit f6879ea16e.

Bug: T109638
Change-Id: I770d8d33a4cff3829bdea9a4df24de209cbe691b
2015-08-20 10:35:56 -07:00
Kai_WMDE
f6879ea16e Enable users to watch category membership changes
Bug: T9148
Change-Id: I5a89d8f19804b1120f4c755d834e2da6ca12ceae
2015-08-13 17:58:06 +02:00
Ori Livneh
b0a79e9245 Rename WikiPage::isParserCacheUsed to WikiPage::shouldCheckParserCache
'isParserCachedUsed' implies that the parser cache usage has already occurred,
and obscures the true purpose of this method, which is to determine whether or
not the requested page *should* be looked up in the parser cache.

Only usage in extensions is in TextExtracts, which I changed to be both
backward- and forward-compatible in If5d5da8eab13.

Change-Id: I7de67937f0e57b1dffb466319192e4d400b867de
2015-06-22 20:55:34 -07:00
Ori Livneh
c099155a17 ellapsed => elapsed
Also fix some files that don't end with a newline.

Change-Id: Id0672d685b929a5832b42f733dad49683536180a
2015-06-23 03:32:33 +00:00
Aaron Schulz
9632223e4c Fixed Job constructor IDE notices about variable types
Change-Id: I4b4e4e38e8d416c3445c52ced311f5fbfcde868a
2015-05-30 08:09:30 +00:00
Aaron Schulz
187fd64723 Made triggerOpportunisticLinksUpdate() jobs make use of parser cache
* On Wikipedia, for example, these jobs are good percent of
  all refreshLinks jobs; skipping the parse step should avoid
  runner CPU overhead
* Also fixed bad TS_MW/TS_UNIX comparison
* Moved the fudge factor to a constant and raised it a bit

Bug: T98621
Change-Id: Id6d64972739df4b26847e4374f30ddcc7f93b54a
2015-05-11 22:26:44 +02:00
Aaron Schulz
df5ef8b5d7 Removed doCascadeProtectionUpdates method to avoid DB writes on page views
* Use special prioritized refreshLinksJobs instead, which triggers when
  transcluded pages are changed
* Also added a triggerOpportunisticLinksUpdate() method to handle
  dynamic transcludes

bug: T89389
Change-Id: Iea952d4d2e660b7957eafb5f73fc87fab347dbe7
2015-02-22 13:36:13 -08:00
Ricordisamoa
12dec5d85d Fix some stuttering in comments and documentation
Change-Id: I9c0088b9aab37335203cad45a1d6fa8ac3f43321
2014-12-17 19:44:10 +00:00
Brad Jorsch
78aad9802d Include parsed revision ID in parser cache
One theory for what's behind bug 46014 is that the vandal submits the
edit, then someone (maybe the vandal) gets into the branch of
Article::view that uses PoolWorkArticleView, then ClueBot comes along
and reverts before the PoolWorkArticleView actually executes. Once that
PoolWorkArticleView actually does execute, it overwrites the parser
cache entry from ClueBot's revert with the one from the old edit.

To detect this sort of thing, let's include the revision id in the
parser cache entry and consider it expired if that doesn't match. Which
makes sense to do anyway.

And for good measure, let's have PoolWorkArticleView not save to the
parser cache if !$isCurrent.

Bug: 46014
Change-Id: Ifcc4d2f67f3b77f990eb2fa45417a25bd6c7b790
2014-04-01 12:15:34 -04:00
Aaron Schulz
9ffd4f085d Renamed /job to /jobqueue
Change-Id: I4c8a2b42140630838867c77a70d45ba14b5d95e2
2014-03-14 13:42:04 -07:00
Renamed from includes/job/jobs/RefreshLinksJob.php (Browse further)