Thijs/wiki.techinc.nl

Author	SHA1	Message	Date
Aaron Schulz	8f829de5f0	Add action/user tracking to link refresh jobs Change-Id: Ie7261eacddb869988b005ba2f17968df88c7003e	2017-10-23 11:06:16 -07:00
Aaron Schulz	2cb965c5a5	Set getDeduplicationInfo() for HTMLCacheUpdateJob This allows de-duplication of single page jobs for the same page due to edits to different templates. This is the same logic that RefreshLinksJob already has. Also fix a bug in that method in RefreshLinksJob. Change-Id: I2f79031c945eb3d195f9dbda949077bbc3e67918	2017-10-19 22:38:58 -07:00
Kunal Mehta	1fd095ec1c	Avoid using the deprecated ParserCache::singleton() Change-Id: I0da6d9cbfad26c89bf5dab564071ef97acaf44f9	2017-09-09 14:20:10 -07:00
Aaron Schulz	70d1bc0091	Make workItemCount() smarter for htmlCacheUpdate/refreshLinks Do not count jobs that just make subdivide as having any "work items". This makes $wgJobBackoffThrottling less overzealous when used to limit these type of jobs. The main reason to limit htmlCacheUpdate would be for CDN purge rate limiting. For refreshLinks, it would mostly be lag, though that is already handled for leaf jobs and JobRunner itself. Bug: T173710 Change-Id: Ide831b555e51e3111410929a598efb6c0afc0989	2017-08-23 10:35:34 -07:00
Umherirrender	3f1a52805e	Use short type bool/int in param documentation Enable the phpcs sniffs for this and used phpcbf Change-Id: Iaa36687154ddd2bf663b9dd519f5c99409d37925	2017-08-20 13:20:59 +02:00
Aaron Schulz	a549313076	Avoid lock acquisition errors for multi-title refreshlinks jobs Bug: T173462 Change-Id: I9dab9b4e5c4cae7306dc29bad9e62287d54b2281	2017-08-17 09:36:56 -07:00
Aaron Schulz	fcc2895cad	Fix bogus variable use in RefreshLinksJob::run() Also removed two unused loop variables. Change-Id: I9a9e0a83bdaa13c031857bc20f977161cf85baff	2017-04-20 11:29:25 -07:00
Aaron Schulz	dd359741cc	Move DB errors to Rdbms namespace Change-Id: I463bd86123501abc68fdb78b4cda6110f7af2549	2017-04-15 10:47:41 -07:00
Bartosz Dziewoński	ecdef925bb	Miscellaneous indentation tweaks I was bored. What? Don't look at me that way. I mostly targetted mixed tabs and spaces, but others were not spared. Note that some of the whitespace changes are inside HTML output, extended regexps or SQL snippets. Change-Id: Ie206cc946459f6befcfc2d520e35ad3ea3c0f1e0	2017-02-27 19:23:54 +01:00
Aaron Schulz	2d4ed16bd8	Make RefreshLinksJob handle LinksUpdateConstructed hooks doing DB writes Bug: T153618 Change-Id: Iae52e9225fe132f2aa99e161611bf8258736d38d	2017-01-07 17:40:11 +00:00
Aaron Schulz	f8a9490f88	Reorganize RefreshLinksJob code slightly and avoid deprecated functions Change-Id: I6ff4bec61b37bfbffc1e96eac61d692dd7feb31a	2016-09-12 21:11:11 -07:00
Aaron Schulz	6c73b32fd5	Convert JobRunner to using beginMasterChanges() This lets the runJobs.php $wgCommandLineMode hack be removed. Some fixes based on unit tests: * Only call applyTransactionRoundFlags() for master connections for transaction rounds from beginMasterChanges(). * Also cleaned up the commitAndWaitForReplication() reset logic. * Removed deprecated DataUpdate::doUpdate() calls from jobs since they cannot nest in a transaction round. Change-Id: Ia9b91f539dc11a5c05bdac4bcd99d6615c4dc48d	2016-09-07 03:56:37 +00:00
Aaron Schulz	16266edff3	Change "slave" => "replica DB" in /includes Change-Id: Icb716219c9335ff8fa447b1733d04b71d9712bf9	2016-09-05 21:01:01 +00:00
Aaron Schulz	21ddcf1592	Add convenience commitAndWaitForReplication() method This also does sanity checks to avoid breaking transactions Change-Id: I7453c245eee25a26243e606970ef5f79b21a8141	2016-08-16 22:09:17 +00:00
Aaron Schulz	63a3911a67	Improvements to RefreshLinksJob/DeleteLinksJob locking * Removed the lockAndGetLatest() call which caused contention problems. Previously, job #2 could block on job #1 in that method, then job #1 yields the row lock to job #2 in LinksUpdate::acquirePageLock() by committing, then job #1 blocks on job #2 in updateLinksTimestamp(). This caused timeout errors. It also is not fully safe ever since batching and acquirePageLock() was added. * Add an outer getScopedLockAndFlush() call to runForTitle() which avoids this contention (as well as contention with page edits) but still prevents older jobs from clobbering newer jobs. Edits can happen concurrently, since they will enqueue a job post-commit that will block on the lock. * Use the same lock in DeleteLinksJob to edit/deletion races. Change-Id: I9e2d1eefd7cbb3d2f333c595361d070527d6f0c5	2016-07-19 13:04:21 -07:00
Aaron Schulz	dc4cc32100	Use READ_LATEST for the WikiPage in RefreshLinksJob Also sanity check that the revision belongs to that page. Change-Id: I4e6897b52212d9787d74fb017861ec62f2927f0e	2016-07-10 04:14:12 -07:00
Aaron Schulz	d022475854	Remove "masterPos" stuff from RefreshLinksJob Just do a single slave lag wait check when branching the base job. Any remnant/leaf jobs after than do not have to do anything special. This should also improve de-duplication and reduce commonswiki errors like "Could not acquire lock on page #42482792" due to insane pages. Change-Id: I40f9c6e0e905bd8149bb364c33a0642628cb1423	2016-06-09 04:57:25 -07:00
Roan Kattouw	01b2516175	Add LinksUpdate::getRevision() Similar to getTriggeringUser(). Also propagate it to subjobs similarly. Bug: T135959 Change-Id: I3d894acaf3d85b790e5034c7d9f76bf94672f445	2016-05-26 15:43:22 -07:00
Aaron Schulz	a6f75ac03c	Tweak RefreshLinksJob cache logic * Make this actually use the cache beyond edge cases by making the page_touched check less strict. The final check on the cache timestamp is good enough. * Log metrics to statsd to give visibility. Change-Id: I14c14846a7b68d079e1a29c6d50e354a3c1926d6	2016-05-02 22:17:45 -07:00
Brad Jorsch	e96c81bab5	Quick-fail refreshLinksJob if the triggering revision isn't the latest If we already know that the triggeringRevisionId is outdated, fail early instead of doing all the work of re-parsing that old revision and preparing all the updates only to fail later at the lockAndGetLatest() call. Change-Id: Ic70c659899d5d47e74fa17c88ed26b436732ca8a	2016-05-02 16:36:10 -04:00
Aaron Schulz	cbc9745eb6	Make refreshLinksJob explicitly check the cache rev ID This is needed if the $useOutdated behavior of ParserCache is modified per Ibd111bed203dd. Bug: T133659 Change-Id: I70806dffba8af255d7cdad7663132b58479f63e3	2016-05-02 12:10:38 -07:00
Kunal Mehta	6e9b4f0e9c	Convert all array() syntax to [] Per wikitech-l consensus: https://lists.wikimedia.org/pipermail/wikitech-l/2016-February/084821.html Notes: * Disabled CallTimePassByReference due to false positives (T127163) Change-Id: I2c8ce713ce6600a0bb7bf67537c87044c7a45c4b	2016-02-17 01:33:00 -08:00
Kunal Mehta	829c4a8503	RefreshLinksJob: Restore LinksUpdate::setTriggeringUser() call This partially reverts `22476baa85`, as the setTriggeringUser() call that was removed was being used by Echo to be able to determine which user caused a LinksUpdate to be triggered. Bug: T121780 Change-Id: I62732032a6b74f17b5ae6a2497fa519f9ff38d4f	2015-12-17 10:50:53 -08:00
Krinkle	a2c30ecb02	Merge "Remove obsolete category links code"	2015-12-06 18:42:39 +00:00
Aaron Schulz	22476baa85	Remove obsolete category links code * These calls and methods should no longer be needed * Follow-up to `6dedffc2d7` Change-Id: Iff121263610117112c84edb5e575f039456d1ac8	2015-12-05 16:07:59 -08:00
Aaron Schulz	25a39d255c	Make RefreshLinksJob de-duplication more robust * Do not de-duplicate jobs with "masterPos". It either does not catch anything or is not correct. Previously, it was the later, by making getDuplicationInfo() ignore the position. That made the oldest DB position win among "duplicate" jobs, which is unsafe. * From graphite, deduplication only applies .5-2% of the time for "refreshLinks", so there should not be much more duplicated effort. Dynamic and Prioritized refreshLinks jobs remain de-duplicated on push() and root job de-duplication still applies as it did before. Also, getLinksTimestamp() is still checked to avoid excess work. * Document the class constants. Change-Id: Ie9a10aa58f14fa76917501065dfe65083afb985c	2015-12-04 12:40:10 -08:00
Aaron Schulz	9b386d2436	Race condition fixes for refreshLinks jobs * Use READ_LATEST when needed to distinguish slave lag affecting new pages from page deletions that happened after the job was pushed. Run-of-the-mill mass backlink updates still typically use "masterPos" and READ_NORMAL. * Search for the expected revision (via READ_LATEST) for jobs triggered by direct page edits. This avoids lag problems for edits to existing pages. * Added a CAS-style check to avoid letting jobs clobber the work of other jobs that saw a newer page version. * Rename and expose WikiPage::lock() method. * Split out position wait logic to a separate protected method and made sure it only got called once instead of per-title (which didn't do anything). Note that there is normally 1 title per job in any case. * Add FIXME about a related race-conditions. Bug: T117332 Change-Id: Ib3fa0fc77040646b9a4e5e4b3dc9ae3c51ac29b3	2015-11-16 13:21:05 -08:00
Aaron Schulz	d570d5102f	Generalize the LinkCache clear() call to JobRunner The use for this logic is not specific to RefreshLinksJob Change-Id: I4bb911ab5882d1795e12163df8ae6b227c58bc8a	2015-11-14 05:22:47 -08:00
Aaron Schulz	180ce81139	Break long lines and cleanup some RefreshLinksJob checks Change-Id: I02c007a2c2032610551d71ce1b21e03db5c011db	2015-11-06 20:18:06 -08:00
addshore	6a65ce223f	Add triggeringRevisionId to LinksUpdate JobSpec Bug: T117860 Change-Id: I8c730a434b8bdda7664fd1e3bb3fbc8840804950	2015-11-05 12:07:01 +00:00
Kunal Mehta	c52e5a21f6	LinksUpdate: Keep track of the triggering User So extensions like Echo are able to attribute post-edit link updates to specific the users who triggered them. Bug: T116485 Change-Id: I083736a174b6bc15e3ce60b2b107c697d0ac13da	2015-10-27 17:10:19 -07:00
jenkins-bot	cc167acbc6	Merge "Fixes related to WikiPage::triggerOpportunisticLinksUpdate()"	2015-10-27 10:07:48 +00:00
Aaron Schulz	e6aabda9b6	Remove paranoid title check from RefreshLinksJob::runForTitle Change-Id: Ie2b875dcb394e9cf20818a26d245684933765baf	2015-10-23 21:14:11 -07:00
Aaron Schulz	d705ae970a	Fixes related to WikiPage::triggerOpportunisticLinksUpdate() * Focus on updating links that would not already be updated by jobs, not those that already will be updated. * Place the jobs into a dedicated queue so they don't wait behind jobs that actually have to parse every time. This helps avoid queue buildup. * Make Job::factory() set the command field to match the value it had when enqueued. This makes it easier to have the same job class used for multiple queues. * Given the above, remove the RefreshLinksJob 'prioritize' flag. This worked by overriding getType() so that the job went to a different queue. This required both the special type and the flag to be set if using JobSpecification or either ack() would route to the wrong queue and fail or the job would go in the regular queue. This was too messy and error prone. Cirrus jobs using the same pattern also had ack() failures for example. Change-Id: I5941cb62cdafde203fdee7e106894322ba87b48a	2015-10-24 00:10:12 +00:00
Brian Wolff	4d3fb38bff	Properly make LinksUpdate be recursive when done from job queue New enqueue method of DeferredUpdates was turning LinksUpdate updates into Jobs. However RefreshLinksJob was not properly reconstructing the secondary updates as being recursive (if they were recursive). This means that templates weren't having the pages that were using them being updated. See also related T116001. Change-Id: Ia06246efb2034fdfe07232fd8c2334160edbcf02	2015-10-22 12:56:03 -06:00
addshore	d40cd42b9f	Enable users to watch category membership changes #2 This is part of a chain that reverts: `e412ff5ecc`. NOTE: - The feature is disabled by default - User settings default to hiding changes - T109707 Touching a file on wikisource adds and removes it from a category... Even when page has no changes.... WTF? See linked issue, marked as stalled with a possible way forward for this patch. @see https://gerrit.wikimedia.org/r/#/c/235467/ Changes since version 1: - T109604 - Page names in comment are no longer url encoded / have _'s - T109638 & T110338 - Reserved username now used when we can't determine a username for the change (we could perhaps set the user and id to be blank in the RC table, but who knows what this might do) - T109688 - History links are now disabled in RC.... (could be fine for the introduction and worked on more in the future) - Categorization changes are now always patrolled - Touching on T109672 in this change emails will never be sent regarding categorization changes. (this can of course be changed in a followup) - Added $wgRCWatchCategoryMembership defaulting to true for enabling / disabling the feature - T109700 - for cases when no revision was retrieved for a category change set the bot flag to true. This means all changes caused by parser functions & Lua will be marked as bot, as will changes that cant find their revision due to slave lag.. Bug: T9148 Bug: T109604 Bug: T109638 Bug: T109688 Bug: T109700 Bug: T110338 Bug: T110340 Change-Id: I51c2c1254de862f24a26ef9dbbf027c6c83e9063	2015-10-20 14:23:48 -07:00
Vivek Ghaisas	c54766586a	Fix issues identified by SpaceBeforeSingleLineComment sniff Change-Id: I048ccb1fa260e4b7152ca5f09b053defdd72d8f9	2015-09-26 23:06:52 +00:00
Aaron Schulz	c3d9666051	jobqueue: A few small code cleanups to RefreshLinksJob Change-Id: Ia331e9dbf9d2be137c34a8c93ef2d6da8aad6c56	2015-09-22 01:27:36 +00:00
CSteipp	e412ff5ecc	Revert "Enable users to watch category membership changes" This reverts commit `f6879ea16e`. Bug: T109638 Change-Id: I770d8d33a4cff3829bdea9a4df24de209cbe691b	2015-08-20 10:35:56 -07:00
Kai_WMDE	f6879ea16e	Enable users to watch category membership changes Bug: T9148 Change-Id: I5a89d8f19804b1120f4c755d834e2da6ca12ceae	2015-08-13 17:58:06 +02:00
Ori Livneh	b0a79e9245	Rename WikiPage::isParserCacheUsed to WikiPage::shouldCheckParserCache 'isParserCachedUsed' implies that the parser cache usage has already occurred, and obscures the true purpose of this method, which is to determine whether or not the requested page should be looked up in the parser cache. Only usage in extensions is in TextExtracts, which I changed to be both backward- and forward-compatible in If5d5da8eab13. Change-Id: I7de67937f0e57b1dffb466319192e4d400b867de	2015-06-22 20:55:34 -07:00
Ori Livneh	c099155a17	ellapsed => elapsed Also fix some files that don't end with a newline. Change-Id: Id0672d685b929a5832b42f733dad49683536180a	2015-06-23 03:32:33 +00:00
Aaron Schulz	9632223e4c	Fixed Job constructor IDE notices about variable types Change-Id: I4b4e4e38e8d416c3445c52ced311f5fbfcde868a	2015-05-30 08:09:30 +00:00
Aaron Schulz	187fd64723	Made triggerOpportunisticLinksUpdate() jobs make use of parser cache * On Wikipedia, for example, these jobs are good percent of all refreshLinks jobs; skipping the parse step should avoid runner CPU overhead * Also fixed bad TS_MW/TS_UNIX comparison * Moved the fudge factor to a constant and raised it a bit Bug: T98621 Change-Id: Id6d64972739df4b26847e4374f30ddcc7f93b54a	2015-05-11 22:26:44 +02:00
Aaron Schulz	df5ef8b5d7	Removed doCascadeProtectionUpdates method to avoid DB writes on page views * Use special prioritized refreshLinksJobs instead, which triggers when transcluded pages are changed * Also added a triggerOpportunisticLinksUpdate() method to handle dynamic transcludes bug: T89389 Change-Id: Iea952d4d2e660b7957eafb5f73fc87fab347dbe7	2015-02-22 13:36:13 -08:00
Ricordisamoa	12dec5d85d	Fix some stuttering in comments and documentation Change-Id: I9c0088b9aab37335203cad45a1d6fa8ac3f43321	2014-12-17 19:44:10 +00:00
Brad Jorsch	78aad9802d	Include parsed revision ID in parser cache One theory for what's behind bug 46014 is that the vandal submits the edit, then someone (maybe the vandal) gets into the branch of Article::view that uses PoolWorkArticleView, then ClueBot comes along and reverts before the PoolWorkArticleView actually executes. Once that PoolWorkArticleView actually does execute, it overwrites the parser cache entry from ClueBot's revert with the one from the old edit. To detect this sort of thing, let's include the revision id in the parser cache entry and consider it expired if that doesn't match. Which makes sense to do anyway. And for good measure, let's have PoolWorkArticleView not save to the parser cache if !$isCurrent. Bug: 46014 Change-Id: Ifcc4d2f67f3b77f990eb2fa45417a25bd6c7b790	2014-04-01 12:15:34 -04:00
Aaron Schulz	9ffd4f085d	Renamed /job to /jobqueue Change-Id: I4c8a2b42140630838867c77a70d45ba14b5d95e2	2014-03-14 13:42:04 -07:00

48 commits