and check for this in WikiPage::doEditUpdates before
inserting a new CategoryMembershipChangeJob.
Some content models like the Wikibase ones do not
have categories and it's wasteful to add these jobs
for all Wikibase edits.
Bug: T126977
Change-Id: I2c54a4ba1546445dc41101e15cb83a2c6cc2b1c9
- Add 'tags' parameters to appropriate API modules
- Add tag-adding logic to appropriate functions that carry out
relevant functions
- ManualLogEntry::{set,get}Tags to handle adding tags to log
entries in a cleaner fashion
- Use ManualLogEntry::setTags in LocalFile::recordUpload2
Bug: T97720
Change-Id: I98c52da7985623bfdafda2dc2dae937b39b72419
Change tags to apply to an edit can now be passed directly to the
WikiPage::doEditContent function. They are then passed to the
RecentChange::notifyNew/Edit functions where tagging is made
after the recent change is saved. This ensures that other callers
of doEditContent will not run into the same issue as T100248.
ApiRollback is fixed in this way.
In addition, we'll have to pass tags in this way for core tagging
of edits (I2e48bd458fc8d7c289f04dc276f9287516e0b987), and this makes
it possible to merge the arrays of tags and call ChangeTags::addTags
only once.
Change-Id: I829960c7a33b70464065839d7504d7529dfd0b72
A few semantic changes result from this:
* If multiple pages are edited in a request, the updates happen
in the same order relative to each other, but all in one second
step instead of after each page edit.
* If the same page is edited twice in a request, the WikiPage hook
argument will reflect the last request edit, not always the edit
that fired the hook.
Bug: T120718
Change-Id: I9429f29e5a90f24e4d7af5797a80e63a9cc34146
Previously various language objects would install a hook to update the
shared conversion table cache when the object was constructed. This is
not a good idea since language objects may be constructed even when they
are not the content language, but only the content language is
associated with variant conversion and the conversion cache.
Instead, have WikiPage call a method on $wgContLang directly. I put this
with message cache update since the logic is almost identical.
Change-Id: Ief9c0ef993e39645e74a6e158cb4e6e2139ce91d
A few semantic changes result from this:
* If multiple pages are created in a request, the updates happen
in the same order relative to each other, but all in one second
step instead of after each page edit.
* If an extension set some extra Status info or errors via the
PageContentInsertComplete hook, they will not be seen by the
caller (unless it was a CLI script possibly). Few callers use
$status at all, and I did not see any that mutated it. Since
the page is already committed when this hooks run (as has always
been) they cannot veto edits and callers do not care or know what
to do with random hook-set status errors; there was never much use
in changing the Status anyway.
Bug: T120718
Change-Id: Ieba35056be31b2f648c57f59d19d3cbbe58f1b05
This makes it easier to migrate the methods to using atomic
sections without having the PageContentInsertComplete hook
change to ending up in separate transaction than
ArticleSaveComplete.
Bug: T120718
Change-Id: I492514413ec9c37c2f9343bb207798fc8e24a5a9
* Make the method sizes a bit more manageable.
This will be useful for replacing the begin/commit
calls later (with startAtomic/endAtomic).
* Cleaned up a few inconsistencies in code style.
Change-Id: I8d66503a5575ca369cd5feb56058af7d24001629
* Using addUpdate() makes sure purges are coalesced and
de-duplicated.
* Also removed incosistent $wgUseSquid checks. If CDN caching
is not used, then $wgSquidServers will just be empty anyway.
Bug: T119016
Change-Id: I8b448366f037f668385d252f9d68289b71d1a707
* Recursive link updates no longer mention an category changes.
It's hard to avoid either duplicate mentioning of changes or
confusing explicit and automatic category changes.
* LinksUpdate no longer handles this logic, but rather WikiPage
decides to spawn this update when needed in doEditUpdates().
* Fix race conditions with calculating category deltas. Do not
rely on the link tables for the read used to determine these
writes, as they may be out-of-date due to slave lag. Using the
master would still not be good enough since that would assume
FIFO and serialized job execution, which is not garaunteed.
Use the parser output of the relevant revisions to determine
the RC rows. If 3 users quickly edit a page's categories, the
old way could misattribute who actually changed what.
* Make sure RC rows are inserted in an order that matches that
of the corresponding revisions.
* Better avoid mentioning time-based (parser functions) category
changes so they don't get attributed to the next editor.
* Also wait for slaves between RC row insertions if there where
many category changes (it theory it could well over 10K rows).
* Using a separate job better separates concerns as LinksUpdate
should not have to care about recent changes updates.
* Added more docs to $wgRCWatchCategoryMembership.
Bug: T95501
Change-Id: I5863e7d7483a4fd1fa633597af66a0088ace4c68
This avoids contention slams and synchronous master DB writes
on HTTP GET requests.
Bug: T119742
Bug: T92357
Change-Id: I7b3ebac0d6a11542c47ddf3219911be54380c537
* Use READ_LATEST when needed to distinguish slave lag
affecting new pages from page deletions that happened
after the job was pushed. Run-of-the-mill mass backlink
updates still typically use "masterPos" and READ_NORMAL.
* Search for the expected revision (via READ_LATEST)
for jobs triggered by direct page edits. This avoids lag
problems for edits to existing pages.
* Added a CAS-style check to avoid letting jobs clobber
the work of other jobs that saw a newer page version.
* Rename and expose WikiPage::lock() method.
* Split out position wait logic to a separate protected
method and made sure it only got called once instead of
per-title (which didn't do anything). Note that there is
normally 1 title per job in any case.
* Add FIXME about a related race-conditions.
Bug: T117332
Change-Id: Ib3fa0fc77040646b9a4e5e4b3dc9ae3c51ac29b3
* Use the CAS style page row checks instead
* This a first step towards switching to startAtomic/endAtomic
* Only a sanity check uses rollback() now, which should never
be hit unless there is a serious DB layer bug
Change-Id: Ideb189f918dee5d3e3c7b91cb92179df514ef35a
Also consistently use self:: instead of BagOStuff:: for constants
referenced within the BagOStuff class.
Change-Id: I20fde9fa5cddcc9e92fa6a02b05dc7effa846742
So extensions like Echo are able to attribute post-edit link updates to
specific the users who triggered them.
Bug: T116485
Change-Id: I083736a174b6bc15e3ce60b2b107c697d0ac13da
* They no longer commit the update, but just make sure
it is part of a transaction. The BEGIN/COMMIT will
happen at request start/end given DBO_TRX or in this
method otherwise (like when in CLI mode). This avoids
premature committing of other transactions.
* FileDeleteForm was the only caller using $commit=false
for WikiPage::doDeleteArticleReal() and its proxies
WikiPage::doDeleteArticle() and Article::doDeleteArticle().
The ugly $commit flag is now removed.
* No caller was using $id for WikiPage::doDeleteArticleReal()
and its proxies WikiPage::doDeleteArticle() and
Article::doDeleteArticle(). It is now removed and we can
be sure the lock() and CAS logic always hit in the method.
The rollback() calls are not needed given the page row lock
and having them there could break outer transactions.
* Updated FileDeleteForm to use startAtomic()/endAtomic() so
the article and file delete are still wrapped in a
transaction. Note that LocalFile::delete() does reference
counting and trxLevel() checks so it will not try to begin()
and break FileDeleteForm's transaction via startAtomic().
* Moved less important 'page-recent-delete' key update down
for sanity in case it blows up.
Change-Id: Idb98510506c0edd02236c30badaec97d86e07696
* Focus on updating links that would *not* already be updated
by jobs, not those that already *will* be updated.
* Place the jobs into a dedicated queue so they don't wait
behind jobs that actually have to parse every time. This
helps avoid queue buildup.
* Make Job::factory() set the command field to match the value
it had when enqueued. This makes it easier to have the same
job class used for multiple queues.
* Given the above, remove the RefreshLinksJob 'prioritize' flag.
This worked by overriding getType() so that the job went to a
different queue. This required both the special type *and* the
flag to be set if using JobSpecification or either ack() would
route to the wrong queue and fail or the job would go in the
regular queue. This was too messy and error prone. Cirrus jobs
using the same pattern also had ack() failures for example.
Change-Id: I5941cb62cdafde203fdee7e106894322ba87b48a
To better indicate that these are only triggered by page views.
We don't currently have any slow-parse logging for the parser
invocation that happens during save (which means we're potentially
missing lots of them).
Once we add that, this will help distinguish them.
Bug: T110760
Change-Id: I22be5684ef93efd410d683637e223f770d6c768c
This is part of a chain that reverts:
e412ff5ecc.
NOTE:
- The feature is disabled by default
- User settings default to hiding changes
- T109707 Touching a file on wikisource adds and
removes it from a category... Even when page
has no changes.... WTF? See linked issue,
marked as stalled with a possible way forward
for this patch.
@see https://gerrit.wikimedia.org/r/#/c/235467/
Changes since version 1:
- T109604 - Page names in comment are no longer
url encoded / have _'s
- T109638 & T110338 - Reserved username now used
when we can't determine a username for the change
(we could perhaps set the user and id to be blank
in the RC table, but who knows what this might do)
- T109688 - History links are now disabled in RC....
(could be fine for the introduction and worked
on more in the future)
- Categorization changes are now always patrolled
- Touching on T109672 in this change emails will never
be sent regarding categorization changes. (this
can of course be changed in a followup)
- Added $wgRCWatchCategoryMembership defaulting to true
for enabling / disabling the feature
- T109700 - for cases when no revision was retrieved
for a category change set the bot flag to true.
This means all changes caused by parser functions
& Lua will be marked as bot, as will changes that
cant find their revision due to slave lag..
Bug: T9148
Bug: T109604
Bug: T109638
Bug: T109688
Bug: T109700
Bug: T110338
Bug: T110340
Change-Id: I51c2c1254de862f24a26ef9dbbf027c6c83e9063
This makes the jobs *actually* end up in the
'refreshLinksPrioritized' queue rather than 'refreshLinks'.
These jobs can run fast in a higher priority queue since the
output is already cached (much of the point of 'prioritize').
They won't if they get stuck behind regular 'refreshLinks'.
This also avoids the indirection of an intermediate job.
The use of lazyPush() is already enough to prevent the user
from experiencing cross-DC RTT overhead.
Change-Id: I5d0440588db09c299cd70191e5624ffc7ebf04c0
Currently, INSERT...ON DUPLICATE KEY UPDATE is used to update the page
counts in the category table. However, MySQL 5.1.22 and newer, by default,
increment the counter for cat_id before checking for duplicate key errors.
This creates many gaps in the cat_id sequence.
To avoid this, check for existing category rows, and instead UPDATE any
that were found. It is hoped that the extra queries will not significantly
harm performance.
Change-Id: Ic2ab9ff14f04a0c7ea90a5b6756cade0c78e2885