Commit graph

1020 commits

Author SHA1 Message Date
daniel
7f1fa6f51f RefreshLinksJob: Check hastext before comparing HTML
Why:
- When comparing the newly generated HTML to the cached HTML, there
  might be cases when the new ParserOutput doesn't contain HTML.

What:
- If hasText() returns false, don't compare HTML and use the "unknown"
  value for the html_changed stats label.

Bug: T388406
Change-Id: Ibc3e79e79a6421d4780739104a949bac50a5b01f
(cherry picked from commit a275e02771bc2ed4243804d5294188f54e47f9fc)
2025-06-25 08:21:23 +00:00
Paladox
fe9818379d RefreshLinksJob: Don't retry job if "Revision x is not current" is returned
There's no point in retrying a job when you get "Revision x is not current".
It just causes log spam. Makes people think there's a problem when there isn't as it
logs in the error channel.

Bug: T379656
Change-Id: Iaa5bd006bf3f26277e81ad5bea1387ef4b925f68
2024-11-19 16:08:11 +00:00
Timo Tijhof
e0f6f8f527 objectcache: Move RedisConnRef.php to /libs/objectcache/
Change-Id: I4c6a349afcc4039bec27413af9511639f8c0c4b0
(cherry picked from commit 72338de32b249a7cff0b758746c0896fd649e53b)
2024-11-07 08:40:54 +00:00
jenkins-bot
a92f3be569 Merge "rdbms: clear replica snapshots in (begin|commit|rollback)PrimaryChanges()" 2024-10-22 15:25:44 +00:00
Umherirrender
1b29f07440 Use namespaced classes
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: I73fb416573f5af600e529d224b5beb5d2e3d27d3
2024-10-21 20:41:20 +02:00
jenkins-bot
746a0751d8 Merge "jobqueue: expand architecture documentation" 2024-10-18 14:40:58 +00:00
Máté Szabó
f414f8f91b jobqueue: expand architecture documentation
Why:

- The JobQueue docs currently do not describe the
  semantics of handling job execution failures.
- Some other features, such as job deduplication, enqueue semantics and
  the overall execution flow, are also missing from the architecture
  document.

What:

- Expand the JobQueue architecture document to describe how jobs that
  return false from Job::run() or complete exceptionally are treated.
- Rework some preexisting sections of the same document for brevity.
- Describe JobSpecification and JobQueueGroup in the same document.
- Describe the different between push() and lazyPush() semantics.
- Describe the execution flow of a Job.
- Briefly summarize how job deduplication is expected to function.

Change-Id: Ib0c1f165feabefe862710b28977b598faf637ec5
2024-10-17 22:14:07 +00:00
jenkins-bot
abc8da60be Merge "Use explicit nullable type on parameter arguments" 2024-10-16 23:10:14 +00:00
C. Scott Ananian
b41f95e3a3 Use MetricsInterface::setLabels() for parsercache_selective_* stats
Depends-On: Ifb51bca3b8762e97e349d2868e42789494f262cb
Change-Id: Ie915a2f5debf74c66c91ff256f3b1632bd078435
2024-10-16 17:01:20 -04:00
Umherirrender
e662614f95 Use explicit nullable type on parameter arguments
Implicitly marking parameter $... as nullable is deprecated in php8.4,
the explicit nullable type must be used instead

Created with autofix from Ide15839e98a6229c22584d1c1c88c690982e1d7a

Break one long line in SpecialPage.php

Bug: T376276
Change-Id: I807257b2ba1ab2744ab74d9572c9c3d3ac2a968e
2024-10-16 20:58:33 +02:00
Aaron Schulz
b69c1839d4 rdbms: clear replica snapshots in (begin|commit|rollback)PrimaryChanges()
Automatically perform the empty-transaction commit logic in methods
like beginPrimaryChanges() and commitPrimaryChanges(). This avoids
the proliferation of callers having to call the same set of methods
one after another for non-obvious reasons. It also discourages code
from making brittle assumptions that might fail for setups where
there is only a primary or the primary has non-zero read load.

Deprecate flushReplicaSnapshots() and remove callers.

Clarify the related method documentation.

Bug: T315664
Change-Id: I255afd22ffcaeac0fad2d4e4a2a0c55c99be7905
2024-10-09 20:34:25 +00:00
C. Scott Ananian
06d6d20b6e Parsoid selective update metrics: add labels for wiki id and content model
While investigating T376433 I realized we were also generating metrics
for wikidatawiki for the various content models implemented by
Wikibase.  Add the wiki ID and main content model to the metrics
labels so we can distinguish or filter these out.

While we're at it, fix the case of the `parsercache_` metrics prefix to match
the other `ParserCache_` metrics already being recorded.

Bug: T371713
Depends-On: I11386e307caaa9fce34870b08bd4dce4c5e6eb25
Change-Id: Iaf9d8cac1fe008f1441c46e5bc70e7d060358b27
2024-10-08 18:01:40 -04:00
Máté Szabó
8661130e02 jobqueue: Mark JobSpecification as newable
Why:

- JobSpecification is the preferred way to enqueue jobs without
  instantiating a full Job subclass instance, and the only way to
  enqueue a job in the context of a foreign wiki.
- As such, it's used by several extensions. However, it's not explicitly
  marked as newable.

What:

- Mark JobSpecification as newable.

Change-Id: I3dca96857c875da1ee6f0f6054a12aa6ec276697
2024-10-04 23:05:59 +02:00
jenkins-bot
c35a006fe6 Merge "Use import actor store where needed in RC categorisation" 2024-10-03 15:30:13 +00:00
jenkins-bot
37f7b7af7c Merge "RefreshLinksJob: Fix exception due to null/false confusion (take 2)" 2024-10-03 14:54:42 +00:00
C. Scott Ananian
05435ec623 RefreshLinksJob: Fix exception due to null/false confusion (take 2)
Another instance of the null/false confusion in the return value to
ParserCache::getDirty() snuck in.

Follows-Up: I497f6956cd4f5b22f13a97c01029f3201e56e7c0
Follows-Up: I208aeac1b315a96bdb9669427cd03de461b914b4
Change-Id: I42bbd370c4eba46de40261511cf49d7c462f5bfe
2024-10-02 20:01:41 -04:00
Dreamy Jazz
649b0f4954 Use import actor store where needed in RC categorisation
Why:
- When temporary users are enabled, creating IP actors is disallowed
  apart from specific cases, such as importing revisions authored by
  anonymous users.
- If such a revision includes a category link and
  wgRCWatchCategoryMembership is true, MediaWiki will fire a job to
  create a corresponding RC entry, which will attempt to attribute the RC
  to the anonymous IP that authored the imported revision and fail in
  doing so.

What:
- Track whether a category membership change job was triggered by an
  import, and allow RecentChange objects created by such jobs to create
  anonymous actors.

Test Plan:
 1. On a wiki with temporary accounts enabled and wgRCWatchCategoryMembership = true,
    import a revision via Special:Import that was authored by an anonymous user
    and contains a category link.
 2. Verify that the import succeeds and that the corresponding RC entry
    shows up.

Bug: T373318
Change-Id: I89abdca9c4ab8796a211df8b37c1bd7173a496e5
2024-10-02 00:45:22 +02:00
James D. Forrester
cc28acc455 Add namespace to remaining parts of Wikimedia\Mime and Wikimedia\Stats
Bug: T353458
Change-Id: If0137003ab625017d322d57870448a02569668c3
2024-09-27 16:19:10 -04:00
James D. Forrester
9e5c1e8ac7 Add namespace to IDBAccessObject and DBAccessObjectUtils
Bug: T353458
Change-Id: I23cf7991f8792d4d000d1780463d8ce76dc0aee0
2024-09-27 16:19:10 -04:00
James D. Forrester
53b67ae0a6 Add namespace to remaining parts of Wikimedia\ObjectCache
Bug: T353458
Change-Id: I3b736346550953e3b2977c14dc3eb10edc07cf97
2024-09-27 16:19:10 -04:00
James D. Forrester
2144fef6d1 Add namespace to Wikimedia\Redis libs
Bug: T353458
Change-Id: I7a874e1ee1d41a75e34b8a6b6f4d065b5b812c43
2024-09-27 16:19:10 -04:00
jenkins-bot
7de7ccd327 Merge "RefreshLinksJob: Minor refactor for html_changed metric label" 2024-09-26 22:09:08 +00:00
C. Scott Ananian
9b260caeb5 stats: collect timing information for parsercache_selective_* sample
Collect parse time statistics as a counter in order to determine both
the number of opportunities for selective update as well as the
proportion of cpu time spent on parses where selective update is
feasible.

Bug: T371713
Change-Id: I5b8c7ab48d5a1d6c1e311149fcac6abdc523aa13
2024-09-19 15:01:09 -04:00
C. Scott Ananian
65ecdc0eea Fix names of parsercache_selective_* stats
Rename to use a unit type as a suffix to match the guidance in
 https://www.mediawiki.org/wiki/Manual:Stats#Metrics

Change-Id: Ied4c1c3a1ab7fa6148d10a7fc89094c46f568453
2024-09-19 14:41:19 -04:00
Adam Wight
188d2cbbb0 Remove unchecked exception annotations
Callers should not catch an unchecked exception, so it doesn't belong
in a function signature.  Unchecked exceptions indicate a coding error,
which by definition the code will not be able to handle correctly.

If any of these exceptions were supposed to be in response to an edge
case, user input, or initial conditions, then they should be changed
to a runtime error.  If the exception class cannot be changed, then
the annotation should include a comment explaining its purpose and
prognosis.

Bug: T240672
Change-Id: I2e640b9737cb68090a8e1cb70067d1b74037d647
2024-09-17 22:20:58 +02:00
C. Scott Ananian
92ca7f68a4 Randomly sample statistics for Parsoid Selective Update
Controlled by $wgParsoidSelectiveUpdateSampleRate (which defaults to off)
randomly sample 1 in N parses to collect statistics to inform the design
of Parsoid selective update:

* For both legacy parses and Parsoid, count how many times a previous
  parse is in the cache when a new parse is requested.  This needs to
  sample the legacy parser as well as Parsoid because Parsoid is not
  yet invoked from the RefreshLinksJob.  We also count the relative
  number of parses from the different
  RevisionRenderer::getRenderedRevision() call sites to determine
  which pathways might account for the most opportunities for
  optimized selective update.

* For sampled parses using the Parsoid parser where a previous parse
  result is available, also fetch the previous wikitext source from the
  database.

Bug: T371713
Change-Id: I208aeac1b315a96bdb9669427cd03de461b914b4
2024-09-13 19:29:18 -04:00
James D. Forrester
2b11d61577 Migrate all uses of deprecated URL global functions to use wfGetUrlUtils()
wfGetUrlUtils() is also deprecated, but less so, so we can do this first
and then properly replace the individual uses with dependency injection
in local pieces of work.

Also:
* Switching Parser::getExternalLinkRel to UrlUtils::matchesDomainList
  exposed a type error in media.txt where $wgNoFollowDomainExceptions
  was set to a string (which is invalid) instead of an array.

Bug: T319340
Change-Id: Icb512d7241954ee155b64c57f3782b86acfd9a4c
2024-09-10 16:50:02 -07:00
Timo Tijhof
0ecce2dc49 RefreshLinksJob: Minor refactor for html_changed metric label
Follows-up I18e4236b673396e, to ensure that:

1. The metric labels remain together for easier understanding
   and discovery, and reduces changes of calling it too early,
   incompletely, too often, or not at all in some cases.

2. Avoids situations where the metric would become invalid if the
   label only exists sometimes. Prometheus generally requires
   consistent labelling over time. By setting it once we gain
   the benefit of Phan static analysis noticing if the variable
   is unset, instead of just not setting one of the labels in some
   branches.

Change-Id: I9d27e5a7220e565fcc23f4f19a5824be57676552
2024-09-07 12:37:58 -07:00
thiemowmde
dca4931b42 Make use of the ??= and ?? operators where it makes sense
This touches various production classes and maintenance scripts.
The code should do the exact same as before. The main benefit is that
the syntax avoids any repetition.

Change-Id: I5c552125469f4d7fb5b0fe494d198951b05eb35f
2024-08-26 09:26:36 +02:00
Bartosz Dziewoński
006e22a514 RefreshLinksJob: Fix exception due to null/false confusion
Follow-up to a0503debb0.

Change-Id: I497f6956cd4f5b22f13a97c01029f3201e56e7c0
2024-08-24 20:26:00 +02:00
C. Scott Ananian
a0503debb0 Provide previous parse results to parser when rendering
This patch lays the groundwork for incremental/selective parsing in
Parsoid by ensuring that we can pass previous cached parses through
the parse pipeline to Parsoid.  We do this by adding a new render
hint type, `previous-output`, and ensuring it is passed along.

Because revisions can contain a ParserOutput which is the combination
of separate ParserOutput objects for each of their slots, RenderedRevision
also contains a method to unsplit the combined ParserOutput to reconstruct
an original ParserOutput for use in incremental parsing.  Currently this
is mostly a stub, but illustrates how slot combination and splitting can
work, assuming those transformations are reversible.

Extra calls to ParserCache::getDirty() are added to some code paths
in order to ensure that any previously-cached ParserOutput is available
for selective update.  In order to mitigate any performance concerns,
these are only done for the Parsoid parser at the moment.  Future
patches will add additional metrics to quantify the cost/benefit ratio
of the additional cache lookups on these paths.

Bug: T363421
Bug: T371713
Change-Id: I440884f1d7e09c1ff9806f848b7b53a636367690
2024-08-23 17:41:55 -04:00
Aaron Schulz
c924df89e2 jobqueue: remove JobQueueDB::getScopedNoTrxFlag() method
We already use CONN_TRX_AUTOCOMMIT for mysql/postgres and
a separate DB file (e.g. "server" config) for sqlite (via
the installer). For any legacy case where the main DB is
still used with sqlite, the removed method was unlikely to
be very effective.

Ignore DBO_TRX/DBO_DEFAULT in JobQueueDB "server" config
for good measure, similar to SqlBagOStuff.

This removes more direct uses of IDatabase::setFlags(),
a method which does not play well with DBConnRef.

Bug: T311090
Change-Id: Ia8457dea2ed30539e23345f89cb6b382be442975
2024-08-12 17:16:17 +00:00
daniel
21482f88c6 RefreshLinksJob: fix missing metrics label
Set the html_changed label on all code paths that emit the
refreshlinks_parsercache_operations_total metric.

Change-Id: I219547d0947a81f2a78c93a46c376eeaf82d6fa3
2024-08-05 18:10:25 +02:00
Bartosz Dziewoński
df4cbf5ac6 Replace gettype() with get_debug_type() in debug/log/test output
get_debug_type() does the same thing but better (spelling type names
in the same way as in type declarations, and including names of
object classes and resource types). It was added in PHP 8, but the
symfony/polyfill-php80 package provides it while we still support 7.4.

Also remove uses of get_class() and get_resource_type() where the new
method already provides the same information.

For reference:
https://www.php.net/manual/en/function.get-debug-type.php
https://www.php.net/manual/en/function.gettype.php

In this commit I'm only changing code where it looks like the result
is used only for some king of debug, log, or test output. This
probably won't break anything important, but I'm not sure whether
anything might depend on the exact values.

Change-Id: I7c1f0a8f669228643e86f8e511c0e26a2edb2948
2024-07-31 19:33:57 +02:00
Bartosz Dziewoński
c045fa0291 Replace gettype() with get_debug_type() in exception messages
get_debug_type() does the same thing but better (spelling type names
in the same way as in type declarations, and including names of
object classes and resource types). It was added in PHP 8, but the
symfony/polyfill-php80 package provides it while we still support 7.4.

Also remove uses of get_class() and get_resource_type() where the new
method already provides the same information.

For reference:
https://www.php.net/manual/en/function.get-debug-type.php
https://www.php.net/manual/en/function.gettype.php

To keep this safe and simple to review, I'm only changing cases where
the type is immediately used in an exception message.

Change-Id: I325efcddcb58be63b1592b9c20ac0845393c15e2
2024-07-31 19:24:39 +02:00
Umherirrender
9a107e6b03 Use expression builder instead of raw sql
Bug: T361023
Change-Id: Ibf1c93ddbf8f680e8fb9442816f6fed94a069c0a
2024-07-23 23:30:45 +02:00
jenkins-bot
f27c23e1b2 Merge "RefreshLinksJob: collect stats on redundant parses" 2024-07-23 14:15:14 +00:00
C. Scott Ananian
3e41554b30 Remove unnecessary try/catch in ParsoidCachePrewarmJob
The two exceptions caught here should never be thrown in actual practice,
and indicate programming errors if they are thrown: the first would only
occur for an invalid language variant code or invalid DOM name characters,
and the second would only be thrown if the wikitext size limit is exceeded,
which should never happen with articles retrieved from the database.

(Parsoid's handling of overlarge articles also differs from legacy,
and it should probably match legacy and just truncate to the size
limit rather than throw an exception, but that's a separate issue.)

In any case, there's not much gained here by catching and rethrowing
rather than just letting the original Parsoid exception, were it to
ever occur, bubble up and be logged directly as a uncaught exception.

Followup-To: I96161a64952e1809c0aec773d5a3dd4c71105657
Change-Id: Ibac2d01306d0637097c0cc3363e63f238bb649fe
2024-07-19 16:09:30 -04:00
daniel
a2c6e8ba89 RefreshLinksJob: collect stats on redundant parses
Collect statistics on how often we re-parse pages in refreshLinksJob but
end up generating the same HTML we already had. This typically happens
when pages get re-rendered because a template was changed, but that
change doesn't actually affect the page in question.

Bug: T369898
Change-Id: I18e4236b673396e28d638372ce0f1e24a07dca16
2024-07-19 09:12:45 +00:00
Ebrahim Byagowi
fab78547ad Add namespace to the root classes of ObjectCache
And deprecated aliases for the the no namespaced classes.

ReplicatedBagOStuff that already is deprecated isn't moved.

Bug: T353458
Change-Id: Ie01962517e5b53e59b9721e9996d4f1ea95abb51
2024-07-10 00:14:54 +03:30
jenkins-bot
2f17ad487c Merge "Use namespaced classes" 2024-07-05 15:12:37 +00:00
jenkins-bot
be3e67484a Merge "BacklinkJobUtils: don't miss pages falling in the gaps between batches" 2024-07-05 11:13:33 +00:00
Umherirrender
e66f66d875 Use namespaced classes
Change-Id: Ie08a616eb07c8da50e971a5fc3f6207c34c3f342
2024-07-05 00:16:44 +02:00
jenkins-bot
68cebdfbb0 Merge "[ParsoidCachePrewarmJob] Use ParserOutputAccess" 2024-06-28 11:44:24 +00:00
Tim Starling
89dcc914b3 BacklinkJobUtils: don't miss pages falling in the gaps between batches
BacklinkCache::partition() is persistently cached with an expiry time
of one hour, with no invalidation. If a page starts using a template
after the partition cache is saved, and its page_id happens to fall in
a gap between batch ranges, a subsequent update to the template would
miss the page. Its page_touched would be permanently incorrect.

So, when loading the page IDs in a batch, use the start of the next
batch as the endpoint, rather than the highest page ID at the time of
the cache update.

Change-Id: I4459fc1c4242cd59505f0e05bf5c20a0b96cab33
2024-06-28 13:16:30 +10:00
C. Scott Ananian
105bb58ae2 [ParsoidCachePrewarmJob] Use ParserOutputAccess
One more step in gradually replacing uses of ParsoidOutputAccess.  This
one was pretty easy, as ParsoidOutputAccess was pretty much directly
calling ParserOutputAccess when provided with a ExistingPageRecord
and RevisionRecord.

Bug: T367074
Change-Id: I96161a64952e1809c0aec773d5a3dd4c71105657
2024-06-17 13:24:39 +00:00
Umherirrender
472891385d Use namespaced classes (2)
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: Id9f3e775e143d1a17b6b96812a8230cfba14d9d3
2024-06-16 20:23:55 +02:00
Wandji69
1665ea876f User objectCacheFactory methods not deprecated ObjectCache methods
Bug: T363770
Change-Id: I2335b315bec6a540409492df4891c518640966d5
2024-06-06 09:59:24 +01:00
Amir Sarabadani
f33b5515b5 rdbms: Remove ILoadBalancer::getWriterIndex()
It doesn't need to have its own method, We can just use the constant
instead.

Bug: T363839
Change-Id: Iaec5a8e88dc3e5ae4eaf1f24aebf4c5d73f4b350
2024-06-03 14:17:57 -07:00
jenkins-bot
cdd4f95f19 Merge "Document needsPage:false for GenericParameterJob" 2024-06-02 20:46:09 +00:00