Add a multi-primary mode option that supports MySQL DB setups
that use circular replication with STATEMENT formatted binlogs.
The `modtoken` column is only used when multi-primary mode is
explicitly enabled in configuration. The column is used by write
queries to determine the "winning" version of keys, with the goal
of approximating "Last-Write-Wins" eventual consistency.
Writes with different timestamps can be handled by picking the
one with the highest timestamp as the "winner". Writes with the
same timestamp, from different primary DBs, can be handled by
picking the one from the primary DB with the highest server_id.
Writes with the same token timestamp from the same primary DB can
be handled by picking the last write to appear in the binlog.
The delete() operation uses tombstones in multi-primary mode,
since there must be a key version to actually compare with the
versions from other operations.
Also:
* Remove "LOCK IN SHARE MODE" that was made obsolete by the
CONN_TRX_AUTOCOMMIT flag. For the SQLite transaction case,
it is serializable anyway.
* Simplified handleWriteError() to match handleReadError()
and merged them into handleDBError().
Bug: T274174
Change-Id: Icc5eff9a032dd3403b5718058f20e38f8ea84af5
Avoid the duplication of the reentrant lock logic in subclasses.
Move "lock already expired" logic from getScopedLock() to unlock()
so that it applies more generally. Warn when unlock() is called on
a key that is not even locked as well.
Also:
* Rename lock() $expiry argument to $exptime for consistency
* Fix return types for serialize()/unserialize()
Bug: T274174
Change-Id: I211536e616cf7f1cc60181c378bbf9b35ffa40a4
Only used in one plac (getServerShardIndexes), where a simpler solution
can (and arguably should) be taken instead, based on the indexes
of the array rather than assuming/re-creating them on the spot.
Change-Id: I807f53fd4f47237e9192bb1ad004b901ecbaa5d6
Follows-up 0545fdc3af, which added `--tag` to purgeParserCache.php
to purge only one of the server shards (by its tag).
However, the percentage printing was still based on the total count
instead of on the count of the local $shardIndexes array, thus the
progress reported was artifically slowed down to a low percentage (
e.g. 33.3% if there are three shards), and then it stops, with 100.0%
added to the end by the CLI script:
```
... 0.0%
[…]
... 33.3%
... 100.0%
```
Fix this to only count the servers being iterated.
Bug: T282761
Change-Id: Id87868ceb9b7980442d57551e728d9f0d2784e01
This is preparation for Id87868ceb9b7, as otherwise this internal
signature is getting increasingly messy and awkward to use for the
occasionallyGarbageCollect() callsite where would effectively have
to pass non-sensical values that are never used just to satisfy the
contract.
Change-Id: Ic8d925110a0826811c7e8d3d7bc5077d03f38085
Large wiki farms may have to purge servers concurrently (instead of
one at a time) in order to keep up with new writes and delete expired
rows faster than new rows are written.
The parameter for this uses server tags for three reasons:
* Maintenance risk and complexity.
This requires the least information about MW configuration to be
hardcoded in the scheduled maintenance cronjob (compared to: server
indexes which are a runtime concept, or specific hostnames/IP/
tableprefixes which may change and should not require
coordinating changes elsewhere).
* Operational convenience.
By using server tags, the parameters don't have to vary between
data centers.
* Code complexity.
The current code for obtaining connections is based on server indexes,
which are easy to mapped at runtime from server tags. Other ways
of identifying shard like hostnames are either an awkward fit (as
they don't uniquely identify a shard per-se, with multiple instances
on the same hardware at WMF), or require SqlBagOStuff to store and
maintain more information about connections than it currently has
readily available.
Bug: T282761
Change-Id: I618bc1e8ca3008a4dd54778ac24aa5948f27c52e
As part of working on I, I noticed that, despite there being only
one shard on my localhost (getServerShardIndexes: ["local"]), only one
table index ([0]), and no data in them at all, purgeParserCache.php was
still, incorrectly, reporting the (only) progress callback from SqlBag
as being at 50%, with the final 100% only being printed due to the
CLI script forcing it so at the end - if not already done by SqlBag.
The following diff helped me understand this:
```
@@ -813,6 +814,10 @@ class SqlBagOStuff extends MediumSpecificBagOStuff {
+ echo json_encode( [
+ "tablesDoneRatio" => $tablesDoneRatio,
+ "numServerShards" => $this->numServerShards,
+ ], JSON_PRETTY_PRINT );
call_user_func( $progressCallback, $overallRatio * 100 );
```
This should have printed 1 and 1, but printed 1 and 2. Thus the reason
it was only able to go upto 50% was because it wrongly thought there
was a second shard.
Change-Id: I9500a40d5c2a18750f043e33d825d748b79ae202
Also added token and flags fields. The token field can
be used as a tie-breaker for modtime and also for faster
cas() operations. The flags field makes serialization and
compression format changes easier in the future.
Bug: T274174
Change-Id: I45731a877b21835652993c2d285165a76eeae3e9
Add a helper method for the common use case of temporarily silencing
transaction profiler warnings.
Change-Id: I40de4daf8756da693de969e5526b471b624b2cee
* There was a carriage return, as part of a self-overwriting progress
bar. While nice for humans running the script by hand, this made
the script difficult to understand in WMF production where the logs
go to syslog/journalctl, which doesn't like \r and thus showed only
`[2KB blob]` instead of the actual output. This can bypassed with
`journalctl -u … --all`, but that still leaves one with a sea of
misaligned progress bars, concatenated as one long "line".
It seems most of our maintenance scripts nowadays just keep printing
lines to notify of progress made, so change the script to that,
by just printing a boring "... % done\n" line (at most) every 0.1%.
* Change hardcoded batch size of 100 to use $wgUpdateRowsPerQuery.
This is generally a no-op as this configuration defaults to 100 in
DefaultSettings.php, and is not overridden in WMF production.
* Add a "--dry-run" option to allow for last-minute verification of
the age and date calculation which is rather non-trivial given that
we don't index creation time and assume $wgParserCacheExpireTime
is constant and unchanged, which in practice, when this script is
considered for running, tends not to be the case, making it hard to
be sure exactly what it will do.
Minor stuff:
* Make the ORDER BY clause explicitly use ascending order.
* Try to retroactively document why we use a --msleep parameter
instead of waiting for replication.
* Revert some of the variable name changes of I723e6377c26750ff9
which imho made the code harder to understand.
* Flip order of $overallRatio additions to match the order of
$tablesDoneRatio (e.g. high level + current thing).
* Remove use of Language->timeanddate() in favour of native
date formatting, mainly to include seconds and timezone,
but also because its simpler not to bring in all of Language
and LocalisationCache into the script.
* Pass unix time directly to SqlBagOStuff::deleteObjectsExpiringBefore,
as other callers do, instead of TS_MW-formatting first. There is
also additional casting to TS_UNIX within the method.
Before:
> Deleting objects expiring before 21:43, 7 May 2021
> [**************************************************] 100.00%
> Done
After:
> Deleting objects expiring before Fri, 07 May 2021 21:43:52 GMT
> ... 50.0% done
> ... 100.0% done
> Done
Bug: T280605
Bug: T282761
Change-Id: I563886be0b3aeb187c274c0de4549374e0ff18f0
Make public methods and methods defining abstract parent methods first.
This also matches the order of BagOStuff.
Change-Id: If609f5c88fefa9afaa1027555cb42eec7361e223
Always use "99991231235959" instead of using either 0x7fffffff
or (1 << 62), depending on the current timestamp. The bit shifted
case always returned false, and thus did not work.
Refactor DB timestamp logic into encodeDbExpiry()/decodeDbExpiry().
Change-Id: Ic78bca1fa0234468c30e17ec603565636348b0de
Remove WRITE_SYNC flag from ChronologyProtector since the current
plan is to simply use a datacenter-local storage cluster.
Move the touched timestamps into the same stash key that holds the
replication positions. Update the ChronologyProtector::getTouched()
comments.
Also:
* Use $wgMainCacheType as a $wgChronologyProtectorStash fallback
since the main stash will be 'db-replicated' for most sites.
* Remove HashBagOStuff default for position store since that can
result in timeouts waiting on a write position index to appear
since the data does not actually persist accress requests.
* Rename ChronologyProtector::saveSessionReplicationPosition()
since it does not actually save replication positions to storage.
* Make ChronologyProtector::getTouched() check the "enabled" field.
* Allow mocking the current time in ChronologyProtector.
* Mark some internal methods with @internal.
* Migrate various comments from $wgMainStash to BagOStuff.
* Update some other ObjectCache related comments.
Bug: T254634
Change-Id: I0456f5d40a558122a1b50baf4ab400c5cf0b623d
Make the method accept either an integer-indexed list of keys or a map
of keys to the send/receive payload size information tuple.
Make sure that no BagOStuff subclasses are passing in $keys directly
from the call to the public BagOStuff methods; these allow keys in the
form of a list or a map of arbitrary strings to keys.
Change-Id: I9687d25a4dd1c7b4b304f9fd543cc0a26a595962
This is micro-optimization of closure code to avoid binding the closure
to $this where it is not needed.
Created by I25a17fb22b6b669e817317a0f45051ae9c608208
Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653
Update SQL, REST, and redis subclasses to emit call count and
payload size metrics for cache key operations. These metrics
are bucketed by cache key collection (similar to WANCache).
Bug: T235705
Change-Id: Icaa3fa1ae9c8b0f664c26ce70b7e1c4fc5f92767
For Postgres:
- Drop Unique constraint on `keyname` and make primary key
- Change type of `value` from BYTEA to TEXT and drop its default
- Make `value` nullable to sync with MySQL/SQLite
MySQL:
- Change exptime from DATETIME to TIMESTAMP
MySQL/SQLite:
- Make 'exptime' not nullable
Bug: T230428
Bug: T164898
Change-Id: Iab9de8a1bb2cb01b6e3e69e66f1bbe089d53d0a7
This saves a few bytes in the response size and make it easy
for memcached proxies to distinguish key fetches that are part
of check-and-set cycles from those that are not.
MediumSpecificBagOStuff now requires PASS_BY_REF to fetch CAS
tokens. BagOStuff::merge() and WinCacheBagOStuff::doCas() are
the only callers that need this mode.
Bug: T257003
Change-Id: If91963f58adc4cda94f6d634ee0252a479a0fc5e
Clean up the recursive DB dependency mitigation logic by having
ServiceContainer detect recursion and throw an appropriate error.
Catch the error and use EmptyBagOStuff in such cases. This works
better than checking getQoS() since that begs the question by
requiring the cache instance to begin with.
Also add support for using different LoadBalancer instances for
local and global keys in SqlBagOStuff. This makes it easier to
share keys between projects.
Bug: T229062
Change-Id: Ib8ec1845bcf1b86cbb3bededa0ca7621a0ca293a
Follows-up 746d67f5fc which implicitly caused the APCUBagOStuff
object to no longer have a wiki-dependent keyspace. This meant
that all cache keys were shared across wikis, even if they used
makeKey() instead of makeGlobalKey() because the keyspace was
not defined (e.g. it was the string "local" for all wikis, instead
of a string like "enwiki").
Bug: T247562
Change-Id: I469e107a54aae91b8782a4dd9a2f1390ab9df2e5
This was added in ce84590988e, and is no longer needed as
of 10dce13709.
Also remove the comment that announces deprecation/removal,
given newFromParams is used as callback in wgObjectCaches,
that might not be feasible. We can keep supporting it as an
optional parameter for now for uses in tests and for uses
outside configuration (e.g ServiceWiring), so that best
practices can be followed where they make sense, but still
allow bypass for config use case, since that would only ever
inject the One True Config object anyway.
Change-Id: I8cc4bfb1862b81df2c31fdc0886364b092636cc2
This bypasses the indirection of global wgObjectCaches config and
ObjectCache::newFromParams static methods, which don't do anything
in practice.
This solves the ExtensionRegistry use-case where it couldn't use
the service container, which in turn opens the path for
being able to deprecate absence of Config being passed to
ObjectCache::newFromParams(), which isn't possible right now because
ExtensionRegistry depended on being able to call it without that,
and that is a hard requirement because ExtensionRegistry isn't
allowed to use the service container.
Change-Id: Ic88da279662f33d3585cb2232358d6faf590b5b3
* Avoid direct $GLOBALS lookups.
* Avoid MediaWikiServices singleton for Config object.
Also given that newFromParams() constructs its Logger object
unconditionally as of I2aa835c5ec79, stop creating it ahead of
time in ServiceWiring.
This code, including the default loggroup value, was duplicated by
me from ObjectCache.php to ServiceWiring.php in commits 3828558140
and bd16c5eb34 because I needed to obtain the Logger object
in order to send a message to it about how it was created.
Solve this debt by letting ServiceWiring access the actual logger
that was created by newFromParams() through a new getLogger()
object.
Change-Id: Ib2de7b22f6c79db0c700fae06269d04fbe27831d
Apply the same duplication logging and asyncHandler defaults for
factory-based entries as constructor-based entries.
Change-Id: I2aa835c5ec7932432d2c739ffa761a7bd9c21198
Introduced in c387e5c with method `getSeparateMainLB()` and then
the method was removed in 99c80a8 and some uses of the member. But,
the declaration of class member above wasn't cleaned up.
Change-Id: Icfd382de782ec8793be4894c5aa06320a97689d0