objectcache: add size metrics to WANObjectCache::getWithSetCallback()

Track the serialized value bytes/second spent on cache backfills for
each key class. The combination of a Count and Timing metrics within
checkAndSetCooloff() permits calculation of bytes/call as well.

Bug: T248962
Change-Id: I324e29e85bc4df7689bd2d5fb45cf8750b92a8d9
This commit is contained in:
Aaron Schulz 2020-04-09 18:47:18 -07:00 committed by Krinkle
parent ee2b258210
commit f7141fa09f
2 changed files with 55 additions and 46 deletions

View file

@ -25,8 +25,9 @@ Call counter from `WANObjectCache::getWithSetCallback()`.
#### `wanobjectcache.{kClass}.regen_set_delay`
Upon cache miss, this measures the time spent in `WANObjectCache::getWithSetCallback()`,
from the start of the method to right after the new value has been computed by the callback.
Upon cache update due to a cache miss, this measures the time spent in
`WANObjectCache::getWithSetCallback()`, from the start of the method to right after
the new value has been computed by the callback.
This essentially measures the whole method (including retrieval of any old value,
validation, any locks for `lockTSE`, and the callbacks), except for the time spent
@ -35,10 +36,19 @@ in sending the value to the backend server.
* Type: Measure (in milliseconds).
* Variable `kClass`: The first part of your cache key.
#### `wanobjectcache.{kClass}.regen_set_bytes`
Upon cache update due to a cache miss, this estimates the size of the new value
sent from `WANObjectCache::getWithSetCallback()`.
* Type: Counter (in bytes).
* Variable `kClass`: The first part of your cache key.
#### `wanobjectcache.{kClass}.regen_walltime`
Upon cache miss, this measures the time spent in `WANObjectCache::getWithSetCallback()`
from the start of the callback to right after the new value has been computed.
Upon cache update due to a cache miss, this measures the time spent in
`WANObjectCache::getWithSetCallback()` from the start of the callback to
right after the new value has been computed.
* Type: Measure (in milliseconds).
* Variable `kClass`: The first part of your cache key.

View file

@ -1665,58 +1665,57 @@ class WANObjectCache implements
* @param string $key
* @param string $kClass
* @param mixed $value The regenerated value
* @param float $elapsed Seconds spent regenerating the value
* @param bool $hasLock
* @param float $elapsed Seconds spent fetching, validating, and regenerating the value
* @param bool $hasLock Whether this thread has an exclusive regeneration lock
* @return bool Whether it is OK to proceed with a key set operation
*/
private function checkAndSetCooloff( $key, $kClass, $value, $elapsed, $hasLock ) {
$this->stats->timing( "wanobjectcache.$kClass.regen_set_delay", 1e3 * $elapsed );
if ( $hasLock ) {
// No concurrent I/O risk due to mutex
return true;
}
// Serialized value size or estimate
$valueKey = $this->makeSisterKey( $key, self::$TYPE_VALUE );
list( $estimatedSize ) = $this->cache->setNewPreparedValues( [ $valueKey => $value ] );
// Suppose that this cache key is very popular (KEY_HIGH_QPS reads/second).
// After eviction, there will be cache misses until it gets regenerated and saved.
// If the time window when the key is missing lasts less than one second, then the
// number of misses will not reach KEY_HIGH_QPS. This window largely corresponds to
// the key regeneration time. Estimate the count/rate of cache misses, e.g.:
// - 100 QPS, 20ms regeneration => ~2 misses (< 1s)
// - 100 QPS, 100ms regeneration => ~10 misses (< 1s)
// - 100 QPS, 3000ms regeneration => ~300 misses (100/s for 3s)
$missesPerSecForHighQPS = ( min( $elapsed, 1 ) * $this->keyHighQps );
// Determine whether there is enough I/O stampede risk to justify throttling set().
// Estimate the unthrottled set() overhead, as bps, from miss count/rate and value size,
// comparing it to the preferred per-key uplink bps limit (KEY_HIGH_UPLINK_BPS), e.g.:
// - 2 misses (< 1s), 10KB value, 1250000 bps limit => 160000 bits (low risk)
// - 2 misses (< 1s), 100KB value, 1250000 bps limit => 1600000 bits (high risk)
// - 10 misses (< 1s), 10KB value, 1250000 bps limit => 800000 bits (low risk)
// - 10 misses (< 1s), 100KB value, 1250000 bps limit => 8000000 bits (high risk)
// - 300 misses (100/s), 1KB value, 1250000 bps limit => 800000 bits/s (low risk)
// - 300 misses (100/s), 10KB value, 1250000 bps limit => 8000000 bits/s (high risk)
// - 300 misses (100/s), 100KB value, 1250000 bps limit => 80000000 bits/s (high risk)
if ( ( $missesPerSecForHighQPS * $estimatedSize ) >= $this->keyHighUplinkBps ) {
$this->cache->clearLastError();
if (
!$this->cache->add(
$this->makeSisterKey( $key, self::$TYPE_COOLOFF ),
1,
self::$COOLOFF_TTL
) &&
// Don't treat failures due to I/O errors as the key being in cooloff
$this->cache->getLastError() === BagOStuff::ERR_NONE
) {
$this->stats->increment( "wanobjectcache.$kClass.cooloff_bounce" );
if ( !$hasLock ) {
// Suppose that this cache key is very popular (KEY_HIGH_QPS reads/second).
// After eviction, there will be cache misses until it gets regenerated and saved.
// If the time window when the key is missing lasts less than one second, then the
// number of misses will not reach KEY_HIGH_QPS. This window largely corresponds to
// the key regeneration time. Estimate the count/rate of cache misses, e.g.:
// - 100 QPS, 20ms regeneration => ~2 misses (< 1s)
// - 100 QPS, 100ms regeneration => ~10 misses (< 1s)
// - 100 QPS, 3000ms regeneration => ~300 misses (100/s for 3s)
$missesPerSecForHighQPS = ( min( $elapsed, 1 ) * $this->keyHighQps );
return false;
// Determine whether there is enough I/O stampede risk to justify throttling set().
// Estimate unthrottled set() overhead, as bps, from miss count/rate and value size,
// comparing it to the per-key uplink bps limit (KEY_HIGH_UPLINK_BPS), e.g.:
// - 2 misses (< 1s), 10KB value, 1250000 bps limit => 160000 bits (low risk)
// - 2 misses (< 1s), 100KB value, 1250000 bps limit => 1600000 bits (high risk)
// - 10 misses (< 1s), 10KB value, 1250000 bps limit => 800000 bits (low risk)
// - 10 misses (< 1s), 100KB value, 1250000 bps limit => 8000000 bits (high risk)
// - 300 misses (100/s), 1KB value, 1250000 bps limit => 800000 bps (low risk)
// - 300 misses (100/s), 10KB value, 1250000 bps limit => 8000000 bps (high risk)
// - 300 misses (100/s), 100KB value, 1250000 bps limit => 80000000 bps (high risk)
if ( ( $missesPerSecForHighQPS * $estimatedSize ) >= $this->keyHighUplinkBps ) {
$this->cache->clearLastError();
if (
!$this->cache->add(
$this->makeSisterKey( $key, self::$TYPE_COOLOFF ),
1,
self::$COOLOFF_TTL
) &&
// Don't treat failures due to I/O errors as the key being in cooloff
$this->cache->getLastError() === BagOStuff::ERR_NONE
) {
$this->stats->increment( "wanobjectcache.$kClass.cooloff_bounce" );
return false;
}
}
}
// Corresponding metrics for cache writes that actually get sent over the write
$this->stats->timing( "wanobjectcache.$kClass.regen_set_delay", 1e3 * $elapsed );
$this->stats->updateCount( "wanobjectcache.$kClass.regen_set_bytes", $estimatedSize );
return true;
}