wiki.techinc.nl/includes/externalstore
Tim Starling ca71e69fc6 Try not to discard Excimer timeout exceptions
Don't catch and discard exceptions from the RequestTimeout library,
except when the exception is properly handled and the code seems to be
trying to wrap things up.

In most cases the exception is rethrown. Ideally it should instead be
done by narrowing the catch, and this was feasible in a few cases. But
sometimes the exception being caught is an instance of the base class
(notably DateTime::__construct()). Often Exception is the root of the
hierarchy of exceptions being thrown and so is the obvious catch-all.

Notes on specific callers:

* In the case of ResourceLoader::respond(), exceptions were caught for API
  correctness, but processing continued. I added an outer try block for
  timeout handling so that termination would be more prompt.
* In LCStoreCDB the Exception being caught was Cdb\Exception not
  \Exception. I added an alias to avoid confusion.
* In ImageGallery I added a special exception class.
* In Message::__toString() the rationale for catching disappears
  in PHP 7.4.0+, so I added a PHP version check.
* In PoolCounterRedis, let the shutdown function do its thing, but
  rethrow the exception for logging.

Change-Id: I4c3770b9efc76a1ce42ed9f59329c36de04d657c
2022-02-02 16:27:44 +11:00
..
ExternalStore.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
ExternalStoreAccess.php Try not to discard Excimer timeout exceptions 2022-02-02 16:27:44 +11:00
ExternalStoreDB.php Fix typos in comments (E-H) 2021-12-30 18:14:43 +05:30
ExternalStoreException.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
ExternalStoreFactory.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
ExternalStoreHttp.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
ExternalStoreMedium.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
ExternalStoreMemory.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
ExternalStoreMwstore.php externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00
README.md externalstore: Improve overall documentation 2021-12-15 02:03:47 +00:00

%ExternalStore

%ExternalStore is an optional feature that enables persistent object storage outside the main database, primarily for revision text (also known as a "blob").

The main public interface for interacting with %ExternalStore is ExternalStoreAccess. Though note that higher-level concepts like {@link MediaWiki\Revision\RevisionRecord} and text blobs have their own dedicated interface: {@link MediaWiki\Revision\RevisionStore}, and {@link MediaWiki\Storage\BlobStore}.

Concepts

URL

Objects in external stores are internally identified by a special URL. The URL is of the form <store protocol>://<location>/<object name>.

Protocol

The protocol represents which ExternalStoreMedium class is used. The following protocols are supported by default:

  • DB: ExternalStoreDB
  • http: ExternalStoreHttp
  • mwstore: ExternalStoreMwstore

Multiple protocols may be enabled at the same time. For example, to support reading older data while using a different protocol for new data.

Protocols are configured via {@link $wgExternalStores}. The ExternalStoreMedium class is decided based on concatenating the value from $wgExternalStores to the string ExternalStore, with a ucfirst transformation applied as-needed.

A custom protocol called "foobar" could be configured by implementing ExternalStoreMedium in a subclass called ExternalStoreFoobar.

Location

The location identifies a particular instance of given store protocol.

In the case of ExternalStoreDB, the location represents a database cluster (one or more database servers that hold the same data).

When using the default of {@link Wikimedia::Rdbms::LBFactorySimple LBFactorySimple}, these clusters can be configured via {@link $wgExternalServers}. Otherwise, external clusters must be configured via {@link $wgLBFactoryConf}.

New insertions

The destination of newly stored text blobs is configured via {@link $wgDefaultExternalStore}. To enable use of %ExternalStore for new blobs, this must be set to a non-empty array. This can be disabled to store new blobs in the main database instead, it does not affect how existing blobs are read.

Each destination uses a partial URL of the form <store protocol>://<location>. When a blob is inserted, we randomly pick an available protocol/location pair from this list. Insertions will fail-over to another default destination if the chosen one is unavailable.

Append-only

%ExternalStore is designed as an append-only system, to persist data in a way that is highly reliable and immutable. As such, the interface is restricted to fetch and insert operations, and specifically does not permit modification or deletion once data is stored.

This design benefits MediaWiki in a number of ways:

  • The limited interface provides flexibility to each protocol implementation.
  • Caching is trivial and safe.
  • Stable references to external store can be kept outside of it, in the core database and anywhere else in caching or other storage layers, without needing to track of propagate changes.
  • Historical data can be stored with high reliability guarantees and operational safety:
    • External database clusters may be operated in read-only mode, directly through MySQL.
    • Each replica within the cluster may operate as independent static backup.
    • Database replication between hosts may be turned off.
    • Even command-line access from outside MediaWiki can't accidentally affect historical data.

In case of maintenance tasks such as recompression, we generally iterate through known blobs and write new blobs as-needed and gracefully update pointers accordingly. If an entire cluster has been copied or recompressed to a new location, it can be taken out of rotation, with any storage space freed at that time. Note that multiple locations may be physically colocated on the same hardware, e.g. by running multiple instances of MySQL. Although it may be simpler to free space by doing recompression during other routine maintenance, such as when migrating data from old to new hardware.