wiki.techinc.nl/includes/filerepo
Tim Starling b4849e03b7 Use the unserialized form of image metadata internally
Image metadata is usually a serialized string representing an array.
Passing the string around internally and having everything unserialize
it is an awkward convention.

Also, many image handlers were reading the file twice: once for
getMetadata() and again for getImageSize(). Often getMetadata()
would actually read the width and height and then throw it away.

So, in filerepo:

* Add File::getMetadataItem(), which promises to allow partial
  loading of metadata per my proposal on T275268 in a future commit.
* Add File::getMetadataArray(), which returns the unserialized array.
  Some file handlers were returning non-serializable strings from
  getMetadata(), so I gave them a legacy array form ['_error' => ...]
* Changed MWFileProps to return the array form of metadata.
* Deprecate the weird File::getImageSize(). It was apparently not
  called by anything, but was overridden by UnregisteredLocalFile.
* Wrap serialize/unserialize with File::getMetadataForDb() and
  File::loadMetadataFromDb() in preparation for T275268.

In MediaHandler:

* Merged MediaHandler::getImageSize() and MediaHandler::getMetadata()
  into getSizeAndMetadata(). Deprecated the old methods.
* Instead of isMetadataValid() we now have isFileMetadataValid(), which
  only gets a File object, so it can decide what data it needs to load.
* Simplified getPageDimensions() by having it return false for non-paged
  media. It was not called in that case, but was implemented anyway.

In specific handlers:

* Rename DjVuHandler::getUnserializedMetadata() and
  extractTreesFromMetadata() for clarity. "Metadata" in these function
  names meant an XML string.
* Updated DjVuImage::getImageSize() to provide image sizes in the new
  style.
* In ExifBitmapHandler, getRotationForExif() now takes just the
  Orientation tag, rather than a serialized string. Also renamed for
  clarity.
* In GIFMetadataExtractor, return the width, height and bits per channel
  instead of throwing them away. There was some conflation in
  decodeBPP() which I picked apart. Refer to GIF89a section 18.
* In JpegMetadataExtractor, process the SOF0/SOF2 segment to extract
  bits per channel, width, height and components (channel count). This
  is essentially a port of PHP's getimagesize(), so should be bugwards
  compatible.
* In PNGMetadataExtractor, return the width and height, which were
  previously assigned to unused local variables. I verified the
  implementation by referring to the specification.
* In SvgHandler, retain the version validation from unpackMetadata(),
  but rename the function since it now takes an array as input.

In tests:

* In ExifBitmapTest, refactored some tests by using a provider.
* In GIFHandlerTest and PNGHandlerTest, I removed the tests in which
  getMetadata() returns null, since it doesn't make sense when ported to
  getMetadataArray(). I added tests for empty arrays instead.
* In tests, I retained serialization of input data since I figure it's
  useful to confirm that existing database rows will continue to be read
  correctly. I removed serialization of expected values, replacing them
  with plain data.
* In tests, I replaced access to private class constants like
  BROKEN_FILE with string literals, since stability is essential. If
  the class constant changes, the test should fail.

Elsewhere:

* In maintenance/refreshImageMetadata.php, I removed the check for
  shrinking image metadata, since it's not easy to implement and is
  not future compatible. Image metadata is expected to shrink in
  future.

Bug: T275268
Change-Id: I039785d5b6439d71dcc21dcb972177dba5c3a67d
2021-06-08 17:04:01 +10:00
..
file Use the unserialized form of image metadata internally 2021-06-08 17:04:01 +10:00
Hook Add list of thumbnail urls to LocalFilePurgeThumbnails hook 2020-12-05 21:58:43 +00:00
FileBackendDBRepoWrapper.php Replace uses of DB_MASTER with DB_PRIMARY 2021-04-29 09:24:31 -07:00
FileRepo.php FileRepo::findFile - support Authority 2021-05-26 19:01:12 -07:00
ForeignAPIRepo.php FileRepo: replace Title in method signatures 2021-04-16 11:01:48 +00:00
ForeignDBRepo.php Replace uses of DB_MASTER with DB_PRIMARY 2021-04-29 09:24:31 -07:00
ForeignDBViaLBRepo.php Replace uses of DB_MASTER with DB_PRIMARY 2021-04-29 09:24:31 -07:00
LocalRepo.php FileRepo::findFile - support Authority 2021-05-26 19:01:12 -07:00
NullRepo.php Set method visibility for various constructors 2019-12-03 20:17:30 +01:00
README.md docs: Remove ingroup tag from Markdown files 2019-11-12 16:11:30 -08:00
RepoGroup.php Merge "FileRepo::findFile - support Authority" 2021-06-01 15:24:39 +00:00
TempFileRepo.php Move TempFileRepo to a separate file 2016-12-18 05:52:49 +00:00

FileRepo Architecture

Some quick notes on the file/repository architecture.

Functionality is, as always, driven by data model.

  • The repository object stores configuration information about a file storage method.

  • The file object is a process-local cache of information about a particular file.

Thus the file object is the primary public entry point for obtaining information about files, since access via the file object can be cached, whereas access via the repository should not be cached.

Functions which can act on any file specified in their parameters typically find their place either in the repository object, where reference to repository-specific configuration is needed, or in static members of File or FileRepo, where no such configuration is needed.

File objects are generated by a factory function from the repository. The repository thus has full control over the behavior of its subsidiary file class, since it can subclass the file class and override functionality at its whim. Thus there is no need for the File subclass to query its parent repository for information about repository-class-dependent behavior -- the file subclass is generally fully aware of the static preferences of its repository. Limited exceptions can be made to this rule to permit sharing of functions, or perhaps even entire classes, between repositories.

These rules alone still do lead to some ambiguity -- it may not be clear whether to implement some functionality in a repository function with a filename parameter, or in the file object itself.

So we introduce the following rule: the file subclass is smarter than the repository subclass. The repository should in general provide a minimal API needed to access the storage backend efficiently.

In particular, note that I have not implemented any database access in LocalRepo.php. LocalRepo provides only file access, and LocalFile provides database access and higher-level functions such as cache management.

Tim Starling, June 2007