Image metadata is usually a serialized string representing an array.
Passing the string around internally and having everything unserialize
it is an awkward convention.
Also, many image handlers were reading the file twice: once for
getMetadata() and again for getImageSize(). Often getMetadata()
would actually read the width and height and then throw it away.
So, in filerepo:
* Add File::getMetadataItem(), which promises to allow partial
loading of metadata per my proposal on T275268 in a future commit.
* Add File::getMetadataArray(), which returns the unserialized array.
Some file handlers were returning non-serializable strings from
getMetadata(), so I gave them a legacy array form ['_error' => ...]
* Changed MWFileProps to return the array form of metadata.
* Deprecate the weird File::getImageSize(). It was apparently not
called by anything, but was overridden by UnregisteredLocalFile.
* Wrap serialize/unserialize with File::getMetadataForDb() and
File::loadMetadataFromDb() in preparation for T275268.
In MediaHandler:
* Merged MediaHandler::getImageSize() and MediaHandler::getMetadata()
into getSizeAndMetadata(). Deprecated the old methods.
* Instead of isMetadataValid() we now have isFileMetadataValid(), which
only gets a File object, so it can decide what data it needs to load.
* Simplified getPageDimensions() by having it return false for non-paged
media. It was not called in that case, but was implemented anyway.
In specific handlers:
* Rename DjVuHandler::getUnserializedMetadata() and
extractTreesFromMetadata() for clarity. "Metadata" in these function
names meant an XML string.
* Updated DjVuImage::getImageSize() to provide image sizes in the new
style.
* In ExifBitmapHandler, getRotationForExif() now takes just the
Orientation tag, rather than a serialized string. Also renamed for
clarity.
* In GIFMetadataExtractor, return the width, height and bits per channel
instead of throwing them away. There was some conflation in
decodeBPP() which I picked apart. Refer to GIF89a section 18.
* In JpegMetadataExtractor, process the SOF0/SOF2 segment to extract
bits per channel, width, height and components (channel count). This
is essentially a port of PHP's getimagesize(), so should be bugwards
compatible.
* In PNGMetadataExtractor, return the width and height, which were
previously assigned to unused local variables. I verified the
implementation by referring to the specification.
* In SvgHandler, retain the version validation from unpackMetadata(),
but rename the function since it now takes an array as input.
In tests:
* In ExifBitmapTest, refactored some tests by using a provider.
* In GIFHandlerTest and PNGHandlerTest, I removed the tests in which
getMetadata() returns null, since it doesn't make sense when ported to
getMetadataArray(). I added tests for empty arrays instead.
* In tests, I retained serialization of input data since I figure it's
useful to confirm that existing database rows will continue to be read
correctly. I removed serialization of expected values, replacing them
with plain data.
* In tests, I replaced access to private class constants like
BROKEN_FILE with string literals, since stability is essential. If
the class constant changes, the test should fail.
Elsewhere:
* In maintenance/refreshImageMetadata.php, I removed the check for
shrinking image metadata, since it's not easy to implement and is
not future compatible. Image metadata is expected to shrink in
future.
Bug: T275268
Change-Id: I039785d5b6439d71dcc21dcb972177dba5c3a67d
A terminating line break has not been required in wfDebug() since 2014,
however no migration was done. Some of these line breaks found their way
into LoggerInterface::debug() calls, where they mess up the formatting
of the debug log.
So, remove terminating line breaks from wfDebug() and
LoggerInterface::debug() calls.
Also:
* Fix the stripping of leading line breaks from the log header emitted
by Setup.php. This feature, accidentally broken in 2014, allows
requests to be distinguished in the log file.
* Avoid using the global variable $self.
* Move the logging of the client IP back to Setup.php. It was moved to
WebRequest in the hopes that it would not always be needed, however
$wgRequest->getIP() is now called unconditionally a few lines up in
Setup.php. This means that it is put in its proper place after the
"start request" message.
* Wrap the log header code in a closure so that variables like $name do
not leak into global scope.
* In Linker.php, remove a few instances of an unnecessary second
parameter to wfDebug().
Change-Id: I96651d3044a95b9d210b51cb8368edc76bebbb9e
Scalar casts are still allowed (for now), because there's a huge amount
of false positives. Ditto for invalid array offsets.
Thoughts about the rest: luckily, many false positives with array offsets
have gone. Moreover, since *Internal issues are suppressed in the base
config, we can remove inline suppressions.
Unfortunately, there are a couple of new issues about array additions
with only false positives, because apparently they don't take
branches into account.
Change-Id: I5a3913c6e762f77bfdae55051a395fae95d1f841
All of these suppression prevent the detection of many common mistakes,
and could easily prevent things like T231488. Especially if there are
few issues of a given type, it's way better to suppress them inline,
instead of disabling them for the whole core.
This patch only touches the one with a lower count (although those
counts may be out of date).
Bug: T231636
Change-Id: Ica50297ec7c71a81ba2204f9763499da925067bd
This allows us to populate X-Content-Dimensions without touching the
existing metadata format. Which makes the migration of existing content a lot faster by
only having to run refreshFileHeaders.
Bug: T150741
Change-Id: I2c0f39b2b01f364c3fab997ccc2f874b7f101d8a
For storage repos that support headers (such as Swift), this will store the original
media dimensions as an extra custom header, X-Content-Dimensions.
The header is formatted to minimize its length when dealing with multipage
documents, by expressing the information as page ranges keyed by dimensions.
Example for a multipage documents with some pages of different sizes:
X-Content-Dimensions: 1903x899:1-9,11/1903x873:10
Example for a single page document:
X-Content-Dimensions: 800x600:1
Bug: T150741
Change-Id: Ic4c6a86557b3705cf75d074753e9ce2ee070a6df
I searched for /\$(\S+) = (.+?\(.*?\);)\n.*?\$\1\[/, ignored
everything involving isset(), unset() or array assigments, then
skimmed through the remaining results and changed things where they
made sense. These changes were not automated, so please review them.
Change-Id: Ib37b4c66fc57648470f151ad412210b3629c2538
wfSuppressWarnings() and wfRestoreWarnings() were split out into a
separate library. All usages in core were replaced with the new
functions, and the wf* global functions are marked as deprecated.
Additionally, some uses of @ were replaced due to composer's autoloader
being loaded even earlier.
Ie1234f8c12693408de9b94bf6f84480a90bd4f8e adds the library to
mediawiki/vendor.
Bug: T100923
Change-Id: I5c35079a0a656180852be0ae6b1262d40f6534c4
This drops support for the custom utf8 normal PHP extension in favor
of the intl extension.
Bug: T90825
Change-Id: Ifbaeb2ef684217cf6187ccc4fb4d303f89608300
The Line continuation Coding conventions prefers the closing parenthesis
on the same line than the beginning curly braces. This is done for ifs
and functions.
Also move some boolean operator from the end of a line to the beginning
and changed some indentation to make the condition hopefully better
readable.
Change-Id: Id0437b06bde86eb5a75bc59eefa19e7edb624426
Added spaces after/before parenthesis
Removed unneeded parenthesis around some statements
Broke a long line
Change-Id: I7fbe129f7bbf524dd0598ece2a9708643f08453b
And added/removed spaces around some other tokens,
like +, -, *, /, <, >, =, !
Fixed windows newline style
Change-Id: I0b9c8c408f3f6bfc0d685a074d7ec468fb848fc8
Added/removed spaces around logical/arithmetic operator
Reduced multiple empty lines to one empty line
Removed wrong tabs before comments at end of line
Removed too many spaces in assigments
Change-Id: I2bba4e72f9b5f88c53324d7b70e6042f1aad8f6b
I created a new wrapper around unpack - wfUnpack which throws an exception if it runs out of input
(but i didn't use that on GIFMetadataExtractor/PNGMetadataExtractor because those files have a header saying that they weren't external dependencies minimized for potential re-users. I don't think anyone actually re-uses those files, but I didn't want to add a dependency on wfUnpack just in case).
I also changed fopen( blah, "r" ) -> fopen( blah, "rb" ) since these are binary formats, so we don't want newlines converted on windows.
# Some really obscure Exif properties did not have the Exif byte order taken into account
and were being extracted with the bytes reversed (for example user comment when encoded as utf-16).
Not a major issue as these properties are very rare in practise, but certainly not a good thing.
( One would think php's exif support would take care of that, but no it does not...)
# Change the fallback encoding for Gif comments to be windows-1252 instead of iso 8859-1. More
to be consitent with jpg and iptc then anything else.
Hope I did this in an ok fashion. svn merge --re-integrate was giving me issues
so I just essentially over-wrote my working copy with the version at img_metadata.
There was a missing parameter in GIFMetadataExtractor's skipBlock() call for 'netscape 2.0' data blocks, which threw a monkey in the works.
Now also checking for exceptions thrown by the metadata load and stubbing out null metadata for files which can't be read, rather than letting the exception bubble up and kill MediaWiki. :)
* Fast failure when inconsistencies are detected.
* Skip non-GCF, non-application extensions too.
* Fix error where seeking was not being done properly for non-netscape application extensions.
* Creates a new media handler for GIF files, and extracts metadata such as duration, whether or not the GIF is looped, and the number of frames.
* Uses the extracted metadata for the long file description.
* Considers number of frames in the calculation of image area (multiplies by width and height to get the "time-area", or so to speak).
After this patch is deployed, the work-around to disable GIF scaling on Wikimedia sites can be eliminated.