Commit graph

889 commits

Author SHA1 Message Date
Umherirrender
0bd6c0f180 media: Add missing false return types to doc
Change-Id: Ieefa5bbfddb803ceb41196725c1f344f175c10c3
2021-10-16 21:49:29 +00:00
jenkins-bot
ae5c51f354 Merge "Use Message::sizeParams to simplify code when building messages" 2021-10-15 23:21:38 +00:00
Umherirrender
02c0e8b8e7 Use Message::sizeParams to simplify code when building messages
Change-Id: Ic04d4dea86e61fb07b2a3b17acb6021fab6ae5ee
2021-10-13 19:52:41 +00:00
Kunal Mehta
a7c90ecc5f Allow skipping $wgMaxImageArea check
If $wgMaxImageArea is false, MediaWiki will no longer check if the image
fits within that area before trying to scale it. Extensions can still
use the BitmapHandlerCheckImageArea hook to override it.

This is primarily useful when using an external scaler like Wikimedia
does with Thumbor, which decides whether it can scale images by using a
timeout rather than based on size.

Codesearch indicates that the only extension checking this setting is
PagedTiffHandler, which will be updated in Iefa67321d07f7.

Bug: T291014
Depends-On: Iefa67321d07f79d982388231e02e87e2f18aed40
Change-Id: Id10173bbddb32bc70e036f426369cfbea52cecf4
2021-09-23 15:33:37 -07:00
Amir Sarabadani
64752c0fc4 Drop $wgDjvuToXML
The software behind this is abandoned and we are migrating djvu metadata
to json instead.

This doesn't affect production as we already use djvudump

Bug: T275268
Change-Id: If45ae5746ba91ba305f93603dc1e3aafba80a369
2021-09-10 23:40:31 +02:00
Umherirrender
07b499fbcf build: Update mediawiki/mediawiki-phan-config to 0.11.0
Addition and remove of suppression needs to be done with the version
update.

Change-Id: I3288b3cefa744b507eadebb67b8ab08c86517c1c
2021-09-07 17:19:05 +02:00
Umherirrender
084f2fedfd build: Enable phan plugin UseReturnValuePlugin
Suppress false positives

Bug: T240141
Change-Id: Ie356512ad76de465b1fda5b913fa30702339cb11
2021-08-26 21:08:19 +00:00
jenkins-bot
c039f2db81 Merge "Simplify loops over array_keys" 2021-08-24 13:19:38 +00:00
jenkins-bot
40c423d441 Merge "Simplify if-then-else-return statements with explicit true/false" 2021-08-24 11:44:23 +00:00
jenkins-bot
916c0307a0 Merge "Remove unneeded continue/return statements" 2021-08-18 09:10:19 +00:00
Umherirrender
220fd020c4 Simplify if-then-else-return statements with explicit true/false
When both branches returns a bool, use the condition as return value

Change-Id: I59416aa021d0ada77d84fda4aaf7def0eea54009
2021-08-17 23:19:04 +02:00
Umherirrender
1d178e177b Remove unneeded continue/return statements
Change-Id: I26f9845b09ecc15de8b6e0213ab369b386194c9d
2021-08-17 22:53:53 +02:00
Umherirrender
244ea7c0b5 Simplify else-branches after continue/break
When the if branch continues the loop,
than the next branch does not need to be an else branch

Change-Id: Ia158709b7fd2ea811f1049cf8f53ed12c89719e3
2021-08-17 22:51:43 +02:00
Matěj Suchánek
5267cc09ea Simplify loops over array_keys
Use native PHP feature of iterating over key-value pairs
instead of looking up the value if it's used.

Change-Id: Id55f774b3a9d97463b97581c5b2ffe081489863a
2021-08-12 07:08:36 +00:00
jenkins-bot
5d1bca6a1d Merge "PNGMetadataExtractor: skip oversize chunks instead of aborting" 2021-07-29 05:12:11 +00:00
Matěj Suchánek
d71ff53639 Add missing spaces to imploded debug strings
Change-Id: I32d921aaa3a5799777ff62b35608cbedcfff907d
2021-07-28 11:07:17 +02:00
Tim Starling
d0d73ff1f9 PNGMetadataExtractor: skip oversize chunks instead of aborting
Bug: T286273
Change-Id: Iceaf92647e74a1e20f94fc36822d0735f70764dc
2021-07-28 14:14:22 +10:00
Tim Starling
2e507003ca Ignore invalid chunks in PNG files, instead of aborting metadata extraction
* Factor out file read errors and unexpected EOF errors.
* For errors relating to chunk content, instead of throwing an
  exception which is silently discarded, just log an error and continue
  to the next chunk. This allows the dimensions to be extracted even if
  other metadata is mangled.
* As an additional sanity check, verify the CRC of each chunk.

Bug: T286273
Change-Id: I11d0186496324e0bb1bb0a143f438e0368a8e902
2021-07-13 11:11:03 +10:00
jenkins-bot
6eb8c5a6da Merge "Use IEC prefixes instead of SI prefixes for byte sizes (docs+backend)" 2021-06-29 10:34:41 +00:00
jenkins-bot
0921479b01 Merge "media: Ignore EXIF tag GPSAltitudeRef in FormatMetadata" 2021-06-29 01:04:36 +00:00
Tim Starling
5150f19d65 media: Ignore EXIF tag GPSAltitudeRef in FormatMetadata
GPSAltitudeRef is no longer written to img_metadata, but for
compatibility with old rows, don't raise a formatnum warning, just
ignore the tag.

Bug: T285213
Change-Id: Icc074bc0d7bd6a84f73a26e8dd001be85cbef165
2021-06-29 00:44:25 +00:00
jenkins-bot
e6c42d7d79 Merge "media: Handle lack of 'metadata' key from getSizeAndMetadata gracefully" 2021-06-28 12:17:03 +00:00
Fomafix
356f1b72ef Use IEC prefixes instead of SI prefixes for byte sizes (docs+backend)
This change doesn't change any UI messages.

Bug: T54687
Change-Id: Ia62899a2a6fe8910618c35cd667291e397ddb055
2021-06-28 11:59:09 +01:00
Amir Sarabadani
0c48236c60 media: Handle lack of 'metadata' key from getSizeAndMetadata gracefully
*Handler::getSizeAndMetadata() can return an array without 'metadata' as
BmpHandler does and it's currently causing issues.

Bug: T285490
Change-Id: Ib6bc798508002b1cf2a33325b0ddf6e473d6f287
2021-06-28 10:25:24 +00:00
jenkins-bot
2739509d38 Merge "Optionally split out parts of file metadata to BlobStore" 2021-06-26 20:32:04 +00:00
Arlo Breault
fdd8f864b8 Emit media structure as piloted in Parsoid
Gated behind the flag $wgParserEnableLegacyMediaDOM.  The scattershot
usage of it is a little unfortunate but isn't expected to live very long
so maybe that's acceptable.

Further details can be found at,
https://www.mediawiki.org/wiki/Parsing/Media_structure

Bug: T51097
Bug: T266148
Bug: T271129
Change-Id: I978187f9f6e9e0a105521ab3e26821e36a96b911
2021-06-24 23:32:40 +00:00
Amir Sarabadani
3ff89b6295 media: Make the file metadata "_error" check looser
Follows-up I039785d5b6 and I0ccb9971c7b6d99.
It basically cancelled out when the value of error is 0, which is falsy.

Bug: T285431
Change-Id: I79a3c021973e43cf8012a783c24e40bbd33d8652
2021-06-24 00:57:50 +00:00
Amir Sarabadani
9d4a25a7c7 Check for _error in getting metadata array in GIFHandler
Before I039785d5b6, it would just suppress errors and return empty value
in which shorten out. Now, it returns [ '_error' => some value ] which
is not empty value but doesn't have anything this method wants either.

Bug: T285431
Change-Id: I0ccb9971c7b6d99937a6b78611cb493795228aee
2021-06-24 00:15:31 +02:00
Amir Sarabadani
cafb14dffb Check for _error in getting metadata array in PNGHandler
Before I039785d5b6, it would just suppress errors and return empty value
in which shorten out. Now, it returns [ '_error' => some value ] which
is not empty value but doesn't have anything this method wants either.

Bug: T285431
Change-Id: Ia2bc0982ffaeda0575af1481f9b84faad7d784ad
2021-06-23 23:59:06 +02:00
Thiemo Kreuz
2ba01c7ee7 Remove some more comments that literally repeat the code
… including PHPDoc tags like `@return <type> $variableName`.
A return value doesn't have a variable name. I can see that
some people do this intentionally, repeating the variable
name that was used in the final `return $var;` at the end
of a method. This can indeed be helpful. I leave a lot of
these untouched and removed them only when it's obviously
wrong, or does not provide any additional information in
addition to what the code already says.

Change-Id: Ia18cd9f25ef658b08ad25b97a744897e2a8deffc
2021-06-18 21:23:56 +00:00
Tim Starling
9c3c0b704b Use array_fill_keys() instead of array_flip() if that reflects the developer's intention
array_fill_keys() was introduced in PHP 5.2.0 and works like
array_flip() except that it does only one thing (copying keys) instead
of two things (copying keys and values). That makes it faster and more
obvious.

When array_flip() calls were paired, I left them as is, because that
pattern is too cute. I couldn't kill something so cute.

Sometimes it was hard to figure out whether the values in array_flip()
result were used. That's the point of this change. If you use
array_fill_keys(), the intention is obvious.

Change-Id: If8d340a8bc816a15afec37e64f00106ae45e10ed
2021-06-15 00:11:10 +00:00
Tim Starling
68ebdfc77f Optionally split out parts of file metadata to BlobStore
* Optionally store metadata in the database in JSON format instead of
  PHP serialization. The new JSON format has a top-level "envelope"
  array which gives us a place to store things that are not part of the
  handler metadata.
* Optionally split metadata items, putting items above a threshold into
  the text table. The FileRepo and MediaHandler must both opt in.
* For staged deployment, the read side of these changes is always
  active. Only the write side is configurable.

Bug: T275268
Change-Id: I876ea5c9d3a1881e278f689d2f8a3ae20240c703
2021-06-11 08:01:26 +10:00
Tim Starling
b4849e03b7 Use the unserialized form of image metadata internally
Image metadata is usually a serialized string representing an array.
Passing the string around internally and having everything unserialize
it is an awkward convention.

Also, many image handlers were reading the file twice: once for
getMetadata() and again for getImageSize(). Often getMetadata()
would actually read the width and height and then throw it away.

So, in filerepo:

* Add File::getMetadataItem(), which promises to allow partial
  loading of metadata per my proposal on T275268 in a future commit.
* Add File::getMetadataArray(), which returns the unserialized array.
  Some file handlers were returning non-serializable strings from
  getMetadata(), so I gave them a legacy array form ['_error' => ...]
* Changed MWFileProps to return the array form of metadata.
* Deprecate the weird File::getImageSize(). It was apparently not
  called by anything, but was overridden by UnregisteredLocalFile.
* Wrap serialize/unserialize with File::getMetadataForDb() and
  File::loadMetadataFromDb() in preparation for T275268.

In MediaHandler:

* Merged MediaHandler::getImageSize() and MediaHandler::getMetadata()
  into getSizeAndMetadata(). Deprecated the old methods.
* Instead of isMetadataValid() we now have isFileMetadataValid(), which
  only gets a File object, so it can decide what data it needs to load.
* Simplified getPageDimensions() by having it return false for non-paged
  media. It was not called in that case, but was implemented anyway.

In specific handlers:

* Rename DjVuHandler::getUnserializedMetadata() and
  extractTreesFromMetadata() for clarity. "Metadata" in these function
  names meant an XML string.
* Updated DjVuImage::getImageSize() to provide image sizes in the new
  style.
* In ExifBitmapHandler, getRotationForExif() now takes just the
  Orientation tag, rather than a serialized string. Also renamed for
  clarity.
* In GIFMetadataExtractor, return the width, height and bits per channel
  instead of throwing them away. There was some conflation in
  decodeBPP() which I picked apart. Refer to GIF89a section 18.
* In JpegMetadataExtractor, process the SOF0/SOF2 segment to extract
  bits per channel, width, height and components (channel count). This
  is essentially a port of PHP's getimagesize(), so should be bugwards
  compatible.
* In PNGMetadataExtractor, return the width and height, which were
  previously assigned to unused local variables. I verified the
  implementation by referring to the specification.
* In SvgHandler, retain the version validation from unpackMetadata(),
  but rename the function since it now takes an array as input.

In tests:

* In ExifBitmapTest, refactored some tests by using a provider.
* In GIFHandlerTest and PNGHandlerTest, I removed the tests in which
  getMetadata() returns null, since it doesn't make sense when ported to
  getMetadataArray(). I added tests for empty arrays instead.
* In tests, I retained serialization of input data since I figure it's
  useful to confirm that existing database rows will continue to be read
  correctly. I removed serialization of expected values, replacing them
  with plain data.
* In tests, I replaced access to private class constants like
  BROKEN_FILE with string literals, since stability is essential. If
  the class constant changes, the test should fail.

Elsewhere:

* In maintenance/refreshImageMetadata.php, I removed the check for
  shrinking image metadata, since it's not easy to implement and is
  not future compatible. Image metadata is expected to shrink in
  future.

Bug: T275268
Change-Id: I039785d5b6439d71dcc21dcb972177dba5c3a67d
2021-06-08 17:04:01 +10:00
Tim Starling
f5d86ec75e Replace usage of custom File properties
Some MediaHandler subclasses were setting custom properties on the File
object in order to cache file-associated state. So:

* Add File::getHandlerState() and File::setHandlerState().
* Put them in an interface, which will be used in a subsequent commit in
  MediaHandler::getSizeAndMetadata().
* Use them in DjvuHandler.
* Provide a trivial implementation of the interface, for use in testing
  and in the subsequent commit.

Change-Id: Ic365384ff13f7898c1203da38c4405abf03d7563
2021-05-27 18:48:06 +10:00
Derk-Jan Hartman
3db119428a Basic JPEG2000 handler
Basic handler for JPEG2000 files. Both jp2 and jpx are supported by
php's image functions.

No support for:
- metadata
- lossy vs lossless thumbnail
- bucketing
- thumbor

Bug: T161934
Change-Id: I1a72d4dfb034f3ae24661db515cf03b35ec18fa2
2021-05-19 12:42:08 -07:00
Thiemo Kreuz
6805f39a30 Remove unused default values from class properties
In all these cases the property is unconditionally set in
the constructor. The extra initialisation is effectively
dead code and an extra source of errors and confusion.

Change-Id: Icae13390d5ca5c14e2754f3be4eb956dd7f54ac4
2021-05-12 13:44:28 +02:00
jenkins-bot
24c72c6079 Merge "MediaHandlerFactory: inject a logger" 2021-04-14 15:17:28 +00:00
Umherirrender
03ed01445d Avoid double escape of exif message in FormatMetadata
Change-Id: I1aece2b2c26a06b887c6aa71719de8d4574f9dcc
2021-04-12 23:24:33 +02:00
DannyS712
6fb338ae40 MediaHandlerFactory: inject a logger
Instead of using wfDebug
Also normalize the entries

Change-Id: Ie539233c8b95eaae370732f97681989821157299
2021-03-31 17:45:51 +00:00
jenkins-bot
e623a91aab Merge "Fix replacement of control chars in DJVU text output" 2021-03-29 15:38:33 +00:00
Inductiveload
32ea3a3fbe Fix replacement of control chars in DJVU text output
The control characters are presented as text, not actual
control characters, so the regexes to replace them are
incorrect.

Added a column and para to the Djvu text on the first page
of the test LoremIpsum.djvu file

Bug: T230415
Change-Id: I4970bc30b3935ce4da062ee7ff687aa667027a00
2021-03-29 06:03:34 +00:00
Reedy
cce3fb49d0 Use more neutral or alternative language
Bug: T277987
Change-Id: Iafc4b3e3137936046487119b7e17635f4e560277
2021-03-20 19:47:18 +00:00
Umherirrender
8de3b7d324 Use static closures where safe to use
This is micro-optimization of closure code to avoid binding the closure
to $this where it is not needed.

Created by I25a17fb22b6b669e817317a0f45051ae9c608208

Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653
2021-02-11 00:13:52 +00:00
Reedy
d7decde5f5 SVGReader.php: Reduce code duplication by using finally {}
Change-Id: I916171216dc96b46120d11b492f14b8d791c1b3c
2021-02-07 02:48:01 +00:00
Umherirrender
e4d1a2c8bd Use __CLASS__/::class to define callback for array_map/_filter/usort
Change-Id: I3519dd5a1ce1ea688de602190cd74755c400c717
2021-01-22 16:39:29 +00:00
James D. Forrester
75b2aafb6f Exif::isSlong: Cast input to float so PHP 8.0 abs() doesn't whine
Bug: T272327
Change-Id: I8d2ce893205d1e04a3f07fb5ea76670433cd79ef
2021-01-18 16:02:35 -08:00
Umherirrender
a30fe542ae build: Enable SecurityCheck-DoubleEscaped and suppress issues
This issue type was globally suppressed in
I849ac4f120fd15b483e8939d4db45c98dc351259 to make reviewer easier.

This adds inline suppressions or @suppress directives on function
docs for false positives, mostly restoring those removed in
I849ac4f120fd15b483e8939d4db45c98dc351259

Bug: T231311
Change-Id: I1b1d814bd907e9d49fcc39f777982936574fc7c6
2020-12-30 23:34:20 +00:00
Umherirrender
f46ca9a63c build: Updating mediawiki/mediawiki-phan-config to 0.10.5
Change-Id: I343d2bae626a3903eb1e67c05bf5caef4314b7dd
2020-12-12 14:42:25 +01:00
jenkins-bot
9b16a2e3c7 Merge "Stop ignoring paragraph and region separators in DjVu file OCR text layer" 2020-12-04 19:06:12 +00:00
Reedy
7acc57cff9 media: Swap second if for elseif in FormatMetdata::sanitizeKeyForAPI()
Also swap == for ===

Noted in T268133 as a minor optimisation

Change-Id: I7c2198b68cd91dc7a642d0c1f6ce3bf39aeccc41
2020-11-18 12:58:55 +00:00