(bug 16583) This was detecting PHP if any of a few three-byte strings
occurred anywhere in the first 1024 bytes of the file. This is too
paranoid -- it creates a significant number of false positives for
binary files, reportedly on the order of about one every 4096 uploads.
It's hard to see what security advantage this check every conveyed,
because it only looks in the first 1024 bytes anyway. For the purposes
of upload it could surely be removed entirely, but I didn't check all
callers, so maybe some caller wants to guess whether the file is PHP for
some purpose other than banning it. So for now I only removed the
checks for the shortest strings, which were most likely to get hit.
Previously, the shorter types like 'text' matched before the longer ones like 'text-template', causing an .ott file to be misdetected as an .odt... and thus rejected for being the wrong type.
Added a check for the magic value header in OpenDocument zip archives which specifies which subtype it is. Such files will get detected with the appropriate mime type and matching extension, so ODT etc uploads will work again where enabled.
(Previously the general ZIP check and blacklist would disable them.)
I very much like the idea of making this extensible, but the current implementation has a couple problems. I'd recommend addresses the following:
* The format of the array isn't documented; it has neither examples nor a description of the content format in its comment. If I wanted to add something to it, I wouldn't know what the result should look like without looking up the code.
* Rather than "additional" types, it might be best to simply list *all* the types we recognize in the default array -- then it can be modified and extended in local configuration. This would have the following benefits:
** Allows modifying existing types
** Defaults are an example of format, making the structure self-documenting
** Avoids code duplication -- we only have to check one array, not two, and don't have to worry about their formats getting out of sync.
* Switch XML type detection/validity check from dipping for XML processing instructions, doctypes, or subtags to just trying to parse it and checking the root element's name and namespace. This lets us properly handle SVG files which specify a namespace but no doctype, as well as rejecting files that aren't well-formed. (See http://meta.wikimedia.org/wiki/SVG_validity_checks for some samples of bad files I encountered.) Non-XML files will abort parsing pretty quickly, so this shouldn't be a big burden on other types that didn't hit a magic check.
* Fix Unicode unix script checks (er.... is that even right? :D), remove the iconv dependency
* Make the autodetection work for UTF-16LE and UTF-16BE XML, which never worked before due to using the wrong string compare length
* Allow doctype strings to break over newlines
* Detect XML if there's a doctype even if there's no XML header (the xml header isn't required for UTF-8 files)
Adds magic header checks for the following types:
* MIDI
* Ogg
* PDF
* XCF
* DOS/Windows, Mach-O, and ELF executables
Locks down detection to prevent uploading different file types for the following extensions:
* mid, ogg, pdf, svg, wmf, xcf
This should now cover all the file types we have uploadable at Wikimedia public sites. (I've disabled the old StarOffice formats.)
Changed priority so our own checks happen in favor of the external checks, since we don't trust that stuff. Would like to see much further work here to replace it all.
Hopefully I haven't broken SVG files; I'm not 100% certain the built-in checks are correct.
* JavaScript video player based loosely on Greg Maxwell's player
* Image page text snippet customisation
* Abstraction of transform parameters in the parser. Introduced Linker::makeImageLink2().
* Made canRender(), mustRender() depend on file, not just on handler. Moved width=0, height=0 checking to ImageHandler::canRender(), since audio streams have width=height=0 but should be rendered.
Also:
* Automatic upgrade for oldimage rows on image page view, allows media handler selection based on oi_*_mime
* oi_*_mime unconditionally referenced, REQUIRES SCHEMA UPGRADE
* Don't destroy file info for missing files on upgrade
* Simple, centralised extension message file handling
* Made MessageCache::loadAllMessages non-static, optimised for repeated-call case due to abuse in User.php
* Support for lightweight parser output hooks, with callback whitelist for security
* Moved Linker::formatSize() to Language, to join the new formatTimePeriod() and formatBitrate()
* Introduced MagicWordArray, regex capture trick requires that magic word IDs DO NOT CONTAIN HYPHENS.
* Seems like an opportune time to introduce "@addtogroup Media" documentation tags.
* Merge "@addtogroup Metadata" (used by Exif.php) into "@addtogroup Media".
* Few more moving comment blocks to above classes.
* Deprecated $wgUseImageResize, thumbnailing will be enabled unconditionally.
* Fixed interaction of page parameter to ImagePage with the HTML file cache
* Improved error reporting for image thumbnailing
* Fixed MIME type for SVG files, will be silently changed from image/svg to image/svg+xml after loading from the database.
* Workaround for djvutoxml bug #1704049 (poor performance). Use djvudump instead.
* Fixed odd behaviour in ImagePage on DjVu thumbnailing errors
* Improved error reporting for image thumbnailing
* Added sharpening option for ImageMagick thumbnailing
* Removed Image::selectPage(), added page parameters to getWidth() and getHeight(), deprecated Image::renderThumb() and Image::getThumbnail()
* Changed default contents of img_metadata to empty string instead of a:0:{}
* Moved responsibility for respecting $wgGenerateThumbnailOnParse from the UI to Image.php
* Specify output type in ImageMagick SVG rendering command line
* Make some Image functions static, for the benefit of WebStore.
* Fixed SVG MIME type, will be image/svg+xml from now on with both accepted.
but $wgDebugLogFile does not exist yet: "filesize(): stat failed for sql-log.txt in includes/GlobalFunctions.php on line 219"
* Removing unused global $IP.
* Indentation of an if/else block.
* Trivial comment typo.
* Prevent PHP Fatal error: "Call to a member function getText() on a non-object in includes/SpecialListusers.php on line 46",
when opening a URL such as http://192.168.0.64/wiki/index.php/Special:Listusers?username=%22%27%3E
(i.e. when "Display users starting at:" username supplied in Special:Listusers is not a valid MediaWiki title).
* Fix HTML validation of protection form (i.e. when "action=protect").
* removing unused local vars
* removing used global declarations
* adding FIXMEs against extract() calls and lines that seem to be using uninitialized variables
* adding some array() declarations.
* Strict Standards: Undefined index: application/ogg in includes/MimeMagic.php on line 154
* Strict Standards: Undefined index: ogm in includes/MimeMagic.php on line 163
* removing some unused global declarations.
* removing or commenting out or adding comments for unused local vars.
* Adding one or two local var declarations.
* Declaring $matches array passed to preg_match() / preg_match_all() as array() before using [not required, just have a slight preference for the explicitness].
* remove one or two pass-by-reference function declarations where the value is not modified.
* Adding some braces to if-else blocks.
* In Parser.php, stripstrate is now an object rather than an array as per r17820, so we no longer need ask for a reference to it (as in "$x =& $this->mStripState;"), and in fact it's probably just simpler to get rid of $x altogether.
* Moving some preg regexes from "" quoting to '' quoting to stop static analyzer whinging about bad escape sequences.
... up to "LinksUpdate.php" in the includes/ directory.
* Deleted DatabaseMysql.php, no longer necessary, database classes are autoloaded.
* Moved wfGetMimeMagic() to MimeMagic::singleton()
* Fixed a couple of __CLASS__.'::'.__FUNCTION__ things.