* Work around HipHop issue 314 (volatile broken) and issue 308 (no compilation detection) by adding some large and ugly compilation detection code to WebStart.php and doMaintenance.php.
* Provide an MW_COMPILED constant which can be used to detect compiled mode throughout the codebase.
* Introduced wfIsHipHop(), which detects either compiled or interpreted mode. Used this to work around unusual eval() return value in eval.php.
* Work around lack of ini_get() in Maintenance.php, by duplicating wfIsHipHop().
* In Maintenance::shouldExecute(), accept "include" as an inclusion function name, since all kinds of inclusion give this string in HipHop.
* Introduced new class MWInit, which provides some static functions in the pre-autoloader environment.
* Introduced MWInit::compiledPath(), which provides a relative path for invoking a compiled file, and MWInit::interpretedPath(), which provides an absolute path for interpreting a PHP file. Used these new functions in the appropriate places.
* When we are running compiled code, don't include files which would generate duplicate class, function or constant definitions. Documented the new requirements on the contents of Defines.php and UtfNormalDefines.php.
* In HipHop compiled mode, it's not possible to have executable code in the same file as a class definition.
** Moved MimeMagic initialisation to the constructor.
** Moved Namespace.php global variable initialisation to Setup.php.
** Moved MemcachedSessions.php initialisation to the caller in GlobalFunctions.php.
** Moved Sanitizer.php constants and global variables to static class members. Introduced an accessor function for the attribs regex, as a new place to put code formerly at file level.
** Moved Language.php initialisation of $wgLanguageNames to Language::getLanguageNames(). Removed the global variable, marked "private" since forever.
* In two places: don't use error_log() with type=3 to append to a file, HipHop doesn't support it. Use file_put_contents() with FILE_APPEND instead.
* Work around the terrible breakage of class_exists() by using MWInit::classExists() instead in various places. In WebInstaller::getPageByName(), the class_exists() was marked with a fixme comment already, so I replaced it with an autoloader solution.
a) avoid redundant inspection of file contents when validating uploads, caused by multiple calls to guessMimeType
b) deprecated obscure use of the file extension when guessing mime types, using an explicit call to improveTypeFromExtension() instead
Note that File::getPropsFromPath() will now return an additional field: $props['file-mime'] contains the mime type as determined solely from the file's content, $props['mime'] contains the type that was derived considering the file extension too.
Currently all webm files are stored as video/webm. It is not possible to detect
wether this file is an audio file without using a full parser. This is why We should
really move mime and mediatype accessors to the MediaHandlers.
Using video/x-matroska for MKV files. There is no official mime for MKV (though the
webm isn't official either, but everyone is already using it apparently).
OpenXML files are Open Package Convention files. Internally, we use the custom mime application/x-opc+zip for these files. In the database, we store the 'proper' mime, which we gu
ess from the file extension, or if not supported, application/zip. All OPC files are blacklisted by $wgMimeTypeBlacklist by default, just as other zip files.
(bug 16583) This was detecting PHP if any of a few three-byte strings
occurred anywhere in the first 1024 bytes of the file. This is too
paranoid -- it creates a significant number of false positives for
binary files, reportedly on the order of about one every 4096 uploads.
It's hard to see what security advantage this check every conveyed,
because it only looks in the first 1024 bytes anyway. For the purposes
of upload it could surely be removed entirely, but I didn't check all
callers, so maybe some caller wants to guess whether the file is PHP for
some purpose other than banning it. So for now I only removed the
checks for the shortest strings, which were most likely to get hit.
Previously, the shorter types like 'text' matched before the longer ones like 'text-template', causing an .ott file to be misdetected as an .odt... and thus rejected for being the wrong type.
Added a check for the magic value header in OpenDocument zip archives which specifies which subtype it is. Such files will get detected with the appropriate mime type and matching extension, so ODT etc uploads will work again where enabled.
(Previously the general ZIP check and blacklist would disable them.)
I very much like the idea of making this extensible, but the current implementation has a couple problems. I'd recommend addresses the following:
* The format of the array isn't documented; it has neither examples nor a description of the content format in its comment. If I wanted to add something to it, I wouldn't know what the result should look like without looking up the code.
* Rather than "additional" types, it might be best to simply list *all* the types we recognize in the default array -- then it can be modified and extended in local configuration. This would have the following benefits:
** Allows modifying existing types
** Defaults are an example of format, making the structure self-documenting
** Avoids code duplication -- we only have to check one array, not two, and don't have to worry about their formats getting out of sync.
* Switch XML type detection/validity check from dipping for XML processing instructions, doctypes, or subtags to just trying to parse it and checking the root element's name and namespace. This lets us properly handle SVG files which specify a namespace but no doctype, as well as rejecting files that aren't well-formed. (See http://meta.wikimedia.org/wiki/SVG_validity_checks for some samples of bad files I encountered.) Non-XML files will abort parsing pretty quickly, so this shouldn't be a big burden on other types that didn't hit a magic check.
* Fix Unicode unix script checks (er.... is that even right? :D), remove the iconv dependency
* Make the autodetection work for UTF-16LE and UTF-16BE XML, which never worked before due to using the wrong string compare length
* Allow doctype strings to break over newlines
* Detect XML if there's a doctype even if there's no XML header (the xml header isn't required for UTF-8 files)
Adds magic header checks for the following types:
* MIDI
* Ogg
* PDF
* XCF
* DOS/Windows, Mach-O, and ELF executables
Locks down detection to prevent uploading different file types for the following extensions:
* mid, ogg, pdf, svg, wmf, xcf
This should now cover all the file types we have uploadable at Wikimedia public sites. (I've disabled the old StarOffice formats.)
Changed priority so our own checks happen in favor of the external checks, since we don't trust that stuff. Would like to see much further work here to replace it all.
Hopefully I haven't broken SVG files; I'm not 100% certain the built-in checks are correct.
* JavaScript video player based loosely on Greg Maxwell's player
* Image page text snippet customisation
* Abstraction of transform parameters in the parser. Introduced Linker::makeImageLink2().
* Made canRender(), mustRender() depend on file, not just on handler. Moved width=0, height=0 checking to ImageHandler::canRender(), since audio streams have width=height=0 but should be rendered.
Also:
* Automatic upgrade for oldimage rows on image page view, allows media handler selection based on oi_*_mime
* oi_*_mime unconditionally referenced, REQUIRES SCHEMA UPGRADE
* Don't destroy file info for missing files on upgrade
* Simple, centralised extension message file handling
* Made MessageCache::loadAllMessages non-static, optimised for repeated-call case due to abuse in User.php
* Support for lightweight parser output hooks, with callback whitelist for security
* Moved Linker::formatSize() to Language, to join the new formatTimePeriod() and formatBitrate()
* Introduced MagicWordArray, regex capture trick requires that magic word IDs DO NOT CONTAIN HYPHENS.