Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
<?php
|
2017-12-29 23:44:13 +00:00
|
|
|
/**
|
2017-05-19 11:36:04 +00:00
|
|
|
* @group Media
|
|
|
|
|
* @covers MimeAnalyzer
|
|
|
|
|
*/
|
2018-02-17 12:29:13 +00:00
|
|
|
class MimeAnalyzerTest extends PHPUnit\Framework\TestCase {
|
2017-12-29 23:22:37 +00:00
|
|
|
|
|
|
|
|
use MediaWikiCoversValidator;
|
|
|
|
|
|
2016-09-22 04:57:13 +00:00
|
|
|
/** @var MimeAnalyzer */
|
|
|
|
|
private $mimeAnalyzer;
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
|
|
|
|
|
function setUp() {
|
2016-09-22 04:57:13 +00:00
|
|
|
global $IP;
|
|
|
|
|
|
|
|
|
|
$this->mimeAnalyzer = new MimeAnalyzer( [
|
|
|
|
|
'infoFile' => $IP . "/includes/libs/mime/mime.info",
|
|
|
|
|
'typeFile' => $IP . "/includes/libs/mime/mime.types",
|
|
|
|
|
'xmlTypes' => [
|
|
|
|
|
'http://www.w3.org/2000/svg:svg' => 'image/svg+xml',
|
|
|
|
|
'svg' => 'image/svg+xml',
|
|
|
|
|
'http://www.lysator.liu.se/~alla/dia/:diagram' => 'application/x-dia-diagram',
|
|
|
|
|
'http://www.w3.org/1999/xhtml:html' => 'text/html', // application/xhtml+xml?
|
|
|
|
|
'html' => 'text/html', // application/xhtml+xml?
|
|
|
|
|
]
|
|
|
|
|
] );
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
parent::setUp();
|
|
|
|
|
}
|
|
|
|
|
|
2017-05-19 11:36:04 +00:00
|
|
|
function doGuessMimeType( array $parameters = [] ) {
|
|
|
|
|
$class = new ReflectionClass( get_class( $this->mimeAnalyzer ) );
|
|
|
|
|
$method = $class->getMethod( 'doGuessMimeType' );
|
|
|
|
|
$method->setAccessible( true );
|
|
|
|
|
return $method->invokeArgs( $this->mimeAnalyzer, $parameters );
|
|
|
|
|
}
|
|
|
|
|
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
/**
|
|
|
|
|
* @dataProvider providerImproveTypeFromExtension
|
2014-07-24 12:55:43 +00:00
|
|
|
* @param string $ext File extension (no leading dot)
|
|
|
|
|
* @param string $oldMime Initially detected MIME
|
|
|
|
|
* @param string $expectedMime MIME type after taking extension into account
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
*/
|
|
|
|
|
function testImproveTypeFromExtension( $ext, $oldMime, $expectedMime ) {
|
2016-09-22 04:57:13 +00:00
|
|
|
$actualMime = $this->mimeAnalyzer->improveTypeFromExtension( $oldMime, $ext );
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
$this->assertEquals( $expectedMime, $actualMime );
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function providerImproveTypeFromExtension() {
|
2016-02-17 09:09:32 +00:00
|
|
|
return [
|
|
|
|
|
[ 'gif', 'image/gif', 'image/gif' ],
|
|
|
|
|
[ 'gif', 'unknown/unknown', 'unknown/unknown' ],
|
|
|
|
|
[ 'wrl', 'unknown/unknown', 'model/vrml' ],
|
|
|
|
|
[ 'txt', 'text/plain', 'text/plain' ],
|
|
|
|
|
[ 'csv', 'text/plain', 'text/csv' ],
|
|
|
|
|
[ 'tsv', 'text/plain', 'text/tab-separated-values' ],
|
|
|
|
|
[ 'js', 'text/javascript', 'application/javascript' ],
|
|
|
|
|
[ 'js', 'application/x-javascript', 'application/javascript' ],
|
|
|
|
|
[ 'json', 'text/plain', 'application/json' ],
|
|
|
|
|
[ 'foo', 'application/x-opc+zip', 'application/zip' ],
|
|
|
|
|
[ 'docx', 'application/x-opc+zip',
|
|
|
|
|
'application/vnd.openxmlformats-officedocument.wordprocessingml.document' ],
|
|
|
|
|
[ 'djvu', 'image/x-djvu', 'image/vnd.djvu' ],
|
|
|
|
|
[ 'wav', 'audio/wav', 'audio/wav' ],
|
|
|
|
|
];
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
}
|
|
|
|
|
|
2014-06-29 23:46:40 +00:00
|
|
|
/**
|
|
|
|
|
* Test to make sure that encoder=ffmpeg2theora doesn't trigger
|
2017-02-20 23:45:58 +00:00
|
|
|
* MEDIATYPE_VIDEO (T65584)
|
2014-06-29 23:46:40 +00:00
|
|
|
*/
|
|
|
|
|
function testOggRecognize() {
|
2016-09-22 04:57:13 +00:00
|
|
|
$oggFile = __DIR__ . '/../../../data/media/say-test.ogg';
|
|
|
|
|
$actualType = $this->mimeAnalyzer->getMediaType( $oggFile, 'application/ogg' );
|
2017-05-19 10:38:36 +00:00
|
|
|
$this->assertEquals( MEDIATYPE_AUDIO, $actualType );
|
2014-06-29 23:46:40 +00:00
|
|
|
}
|
2017-01-17 21:23:02 +00:00
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Test to make sure that Opus audio files don't trigger
|
|
|
|
|
* MEDIATYPE_MULTIMEDIA (bug T151352)
|
|
|
|
|
*/
|
|
|
|
|
function testOpusRecognize() {
|
|
|
|
|
$oggFile = __DIR__ . '/../../../data/media/say-test.opus';
|
|
|
|
|
$actualType = $this->mimeAnalyzer->getMediaType( $oggFile, 'application/ogg' );
|
2017-05-19 10:38:36 +00:00
|
|
|
$this->assertEquals( MEDIATYPE_AUDIO, $actualType );
|
2017-01-17 21:23:02 +00:00
|
|
|
}
|
2017-05-19 11:36:04 +00:00
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Test to make sure that mp3 files are detected as audio type
|
|
|
|
|
*/
|
|
|
|
|
function testMP3AsAudio() {
|
|
|
|
|
$file = __DIR__ . '/../../../data/media/say-test-with-id3.mp3';
|
|
|
|
|
$actualType = $this->mimeAnalyzer->getMediaType( $file );
|
|
|
|
|
$this->assertEquals( MEDIATYPE_AUDIO, $actualType );
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Test to make sure that MP3 with id3 tag is recognized
|
|
|
|
|
*/
|
|
|
|
|
function testMP3WithID3Recognize() {
|
|
|
|
|
$file = __DIR__ . '/../../../data/media/say-test-with-id3.mp3';
|
|
|
|
|
$actualType = $this->doGuessMimeType( [ $file, 'mp3' ] );
|
|
|
|
|
$this->assertEquals( 'audio/mpeg', $actualType );
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Test to make sure that MP3 without id3 tag is recognized (MPEG-1 sample rates)
|
|
|
|
|
*/
|
|
|
|
|
function testMP3NoID3RecognizeMPEG1() {
|
|
|
|
|
$file = __DIR__ . '/../../../data/media/say-test-mpeg1.mp3';
|
|
|
|
|
$actualType = $this->doGuessMimeType( [ $file, 'mp3' ] );
|
|
|
|
|
$this->assertEquals( 'audio/mpeg', $actualType );
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Test to make sure that MP3 without id3 tag is recognized (MPEG-2 sample rates)
|
|
|
|
|
*/
|
|
|
|
|
function testMP3NoID3RecognizeMPEG2() {
|
|
|
|
|
$file = __DIR__ . '/../../../data/media/say-test-mpeg2.mp3';
|
|
|
|
|
$actualType = $this->doGuessMimeType( [ $file, 'mp3' ] );
|
|
|
|
|
$this->assertEquals( 'audio/mpeg', $actualType );
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Test to make sure that MP3 without id3 tag is recognized (MPEG-2.5 sample rates)
|
|
|
|
|
*/
|
|
|
|
|
function testMP3NoID3RecognizeMPEG2_5() {
|
|
|
|
|
$file = __DIR__ . '/../../../data/media/say-test-mpeg2.5.mp3';
|
|
|
|
|
$actualType = $this->doGuessMimeType( [ $file, 'mp3' ] );
|
|
|
|
|
$this->assertEquals( 'audio/mpeg', $actualType );
|
|
|
|
|
}
|
Fix mime detection of easily-confused-with text/plain formats
json, csv, and tsv are often detected as text/plain. However that's
not right. This patch causes MediaWiki to look at the file extension
of files detected as text/plain, and if the file extension is
for a "textual" type, use the mime type associated with that extension.
This change also changes the "does mime type match uploaded file
extension" check to use the mime based on the file contents
plus extension, as opposed to just the file contents. Various
documentation suggests this is more appropriate (e.g. line 807
of MimeMagic.php). In my opinion we should use just the file
contents when verifying file is not on blacklist, but use ext
when verifying file type matches extension, and for decided
what handler specific checks to run. Not the detect mime type
with extension doesn't override the detected mime type with
the extension, but only uses the extension if content based
detection is ambigious or not specific enough.
This patch should be reviewed by csteipp before merge for
any potential security implications.
Note: This is partially fixing a regression from 3846d1048766a7,
where previously csv and json files were allowed to be uploaded,
and that change prevented them
Bug: 66036
Bug: 45424
Change-Id: Ib637fe6850a81b26f84dc8c00ab4772f3d3a1f34
2014-06-24 19:15:32 +00:00
|
|
|
}
|