Remove $wgShellLocale, always use C

$wgShellLocale was a flawed solution to the problem of locale
dependence. MediaWiki has its own concept of locale (the Language
hierarchy) and any kind of dependence on the server's libc locale is
incorrect and harmful, leading to bugs. Developers have an expectation
that functions like strtolower() will work in a certain way, and
respecting the locale set in the environment at install time violates
this expectation.

The problems with using C as a locale, which led to $wgShellLocale, are:

* escapeshellarg() will strip non-ASCII characters. This can be worked
  around by not using it. The security vulnerability it was trying to
  fix can be prevented in another way.
* Shell commands like rsvg will fail to correctly interpret UTF-8
  arguments. This is the reason for the putenv(). On Linux, this can
  be fixed by using C.UTF-8, which we didn't know at the time. On
  Windows, the problem is not relevant (there are unrelated issues
  with UTF-8 arguments).

Bug: T291234
Change-Id: Ib5ac0e7bc720dcc094303a358ee1c7bbdcfc6447
This commit is contained in:
Tim Starling 2021-09-21 15:40:55 +10:00 committed by Kunal Mehta
parent 041dadd69d
commit 65b1b6b56a
12 changed files with 27 additions and 230 deletions

View file

@ -180,15 +180,6 @@
<rule ref="MediaWiki.Usage.AssignmentInReturn.AssignmentInReturn">
<exclude-pattern>*/tests/phpunit/*</exclude-pattern>
</rule>
<rule ref="MediaWiki.Usage.ForbiddenFunctions.escapeshellarg">
<!--
Continue to allow existing violations, but enable the sniff to prevent
any new occurrences.
-->
<exclude-pattern>*/includes/libs/filebackend/FSFileBackend\.php</exclude-pattern>
<exclude-pattern>*/includes/shell/Command\.php</exclude-pattern>
<exclude-pattern>*/includes/shell/Shell\.php</exclude-pattern>
</rule>
<rule ref="MediaWiki.Usage.ForbiddenFunctions.popen">
<!--
Continue to allow existing violations, but enable the sniff to prevent

View file

@ -33,6 +33,11 @@ of automatic detection of possible phone numbers in a webpage in Safari on iOS.
* …
==== Removed configuration ====
* $wgShellLocale - This setting has been removed as it was a flawed solution
to the problem of locale dependence, MediaWiki will now always set a locale
of C.UTF-8 or C and works around the remaining problems of the C locale by
not using escapeshellarg. This follows the direction of PHP 8.0, which sets
a locale of C by default instead of respecting LC_CTYPE.
* …
=== New user-facing features in 1.38 ===

View file

@ -9535,42 +9535,6 @@ $wgShellCgroup = false;
*/
$wgPhpCli = '/usr/bin/php';
/**
* Locale for LC_ALL, to provide a known environment for locale-sensitive operations
*
* For Unix-like operating systems, this should be set to C.UTF-8 or an
* equivalent to provide the most consistent behavior for locale-sensitive
* C library operations across different-language wikis. If that locale is not
* available, use another locale that has a UTF-8 character set.
*
* This setting mainly affects the behavior of C library functions, including:
* - String collation (order when sorting using locale-sensitive comparison)
* - For example, whether "Å" and "A" are considered to be the same letter or
* different letters and if different whether it comes after "A" or after
* "Z", and whether sorting is case sensitive.
* - String character set (how characters beyond basic ASCII are represented)
* - We need this to be a UTF-8 character set to work around
* https://bugs.php.net/bug.php?id=45132
* - Language used for low-level error messages.
* - Formatting of date/time and numeric values (e.g. '.' versus ',' as the
* decimal separator)
*
* MediaWiki provides its own methods and classes to perform many
* locale-sensitive operations, which are designed to be able to vary locale
* based on wiki language or user preference:
* - MediaWiki's Collation class should generally be used instead of the C
* library collation functions when locale-sensitive sorting is needed.
* - MediaWiki's Message class should be used for localization of messages
* displayed to the user.
* - MediaWiki's Language class should be used for formatting numeric and
* date/time values.
*
* @note If multiple wikis are being served from the same process (e.g. the
* same fastCGI or Apache server), this setting must be the same on all those
* wikis.
*/
$wgShellLocale = 'C.UTF-8';
/**
* Method to use to restrict shell commands
*

View file

@ -1870,11 +1870,12 @@ function wfStringToBool( $val ) {
}
/**
* Version of escapeshellarg() that works better on Windows.
* Locale-independent version of escapeshellarg()
*
* Originally, this fixed the incorrect use of single quotes on Windows
* (https://bugs.php.net/bug.php?id=26285) and the locale problems on Linux in
* PHP 5.2.6+ (bug backported to earlier distro releases of PHP).
* PHP 5.2.6+ (https://bugs.php.net/bug.php?id=54391). The second bug is still
* open as of 2021.
*
* @param string|string[] ...$args strings to escape and glue together,
* or a single array of strings parameter

View file

@ -169,10 +169,13 @@ ExtensionRegistry::getInstance()->loadFromQueue();
// Don't let any other extensions load
ExtensionRegistry::getInstance()->finish();
// Set the configured locale on all requests for consistency
// This must be after LocalSettings.php (and is informed by the installer).
putenv( "LC_ALL=$wgShellLocale" );
setlocale( LC_ALL, $wgShellLocale );
// Set an appropriate locale (T291234)
// setlocale() will return the locale name actually set.
// The putenv() is meant to propagate the choice of locale to shell commands
// so that they will interpret UTF-8 correctly. If you have a problem with a
// shell command and need to send a special locale, you can override the locale
// with Command::environment().
putenv( "LC_ALL=" . setlocale( LC_ALL, 'C.UTF-8', 'C' ) );
/**
* Expand dynamic defaults and shortcuts

View file

@ -28,7 +28,6 @@ use MediaWiki\HookContainer\HookContainer;
use MediaWiki\HookContainer\StaticHookRegistry;
use MediaWiki\Interwiki\NullInterwikiLookup;
use MediaWiki\MediaWikiServices;
use MediaWiki\Shell\Shell;
/**
* The Installer helps admins create or upgrade their wiki.
@ -144,7 +143,6 @@ abstract class Installer {
'envCheckGit',
'envCheckServer',
'envCheckPath',
'envCheckShellLocale',
'envCheckUploadsDirectory',
'envCheckLibicu',
'envCheckSuhosinMaxValueLength',
@ -191,7 +189,6 @@ abstract class Installer {
'wgMetaNamespace',
'wgDeletedDirectory',
'wgEnableUploads',
'wgShellLocale',
'wgSecretKey',
'wgUseInstantCommons',
'wgUpgradeKey',
@ -1011,87 +1008,6 @@ abstract class Installer {
return true;
}
/**
* Environment check for preferred locale in shell
* @return bool
*/
protected function envCheckShellLocale() {
$os = php_uname( 's' );
$supported = [ 'Linux', 'SunOS', 'HP-UX', 'Darwin' ]; # Tested these
if ( !in_array( $os, $supported ) ) {
return true;
}
if ( Shell::isDisabled() ) {
return true;
}
# Get a list of available locales.
$result = Shell::command( '/usr/bin/locale', '-a' )->execute();
if ( $result->getExitCode() != 0 ) {
return true;
}
$lines = $result->getStdout();
$lines = array_map( 'trim', explode( "\n", $lines ) );
$candidatesByLocale = [];
$candidatesByLang = [];
foreach ( $lines as $line ) {
if ( $line === '' ) {
continue;
}
if ( !preg_match( '/^([a-zA-Z]+)(_[a-zA-Z]+|)\.(utf8|UTF-8)(@[a-zA-Z_]*|)$/i', $line, $m ) ) {
continue;
}
list( , $lang, , , ) = $m;
$candidatesByLocale[$m[0]] = $m;
$candidatesByLang[$lang][] = $m;
}
# Try the current value of LANG.
if ( isset( $candidatesByLocale[getenv( 'LANG' )] ) ) {
$this->setVar( 'wgShellLocale', getenv( 'LANG' ) );
return true;
}
# Try the most common ones.
$commonLocales = [ 'C.UTF-8', 'en_US.UTF-8', 'en_US.utf8', 'de_DE.UTF-8', 'de_DE.utf8' ];
foreach ( $commonLocales as $commonLocale ) {
if ( isset( $candidatesByLocale[$commonLocale] ) ) {
$this->setVar( 'wgShellLocale', $commonLocale );
return true;
}
}
# Is there an available locale in the Wiki's language?
$wikiLang = $this->getVar( 'wgLanguageCode' );
if ( isset( $candidatesByLang[$wikiLang] ) ) {
$m = reset( $candidatesByLang[$wikiLang] );
$this->setVar( 'wgShellLocale', $m[0] );
return true;
}
# Are there any at all?
if ( count( $candidatesByLocale ) ) {
$m = reset( $candidatesByLocale );
$this->setVar( 'wgShellLocale', $m[0] );
return true;
}
# Give up.
return true;
}
/**
* Environment check for the permissions of the uploads directory
* @return bool

View file

@ -56,7 +56,7 @@ class LocalSettingsGenerator {
$confItems = array_merge(
[
'wgServer', 'wgScriptPath',
'wgPasswordSender', 'wgImageMagickConvertCommand', 'wgShellLocale',
'wgPasswordSender', 'wgImageMagickConvertCommand',
'wgLanguageCode', 'wgLocaltimezone', 'wgEnableEmail', 'wgEnableUserEmail',
'wgDiff3', 'wgEnotifUserTalk', 'wgEnotifWatchlist', 'wgEmailAuthentication',
'wgDBtype', 'wgSecretKey', 'wgRightsUrl', 'wgSitename', 'wgRightsIcon',
@ -242,13 +242,6 @@ class LocalSettingsGenerator {
$magic = '';
}
if ( !$this->values['wgShellLocale'] ) {
$this->values['wgShellLocale'] = 'C.UTF-8';
$locale = '#';
} else {
$locale = '';
}
$metaNamespace = '';
if ( $this->values['wgMetaNamespace'] !== $this->values['wgSitename'] ) {
$metaNamespace = "\$wgMetaNamespace = \"{$this->values['wgMetaNamespace']}\";\n";
@ -392,14 +385,6 @@ ${serverSetting}
# with MediaWiki developers to help guide future development efforts.
\$wgPingback = {$this->values['wgPingback']};
## If you use ImageMagick (or any other shell command) on a
## Linux server, this will need to be set to the name of an
## available UTF-8 locale. This should ideally be set to an English
## language locale so that the behaviour of C library functions will
## be consistent with typical installations. Use \$wgLanguageCode to
## localise the wiki.
{$locale}\$wgShellLocale = \"{$this->values['wgShellLocale']}\";
# Site language code, should be one of the list in ./languages/data/Names.php
\$wgLanguageCode = \"{$this->values['wgLanguageCode']}\";

View file

@ -44,32 +44,10 @@ class FloatDef extends NumericDef {
return $this->checkRange( $ret, $name, $value, $settings, $options );
}
/**
* Attempt to fix locale weirdness
*
* We don't have any usable number formatting function that's not locale-aware,
* and `setlocale()` isn't safe in multithreaded environments. Sigh.
*
* @param string $value Value to fix
* @return string
*/
private function fixLocaleWeirdness( $value ) {
$localeData = localeconv();
if ( $localeData['decimal_point'] !== '.' ) {
$value = strtr( $value, [
$localeData['decimal_point'] => '.',
// PHP's number formatting currently uses only the first byte from 'decimal_point'.
// See upstream bug https://bugs.php.net/bug.php?id=78113
$localeData['decimal_point'][0] => '.',
] );
}
return $value;
}
public function stringifyValue( $name, $value, array $settings, array $options ) {
// Ensure sufficient precision for round-tripping. PHP_FLOAT_DIG was added in PHP 7.2.
$digits = defined( 'PHP_FLOAT_DIG' ) ? PHP_FLOAT_DIG : 15;
return $this->fixLocaleWeirdness( sprintf( "%.{$digits}g", $value ) );
return sprintf( "%.{$digits}g", $value );
}
public function getHelpInfo( $name, array $settings, array $options ) {

View file

@ -41,6 +41,7 @@
* @ingroup FileBackend
*/
use Shellbox\Shellbox;
use Wikimedia\AtEase\AtEase;
use Wikimedia\Timestamp\ConvertibleTimestamp;
@ -822,9 +823,9 @@ class FSFileBackend extends FileBackendStore {
// inode are unaffected since it writes to a new inode, and (c) new threads reading
// the file will either totally see the old version or totally see the new version
$fsStagePath = $this->makeStagingPath( $fsDstPath );
$encSrc = escapeshellarg( $this->cleanPathSlashes( $fsSrcPath ) );
$encStage = escapeshellarg( $this->cleanPathSlashes( $fsStagePath ) );
$encDst = escapeshellarg( $this->cleanPathSlashes( $fsDstPath ) );
$encSrc = Shellbox::escape( $this->cleanPathSlashes( $fsSrcPath ) );
$encStage = Shellbox::escape( $this->cleanPathSlashes( $fsStagePath ) );
$encDst = Shellbox::escape( $this->cleanPathSlashes( $fsDstPath ) );
if ( $this->os === 'Windows' ) {
// https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/copy
// https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/move
@ -854,8 +855,8 @@ class FSFileBackend extends FileBackendStore {
private function makeMoveCommand( $fsSrcPath, $fsDstPath, $ignoreMissing = false ) {
// https://manpages.debian.org/buster/coreutils/mv.1.en.html
// https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/move
$encSrc = escapeshellarg( $this->cleanPathSlashes( $fsSrcPath ) );
$encDst = escapeshellarg( $this->cleanPathSlashes( $fsDstPath ) );
$encSrc = Shellbox::escape( $this->cleanPathSlashes( $fsSrcPath ) );
$encDst = Shellbox::escape( $this->cleanPathSlashes( $fsDstPath ) );
if ( $this->os === 'Windows' ) {
$writeCmd = "MOVE /Y $encSrc $encDst 2>&1";
$cmd = $ignoreMissing ? "IF EXIST $encSrc $writeCmd" : $writeCmd;
@ -875,7 +876,7 @@ class FSFileBackend extends FileBackendStore {
private function makeUnlinkCommand( $fsPath, $ignoreMissing = false ) {
// https://manpages.debian.org/buster/coreutils/rm.1.en.html
// https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/del
$encSrc = escapeshellarg( $this->cleanPathSlashes( $fsPath ) );
$encSrc = Shellbox::escape( $this->cleanPathSlashes( $fsPath ) );
if ( $this->os === 'Windows' ) {
$writeCmd = "DEL /Q $encSrc 2>&1";
$cmd = $ignoreMissing ? "IF EXIST $encSrc $writeCmd" : $writeCmd;

View file

@ -4338,7 +4338,6 @@ class Parser {
}
# HTML IDs must be case-insensitively unique for IE compatibility (T12721).
# @todo FIXME: We may be changing them depending on the current locale.
$arrayKey = strtolower( $safeHeadline );
if ( $fallbackHeadline === false ) {
$fallbackArrayKey = false;

View file

@ -151,11 +151,12 @@ class Shell {
}
/**
* Version of escapeshellarg() that works better on Windows.
* Locale-independent version of escapeshellarg()
*
* Originally, this fixed the incorrect use of single quotes on Windows
* (https://bugs.php.net/bug.php?id=26285) and the locale problems on Linux in
* PHP 5.2.6+ (bug backported to earlier distro releases of PHP).
* PHP 5.2.6+ (https://bugs.php.net/bug.php?id=54391). The second bug is still
* open as of 2021.
*
* @param string|string[] ...$args strings to escape and glue together, or a single
* array of strings parameter. Null values are ignored.

View file

@ -154,53 +154,6 @@ class FloatDefTest extends TypeDefTestCase {
];
}
/** @dataProvider provideLocales */
public function testStringifyValue_localeWeirdness( $locale ) {
static $cats = [ LC_ALL, LC_MONETARY, LC_NUMERIC ];
$curLocales = [];
foreach ( $cats as $c ) {
$curLocales[$c] = setlocale( $c, '0' );
if ( $curLocales[$c] === false ) {
$this->markTestSkipped( 'Locale support is unavailable' );
}
}
try {
foreach ( $cats as $c ) {
if ( setlocale( $c, $locale ) === false ) {
$this->markTestSkipped( "Locale \"$locale\" is unavailable" );
}
}
$typeDef = $this->getInstance( new SimpleCallbacks( [] ), [] );
$this->assertSame( '123456.789', $typeDef->stringifyValue( 'test', 123456.789, [], [] ) );
$this->assertSame( '-123456.789', $typeDef->stringifyValue( 'test', -123456.789, [], [] ) );
$this->assertSame( '1.0e+20', $typeDef->stringifyValue( 'test', 1e20, [], [] ) );
$this->assertSame( '1.0e-20', $typeDef->stringifyValue( 'test', 1e-20, [], [] ) );
} finally {
foreach ( $curLocales as $c => $v ) {
setlocale( $c, $v );
}
}
}
public function provideLocales() {
return [
// May as well test these.
[ 'C' ],
[ 'C.UTF-8' ],
// Some hopefullt-common locales with decimal_point = ',' and thousands_sep = '.'
[ 'de_DE' ],
[ 'de_DE.utf8' ],
[ 'es_ES' ],
[ 'es_ES.utf8' ],
// This one, on my system at least, has decimal_point as U+066B.
[ 'ps_AF' ],
];
}
public function provideGetInfo() {
return [
'Basic test' => [