wiki.techinc.nl/includes/collation/NumericUppercaseCollation.php

<?php
/**
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 * http://www.gnu.org/copyleft/gpl.html
 *
 * @file
 */

use MediaWiki\Languages\LanguageFactory;

/**
 * Collation that orders text with numbers "naturally", so that 'Foo 1' < 'Foo 2' < 'Foo 12'.
 *
 * Note that this only works in terms of sequences of digits, and the behavior for decimal fractions
 * or pretty-formatted numbers may be unexpected.
 *
 * Digits will be based on the wiki's content language settings. If
 * you change the content language of a wiki you will need to run
 * updateCollation.php --force. Only English (ASCII 0-9) and the
 * localized version will be counted. Localized digits from other languages
 * or weird unicode digit equivalents (e.g. ４, 𝟜, ⓸ , ⁴, etc) will not count.
 *
 * @since 1.28
 */
class NumericUppercaseCollation extends UppercaseCollation {

	/**
	 * @var Language How to convert digits (usually the content language)
	 */
	private $digitTransformLang;

	/**
	 * @param LanguageFactory $languageFactory
	 * @param string|Language $digitTransformLang How to convert digits.
	 *  For example, if given language "my" than ၇ is treated like 7.
	 *  It is expected that usually this is given the content language.
	 */
	public function __construct(
		LanguageFactory $languageFactory,
		$digitTransformLang
	) {
		$this->digitTransformLang = $digitTransformLang instanceof Language
			? $digitTransformLang
			: $languageFactory->getLanguage( $digitTransformLang );
		parent::__construct( $languageFactory );
	}

	public function getSortKey( $string ) {
		$sortkey = parent::getSortKey( $string );
		$sortkey = $this->convertDigits( $sortkey );
		// For each sequence of digits, insert the digit '0' and then the length of the sequence
		// (encoded in two bytes) before it. That's all folks, it sorts correctly now! The '0' ensures
		// correct position (where digits would normally sort), then the length will be compared putting
		// shorter numbers before longer ones; if identical, then the characters will be compared, which
		// generates the correct results for numbers of equal length.
		$sortkey = preg_replace_callback( '/\d+/', static function ( $matches ) {
			// Strip any leading zeros
			$number = ltrim( $matches[0], '0' );
			$len = strlen( $number );
			// This allows sequences of up to 65536 numeric characters to be handled correctly. One byte
			// would allow only for 256, which doesn't feel future-proof.
			$prefix = chr( floor( $len / 256 ) ) . chr( $len % 256 );
			return '0' . $prefix . $number;
		}, $sortkey );

		return $sortkey;
	}

	/**
	 * Convert localized digits to english digits.
	 *
	 * based on Language::parseFormattedNumber but without commas.
	 *
	 * @param string $string sortkey to unlocalize digits of
	 * @return string Sortkey with all localized digits replaced with ASCII digits.
	 */
	private function convertDigits( $string ) {
		$table = $this->digitTransformLang->digitTransformTable();
		if ( $table ) {
			$table = array_filter( $table );
			$flipped = array_flip( $table );
			// Some languages seem to also have commas in this table.
			$flipped = array_filter( $flipped, 'is_numeric' );
			$string = strtr( $string, $flipped );
		}
		return $string;
	}

	public function getFirstLetter( $string ) {
		$convertedString = $this->convertDigits( $string );

		if ( preg_match( '/^\d/', $convertedString ) ) {
			return wfMessage( 'category-header-numerals' )
				->numParams( 0, 9 )
				->text();
		} else {
			return parent::getFirstLetter( $string );
		}
	}
}
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+								<?php
 								/**
 								 * This program is free software; you can redistribute it and/or modify
 								 * it under the terms of the GNU General Public License as published by
 								 * the Free Software Foundation; either version 2 of the License, or
 								 * (at your option) any later version.
 								 *
 								 * This program is distributed in the hope that it will be useful,
 								 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 								 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 								 * GNU General Public License for more details.
 								 *
 								 * You should have received a copy of the GNU General Public License along
 								 * with this program; if not, write to the Free Software Foundation, Inc.,
 								 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 								 * http://www.gnu.org/copyleft/gpl.html
 								 *
 								 * @file
 								 */
-												collation: Create CollationFactory service

Use ObjectFactory specs for collation classes
Avoid the language construction in the factory class,
make it a detail of the implementation of each class

Follow-Up of Ifc96f851e6091ce834dbaf0e91695c648a42169c

Bug: T286079
Change-Id: Ib581f64aec8619986fb8dd49ceee0524d59a1b84

											
										
										
											2021-08-24 19:12:39 +00:00
+								use MediaWiki\Languages\LanguageFactory;
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+								/**
 								 * Collation that orders text with numbers "naturally", so that 'Foo 1' < 'Foo 2' < 'Foo 12'.
 								 *
 								 * Note that this only works in terms of sequences of digits, and the behavior for decimal fractions
 								 * or pretty-formatted numbers may be unexpected.
 								 *
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+								 * Digits will be based on the wiki's content language settings. If
-												Fix typo in 'language'

Change-Id: I3c4d090640892ae07d3da33dcfe3ace397a40808

											
										
										
											2017-10-07 16:53:04 +00:00
+								 * you change the content language of a wiki you will need to run
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+								 * updateCollation.php --force. Only English (ASCII 0-9) and the
 								 * localized version will be counted. Localized digits from other languages
 								 * or weird unicode digit equivalents (e.g. ４, 𝟜, ⓸ , ⁴, etc) will not count.
 								 *
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+								 * @since 1.28
 								 */
 								class NumericUppercaseCollation extends UppercaseCollation {
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
 									/**
-												doxygen: Changed Doxygen tags causing warnings during documentation generation

Updated Doxygen markup in several .php files triggering warnings when mwdocgen.php is executed. Removed
obsolete settings MSCGEN_PATH and TCL_SUBST from Doxyfile. The former would generate a warning in 1.8.16
while TCL support was removed in 1.8.18. Since TCL_SUBST was blank anyway, it was removed prior to getting
to .18 in production. Increased DOT_GRAPH_MAX_NODES from 50 to 200 since Doxygen complained about it being
too low for API and Maintenance.

Bug: T248706
Change-Id: I9c67f0807d1b43089d351263d4f591dee5501f36

											
										
										
											2020-04-07 21:38:17 +00:00
+									 * @var Language How to convert digits (usually the content language)
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+									 */
 									private $digitTransformLang;
 									/**
-												collation: Create CollationFactory service

Use ObjectFactory specs for collation classes
Avoid the language construction in the factory class,
make it a detail of the implementation of each class

Follow-Up of Ifc96f851e6091ce834dbaf0e91695c648a42169c

Bug: T286079
Change-Id: Ib581f64aec8619986fb8dd49ceee0524d59a1b84

											
										
										
											2021-08-24 19:12:39 +00:00
+									 * @param LanguageFactory $languageFactory
 									 * @param string|Language $digitTransformLang How to convert digits.
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+									 *  For example, if given language "my" than ၇ is treated like 7.
-												Inject services into Collation classes

Might be worth converting Collation::singleton/::factory
to a service at some point...

Change-Id: Ifc96f851e6091ce834dbaf0e91695c648a42169c

											
										
										
											2021-03-30 19:02:21 +00:00
+									 *  It is expected that usually this is given the content language.
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+									 */
-												Inject services into Collation classes

Might be worth converting Collation::singleton/::factory
to a service at some point...

Change-Id: Ifc96f851e6091ce834dbaf0e91695c648a42169c

											
										
										
											2021-03-30 19:02:21 +00:00
+									public function __construct(
-												collation: Create CollationFactory service

Use ObjectFactory specs for collation classes
Avoid the language construction in the factory class,
make it a detail of the implementation of each class

Follow-Up of Ifc96f851e6091ce834dbaf0e91695c648a42169c

Bug: T286079
Change-Id: Ib581f64aec8619986fb8dd49ceee0524d59a1b84

											
										
										
											2021-08-24 19:12:39 +00:00
+										LanguageFactory $languageFactory,
 										$digitTransformLang
-												Inject services into Collation classes

Might be worth converting Collation::singleton/::factory
to a service at some point...

Change-Id: Ifc96f851e6091ce834dbaf0e91695c648a42169c

											
										
										
											2021-03-30 19:02:21 +00:00
+									) {
-												collation: Create CollationFactory service

Use ObjectFactory specs for collation classes
Avoid the language construction in the factory class,
make it a detail of the implementation of each class

Follow-Up of Ifc96f851e6091ce834dbaf0e91695c648a42169c

Bug: T286079
Change-Id: Ib581f64aec8619986fb8dd49ceee0524d59a1b84

											
										
										
											2021-08-24 19:12:39 +00:00
+										$this->digitTransformLang = $digitTransformLang instanceof Language
 											? $digitTransformLang
 											: $languageFactory->getLanguage( $digitTransformLang );
 										parent::__construct( $languageFactory );
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+									}
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+									public function getSortKey( $string ) {
 										$sortkey = parent::getSortKey( $string );
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+										$sortkey = $this->convertDigits( $sortkey );
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+										// For each sequence of digits, insert the digit '0' and then the length of the sequence
 										// (encoded in two bytes) before it. That's all folks, it sorts correctly now! The '0' ensures
 										// correct position (where digits would normally sort), then the length will be compared putting
 										// shorter numbers before longer ones; if identical, then the characters will be compared, which
 										// generates the correct results for numbers of equal length.
-												Use static closures where safe to use

This is micro-optimization of closure code to avoid binding the closure
to $this where it is not needed.

Created by I25a17fb22b6b669e817317a0f45051ae9c608208

Change-Id: I0ffc6200f6c6693d78a3151cb8cea7dce7c21653

											
										
										
											2021-02-10 22:31:02 +00:00
+										$sortkey = preg_replace_callback( '/\d+/', static function ( $matches ) {
-												Fixing numeric sorting for numbers with leading zeros

Bug: T148774
Change-Id: I34aa330645d9d82b6c4e57542e891dd2b36e42ad

											
										
										
											2016-10-20 18:56:54 +00:00
+											// Strip any leading zeros
 											$number = ltrim( $matches[0], '0' );
 											$len = strlen( $number );
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+											// This allows sequences of up to 65536 numeric characters to be handled correctly. One byte
 											// would allow only for 256, which doesn't feel future-proof.
 											$prefix = chr( floor( $len / 256 ) ) . chr( $len % 256 );
-												Fixing numeric sorting for numbers with leading zeros

Bug: T148774
Change-Id: I34aa330645d9d82b6c4e57542e891dd2b36e42ad

											
										
										
											2016-10-20 18:56:54 +00:00
+											return '0' . $prefix . $number;
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+										}, $sortkey );
 										return $sortkey;
 									}
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+									/**
 									 * Convert localized digits to english digits.
 									 *
 									 * based on Language::parseFormattedNumber but without commas.
 									 *
-												Add missing type to @param documentation

Change-Id: I6b2c9c7af9a281fe457099cc3a336a60a25e74aa

											
										
										
											2017-08-11 15:46:31 +00:00
+									 * @param string $string sortkey to unlocalize digits of
 									 * @return string Sortkey with all localized digits replaced with ASCII digits.
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+									 */
 									private function convertDigits( $string ) {
 										$table = $this->digitTransformLang->digitTransformTable();
 										if ( $table ) {
 											$table = array_filter( $table );
 											$flipped = array_flip( $table );
 											// Some languages seem to also have commas in this table.
 											$flipped = array_filter( $flipped, 'is_numeric' );
 											$string = strtr( $string, $flipped );
 										}
 										return $string;
 									}
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+									public function getFirstLetter( $string ) {
-												Make NumericUppercaseCollation use localized digit transforms

This will cause the numeric collation to sort localized digits
for the current content language the same as how 0-9 are.

This only deals with the localized digit numbers, commas
and other number formatting are still not handled. Weird
"numerical" unicode characters are also not handled.

I was unsure if to make a "family" of numeric collations
where you specify numeric-<lang code>, or if it should
just use $wgContLang. Given that $wgContLang effectively
never changes, and also affects all other digit handling,
I opted to just use $wgContLang.

Any wikis currently using the 'numeric' collation will
have to have updateCollation.php --force run after this
change is deployed. At the moment that includes:
bnwiki, bnwikisource and hewiki

Bug: T148873
Change-Id: I9eda52a8a9752a91134d1118546b0a80d3980ccf

											
										
										
											2016-10-29 08:29:11 +00:00
+										$convertedString = $this->convertDigits( $string );
 										if ( preg_match( '/^\d/', $convertedString ) ) {
 											return wfMessage( 'category-header-numerals' )
 												->numParams( 0, 9 )
 												->text();
-												Implement NumericUppercaseCollation

This collation orders text with numbers "naturally", so that
'Foo 1' < 'Foo 2' < 'Foo 12'.

Note that this only works in terms of sequences of digits, and the
behavior for decimal fractions or pretty-formatted numbers may be
unexpected.

This is only expected to work mostly correctly for English-language
text. Consider it a proof of concept. You probably want to use
an UCA collation with '-u-kn' suffix rather than this.

Bug: T8948
Change-Id: Ie268f2d92c5c75d0aaecf54ede2bdda1af3b309d

											
										
										
											2016-07-27 14:43:01 +00:00
+										} else {
 											return parent::getFirstLetter( $string );
 										}
 									}
 								}