wiki.techinc.nl/languages/classes/LanguageZh_hans.php

<?php

/**
 * @ingroup Language
 */
class LanguageZh_hans extends Language {
	function hasWordBreaks() {
		return false;
	}
	
	function stripForSearch( $string ) {
		// Eventually this should be a word segmentation;
		// for now just treat each character as a word.
		//
		// Note we put a space on both sides to cover cases
		// where a number or Latin char follows a Han char.
		//
		// @fixme only do this for Han characters...
		$t = preg_replace(
				"/([\\xc0-\\xff][\\x80-\\xbf]*)/",
				" $1 ", $string);
		$t = preg_replace( '/ +/', ' ', $t );
		$t = trim( $t );
		return parent::stripForSearch( $t );
	}
}
Moved most content of LanguageZh.php to LanguageZh_cn.php. Now LanguageZh.php mainly handles the conversion between Traditional and Simplified Chinese 2004-09-16 20:18:49 +00:00			`<?php`
WARNING: HUGE COMMIT Doxygen documentation update: * Changed alls @addtogroup to @ingroup. @addtogroup adds the comment to the group description, but doesn't add the file, class, function, ... to the group like @ingroup does. See for example http://svn.wikimedia.org/doc/group__SpecialPage.html where it's impossible to see related files, classes, ... that should belong to that group. * Added @file to file description, it seems that it should be explicitely decalred for file descriptions, otherwise doxygen will think that the comment document the first class, variabled, function, ... that is in that file. * Removed some empty comments * Removed some ?> Added following groups: * ExternalStorage * JobQueue * MaintenanceLanguage One more thing: there are still a lot of warnings when generating the doc. 2008-05-20 17:13:28 +00:00
Revert breakage from 15190. Broke 'Wikipedia' meta namespace 2006-07-01 19:05:46 +00:00			`/**`
WARNING: HUGE COMMIT Doxygen documentation update: * Changed alls @addtogroup to @ingroup. @addtogroup adds the comment to the group description, but doesn't add the file, class, function, ... to the group like @ingroup does. See for example http://svn.wikimedia.org/doc/group__SpecialPage.html where it's impossible to see related files, classes, ... that should belong to that group. * Added @file to file description, it seems that it should be explicitely decalred for file descriptions, otherwise doxygen will think that the comment document the first class, variabled, function, ... that is in that file. * Removed some empty comments * Removed some ?> Added following groups: * ExternalStorage * JobQueue * MaintenanceLanguage One more thing: there are still a lot of warnings when generating the doc. 2008-05-20 17:13:28 +00:00			`* @ingroup Language`
			`*/`
* (bug 11284) Update Chinese translations Patch by Shinjiman 2007-09-27 15:40:35 +00:00			`class LanguageZh_hans extends Language {`
* (bug 8445) Multiple-character search terms are now handled properly for Chinese Big fixup for Chinese word breaks and variant conversions in the MySQL search backend... - removed redunant variant terms for Chinese, which forces all search indexing to canonical zh-hans - added parens to properly group variants for languages such as Serbian which do need them at search time - added quotes to properly group multi-word terms coming out of stripForSearch, as for Chinese where we segment up the characters. This is based on Language::hasWordBreaks() check. - also cleaned up LanguageZh_hans::stripForSearch() to just do segmentation and pass on the Unicode stripping to the base Language implementation, avoiding scary code duplication. Segmentation was already pulled up to LanguageZh, but was being run again at the second level. :P - made a fix to Chinese word segmentation to handle the case where a Han character is followed by a Latin char or numeral; a space is now added after as well. Spaces are then normalized for prettiness. 2009-06-24 02:27:51 +00:00			`function hasWordBreaks() {`
			`return false;`
			`}`

Moved most content of LanguageZh.php to LanguageZh_cn.php. Now LanguageZh.php mainly handles the conversion between Traditional and Simplified Chinese 2004-09-16 20:18:49 +00:00			`function stripForSearch( $string ) {`
* (bug 8445) Multiple-character search terms are now handled properly for Chinese Big fixup for Chinese word breaks and variant conversions in the MySQL search backend... - removed redunant variant terms for Chinese, which forces all search indexing to canonical zh-hans - added parens to properly group variants for languages such as Serbian which do need them at search time - added quotes to properly group multi-word terms coming out of stripForSearch, as for Chinese where we segment up the characters. This is based on Language::hasWordBreaks() check. - also cleaned up LanguageZh_hans::stripForSearch() to just do segmentation and pass on the Unicode stripping to the base Language implementation, avoiding scary code duplication. Segmentation was already pulled up to LanguageZh, but was being run again at the second level. :P - made a fix to Chinese word segmentation to handle the case where a Han character is followed by a Latin char or numeral; a space is now added after as well. Spaces are then normalized for prettiness. 2009-06-24 02:27:51 +00:00			`// Eventually this should be a word segmentation;`
			`// for now just treat each character as a word.`
			`//`
			`// Note we put a space on both sides to cover cases`
			`// where a number or Latin char follows a Han char.`
			`//`
			`// @fixme only do this for Han characters...`
			`$t = preg_replace(`
			`"/([\\xc0-\\xff][\\x80-\\xbf]*)/",`
			`" $1 ", $string);`
			`$t = preg_replace( '/ +/', ' ', $t );`
			`$t = trim( $t );`
			`return parent::stripForSearch( $t );`
Moved most content of LanguageZh.php to LanguageZh_cn.php. Now LanguageZh.php mainly handles the conversion between Traditional and Simplified Chinese 2004-09-16 20:18:49 +00:00			`}`
			`}`