requiring customization of MySQL server settings Short words are padded so they now get indexed. Yay! Adapted part of Werdna's patch, with some additional cleanup: * Using 'U00' to pad instead of 'SMALL' to reduce false positives (eg search for "small*" could match "Smallville" and "SMALLc") * Checking server's ft_min_word_len variable to see if we need to do anything. This preserves index compatibility with existing installations which have customized their index length. * Some further cleanup on redundant code -- just toss everything through lc() and be done with it :D * Cleaned out some more evals in zh and yue classes :P * Fixed yue class to call the parent adjustor properly
21 lines
502 B
PHP
21 lines
502 B
PHP
<?php
|
|
/**
|
|
* @ingroup Language
|
|
*/
|
|
class LanguageYue extends Language {
|
|
function stripForSearch( $string ) {
|
|
wfProfileIn( __METHOD__ );
|
|
|
|
// eventually this should be a word segmentation
|
|
// for now just treat each character as a word
|
|
// @fixme only do this for Han characters...
|
|
$t = preg_replace(
|
|
"/([\\xc0-\\xff][\\x80-\\xbf]*)/",
|
|
" $1", $string);
|
|
|
|
// Do general case folding and UTF-8 armoring
|
|
$t = parent::stripForSearch( $t );
|
|
wfProfileOut( __METHOD__ );
|
|
return $t;
|
|
}
|
|
}
|