wiki.techinc.nl/includes/language/LanguageConverterIcu.php
David Kamholz 9cb5187944 Implement Balinese language converter
This patch implements the BanConverter class for Balinese. Its purpose is to transliterate Balinese in Balinese script to Latin script. Latin to Balinese is not currently supported, because (1) the Latin transliteration is not fully one-to-one, (2) I'm not aware of any users who currently need Latin to Balinese.

The converter supports three distinct Latin transliteration variants: ban-dharma, ban-palmleaf, and ban-puri-kauhan-ubud. All three variants have been requested by different Balinese community members working with Balinese palm-leaf manuscripts. ban-puri-kauhan-ubud is the default, as it is the most familiar to lontar scholars, but Balinese Wikisource users will be able to select their preferred variant via a user script.

Conversion is accomplished via ICU Rule-Based Transliterators, bindings for which are available through the Intl extension.

This patchset adds the abstract class LanguageConverterIcu and has BanConverter inherit from it (makes future ICU-based LanguageConverters easier).

Bug: T263082
Change-Id: Ic3a46a215fbf020a022726e6b130b1d25496e284
2020-12-21 12:45:41 -08:00

97 lines
2.7 KiB
PHP

<?php
/**
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @file
*/
/**
* A class that extends LanguageConverterSpecific for converts that use
* ICU rule-based transliterators.
*
* @ingroup Language
*/
abstract class LanguageConverterIcu extends LanguageConverterSpecific {
/**
* @var Transliterator[]
*/
protected $mTransliterators;
/**
* Creates empty tables. mTransliterators will be used instead.
*/
protected function loadDefaultTables() {
$this->mTables = [];
foreach ( $this->mVariants as $variant ) {
$this->mTables[$variant] = new ReplacementArray();
}
}
public function translate( $text, $variant ) {
$text = parent::translate( $text, $variant );
if ( trim( $text ) ) {
$text = $this->icuTranslate( $text, $variant );
}
return $text;
}
/**
* Translate a string to a variant using ICU transliterator.
*
* @param string $text Text to convert
* @param string $variant Variant language code
* @return string Translated text
*/
public function icuTranslate( $text, $variant ) {
return $this->getTransliterators()[$variant]->transliterate( $text );
}
/**
* Get the array mapping variants to ICU transliteration rules.
* Subclasses must implement this.
*
* @return string[]
*/
abstract protected function getIcuRules();
/**
* Get the array mapping variants to ICU transliterators.
*
* @return Transliterator[]
*/
protected function getTransliterators() {
if ( $this->mTransliterators === null ) {
$this->mTransliterators = [];
foreach ( $this->getIcuRules() as $variant => $rule ) {
$this->mTransliterators[$variant] = Transliterator::createFromRules( $rule );
}
foreach ( $this->getTransliteratorAliases() as $alias => $variant ) {
$this->mTransliterators[$alias] = $this->mTransliterators[$variant];
}
}
return $this->mTransliterators;
}
/**
* Get the array mapping variant aliases to main variant.
*
* @return string[]
*/
protected function getTransliteratorAliases() {
return [];
}
}