Commit graph

93 commits

Author SHA1 Message Date
Amir Aharoni
14d363c29f Add converter for the Talysh language (tly)
Mostly copied from UzConverter.

This is a very simple converter, with bidirectional one to one
correspondence: for every Latin letter there is a corresponding
Cyrillic letter and vice versa. There are no digraphs or punctuation
to convert.

The Latin alphabet is the primary one used for this language today,
and will probably remain so for the foreseeable future, so "tly" remains
the usual code, and "tly-cyrl" is added for Cyrillic.

Language name is changed:
* The main language name is now Latin.
* The word "language" ("зывон") is removed.
* The spelling of the word "Talysh" is based on the Pireyko dictionary.

Bug: T258975
Change-Id: I552e07967ea82e03c413a0b10b129a846aa007c7
2021-02-17 13:49:36 +00:00
Amir Aharoni
b025479670 Update the autonym of language shi
This language is known in English as Shilha, Tashelhit,
Tachelhit, and also by some other names.

The most authoritative modern name in the language itself
is Taclḥiyt, with the letter c. It appears in the textbook
"Méthode de tachelhit, langue amazighe (berbère) du sud du Maroc"
by Abdallah El Mountassir and also in the online
"Dictionnaire Général de la Langue Amazighe Informatisé"
( https://tal.ircam.ma/dglai/lexieam.php ). It also directly
corresponds to the spelling in the Tifinagh script.

This change was requested by the localizers in translatewiki
and authors in the Wikipedia Incubator in this language.
They asked to define the Latin script as the primary.

The name of the variants are updated in the i18n file:
* The Latin names are changed to Taclḥit, as the Latin autonym.
* The Tifinagh name is added: it was missing, which caused
  the item in the dropdown to be invisible.

Change-Id: I23689a90460110570c54a05587e5084607268035
2021-01-01 10:17:44 +02:00
Amir Aharoni
ad12bd92d8 Add Tyap to Names.php
Bug: T270365
Change-Id: I342318fb170339c8d80c66d08a1018ed15a47dca
2020-12-17 13:45:46 +00:00
Pikne
dbb1a9ef64 Decapitalize autonyms for Võro, Livvi-Karelian and Komi-Permyak
Võro speaker reports that language names in Võro are not
capitalized, just like in Estonian:
https://et.wikipedia.org/w/index.php?diff=5771671
Also decapitalizing Livvinkarjala and Перем Коми, respectively
per usage at https://olo.wikipedia.org/wiki/Piäsivu and
https://koi.wikipedia.org/wiki/Коми-пермяцкӧй_кыв

Change-Id: Ic5c5237f101ffde70990c388bacb202d1af26a79
2020-12-13 08:40:33 +00:00
Amir Aharoni
fa5013dadb Change the autonym of the Altai language
Making it the same as in language-data. This is the main
variety, so there's no need to call it "Southern".

Change-Id: I8c667b55d59b9c40b00e835f5ff1da6dbdf9bdf7
2020-12-09 10:29:58 +00:00
MarcoAurelio
57322997cb Add language support for Madurese (mad)
Bug: T264582
Change-Id: I5f112a8cdd1d441b73c044dfa9363db41cc4597a
2020-10-05 10:13:33 +00:00
MarcoAurelio
c5b6d7901d Names.php: Add support for Nias/Li Niha language
Bug: T263968
Change-Id: Idf8922e80c386a9f22e6e8b0ead19817b118fc38
2020-09-28 08:49:11 +00:00
jenkins-bot
1d85e0df83 Merge "Add mrh Mara to Names.php" 2020-09-15 16:22:49 +00:00
David Kamholz
a7d7c2c7db Add ban-bali to list of known language codes
This change adds ban-bali (Balinese language in Balinese script), falling back to ban (Balinese language in Latin script).

The language code ban-bali is already known to UniversalLanguageSelector and will be used on Balinese Wikisource (currently in development, hosted on Multilingual Wikisource). ban-bali will likely be used as a page language only for now, or for pieces of text within pages. It's important to make it available as a page language so that ULS can apply the correct font.

Bug: T245359
Change-Id: Icaff92904e9d1250c8c84bfcc6cfa3ebcb5230bd
2020-08-18 14:46:32 -07:00
Jon Harald Søby
75d22c7659 Add mrh Mara to Names.php
Bug: T259330
Change-Id: I21eb638434a3ca77d55491c08b881383c89911ed
2020-08-07 12:28:15 +02:00
MarcoAurelio
9604fd92d3 languages: Add Southern Altay (alt)
Bug: T254854
Change-Id: I62c5c4a7115d0d9d3a08f038ae90b274934930a2
2020-06-09 09:53:53 +00:00
jenkins-bot
073dbe9ad3 Merge "Fix native sty name from cебертатар to себертатар" 2020-05-21 22:16:06 +00:00
Amir Aharoni
62642ffb49 Add Seediq (trv) to Names.php
Also known as Taroko, e.g. at
https://iso639-3.sil.org/code/trv

Bug: T215023
Change-Id: I1da162c7b996e430b5d636d3f7a7c30321262fac
2020-05-11 09:46:44 +03:00
RhinosF1
0bdcc942fc Add Ladin (lld) to Names.php
Bug: T251369
Change-Id: I238c50d5f667c1989c37caf11865e4d9546f4135
2020-04-29 07:49:54 +00:00
jenkins-bot
9ff388923d Merge "Change Moroccan Arabic (ary) to Arabic script" 2020-04-27 13:42:37 +00:00
Amir Aharoni
99f7e85ef9 Change Moroccan Arabic (ary) to Arabic script
Requested in bug T237672, and the Language committee
doesn't object.

Bug: T237672
Change-Id: I09925647aa03b736c95aaf34c22d16dc7ea4d102
2020-04-23 10:10:57 +03:00
Amir Aharoni
80f5d9a8bd Add Amis (ami) to Names.php
Bug: T201269
Change-Id: Icae6ccd9d67491eceef22418ae078fe08cc680d5
2020-04-17 14:16:36 +03:00
Fomafix
3dcb9d26a1 Add Inari Sami (smn) to Names.php
Autonym in lower case: anarâškielâ

https://iso639-3.sil.org/code/smn
https://www.ethnologue.com/language/smn

Also change the autonym of Southern Sami (sma) from upper case
(Åarjelsaemien) to lower case (åarjelsaemien) to be consistent to
langdb.yaml

Bug: T248299
Change-Id: I69f3322b1414a22dfa707792779d9d59cd7b837d
2020-03-26 14:01:52 +01:00
Reedy
97c6b8db99 Fix native sty name from cебертатар to себертатар
Bug: T247170
Change-Id: Iafd3d5d2de970d53fa90280bd021c594f930eb15
2020-03-07 19:52:46 +00:00
Jon Harald Søby
bd66771202 Fix Shawiya's autonym in Names.php
Change from "tachawit" to "tacawit" per the community.

Bug: T194047
Change-Id: I63c8ae88a341760f6754e7f9716956f3bc33e153
2019-11-25 12:10:44 +01:00
Jon Harald Søby
14c0ed3928 Add [szy] Sakizaya to Names.php
Bug: T174601
Change-Id: Idd8646917dca143d574a32c3989d23f10ed9123d
2019-10-16 14:22:55 +02:00
Aryeh Gregor
6d80b6c082 Split some Language methods to LanguageNameUtils
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.

In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).

There is 100% test coverage for the code in the new class.

This was previously committed as 2e52f48c2e and reverted because it
depended on e4468a1d6b, which had to be reverted for performance
issues. There should be no changes other than rebasing.

Bug: T201405
Change-Id: Ifa346c8a92bf1eb57dc5e79458b32b7b26f1ee8a
2019-10-07 15:20:52 -07:00
Amir Aharoni
17ff5193b9 Change the Balinese language autonym to "Bali"
This is done for consistency with how it is in language-data,
and with how most other names of languages of Indonesia are written.
The word "Basa" means "language" and is usually not necessary
in list of languages.

Change-Id: I1b5195a0d00c21b51ddb1a10cd3ed7b787363fa3
2019-09-11 11:54:35 +03:00
Mahuton
6cd242011e Change the autonym of Banjar from "Bahasa Banjar" to "Banjar"
Change requested on the Sundanese Wikipedia village pump

Bug: T231283
Change-Id: Ib2f49f77634c497135c0b32256fca4e919866a38
2019-08-31 12:05:39 +00:00
Mahuton
a4d29699bf Change the autonym of Sunda from "Basa Sunda" to "Sunda"
Change requested on the Sundanese Wikipedia village pump

Bug: T228832
Change-Id: Id26493395a028b72f0254c7b866eb074eccbe1f9
2019-08-31 12:02:35 +00:00
Amir Sarabadani
308e6427ae Revert "Make LocalisationCache a service"
This reverts commits:
 - 76a940350d
 - b78b8804d0
 - 2e52f48c2e
 - e4468a1d6b

Bug: T231200
Bug: T231198
Change-Id: I1a7e46a979ae5c9c8130dd3927f6663a216ba753
2019-08-26 18:28:26 +02:00
Aryeh Gregor
2e52f48c2e Split some Language methods to LanguageNameUtils
These are static methods that have to do with processing language names
and codes. I didn't include fallback behavior, because that would mean a
circular dependency with LocalisationCache.

In the new class, I renamed AS_AUTONYMS to AUTONYMS, and added a class
constant DEFINED for 'mw' to match the existing SUPPORTED and ALL. I
also renamed fetchLanguageName(s) to getLanguageName(s).

There is 100% test coverage for the code in the new class.

Change-Id: I245ae94bfc1f62b6af75ea57525139adf2539fe6
2019-08-23 12:52:35 +03:00
Amir Aharoni
a5e4af06bf Change the autonym of Manipuri to Meetei script
Meetei Mayek script is the one used in schools in Manipur
(see the Wikipedia article [[Meitei script]] for references).
It is also the script that is used for all the translations
in translatewiki [1] and for all the test articles
in the Wikimedia Incubator.[2]

[1] https://translatewiki.net/w/i.php?title=Special:Translate&filter=translated&group=mediawiki&language=mni&action=translate
[2] https://incubator.wikimedia.org/wiki/Special:PrefixIndex/wp/mni

Corresponding commit in language-data:
https://github.com/wikimedia/language-data/pull/57

Change-Id: I892320f2dfaac05bfc2a544502823a244a9ad1c8
2019-08-02 15:46:20 +03:00
Mahuton
1aaadd41ff Change the autonym of Minangkabau from "Baso Minangkabau" to "Minangkabau"
Change requested on the Minangkabau Wikipedia village pump

Bug: T224806
Change-Id: I40ddc71f4f6ab73b6d4c4f19ec57bc9c9e221b14
2019-07-08 07:06:36 +02:00
Siebrand Mazeland
193636c33e Correct autonym for rmy (Vlax Romani)
Bug: T223524
Change-Id: Id128199119001bd318012d034ef727cdfa256f54
2019-05-21 09:00:14 +02:00
Amir Aharoni
f6f5987877 Add N'Ko to Names.php
Bug: T221994
Change-Id: I8c8e05ebabec8d50fa089b4d88061667a29d8b83
2019-04-27 07:51:40 +03:00
jenkins-bot
3f27499122 Merge "Change the autonym of Javanese from "Basa Jawa" to "Jawa"" 2019-04-25 23:19:07 +00:00
jhsoby
85c5164bab Capitalize native name of Western Armenian
Language names in Western Armenian start with capital letters.

Bug: T219975
Change-Id: Ic4e1c8ce395324a0e68a2212576fcfbc3b22bb2f
2019-04-23 13:30:50 +00:00
Amir Aharoni
250c10bca0 Change the autonym of Javanese from "Basa Jawa" to "Jawa"
The word "Basa" simply means "language" and it is unnecessary.

This was requested at the Javanese Wikipedia village pump:
https://jv.wikipedia.org/w/index.php?title=Wikipedia:Warung_Kopi&oldid=1478057#Jawa_utawa_Basa_Jawa

Change-Id: Ie5546c868fce2722f70893ece49c05d75302e804
2019-04-20 14:31:59 +03:00
MarcoAurelio
ccf2039021 Add language support for Saisiyat (xsy)
Bug: T216479
Change-Id: Ide1c708c2cf3124794da650667ab140b1f4e9f5e
2019-02-21 17:39:27 +00:00
Zoranzoki21
995941d3f8 Changed the name of the (gcr) language from "Kreyol Gwiyanè" to "Kriyòl Gwiyannen"
Per: https://translatewiki.net/wiki/Thread:Support/Need_Help_!

Change-Id: Idb03ad0dd163801ddfd4a7c77efc337303cfb678
2019-02-09 14:38:47 +00:00
shandrenkoff
3102742206 Add langauge 'kjp' Eastern Pwo
This change adds the Eastern Pwo language with ISO 639-3 code 'kjp' with
'bo' (Tibetian) fallback. The default script is the Burmese script.

Bug: T203908
Change-Id: Ic69c5f1398bcd96674254b69f678f21b71feb475
2018-10-29 15:21:09 +00:00
C. Scott Ananian
21ead7a98d Ensure LanguageCode::bcp47() returns a valid BCP 47 language code
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry.  Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).

Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes.  Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well, and return the standard BCP 47 codes in the siteinfo
API.

Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!

Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.

Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.

This is a replay of an earlier merged patch,
8380f0173e, which had to be reverted
because it caused regressions in the Babel extension (T199941).

Bug: T34483
Bug: T106367
Bug: T120847
Depends-On: I27a5b8e45b34c6b57c1b612b11548001c88cd483
Change-Id: Iebbc604af21d7f2af9c1f1ab2574cb5f309bf6ed
2018-10-11 01:53:54 -04:00
Fomafix
5632815976 Write Latin and other scripts with captial letter
Change-Id: I16c660e54191b63cd6eb3407cb00504665930c4e
2018-10-05 18:49:08 +02:00
jhsoby
109efd4c99 Fix autonym for Armenian
The Armenian autonym should not have a capital
initial, as names of languages are not proper
nouns in that language.

Bug: T202611
Change-Id: I17cd8706f5fee2f39255c3407b758103e4cb5455
2018-09-03 04:32:27 +00:00
MarcoAurelio
fa6d6eb9bf Add language support for Mon (mnw)
Bug: T201583
Change-Id: Ic03b910c3cfc2419ece783d04adb486570416ba3
2018-08-09 18:53:56 +00:00
jenkins-bot
3de7bf779d Merge "Change name of Santali to localized version" 2018-08-06 13:46:30 +00:00
MarcoAurelio
3d6a189e3a Add language support for Western Armenian (hyw)
* Language name: Western Armenian <https://www.ethnologue.com/language/hyw>.
 * Local version: արեւմտահայերէն
 * ISO639-3: hyw <https://iso639-3.sil.org/code/hyw>.
 * Fallback: Armenian (hy).

Bug: T201276
Change-Id: Ic76d7a9a1fa8541fd422a4287044de4daaa6665d
2018-08-06 12:38:14 +02:00
Martin Urbanec
8f85bbd7a9 Change name of Santali to localized version
Bug: T198400
Change-Id: Id2bbcfebf32903c4d8882e2b4f18f37a8a5c3366
2018-08-05 18:30:28 +02:00
Greg Grossmeier
b302b0cd1c Revert "Ensure LanguageCode::bcp47() returns a valid BCP 47 language code"
This reverts commit 8380f0173e.

Reason for revert: Caused T199941

Bug: T199941
Change-Id: I93af756a2d70d6bc91f828fe6ac19bf10ca8788f
2018-07-23 17:27:23 +00:00
C. Scott Ananian
8380f0173e Ensure LanguageCode::bcp47() returns a valid BCP 47 language code
MediaWiki uses a number of nonstandard codes which do not validate
according to the IANA language subtag registry.  Some of them have
the wrong semantics entirely: MediaWiki's `sr-ec` variant maps to
BCP 47 `sr-EC` which is "Serbian as used in Ethiopia" (!).

Extend LanguageCode::bcp47() to map our nonstandard codes to valid
BCP 47 language codes.  Export the mapping so that it can be used
in JavaScript's corresponding mw.language.bcp47() implementation
as well.

Thanks to TheDJ (I10b4473c7e53f027812bbccf26bb47aec15fddfd) and
Fomafix (I93efc190714ba76247d30ba49fc21ae872fc3555) for previous
attempts at this!

Also removed a fixme for the name of 'Twi', dating back to 2004
(f59c3be23b) -- checking
tw.wikipedia.org it certainly appears that the autonym of 'Twi'
is correctly 'Twi'.

Tracking bugs for invalid language codes are T125073 and T145535.
Discussion of zh-XX => zh-HanX-XX mapping is at T198419.

Bug: T34483
Bug: T106367
Bug: T120847
Change-Id: I807dd55d49e9bd19443329231326a5b0d3e6c453
2018-07-13 14:56:18 -04:00
MarcoAurelio
bc9e865ab7 Add Manipuri/Meitei to Names.php
Bug: T198132
Change-Id: I43620c1f34eecda69c61ea0bb13a213e0e6a457d
2018-06-29 05:40:48 +00:00
Niklas Laxström
a19320bf90 Add the es-419 language code to support South American Spanish
This code is useful for targeting Spanish spoken in the Latin America
and the Caribbean region. There are no plans to make this available as
an interface language, hence I am not adding a language file with a
fallback to 'es'.

Bug: T112889
Change-Id: If7f0ed7a13f1cc86985ce5ce509dcf543cc1c0ff
2018-06-24 18:28:02 +00:00
Étienne Beaulé
ef7ff1c26d
Add langauge 'zgh' Standard Moroccan Amazigh
This change adds the Standard Moroccan Amazigh language with ISO
639-3 code 'zgh' with 'kab' (Kabyle) fallback. The default script is the
Neo-Tifinagh script.

Bug: T137491
Change-Id: Idd13f92d7ae05cd47267558c8ff4fa368b701e24
2018-06-11 10:24:08 -03:00
Bartosz Dziewoński
0313128b10 Use PHP 7 "\u{NNNN}" Unicode codepoint escapes in string literals
In cases where we're operating on text data (and not binary data),
use e.g. "\u{00A0}" to refer directly to the Unicode character
'NO-BREAK SPACE' instead of "\xc2\xa0" to specify the bytes C2h A0h
(which correspond to the UTF-8 encoding of that character). This
makes it easier to look up those mysterious sequences, as not all
are as recognizable as the no-break space.

This is not enforced by PHP, but I think we should write those in
uppercase and zero-padded to at least four characters, like the
Unicode standard does.

Note that not all "\xNN" escapes can be automatically replaced:
* We can't use Unicode escapes for binary data that is not UTF-8
  (e.g. in code converting from legacy encodings or testing the
  handling of invalid UTF-8 byte sequences).
* '\xNN' escapes in regular expressions in single-quoted strings
  are actually handled by PCRE and have to be dealt with carefully
  (those regexps should probably be changed to use the /u modifier).
* "\xNN" referring to ASCII characters ("\x7F" and lower) should
  probably be left as-is.

The replacements in this commit were done semi-manually by piping
the existing "\xNN" escapes through the following terrible Ruby
script I devised:

  chars = eval('"' + ARGV[0] + '"').force_encoding('utf-8')
  puts chars.split('').map{|char|
    '\\u{' + char.ord.to_s(16).upcase.rjust(4, '0') + '}'
  }.join('')

Change-Id: Idc3dee3a7fb5ebfaef395754d8859b18f1f8769a
2018-06-04 16:20:13 +00:00