Sign In
New Customer?

Keynote Talk

Indonesia Language Sphere: An Ecosystem for Dictionary Development in Low-Resourced Languages

There are more than 7000 languages around the world. However, 95 % of world population speak only 5 % of them, at most 400 languages. More than half languages have less than 10,000 speakers. In 2010, UNESCO released a list of 2,464 endangered languages. In Indonesia, 144 languages are endangered. To preserve and increase the use of those languages, we have started Indonesia Language Sphere project. The purpose of this project is to develop comprehensive sets of bilingual dictionaries among Indonesian ethnic languages. To this end, we have proposed a generalized bilingual lexicon induction method by combining the existing 2 dictionaries. Furthermore, to reduce the total cost of bilingual dictionary creation, we have combined the machine and manual creations and constructed a planner that optimizes creation orders. This talk introduces the proposed methods and reports a preliminary experiment result by targeting Indonesian, Malay, Javanese, Sundanese, and Minangkabau.