The population of the ASEAN Economic Community is over 600 million and they speak many different languages. Consequently, natural language processing (NLP) is necessary to cope with many languages.
The state of the art technologies in NLP are based on treebanks. A treebank is a linguistic knowledge representation of natural language texts. The basic linguistic annotations in treebanks are word segmentation, part-of-speech (POS) tagging, and parsing annotations. Almost all NLP researches and tools are based on treebanks in a broad sense.
The main problem of the creation of a treebank is that it needs a lot of linguistic knowledge for the language. As a result, existing treebanks are limited in their sizes, annotation types and languages. In particular, no publicly available treebanks for most of Asian languages.
This background makes us propose this project for developing Asian Language Treebank (ALT). The objective of ALT is developing a parallel treebank for Asian languages. Indeed, ASEAN IVO is an ideal organization for developing ALT, because it consists of top-level NLP research institutes for Asian languages. Without ASEAN IVO, it will be impossible to corporate and cover main Asian languages for building treebanks.
ASEAN IVO is an ideal organization for developing ALT, because it consists of top-level NLP research institutes for Asian languages. Without ASEAN IVO, it will be impossible to corporate and cover main Asian languages for building treebanks.
The developing of ALT has already been started. NICT and UCSY has started building Japanese, English and Myanmar treebanks in FY 2015. NICT has also finished the translation of 20,000 English sentences (from Wikinews) into Indonesian, Vietnamese, Thai, Khmer, Laos, Malay, Philippine languages.
In this project, BPPT, I2R, IOIT, NIPTICT, UCSY and NICT will develop ALT for Indonesian, Malay, Vietnamese, Khmer, Myanmar and Japanese languages, respectively. (NICT will also develop English ALT). Those different language treebanks will be built from the already translated Wikinews. After finishing the development of ALT, it will be used to develop NLP tools within this project.
The members of this project are as follows:
For more information: Asian Language Treebank (ALT) Project
To enhance the learning process
Asi@Connect Project Online Training by Prof. Teck Chaw Ling, Prof. Ang Tang Fong from University of Malaya, Malaysia