Open Collaboration for Developing and Using Asian Language Treebank

The population of the ASEAN Economic Community is over 600 million and they speak many different languages. Consequently, natural language processing (NLP) is necessary to cope with many languages.

The state of the art technologies in NLP are based on treebanks. A treebank is a linguistic knowledge representation of natural language texts. The basic linguistic annotations in treebanks are word segmentation, part-of-speech (POS) tagging, and parsing annotations. Almost all NLP researches and tools are based on treebanks in a broad sense.

The main problem of the creation of a treebank is that it needs a lot of linguistic knowledge for the language. As a result, existing treebanks are limited in their sizes, annotation types and languages. In particular, no publicly available treebanks for most of Asian languages.

This background makes us propose this project for developing Asian Language Treebank (ALT). The objective of ALT is developing a parallel treebank for Asian languages. Indeed, ASEAN IVO is an ideal organization for developing ALT, because it consists of top-level NLP research institutes for Asian languages. Without ASEAN IVO, it will be impossible to corporate and cover main Asian languages for building treebanks.


Project Theme
  • ICT Solutions to the Challenges surrounding Urbanization
  • Social Renovation in Rural Areas and/or Urban Areas

Leveraged Resources and Participants

ASEAN IVO is an ideal organization for developing ALT, because it consists of top-level NLP research institutes for Asian languages. Without ASEAN IVO, it will be impossible to corporate and cover main Asian languages for building treebanks.

The developing of ALT has already been started. NICT and UCSY has started building Japanese, English and Myanmar treebanks in FY 2015. NICT has also finished the translation of 20,000 English sentences (from Wikinews) into Indonesian, Vietnamese, Thai, Khmer, Laos, Malay, Philippine languages.

In this project, BPPT, I2R, IOIT, NIPTICT, UCSY and NICT will develop ALT for Indonesian, Malay, Vietnamese, Khmer, Myanmar and Japanese languages, respectively. (NICT will also develop English ALT). Those different language treebanks will be built from the already translated Wikinews. After finishing the development of ALT, it will be used to develop NLP tools within this project.

The members of this project are as follows:

  • BPPT
    •      Dr. Michael Purwoadi, Director ICT Center, oversee the Intelligent computing and Language Technology activities in BPPT
    •      Gunarso, Leader of Language Technology working group
    •      Dr. Teduh Uliniansyah, Researcher of Language Technology working group
  • I2R
    •      Ms Aw Ai Ti, a senior researcher at I2R, who is an expert in NLP and machine translation
    •      Ms Sharifah Mahani Aljunied, a researcher at I2R, who is an expert in NLP and Malay Linguistics.
  • IOIT
    •      Vu Tat Thang, PhD.
    •      Luong Chi Mai, Assoc. Prof., PhD.
    •      Nguyen Phuong Thais, Assoc. Prof, PhD.
  • NIPTICT
    •      Mr. Rapid, Sun, director of Research and Development Center, who is the supervisor of NLP projects
    •      Mr. Vichet Chea, researcher at NIPTICT, who is an expert in NLP and machine translation.
  • UCSY
    •      Dr. Khin Mar Soe, a Professor at NLP lab, UCSY, who is currently doing research in NLP and machine translation.
    •      Dr. Khin Thandar Nwet, a researcher at NLP lab, UCSY, who is currently doing research in NLP and machine translation.
  • NICT
    •      Dr. Masao Utiyama, a senior researcher at NICT, who is an expert in NLP and machine translation
    •      Dr. Chenchen Ding, a researcher at NICT, who is an expert in NLP and machine translation

For more information: Asian Language Treebank (ALT) Project

0 Years
0 Lecturers
0 Enrollment
0 Graduates

LATEST NEWS

Call for Paper (ICCA 2021)

ICCA 2021, the 19th in the series that has been held annually since 2003, will bring together leading engineers and scientists in computer and information technology from around the world.

MCPA Day (UCSY)

In Collaboration with University of Computer Studies, Yangon and Myanmar Computer Professionals Association (MCPA) will be held the 9th MCPA Day on 4th March, 2020 at 9:30 am.

Research Collaboration Meeting

Research Showcase, and Discussion for possible research collaboration with foreign universities.

2019-2020 Academic Year Examination Time Table

The academic year 2019-2020 mid term examination will be held according to the time table in the University of Computer Studies, Yangon.

Oriental COCOSDA 2020

The 23nd Conference of Oriental COCOSDA will be hosted by the University of Computer Studies, Yangon (UCSY). With the Myanmar hosting Oriental COCOSDA for the first time ...


Previous Programming Contest

MCPC 2019

2019 Myanmar Collegiate Programming Contest

Read More

ICPC 2018

2018 ICPC Asia-Yangon Regional Programming Contest

Read More

2018 Myanmar Collegiate Programming Contest

Read More

ICPC 2017

2017 ACM-ICPC Asia-Yangon Regional Programming Contest

Read More

2017 Myanmar Collegiate Programming Contest

Read More

ICPC 2016

2016 Asia-Yangon Regional Programming Contest

Read More

2016 Asia-Yangon National Programming Contest

Read More