Corpus is very important for the development of the language tools, I found we have an existing opensource hosted project about KhmerText which is mostly provide free/opensource data, the collection of Khmer Corpus.
About the project
Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.
Thursday, December 19, 2013
Tuesday, December 17, 2013
|Ref: Ethnologue Map|
A language of Cambodia
- Cambodian, Khmer
- 12,900,000 in Cambodia (2008 census). Population total all countries: 14,224,500.
- Widespread. Also in Canada, China, France, Laos, United States, Viet Nam.
- 1 (National). Statutory national language (1993, Constitution, Article 5).
- Battambang Khmer, Cardamom Khmer, Khmer Kandal (Central), Khmer Keh (Stung Treng), Khmer Krom (Southern). Distinct from Northern Khmer [kxm] of Thailand.
- 1,000,000 L2 speakers.
- 35% of the population over 15 cannot read or write Khmer. Radio programs. Grammar. Bible: 1954–1998.
Writing: Khmer script.