Wednesday, December 30, 2015

Try on between NIPTICT OCR and

I tested the tool, I make with iText for Khmer render on PDF in my post of which is using Khmer OS Battambang to generate the PDF.

I tested on two URLs:

1. by providing the image file
2. by uploading the PDF file

Here is the result so far

Wednesday, September 9, 2015

National Conference about Khmer Natural Language Processing

The Khmer Language Processing Consortium is happy to announce the Second Annual Conference on Khmer Natural language Processing (KNLP 2015), where all its members and others working in this field bring together their work in an effort to collaboratively advance together towards building practical Natural Language Processing for Khmer. The first annual conference took place in October 2014.
The Khmer Natural Language Processing Consortium, created in 2014, groups universities, NGOs, private companies and researchers interested on accelerating – through close coordination and collaboration – the creation of effective natural language processing tools for Khmer language. These tools will be used to improve access to information and communication in this language.
Check on the website now:

Call for Papers

The Conference will address a range of critically important issues and themes relating to the Khmer Natural Language Processing community. Plenary speakers include some of the leading thinkers in these areas.
The Khmer Language Processing Consortium is inviting proposals for paper presentations that address Khmer Natural Language Processing in one of the following areas:
• Text and Speech processing
• Optical Character recognition for Khmer and similar complex scripts
• Automatic Translation to and from Khmer.
• Interpreting and generating spoken and written language
• Natural language interfaces and dialogue systems
• Pattern recognition, applied NLP systems
• Cognitive aspects of natural language processing
• Computation aspects of natural language processing
• NLP-based knowledge science and service science
• Corpus and Language Resources
• Corpus-based language modeling
• Tools and resources for natural language processing
• Theoretical and Applied Linguistics, NLP Applications
• Semantics, syntax and lexicon
• Evaluation of natural language systems
• Information retrieval and extraction, text mining
• Human processing of language and speech
• Languages for Disability
• Ontology Engineering
• Phonetics, phonology and morphology
• Pragmatics and discourse

Important Dates

21 September 2015Paper Submission Deadline
21 October 2015Acceptance/Rejection Notification
4 November 2015Final Submission Deadline
4 December 2015Annual Conference

Tuesday, May 26, 2015

KhmerOCR Demo App Released on GitHub

First of all, as I have already stated in my GitHub, do not expect this release app, the full OCR system but it's only my demo at the first sight to answer to my research using Support Vector Machine in 2013 and slightly updated on 2014. Thanks for understanding.

Since I do not commit my time to continue on this topic, I would prefer to publish the demo and soon will make up the source code to public as well.

Currently people are working on TesseractOCR and we are waiting for result, of course some result can be found with the OCR Team at, please try out and support this team if any.

Here if you're still interesting to see, mine, please download from GitHub: KhmerOCR.NET-App