Wednesday, March 13, 2019

Tools That Will Help Writer in Khmer

KhmerWriters.com is a new tool, provided by Danh Hong who is always providing the support for Khmer language in computing.

Danh Hong was also the one who handle khmerocr.org which is now no longer maintenance and the domain is listed on sale already but this time, he comes again with another tool that will be very helpful for Khmer writers who mostly face issue such as: wrong Khmer spelling, typing without zero space (ZWSP).

Khmer Spelling Checker (with Auto correction)



You can just pass the text and click on check spelling (ពិនិត្យអក្ខរាវិរុទ្ធ) then click on the next buttn for auto correction (កែដោយស្វ័យប្រវត្តិ).



Auto Zero Space

The next action, you need to aware for Khmer writing is to provide zero space to handle justify alignment and well line breaking. Just click put ZWSP as on the screen, then you can check the result by passing to your word document, you will see the different as following:



Of course, when you use it some words might not well detected, this require more corpus to put at his side, so hope it will be soon more accurate but at this stage, you can also use.

Let's try and see, let me know in comment for your thought.

Do you think is it helpful for you?
If you find it's helpful, please also consider to donate to Danh Hong's team to continue his work.

Friday, July 27, 2018

Naming Transliteration Tool (Khmer to Latin/English)

I just see that there is a tool by NIPTITC institute to help writing Khmer name to Latin written, it is very useful for Cambodian people to write their name in Latin characters.


Example here, the name in Khmer is កុសល ចំរើន the tool provide 3 kinds of written :

  1. Character Model: e.g. KOSOL CHAMREUN
  2. Syllable Model: e.g. KOSOL CHAMRAEUN
  3. SMT Model(*): e.g. KOL CHAMROEUN

So if you are looking for the Khmer transliteration tool, I think this research r&d tool, you can try: http://rnd.niptict.edu.kh/tran/.

Remark:
(*) I don't really find a source of translation, I think "SMT" should stand for: Statistical Machine Translation which is in another research by this organization.

Thursday, April 12, 2018

Segmentation - New Zero Width Space (ZWSP) Online Tool - ondra.cf by Danh Hong

Thank Danh Hong who always be with Khmer Unicode solution from font design, OCR... and now segmentation tool: ondra.cf

Danh Hong's Tool for ZWSP


Online tool and even the API available tools are required for bushing more product related in Khmer.
Mostly I use tool from kheng.info as I've been listing them in my list as I can see both tools are great to have in the community and hope for heavy content organization will support them for continuous development.

kheng.info

I've tried out both tools to see the result, there are some points in yellow remark base on the text:
ondra.cf vs kheng.info

Of course, base on above highlight, it would be better when training data is enough but I could see Danh Hong's tool made correctly for numeric data, although requires more data training to correct some concrete words such as country names as example.

Anyway, the tool will help our community growing.

Thanks everyone for hard work and share to us.

Tuesday, March 14, 2017

How we are now about the OCR?

Recently I received a contact from students asking about the Khmer OCR related researches.

At the moment, it's hard to describe.

I do hope students keep doing it better and professional researchers or institutes continue to giving some more detail on the topic.

KhmerOCR.org, a team by Danh Hong, no more info due to no fund (sic).

NIPTICT is still in progress, demo/testing of their researching work of the OCR (sic).

KhmerNLP conference of last December, we have some researching papers but we don't have on OCR topic.

So, if you guys who followup the topic and have other topics related to this research or product, can share to us, please give some comments.

Friday, September 9, 2016

New Online Demo Release - Word Segmentation for Khmer Unicode by NIPTICT

Khmer Word Segmentation is still in demand and hot topic among natural language processing topics for Khmer language.

There are some methods have been introduced so far but online tool to available for people to use it are still few (my collection here).

Why Word Segmentation is important?
In language processing, we need to identify clearly what are the words and sentences, our Khmer language we do not have space between word, the sentence goes without many spaces that's why it is hard for machine to understand it.

Segment the sentence into words nowadays we need big dictionary with method that could split each word with zero space as fast as we can.

Now NIPTICT, the institute just released its first demo for their method online.


This tool is very important to use in office or data entry for the website.

Another online tool that I usually use is with Kheng.info so now at least we have two available online tool to use.

For the explanation of the method that NIPTICT uses, I will find the update later.
Anyway, to join the research, you can submit yours at the conference of Khmer NLP from now until mid of October.

Tuesday, August 23, 2016

Paper: Experimental Comparison of the Performance of SVMs

The research paper on:

Experimental Comparison of the Performance of SVMs with Different Kernel Functions for Recognizing Arabic Characters


said Ghoniemy, Sayed Fadel, M. Asif

Abstract


A considerable progress in the recognition of Latin and Chinese characters has been achieved. By contrast, Arabic Optical character Recognition is still lagging. This is because Arabic language is a cursive language, written from right to left, and each character has different forms according to its position in the word. Support vector machines using kernel classifiers represent a typical approach for character recognition. Choosing the most appropriate kernel highly depends on the problem at hand – and fine tuning its parameters can easily become a tedious and cumbersome task. The present study is devoted to an experimental comparison of the performance of SVM machines with different kernel functions for recognizing Arabic Characters. Two groups of kernel functions were used throughout the study, each group contains 7 kernel functions. The obtained results show that, in the radial basis group, Laplacian kernel gives the best results. In the special functions group, the T-Student approach gives the best results. However, combing both kernels did not yield better performance.


[..]
Sok, P. and Taing, N., "Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition", Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA) (pp. 1-9). IEEE. December 2014.
[..] 

Thanks for cited my Research on SVM related method

Sunday, August 14, 2016

Khmer NLP Conference 2016


The upcoming event, Khmer Natural Language Processing Conference (Khmer NLP Conference 2016) calls for paper which is related tot he natural language processing, especially to solve problem of our Khmer language.

As presented in the poster banner, there are a lot of topics that students, professional or private sector should be participating to help together solving our language issue, promoting research and encouraging more people to join solving the problem.

This year beside research papers, you can also present your research or products as poster to exhibit during the conference. Please check official website for detail: http://khmernlp.org