Showing posts with label Khmer Segmentation. Show all posts
Showing posts with label Khmer Segmentation. Show all posts

Thursday, April 12, 2018

Segmentation - New Zero Width Space (ZWSP) Online Tool - ondra.cf by Danh Hong

Thank Danh Hong who always be with Khmer Unicode solution from font design, OCR... and now segmentation tool: ondra.cf

Danh Hong's Tool for ZWSP


Online tool and even the API available tools are required for bushing more product related in Khmer.
Mostly I use tool from kheng.info as I've been listing them in my list as I can see both tools are great to have in the community and hope for heavy content organization will support them for continuous development.

kheng.info

I've tried out both tools to see the result, there are some points in yellow remark base on the text:
ondra.cf vs kheng.info

Of course, base on above highlight, it would be better when training data is enough but I could see Danh Hong's tool made correctly for numeric data, although requires more data training to correct some concrete words such as country names as example.

Anyway, the tool will help our community growing.

Thanks everyone for hard work and share to us.

Friday, September 9, 2016

New Online Demo Release - Word Segmentation for Khmer Unicode by NIPTICT

Khmer Word Segmentation is still in demand and hot topic among natural language processing topics for Khmer language.

There are some methods have been introduced so far but online tool to available for people to use it are still few (my collection here).

Why Word Segmentation is important?
In language processing, we need to identify clearly what are the words and sentences, our Khmer language we do not have space between word, the sentence goes without many spaces that's why it is hard for machine to understand it.

Segment the sentence into words nowadays we need big dictionary with method that could split each word with zero space as fast as we can.

Now NIPTICT, the institute just released its first demo for their method online.


This tool is very important to use in office or data entry for the website.

Another online tool that I usually use is with Kheng.info so now at least we have two available online tool to use.

For the explanation of the method that NIPTICT uses, I will find the update later.
Anyway, to join the research, you can submit yours at the conference of Khmer NLP from now until mid of October.

Tuesday, July 1, 2014

Kheng.info - Another Site of Khmer Tools

I saw this when a friend shares on facebook, it's so interesting to recommend more research blogs or site related to Khmer language matter.

Kheng.info is one website about that. There is a Khmer Segmentation tool which is about to work around 1000 characters. The website is built with around 3 millions line Khmer corpus and it's a dictionary website of English-Khmer and Khmer-English with audio in Khmer.



It would be an interesting website for other researchers or users to discover there.