KhmerOCR: Research Paper

Showing posts with label Research Paper. Show all posts

Tuesday, August 23, 2016

Paper: Experimental Comparison of the Performance of SVMs

The research paper on:

Experimental Comparison of the Performance of SVMs with Different Kernel Functions for Recognizing Arabic Characters

said Ghoniemy, Sayed Fadel, M. Asif

Abstract

A considerable progress in the recognition of Latin and Chinese characters has been achieved. By contrast, Arabic Optical character Recognition is still lagging. This is because Arabic language is a cursive language, written from right to left, and each character has different forms according to its position in the word. Support vector machines using kernel classifiers represent a typical approach for character recognition. Choosing the most appropriate kernel highly depends on the problem at hand – and fine tuning its parameters can easily become a tedious and cumbersome task. The present study is devoted to an experimental comparison of the performance of SVM machines with different kernel functions for recognizing Arabic Characters. Two groups of kernel functions were used throughout the study, each group contains 7 kernel functions. The obtained results show that, in the radial basis group, Laplacian kernel gives the best results. In the special functions group, the T-Student approach gives the best results. However, combing both kernels did not yield better performance.

Read detail

[..]

Sok, P. and Taing, N., "Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition", Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA) (pp. 1-9). IEEE. December 2014.

[..]

Thanks for cited my Research on SVM related method

Tuesday, September 30, 2014

Want To Write Scientific Paper or Article, Start From Here

If you want to start writing that kind of research, you might need to learn about Structure, Format, Content, and Style of a Journal-Style Scientific Paper to get to understand around what you will write.

The following table is short snapshot to help you out, for the detail, please read this article.

Wednesday, April 23, 2014

State of The Art Of KhmerOCR Implementation

There aren't many articles when we search on the Internet about KhmerOCR topic, I, myself don't find a lot as well.

Of course, I believe that there are some people or companies might quietly in implementing the solution for that but without any announce I believe my presume below are relevant enough for people to understand about current situation of Khmer OCR.

Let's share around "State of The Art Of KhmerOCR" today ;)

I could find that, there are about several Khmer OCR researches being published through some organization, website and universities.

Methodologies

When we talk about Khmer OCR, we suppose around the solutions to make any characters from scanned images of handwritten, typewritten or printed text converts into machine-encoded text.

Solutions on OCR system, mostly focus on:

Pre-processing (usually is noise removal)
Segmentation

Line segmentation
Character segmentation

Recognition
Mapping (Character Assembling)

And there are some methods already used for Khmer OCR in segmentation or recognition part such as

Lagendre Moment Descriptor,
Wavelet Descriptor,
Hidden Markov model (HMM),
Back propagation,
Scale Invariant Fourier Transform (SIFT),
Fourier Descriptor, Hole detection
Template Matching
etc.
And (it seems) the last one is: Support Vector Machine (SVM)

Literature Review/History

I might miss some others but here are what I could find about what have done so far with this topic.

If you, guys, have know some more, please share to people through comment form. I will check and update.

The Khmer Printed Characters Recognition using Lagendre Moment Descriptor by Chey Chanoeurn et al got 92% of accuracy on 10 Khmer consonants including ប ព ជ ក ភ ណ ឃ ស វ and ឆ
2005, The Khmer Printed Character Recognition Using Wavelet Descriptors by Chey Chanoeurn et al got the accuracy of 92.85%, 91.66% and 89.27% on 10 types of Khmer fonts in 3 different sizes.
2008, The Khmer Segmentation for font Limon S1, size 22 by Ing Leng Ieng, PAN Localization Project got the accuracy of 99.11%.
2009, The Khmer OCR for Limon R1 Size 22 by Ing Leng Ieng from PAN Localization Project using framing and Discrete Cosine Transform calculation for recognition based on Hidden Markov Model and got the accuracy of 98.88%.
2011, The Khmer Optical Character Recognition (OCR) by Mr. Kruy Vanna using Fourier Descriptors, Component’s Holes, and Component’s Location got accuracy of 97.9% on 19 types of Khmer font.
2012, The Khmer Printed Character Recognition uses combining of Edge Detection and Template Matching by Iech Setha et al for one font “Khmer OS Content” with font size of 36pt got accuracy of 99%
2013, The Khmer Printed Character Recognition using Support Vector Machine (SVM) based, by Pongsametrey SOK for one font “Khmer OS Content” with font size of 36pt got accuracy of 98.54% (32pt = 98.62%, 28pt = 98.18%) with training set of font size: 32pt

The research No. 7, I did it and it's submitted at Royal University of Phnom Penh (RUPP). So it's not publicly publish any where yet.

Who are doing it nowadays

That's who I have known around in Cambodia only, it might be people who does some study abroad is also doing it. Anyway here what I have known:

Institute of Technology of Cambodia (ITC), It seems, there're some continuing implementation of KhmerOCR there
Royal University of Phnom Penh (RUPP) also doing some more researches on this matter through students' researches, thesis and with their lecturers.
Open Institute (Open Forum, KhmerOS.info), I believe that this topic is still interesting by this NGO
And there are some other individuals as well as I heard (?)

What's Interesting

One opensource OCR engine, Tesseract OCR, it's a completed engine from the image processing to recognition and its output.

What we need for our Khmer language works for it, we need to analyze "how to train" our dataset.

I also did some training for Khmer as well for Tesseract for some letters, it seems that the system is good to go but there are some thing we need to aware before as I posted a question here.
I will try to write a post on how to train some characters that I did before.

Few Training Char, All Are Error

Why Tesseract at this time?
Previous researches are mostly using their own combination of methods to solve various issue for Khmer language such as in segmentation or recognition but the pre-processing process (image processing) is also important for a real OCR system and its accuracy.
And I could see that Tesseract OCR is ready for all of that.

Is There Anyone Already Try for Tesseract?
Yes, you can search on Google, it has already been trying since 2009 per my search on Google and around.
And it might be already made by some universities or lecturers but remaining unclear for me.

So, Is There Any Ready Tesseract OCR for Khmer?
My presumed answer: No, I've never heard that there's a ready training set for Khmer yet to use in Tesseract OCR Engine.

But, just today, I checked again at the repo of Tesseract (14 January 2014), I saw some Khmer config is added (Files: Khmer.unicharset, Khmer.xheights), we need to test if they are working.

Therefore, Students, Lecturers, some NGO or community should take part to help this.

Conclusions

The OCR system is very interesting for people nowadays.

We are using Khmer Unicode since it established in 2003 in the Kingdom and with Unicode, we have Google translate recently. Then, Khmer OCR should be also solved somehow as well.

We need more people to do it, to help, to share and publish.

---

Article Revision

23/04/2014: Initial the article

Remark:

If any mistake in above research, please alert me in comment.
More detail of each research, please find the published paper to read in detail

Thursday, February 13, 2014

Displaying Khmer - Development Document

By Javier Solar, 2004

The purpose of this document is to help developers who do not know the Khmer script to understand what is involved in displaying Khmer Unicode correctly.

Monday, January 20, 2014

Khmer Protocols Research

I haven't read it into detail to give the information but here quickly share:

The presentation:

2014 khmer protocols from c.titus.brown

About
This is a set of protocols for doing genomic data analysis – specifically, de novo mRNAseq assembly and de novo metagenome assembly – in the cloud.
The latest released version of these protocols can always be found at:

http://khmer-protocols.readthedocs.org/

You are reading v0.8.4; please use the following URL in citations and discussions:

https://khmer-protocols.readthedocs.org/en/v0.8.4/

Brown, C. Titus; Scott, Camille; Crusoe, Michael; Sheneman, Leigh;
Rosenthal, Josh; Adina Howe (2013): khmer-protocols documentation.
figshare. http://dx.doi.org/10.6084/m9.figshare.878460

Monday, January 13, 2014

Paper - Mobile Tech for Improved Family Planning Services (MOTIF)

Background

Providing women with contraceptive methods following abortion is important to reduce repeat abortion rates, yet evidence for effective post-abortion family planning interventions are limited. This protocol outlines the evaluation of a mobile phone-based intervention using voice messages to support post-abortion family planning in Cambodia.

Mobile Tech for Improved Family Planning Services (MOTIF), powered by our Verboice, an IVR tech http://t.co/wFohNwLIEd #cambodia #ictkh
— iLab Southeast Asia (@iLabSEA) January 14, 2014

Pages