tag:blogger.com,1999:blog-42926675103898184672024-02-19T09:01:31.765-08:00KhmerOCRUnknownnoreply@blogger.comBlogger42125tag:blogger.com,1999:blog-4292667510389818467.post-78186577597860773402023-11-29T01:14:00.000-08:002023-11-29T01:14:27.863-08:00Research Article: Toward a Low-Resource Non-Latin-Complete Baseline: An Exploration of Khmer Optical Character Recognition<p>Research article on IEEE of the topic: "Toward a Low-Resource Non-Latin-Complete
Baseline: An Exploration of Khmer Optical
Character Recognition"</p><p>by RINA BUOY 1
, (Graduate Student Member, IEEE), MASAKAZU IWAMURA 1
, (Member, IEEE),
SOVILA SRUN2
, AND KOICHI KISE1</p><p>ABSTRACT </p><p>Many existing text recognition methods rely on the structure of Latin characters and words.
Such methods may not be able to deal with non-Latin scripts that have highly complex features, such as
character stacking, diacritics, ligatures, non-uniform character widths, and writing without explicit word
boundaries. In addition, from a natural language processing (NLP) perspective, most non-Latin languages
are considered low-resource due to the scarcity of large-scale data. This paper presents a convolutional
Transformer-based text recognition method for low-resource non-Latin scripts, which uses local two dimensional (2D) feature maps. The proposed method can handle images of arbitrarily long text lines, which
may occur with non-Latin writing without explicit word boundaries, without resizing them to a fixed size by
using an improved image chunking and merging strategy. It has a low time complexity in self-attention layers
and allows efficient training. The Khmer script is used as the representative of non-Latin scripts because it
shares many features with other non-Latin scripts, which makes the construction of an optical character
recognition (OCR) method for Khmer as hard as that for other non-Latin scripts. Thus, by analogy with the
AI-complete concept, a Khmer OCR method can be considered as one of the non-Latin-complete methods
and can be used as a low-resource non-Latin baseline method. The proposed 2D method was trained on
synthetic datasets and outperformed the baseline models on both synthetic and real datasets. Fine-tuning
experiments using Khmer handwritten palm leaf manuscripts and other non-Latin scripts demonstrated the
feasibility of transfer learning from the Khmer OCR method. To contribute to the low-resource language
community, the training and evaluation datasets will be made publicly available.</p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEisaSX6c94_UckbRW0zuPsFIpOHOSBRNzS-R8Vix2E8lA7uoVQVxruoTrq9Cq7J99MKbXcXxKk6QvhcJprnQN8-IxKGVDPm4DDfY5RwavnYycgpCs6uIpUXxKLXSGLSrKgtuC9v7iAgElkd9boECBcMLoy30kS3F4_Lvh6NlXg3C12tvgsczg-l0LavOSI" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="724" data-original-width="1686" height="274" src="https://blogger.googleusercontent.com/img/a/AVvXsEisaSX6c94_UckbRW0zuPsFIpOHOSBRNzS-R8Vix2E8lA7uoVQVxruoTrq9Cq7J99MKbXcXxKk6QvhcJprnQN8-IxKGVDPm4DDfY5RwavnYycgpCs6uIpUXxKLXSGLSrKgtuC9v7iAgElkd9boECBcMLoy30kS3F4_Lvh6NlXg3C12tvgsczg-l0LavOSI=w640-h274" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;">You can find full article at: <a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10316307">https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10316307</a> </div><br /><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-67079693279458493252023-04-19T04:03:00.001-07:002023-04-19T04:03:16.532-07:00Master Thesis of OCR using deep learning - Pavel Andrlik, 2022<p>Thanks for another citation by Pavel Andrlik, at the Master Thesis on OCR using deep learning.</p><p>University of West Bohemia<br />Faculty of Applied Sciences<br />Department of Cybernetics<br />(<span style="background-color: white; color: #4d5156; font-family: arial, sans-serif; font-size: 14px;">Czech Republic)</span></p><p><span style="background-color: white; color: #4d5156; font-family: arial, sans-serif; font-size: 14px;"><br /></span></p><blockquote><p>The author has been highlight my paper over the classification method choice on using Support Vector machine...[]</p></blockquote><p><br /></p><p><i> Abstract of the Thesis</i></p><p>This diploma thesis deals with the problem of optical character recognition (OCR) using
neural networks. I am focusing on improving text detection and OCR by fine-tuning an
E2E-MLT scene text detector by training it on synthetic data which emulates real data. The
model was fine-tuned on several datasets with synthetically generated data and real data,
then the models were tested on one synthetic and two real datasets, one with the majority
of the wild text, the second with the majority of TV news imprinted text. On the dataset
with majority of TV news imprinted texts the fine-tuned models achieved improvement by
decreasing character error rate from 52% to 31.6% word error rate and from 56.5% to 22%.
It was also experimentally discovered that training models on synthetic data simulating real
TV news images deteriorate detection and reading model capability on wild text data.</p><p>----------</p><p><i>What I am interesting is at the motivation side!</i></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhX544ti9H4nwBXjAEodXWj6V0mJBBwrvOHIYd0P_m7eMomzelvlNSaGW5cD2dIA9WaSvF9iKGGan15WqRW31nX7IrBFVzONwivk8-4w636gFR63jaznS3TSftB7huyDY4rJeRfb0t7Iyw58O6vtast9YKdKu7pMC38ixb2W7Duxr_3WgwwIfORzhP4" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1058" data-original-width="1266" height="535" src="https://blogger.googleusercontent.com/img/a/AVvXsEhX544ti9H4nwBXjAEodXWj6V0mJBBwrvOHIYd0P_m7eMomzelvlNSaGW5cD2dIA9WaSvF9iKGGan15WqRW31nX7IrBFVzONwivk8-4w636gFR63jaznS3TSftB7huyDY4rJeRfb0t7Iyw58O6vtast9YKdKu7pMC38ixb2W7Duxr_3WgwwIfORzhP4=w640-h535" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;">Full Thesis Can find at: <a href="https://otik.uk.zcu.cz/bitstream/11025/48953/1/Thesis___Pavel_Andrlik.pdf">https://otik.uk.zcu.cz/bitstream/11025/48953/1/Thesis___Pavel_Andrlik.pdf</a></div><p><i>My quick reflection on the motivation side!</i></p><p>The use case could also apply on some written paper for data collection such as on artist idea, random articles etc. we have a lot of handwriting or piece of writing printed that should also consider as collection on our language.</p><br /><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-32062492051659054772021-10-17T09:27:00.002-07:002021-10-17T09:27:12.132-07:00A compact deep learning model for Khmer handwritten text recognition<p>Bayram Annanurov, Norliza Mohd Noor </p><p>Department of Computer Science, Paragon International University, Cambodia </p><p>Department of Engineering, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Malaysia</p><p><b>Abstract (of the Paper)</b></p><p>The motivation of this study is to develop a compact offline recognition
model for Khmer handwritten text that would be successfully applied under
limited access to high-performance computational hardware. Such a task aims
to ease the ad-hoc digitization of vast handwritten archives in many spheres.
Data collected for previous experiments were used in this work. The oneagainst-all classification was completed with state-of-the-art techniques. A
compact deep learning model (2+1CNN), with two convolutional layers and
one fully connected layer, was proposed. The recognition rate came out to be
within 93-98%. The compact model is performed on par with the state-of-theart models. It was discovered that computational capacity requirements
usually associated with deep learning can be alleviated, therefore allowing
applications under limited computational power.</p><p><a href="https://www.proquest.com/openview/71284580ea5279c705d18f1a1cb1f8d4/1?pq-origsite=gscholar&cbl=1686339" target="_blank">Link To the Page</a> </p><p><br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsy9KY-cwrjKnl-P9QI2EZ4pVlZLhs9NxbSHgdKbMLK3EPLLsJkjhgeWskgt59PahrHdtPQj7mZ_M0oL2CauF0jhTHIZfztYSrK-kZ_pLPAtn8U8xo2WNM-OmM68Hnl7hmZyAfu_VkQlU/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="582" data-original-width="1468" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsy9KY-cwrjKnl-P9QI2EZ4pVlZLhs9NxbSHgdKbMLK3EPLLsJkjhgeWskgt59PahrHdtPQj7mZ_M0oL2CauF0jhTHIZfztYSrK-kZ_pLPAtn8U8xo2WNM-OmM68Hnl7hmZyAfu_VkQlU/w640-h254/image.png" width="640" /></a></div><br /><br /><p></p><p><br /><br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-12638052913451753392021-02-19T19:18:00.004-08:002021-02-19T19:20:39.687-08:00Optical character recognition system for Baybayin scripts using support vector machine<p>A new publishing related to SVM method on OCR case, "Optical character recognition system for Baybayin scripts using support vector machine" - <a href="https://peerj.com/articles/cs-360/">https://peerj.com/articles/cs-360/</a></p><p><br /></p><p>Thanks for citation that to have more clearer that the method could work in some other cases.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiM2DEhLxZFjeLMud32h9XOit4wy4fOPdJ6d8LFVBxJUGAuigAGoHYjnfcceUuo-ZHYb22Zkj9TyYeRrv_gAoeRF-iPRr_PaIQY_AyU7RGWJk6jDSH3LVHBv9MQM4J0OPxoboXVSTwsLHM/" style="margin-left: 1em; margin-right: 1em;"><img data-original-height="320" data-original-width="715" height="286" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiM2DEhLxZFjeLMud32h9XOit4wy4fOPdJ6d8LFVBxJUGAuigAGoHYjnfcceUuo-ZHYb22Zkj9TyYeRrv_gAoeRF-iPRr_PaIQY_AyU7RGWJk6jDSH3LVHBv9MQM4J0OPxoboXVSTwsLHM/w640-h286/image.png" width="640" /></a></div><br /><br /><p></p><p style="text-align: center;">This part is delight me and remind it back.</p><p><br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi924VJ34aZrVfYG7J7_C3Yg45o_QvxRVoxlTDItYqvK21jJPEvyyhz4xllT2NO23-5oV7AeWMtjQSTAUQo47T1o_hvFxHzSusWV-UE8k3-b6fUuvKeRqlF8em4uEKhreCx8IC5h_-N8Rw/" style="margin-left: 1em; margin-right: 1em;"><img data-original-height="441" data-original-width="670" height="422" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi924VJ34aZrVfYG7J7_C3Yg45o_QvxRVoxlTDItYqvK21jJPEvyyhz4xllT2NO23-5oV7AeWMtjQSTAUQo47T1o_hvFxHzSusWV-UE8k3-b6fUuvKeRqlF8em4uEKhreCx8IC5h_-N8Rw/w640-h422/image.png" width="640" /></a></div><br /><br /><p></p><div><div class="row-fluid row-article-item-section-heading"><div class="span1 article-main-left-span1"><i class="article-section-indicator icon-chevron-down"></i></div><h2>Abstract (of the paper)</h2><div class="article-item-section-toggle" id="article-item-abstract"></div></div><div class="row-fluid row-article-item-section"><div class="span1 article-main-left-span1"> In 2018, the Philippine Congress signed House Bill 1022
declaring the Baybayin script as the Philippines’ national writing
system. In this regard, it is highly probable that the Baybayin and
Latin scripts would appear in a single document. In this work, we
propose a system that discriminates the characters of both scripts. The
proposed system considers the normalization of an individual character
to identify if it belongs to Baybayin or Latin script and further
classify them as to what unit they represent. This gives us four
classification problems, namely: (1) Baybayin and Latin script
recognition, (2) Baybayin character classification, (3) Latin character
classification, and (4) Baybayin diacritical marks classification. To
the best of our knowledge, this is the first study that makes use of
Support Vector Machine (SVM) for Baybayin script recognition. This work
also provides a new dataset for Baybayin, its diacritics, and Latin
characters. Classification problems (1) and (4) use binary SVM while (2)
and (3) apply the multiclass SVM classification. On average, our
numerical experiments yield satisfactory results: (1) has 98.5%
accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has
96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score;
(3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1
Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100%
F1 Score.</div></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-16008027091182474842020-07-22T03:20:00.000-07:002020-07-22T03:20:35.748-07:00i2ocr - Free Online Khmer OCR, It works!i2OCR.com has provided Khmer OCR free for everyone to use, I have tested and got a good enough result, I can say around 95% is OK if it is the Khmer Unicode text.<div><br /></div><div>I will try out sometimes on old Limon or ABC fonts, for handwriting text is not working.</div><div><br /></div><div>So you may try when you need: <a href="http://www.i2ocr.com/free-online-khmer-ocr">http://www.i2ocr.com/free-online-khmer-ocr</a></div><div><br /></div><div>The tool now is added into my list, <a href="http://blog.khmerocr.com/p/khmer-tools.html">Khmer Tools</a>.<br /><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgivEgoV_pOaBg-0TVj7K3kpBtV8ZDILo3-_LH4RJD8_0tTaPNeQPHnxu3WCbzsVxOYMYkE735gxjSh9TRENBF3tRLj6tx89SIAWtgi4fYNlSmlGFt7_iw4pTR5I9gDgHWIp_frgBIrxZk/s1264/i2ocr-khmer-2020.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="626" data-original-width="1264" height="316" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgivEgoV_pOaBg-0TVj7K3kpBtV8ZDILo3-_LH4RJD8_0tTaPNeQPHnxu3WCbzsVxOYMYkE735gxjSh9TRENBF3tRLj6tx89SIAWtgi4fYNlSmlGFt7_iw4pTR5I9gDgHWIp_frgBIrxZk/w640-h316/i2ocr-khmer-2020.png" width="640" /></a></div><div><br /></div><div><br /></div><div>Sample testing</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgY7ZYV5IAAHjpFTk1u98Z8USwOgKLfVUey4H-YXzDQPzPU9yoCkNeuoO5SoGbDTKBEphWkqyIdXO_Obauw1NWYWAEcqAkpAl8lQ6XkG0YjDM1X10lf73v3NBtqAp_cXQ2OhXzlu7G-3pU/s1244/i2ocr-khmer-2020-2.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="704" data-original-width="1244" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgY7ZYV5IAAHjpFTk1u98Z8USwOgKLfVUey4H-YXzDQPzPU9yoCkNeuoO5SoGbDTKBEphWkqyIdXO_Obauw1NWYWAEcqAkpAl8lQ6XkG0YjDM1X10lf73v3NBtqAp_cXQ2OhXzlu7G-3pU/w640-h362/i2ocr-khmer-2020-2.PNG" width="640" /></a></div><div><br /></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-60065955855321730652019-11-24T21:41:00.000-08:002019-11-24T21:41:06.641-08:00Application of Support Vector Machine in Prediction Secondary Structure ProteinApplication of Support Vector Machine in Prediction Secondary Structure Protein
<br />
<div style="background-color: white; color: #006621; font-family: arial, sans-serif; font-size: 13px;">
NI Jabbar, RI Jabbar</div>
<div style="background-color: white; color: #006621; font-family: arial, sans-serif; font-size: 13px;">
<br /></div>
<div class="m_-8253516417177703677gse_alrt_sni" style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">
This paper studying the predication of secondary structure protein from primary<br />
structure protein using support vector machine (SVM). We classify 64 types of<br />
proteins in three types: Helices (H), Strand (E) and Coil (C).<br />
<br />
<br />
<a href="https://s3.amazonaws.com/academia.edu.documents/61170183/application-of-support-vector-machine-in-prediction-IJERTV8IS10028620191109-104909-1otd3gp.pdf?response-content-disposition=inline%3B%20filename%3DIJERT-Application_of_Support_Vector_Mach.pdf&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWOWYYGZ2Y53UL3A%2F20191125%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20191125T052355Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=801d1d1c8dbfc64299b259bce05020ffff49671832c7a55a12cd5a62aa56226e" target="_blank">View Here</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWDZWU4nxmtxqYUocg__pIILpdXVhl9BoyCfMrpffyQnwI0E5KWImJmFKNZfEAU2YLf7VK7N9jHXNQXUhfGJwaZ_UWxroS2DqupAvRi2uuSqyMwB6F7JHK2GewMTtASaQWyMck9YLU6TQ/s1600/citation-svm-2019.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="364" data-original-width="815" height="177" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWDZWU4nxmtxqYUocg__pIILpdXVhl9BoyCfMrpffyQnwI0E5KWImJmFKNZfEAU2YLf7VK7N9jHXNQXUhfGJwaZ_UWxroS2DqupAvRi2uuSqyMwB6F7JHK2GewMTtASaQWyMck9YLU6TQ/s400/citation-svm-2019.PNG" width="400" /></a></div>
<br />
<br />
Thanks for citation.</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-46753662973048314302019-07-01T02:57:00.001-07:002021-02-08T02:44:11.666-08:00Wring Lunar Date in Khmer: Khmer Date ConverterWriting Lunar Date in Khmer is for all official documents using in Cambodia's government if you have noticed.<br />
<br />
And of course, it is also a hardest one for other people who are not in government office as we are mostly not using Lunar calendar in private but when we need to use it, we always check Lunar calendar issued by corresponding organization.<br />
<br />
Here is a help tool: <b><u><strike><a href="https://tools.wikischool.asia/KhmerLunar" target="_blank">https://tools.wikischool.asia/KhmerLunar</a></strike></u></b><br /><b><br /></b>
The tool helps for writing the correct date and in Khmer language.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghZb4miENQ2MM6jGNyWNgFEtUBmvzpysTtZxlbZMzufbhaTEKpxTeZBKsw1pBrJL4teLSuabiFYYWE7wXtEPxN2vS4d1fG5oC3opi35asMDWMIB-L1HscfNi5ddhifFjM2jfhWPRgXsoA/s1600/tools.wikischoo.asia.lunar-date.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="481" data-original-width="1183" height="257" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghZb4miENQ2MM6jGNyWNgFEtUBmvzpysTtZxlbZMzufbhaTEKpxTeZBKsw1pBrJL4teLSuabiFYYWE7wXtEPxN2vS4d1fG5oC3opi35asMDWMIB-L1HscfNi5ddhifFjM2jfhWPRgXsoA/s640/tools.wikischoo.asia.lunar-date.PNG" width="640" /></a></div>
<br />
If you find this useful and also want to continue engage with WikiSchool, reach them at <a href="https://www.facebook.com/wikischool.asia" target="_blank">facebook</a>.<div><br /></div><div><b>Updated (2021/02):</b></div><div><ul style="text-align: left;"><li>The tool unpublished for sometimes being, be back, will inform.</li></ul></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4292667510389818467.post-37771493441471094562019-03-13T02:23:00.000-07:002019-03-13T02:32:29.976-07:00Tools That Will Help Writer in Khmer<a href="http://khmerwriters.com/" target="_blank">KhmerWriters.com</a> is a new tool, provided by <a href="https://blog.khmerocr.com/search/label/Danh%20Hong" target="_blank">Danh Hong</a> who is always providing the support for Khmer language in computing.<br />
<br />
Danh Hong was also the one who handle khmerocr.org which is now no longer maintenance and the domain is listed on sale already but this time, he comes again with another tool that will be very helpful for Khmer writers who mostly face issue such as: wrong Khmer spelling, typing without zero space (ZWSP).<br />
<br />
<b>Khmer Spelling Checker (with Auto correction)</b><br />
<b><br /></b>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ-PUH6H8epSgbMicuUUR6DWfh6iY58x4dvp6Ip2oswsKOpx2jGP1M1FyFK8eBast_8AKzttbhpkOUD6i8Q1-GXeZF6hn166djLieeJdsm_rq9xBUwNcCIny126vin27cyW4JcFx9XLD4/s1600/danh-hong-khmerwriters-1.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="1193" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ-PUH6H8epSgbMicuUUR6DWfh6iY58x4dvp6Ip2oswsKOpx2jGP1M1FyFK8eBast_8AKzttbhpkOUD6i8Q1-GXeZF6hn166djLieeJdsm_rq9xBUwNcCIny126vin27cyW4JcFx9XLD4/s640/danh-hong-khmerwriters-1.JPG" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
You can just pass the text and click on check spelling (ពិនិត្យអក្ខរាវិរុទ្ធ) then click on the next buttn for auto correction (កែដោយស្វ័យប្រវត្តិ).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5f5zEHn_GnQSx3c5CYwbTELYdCTI615RXGQ4UA_h-zUeAEiIrMwgnnMIrA5Co0ULr5AAB9WZcjZ61muM7iI3Zfc1SX7Zr_NQX4AfFjXNpNS9vG-1QLAsP8pqGLugfamUNwNofn5dedcA/s1600/danh-hong-khmerwriters-2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="571" data-original-width="1200" height="304" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5f5zEHn_GnQSx3c5CYwbTELYdCTI615RXGQ4UA_h-zUeAEiIrMwgnnMIrA5Co0ULr5AAB9WZcjZ61muM7iI3Zfc1SX7Zr_NQX4AfFjXNpNS9vG-1QLAsP8pqGLugfamUNwNofn5dedcA/s640/danh-hong-khmerwriters-2.JPG" width="640" /></a></div>
<br />
<b><br /></b>
<b>Auto Zero Space</b><br />
<b><br /></b>
The next action, you need to aware for Khmer writing is to provide zero space to handle justify alignment and well line breaking. Just click put ZWSP as on the screen, then you can check the result by passing to your word document, you will see the different as following:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix7e-tPCEx6hnZC5tSo1scH0Qy6x99iS9O76oERVt0p_CNgJ_m6n6zMoAu7BJ-SEt67jr2kh_lGcl-HlQ5f0fIpWrhyphenhyphenG5NDNf4kdadzU0YQHhwesGxAZrJDKinLcluvATyveZxjyW1tp0/s1600/danh-hong-khmerwriters-3.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="471" data-original-width="1073" height="280" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix7e-tPCEx6hnZC5tSo1scH0Qy6x99iS9O76oERVt0p_CNgJ_m6n6zMoAu7BJ-SEt67jr2kh_lGcl-HlQ5f0fIpWrhyphenhyphenG5NDNf4kdadzU0YQHhwesGxAZrJDKinLcluvATyveZxjyW1tp0/s640/danh-hong-khmerwriters-3.JPG" width="640" /></a></div>
<br />
<br />
Of course, when you use it some words might not well detected, this require more corpus to put at his side, so hope it will be soon more accurate but at this stage, you can also use.<br />
<br />
Let's try and see, let me know in comment for your thought.<br />
<br />
Do you think is it helpful for you?<br />
If you find it's helpful, please also consider to donate to Danh Hong's team to continue his work.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-72006937965393046402018-07-27T00:35:00.000-07:002018-07-27T00:46:36.270-07:00Naming Transliteration Tool (Khmer to Latin/English)I just see that there is a tool by NIPTITC institute to help writing Khmer name to Latin written, it is very useful for Cambodian people to write their name in Latin characters.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEKLYO7cFLYGufpSjOYbLt34ysUojpein_t672aGuymwfqgWcSc8aUGlH0UsEtzLyhRcEBDpM7x8xbotz-6kfC1d_b74GQwUygbcxbWwvWWBCP_QgkxI7LM0KV3v5uw8d0RmmEbgREM2s/s1600/transition-khmer-name-to-english-written.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="526" data-original-width="1160" height="181" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEKLYO7cFLYGufpSjOYbLt34ysUojpein_t672aGuymwfqgWcSc8aUGlH0UsEtzLyhRcEBDpM7x8xbotz-6kfC1d_b74GQwUygbcxbWwvWWBCP_QgkxI7LM0KV3v5uw8d0RmmEbgREM2s/s400/transition-khmer-name-to-english-written.PNG" width="400" /></a></div>
<br />
Example here, the name in Khmer is កុសល ចំរើន the tool provide 3 kinds of written :<br />
<br />
<ol>
<li>Character Model: e.g. KOSOL CHAMREUN</li>
<li>Syllable Model: e.g. KOSOL CHAMRAEUN</li>
<li>SMT Model(*): e.g. KOL CHAMROEUN</li>
</ol>
<br />
So if you are looking for the Khmer transliteration tool, I think this research r&d tool, you can try: <a href="http://rnd.niptict.edu.kh/tran/" target="_blank">http://rnd.niptict.edu.kh/tran/</a>.<br />
<br />
<b>Remark:</b><br />
(*) I don't really find a source of translation, I think "SMT" should stand for: <a href="http://rnd.niptict.edu.kh/smt/" target="_blank">Statistical Machine Translation</a> which is in another research by this organization.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-67331271289895297762018-04-12T01:54:00.000-07:002018-04-12T03:37:39.209-07:00Segmentation - New Zero Width Space (ZWSP) Online Tool - ondra.cf by Danh HongThank Danh Hong who always be with Khmer Unicode solution from font design, OCR... and now segmentation tool: <a href="http://ondra.cf/" target="_blank">ondra.cf</a><br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7ufUOedE8W-pGj5umlOGH_69NoAIRT_lMebKX21B3w6t5QEZ0leBXmidUz9Oj68t6IS504L-ePW60afknttBpqrkqVpdV84MHjQ4olRfEp5KBifeHhSGyetOEZu5zWutztOzsuC4fNhs/s1600/ondra.cf-zwsp.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="508" data-original-width="1147" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7ufUOedE8W-pGj5umlOGH_69NoAIRT_lMebKX21B3w6t5QEZ0leBXmidUz9Oj68t6IS504L-ePW60afknttBpqrkqVpdV84MHjQ4olRfEp5KBifeHhSGyetOEZu5zWutztOzsuC4fNhs/s400/ondra.cf-zwsp.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Danh Hong's Tool for ZWSP</td></tr>
</tbody></table>
<br />
<br />
Online tool and even the API available tools are required for bushing more product related in Khmer.<br />
Mostly I use tool from <a href="http://kheng.info/" target="_blank">kheng.info</a> as I've been listing them in my <a href="http://blog.khmerocr.com/p/khmer-tools.html" target="_blank">list</a> as I can see both tools are great to have in the community and hope for heavy content organization will support them for continuous development.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx1UZ1NRyHMwkSgbPDPeYpyK6Jn7WTopEoJYTzXjrYE-IWvF6eQvpKUT-IFFm-vOfdgPjgLqnjhyJ0judUjgUtm8zLAR5WJHm4nXbVatUSHKrGNBBHAoAWtrQSgbqgvk0a1XLfijqFCVo/s1600/kheng.info-zwsp.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="386" data-original-width="1216" height="126" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx1UZ1NRyHMwkSgbPDPeYpyK6Jn7WTopEoJYTzXjrYE-IWvF6eQvpKUT-IFFm-vOfdgPjgLqnjhyJ0judUjgUtm8zLAR5WJHm4nXbVatUSHKrGNBBHAoAWtrQSgbqgvk0a1XLfijqFCVo/s400/kheng.info-zwsp.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">kheng.info</td></tr>
</tbody></table>
<br />
I've tried out both tools to see the result, there are some points in yellow remark base on the text:<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAW7pYmVkwB9fmnz-KIS7AEj_83zux5pnyVqbrAMoz7g3M625KN8DnRiKTmk4f7V_xwy8QFhInM35jyFtBu88oevXX0XrfRXnG17OsT2FCaxwdOVynd6Ov2Xg0MizZEweVSw-r_hpIrWI/s1600/zwsp-compare-two-online-result.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="513" data-original-width="720" height="285" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAW7pYmVkwB9fmnz-KIS7AEj_83zux5pnyVqbrAMoz7g3M625KN8DnRiKTmk4f7V_xwy8QFhInM35jyFtBu88oevXX0XrfRXnG17OsT2FCaxwdOVynd6Ov2Xg0MizZEweVSw-r_hpIrWI/s400/zwsp-compare-two-online-result.PNG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">ondra.cf vs kheng.info</td></tr>
</tbody></table>
<br />
Of course, base on above highlight, it would be better when training data is enough but I could see Danh Hong's tool made correctly for numeric data, although requires more data training to correct some concrete words such as country names as example.<br />
<br />
Anyway, the tool will help our community growing.<br />
<br />
Thanks everyone for hard work and share to us.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-31689004658110685992017-03-14T04:18:00.000-07:002017-03-14T04:18:47.731-07:00How we are now about the OCR?Recently I received a contact from students asking about the Khmer OCR related researches.<br />
<br />
At the moment, it's hard to describe.<br />
<br />
I do hope students keep doing it better and professional researchers or institutes continue to giving some more detail on the topic.<br />
<br />
KhmerOCR.org, a team by <a href="http://khmertype.blogspot.com/" target="_blank">Danh Hong</a>, no more info due to no fund (sic).<br />
<br />
NIPTICT is still in progress, demo/testing of their researching work of the <a href="http://rnd.niptict.edu.kh/ocr/" target="_blank">OCR</a> (sic).<br />
<br />
<a href="http://khmernlp.org/" target="_blank">KhmerNLP</a> conference of last December, we have some researching papers but we don't have on OCR topic.<br />
<br />
So, if you guys who followup the topic and have other topics related to this research or product, can share to us, please give some comments.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-65208292451872457762016-09-09T00:31:00.000-07:002016-12-12T21:08:23.954-08:00New Online Demo Release - Word Segmentation for Khmer Unicode by NIPTICTKhmer Word Segmentation is still in demand and hot topic among natural language processing topics for Khmer language.<br />
<br />
There are some methods have been introduced so far but online tool to available for people to use it are still few (<a href="http://blog.khmerocr.com/p/khmer-tools.html" target="_blank">my collection here</a>).<br />
<br />
Why Word Segmentation is important?<br />
In language processing, we need to identify clearly what are the words and sentences, our Khmer language we do not have space between word, the sentence goes without many spaces that's why it is hard for machine to understand it.<br />
<br />
Segment the sentence into words nowadays we need big dictionary with method that could split each word with zero space as fast as we can.<br />
<br />
Now NIPTICT, the institute just released its first demo for their method <a href="http://rnd.niptict.edu.kh/seg/" target="_blank">online</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwIrjPvPHEzFaWAVK3Ky2im9MjtDKchHCoEq4wAY2pMNpRmL5gl0s7FnLiUTbgtHTA840N4oZoIeO4BxkI2lw8zj3DqrD3MKUsks2AtOJ-GeBmDlTjPBA4PYlZyc5s1sGqSHeYFt1i2p8/s1600/niptict-word-segmentation-demo-2016.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwIrjPvPHEzFaWAVK3Ky2im9MjtDKchHCoEq4wAY2pMNpRmL5gl0s7FnLiUTbgtHTA840N4oZoIeO4BxkI2lw8zj3DqrD3MKUsks2AtOJ-GeBmDlTjPBA4PYlZyc5s1sGqSHeYFt1i2p8/s400/niptict-word-segmentation-demo-2016.png" width="400" /></a></div>
<br />
This tool is very important to use in office or data entry for the website.<br />
<br />
Another online tool that I usually use is with <a href="http://kheng.info/word_segmentation/">Kheng.info</a> so now at least we have two available online tool to use.<br />
<br />
For the explanation of the method that NIPTICT uses, I will find the update later.<br />
Anyway, to join the research, you can submit yours at the conference of <a href="http://khmernlp.org/" target="_blank">Khmer NLP</a> from now until mid of October.<br />
<br />Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4292667510389818467.post-80955184558252584642016-08-23T01:22:00.000-07:002016-08-23T01:22:01.228-07:00Paper: Experimental Comparison of the Performance of SVMsThe research paper on:<br />
<div id="articleTitle">
<h3>
Experimental Comparison of the Performance of SVMs with Different Kernel Functions for Recognizing Arabic Characters</h3>
</div>
<br />
<div id="authorString">
<em>said Ghoniemy, Sayed Fadel, M. Asif</em></div>
<div id="authorString">
<h4>
Abstract</h4>
<br />
<div>
A considerable progress in the recognition of Latin and
Chinese characters has been achieved. By contrast, Arabic Optical
character Recognition is still lagging. This is because Arabic language
is a cursive language, written from right to left, and each character
has different forms according to its position in the word. Support
vector machines using kernel classifiers represent a typical approach
for character recognition. Choosing the most appropriate kernel highly
depends on the problem at hand – and fine tuning its parameters can
easily become a tedious and cumbersome task. The present study is
devoted to an experimental comparison of the performance of SVM machines
with different kernel functions for recognizing Arabic Characters. Two
groups of kernel functions were used throughout the study, each group
contains 7 kernel functions. The obtained results show that, in the
radial basis group, Laplacian kernel gives the best results. In the
special functions group, the T-Student approach gives the best results.
However, combing both kernels did not yield better performance.</div>
</div>
<div id="authorString">
<em><br /></em></div>
<div id="authorString">
<a href="http://sci-coll.com/index.php/IJIMDP/article/view/9" target="_blank">Read detail</a></div>
<div id="authorString">
<em><br /></em></div>
<blockquote class="tr_bq">
[..]</blockquote>
<blockquote class="tr_bq">
Sok, P. and Taing, N., "Support Vector Machine (SVM) Based Classifier
For Khmer Printed Character-set Recognition", Asia-Pacific Signal and
Information Processing Association, 2014 Annual Summit and Conference
(APSIPA) (pp. 1-9). IEEE. December 2014.</blockquote>
<blockquote class="tr_bq">
[..] </blockquote>
<br />
Thanks for cited my Research on SVM related methodUnknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-80826697353038611472016-08-14T21:01:00.000-07:002016-08-14T21:01:04.224-07:00Khmer NLP Conference 2016<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyhfbLqEq8eqxk1L5EXMzKKss_DB9LOvmQ2ytnyZTz9NJUv6Oiv1DS4eAkoP5FYlDHraBMb05pVWQsRq5jX3MXawV_xxnYW3l0hk9cqGpw__pFj4A5FxP_aTFw0oAg8HmD2koDMeWSB2U/s1600/khmer-nlp-2016.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="182" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyhfbLqEq8eqxk1L5EXMzKKss_DB9LOvmQ2ytnyZTz9NJUv6Oiv1DS4eAkoP5FYlDHraBMb05pVWQsRq5jX3MXawV_xxnYW3l0hk9cqGpw__pFj4A5FxP_aTFw0oAg8HmD2koDMeWSB2U/s400/khmer-nlp-2016.jpg" width="400" /></a></div>
<br />
The upcoming event, Khmer Natural Language Processing Conference (Khmer NLP Conference 2016) calls for paper which is related tot he natural language processing, especially to solve problem of our Khmer language.<br />
<br />
As presented in the poster banner, there are a lot of topics that students, professional or private sector should be participating to help together solving our language issue, promoting research and encouraging more people to join solving the problem.<br />
<br />
This year beside research papers, you can also present your research or products as poster to exhibit during the conference. Please check official website for detail: <a href="http://khmernlp.org/" target="_blank">http://khmernlp.org</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSXdA4tN_RN4kLEgGhyMR8GjRVRHRPLT8FD2kNuJjgHoid1OO0no3EHZO7NESjVHfdTPY9a-V634k__H1wmrAIninP4_yWtHkNZ3BFwe5xtWKOezRQAcaRE8WcHuV-hoPKY0QSy2G2vjY/s1600/3rd+KNLP+Annual+Conference+2016-page-001.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSXdA4tN_RN4kLEgGhyMR8GjRVRHRPLT8FD2kNuJjgHoid1OO0no3EHZO7NESjVHfdTPY9a-V634k__H1wmrAIninP4_yWtHkNZ3BFwe5xtWKOezRQAcaRE8WcHuV-hoPKY0QSy2G2vjY/s640/3rd+KNLP+Annual+Conference+2016-page-001.jpg" width="492" /></a></div>
<br />Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4292667510389818467.post-13344236411975530742015-12-30T19:51:00.000-08:002016-12-12T21:06:57.319-08:00Try on between NIPTICT OCR and KhmerOCR.orgI tested the tool, I make with iText for Khmer render on PDF in my post of <a href="http://ask.osify.com/qa/613">http://ask.osify.com/qa/613</a> which is using Khmer OS Battambang to generate the PDF.<br />
<br />
I tested on two URLs:<br />
<br />
1. <a href="http://khmerocr.org/">http://KhmerOCR.org</a> by providing the image file<br />
2. <a href="http://rnd.niptict.edu.kh/ocr/">http://rnd.niptict.edu.kh/ocr/</a> by uploading the PDF file<br />
<br />
Here is the result so far<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeMXay6-5LIo4SSrUXsMoUUhIvXz1ISpxkNH7FthiUyMvukBxpe5UYUROxtEOPOzaPXv6YnGL51yv2Q-Cr1ytMgZZcik7RBU6HORpsIBYzR1tIIu2D6yaiwzsq3nTp23k59ha0-6sUxI0/s1600/khmerocr-testing-niptict-and-org.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="297" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeMXay6-5LIo4SSrUXsMoUUhIvXz1ISpxkNH7FthiUyMvukBxpe5UYUROxtEOPOzaPXv6YnGL51yv2Q-Cr1ytMgZZcik7RBU6HORpsIBYzR1tIIu2D6yaiwzsq3nTp23k59ha0-6sUxI0/s400/khmerocr-testing-niptict-and-org.png" width="400" /></a></div>
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-82595339755482951812015-09-09T02:02:00.003-07:002015-09-09T02:02:41.052-07:00National Conference about Khmer Natural Language Processing<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
The Khmer Language Processing Consortium is happy to announce the Second Annual Conference on Khmer Natural language Processing (KNLP 2015), where all its members and others working in this field bring together their work in an effort to collaboratively advance together towards building practical Natural Language Processing for Khmer. The first annual conference took place in October 2014.</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
The Khmer Natural Language Processing Consortium, created in 2014, groups universities, NGOs, private companies and researchers interested on accelerating – through close coordination and collaboration – the creation of effective natural language processing tools for Khmer language. These tools will be used to improve access to information and communication in this language.</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
Check on the website now: <a href="http://khmernlp.org/" target="_blank">http://khmernlp.org/</a></div>
<h1 style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Roboto Condensed'; font-size: 36px; line-height: 1.6em; margin: 0px 0px 10px; padding: 0px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
Call for Papers</h1>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
The Conference will address a range of critically important issues and themes relating to the Khmer Natural Language Processing community. Plenary speakers include some of the leading thinkers in these areas.</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
The Khmer Language Processing Consortium is inviting proposals for paper presentations that address Khmer Natural Language Processing in one of the following areas:</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Text and Speech processing<br style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;" />• Optical Character recognition for Khmer and similar complex scripts</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Automatic Translation to and from Khmer.</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Interpreting and generating spoken and written language</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Natural language interfaces and dialogue systems</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Pattern recognition, applied NLP systems</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Cognitive aspects of natural language processing</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Computation aspects of natural language processing</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• NLP-based knowledge science and service science</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Corpus and Language Resources</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Corpus-based language modeling</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Tools and resources for natural language processing</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Theoretical and Applied Linguistics, NLP Applications</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Semantics, syntax and lexicon</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Evaluation of natural language systems</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Information retrieval and extraction, text mining</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Human processing of language and speech</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Languages for Disability</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Ontology Engineering</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Phonetics, phonology and morphology</div>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; padding-left: 60px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
• Pragmatics and discourse</div>
<h1 style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Roboto Condensed'; font-size: 36px; line-height: 1.6em; margin: 0px 0px 10px; padding: 0px; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
Important Dates</h1>
<table id="tbl1" style="-webkit-font-smoothing: antialiased; background-color: white; border-collapse: collapse; border-spacing: 0px; border: 1px solid black; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 22.4px; margin: auto; max-width: 100%; text-rendering: optimizeLegibility; transition: all 0.3s ease; width: 70%px;"><tbody style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
<tr style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><th style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;" width="40%"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">Date</span></th><th style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; padding-left: 30px; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">Description</span></th></tr>
<tr class="rowObsoleted" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">21 September 2015</span></td><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; padding-left: 30px; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">Paper Submission Deadline</span></td></tr>
<tr class="rowObsoleted" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">21 October 2015</span></td><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; padding-left: 30px; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">Acceptance/Rejection Notification</span></td></tr>
<tr class="rowObsoleted" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">4 November 2015</span></td><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; padding-left: 30px; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">Final Submission Deadline</span></td></tr>
<tr class="rowObsoleted" style="-webkit-font-smoothing: antialiased; box-sizing: border-box; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; text-align: center; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">4 December 2015</span></td><td style="-webkit-font-smoothing: antialiased; border-collapse: collapse; border: 1px solid black; box-sizing: border-box; line-height: 1.6em; padding-left: 30px; text-rendering: optimizeLegibility; transition: all 0.3s ease;"><span style="-webkit-font-smoothing: antialiased; box-sizing: border-box; font-size: 18px; line-height: 1.6em; text-rendering: optimizeLegibility; transition: all 0.3s ease;">Annual Conference</span></td></tr>
</tbody></table>
<div style="-webkit-font-smoothing: antialiased; background-color: white; box-sizing: border-box; color: #444444; font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 20px; line-height: 1.6em; margin-bottom: 10px; text-rendering: optimizeLegibility; transition: all 0.3s ease;">
<br /></div>
Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4292667510389818467.post-21140569913486213702015-05-26T04:06:00.000-07:002015-05-26T04:06:06.324-07:00KhmerOCR Demo App Released on GitHubFirst of all, as I have already stated in my GitHub, do not expect this release app, the full OCR system but it's only my demo at the first sight to answer to my research using Support Vector Machine in 2013 and slightly updated on 2014. Thanks for understanding.<br />
<br />
Since I do not commit my time to continue on this topic, I would prefer to publish the demo and soon will make up the source code to public as well.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6bPxw1OKVEidlpS5WA29PoYfxUNxxUzsj0usJdtEiNvU2ecXEl900JHRyZeqjrtpmr8qqLSjBvPUwkovnVGxxVdFEUZzO0gLB3tm48o7F0CoVDHo75l_eBWMT6mMeEAsexqLuMSXOzlo/s1600/khmerocrnet-sample-front.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="286" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6bPxw1OKVEidlpS5WA29PoYfxUNxxUzsj0usJdtEiNvU2ecXEl900JHRyZeqjrtpmr8qqLSjBvPUwkovnVGxxVdFEUZzO0gLB3tm48o7F0CoVDHo75l_eBWMT6mMeEAsexqLuMSXOzlo/s400/khmerocrnet-sample-front.png" width="400" /></a></div>
<br />
Currently people are working on TesseractOCR and we are waiting for result, of course some result can be found with the OCR Team at khmerocr.org, please try out and support this team if any.<br />
<br />
Here if you're still interesting to see, mine, please download from GitHub: <a href="https://github.com/metrey/khmerocr.net-app">KhmerOCR.NET-App</a><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimnjB43GwTwi6iJwOnoqNyFmckm_dy21sWnzCTu-PBb804FwpKjQLKpOBM4CfdVenA8pGTB5406qJjkK31H19HMn3zdSyPr6vfBWrCDhAmCddk4GzaGF3v-ETjYMYP0ZGT5JYFiy9idnc/s1600/khmerocrnet-sample-test.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="278" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimnjB43GwTwi6iJwOnoqNyFmckm_dy21sWnzCTu-PBb804FwpKjQLKpOBM4CfdVenA8pGTB5406qJjkK31H19HMn3zdSyPr6vfBWrCDhAmCddk4GzaGF3v-ETjYMYP0ZGT5JYFiy9idnc/s400/khmerocrnet-sample-test.png" width="400" /></a></div>
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4292667510389818467.post-16069553125391999932014-11-07T07:37:00.001-08:002014-11-07T07:37:13.362-08:00More Update From Khmer Type: Khmer OCR AccuracyKhmerType pushes more update on <a href="http://khmerocr.org/" rel="nofollow" target="_blank">khmerocr.org</a> that works more better with Khmer OS Battambang font with size of 26pt. It's good, I do hope the solution could work good enough for all legacy fonts: Limon, ABC etc. that in 90s there were so many documents written in that fonts.<br /><br />
<br /><br />
The target documents, we should focus: Laws, Story books and many other education books in library.<br /><br />
<br /><br />
Let's follow his post in his blog:<br /><br />
<br /><br />
<a href="http://www.khmertype.org/2014/11/khmer-ocr.html?spref=bl">Khmer Type: Khmer OCR ត្រូវបានប៉ុន្មាន % ហើយ?</a>: ថ្ងៃនេះ ខ្ញុំយកសៀវភៅ "ប្រវត្តិប្រជាជាតិខ្មែរ" ដែលវាយលើងវិញសម្រាប់ធ្វើសៀវភៅអេឡិចទ្រូនិច មកប្តូរទៅជាពុម្ពអក្សរ ...<br /><br />
<br /><br />
<br />Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4292667510389818467.post-52168010545032209172014-10-14T04:57:00.000-07:002014-10-14T20:20:43.982-07:00First online TesseractOCR Engine Based for KhmerOCR<a href="http://www.khmertype.org/2014/10/khmer-ocr.html" target="_blank">KhmerType</a> just announced his online KhmerOCR implemented with TesseractOCR engine.<br />
<br />
This year is the year of <a href="https://code.google.com/p/tesseract-ocr/" target="_blank">Tesseract OCR</a> engine since every where, every researchers are focusing on it in Cambodia. The TesseractOCR is an opensource OCR engine maintained by Google. In few years ago, there are some people had tried to train Khmer characters with the engine since 2009 but the result was not good enough to go.<br />
<br />
Today, <a href="http://www.khmertype.org/2014/10/khmer-ocr.html" target="_blank">Danh Hong</a>, a team leader of his OCR project and well known as the Khmer OS fonts designer with Thim Rithy, moonOS (Unix Kernel OS) founder has announced his result with an online tool: <a href="http://khmerocr.org/" target="_blank">khmerocr.org</a> that allows people to try the scanned document to convert into Khmer Unicode text.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdNL3_2WHLi134TMuuiMMaz0zIjSqArR9tYl6ujD-Z-NjyrPJhv9EbwSs_V2BnebF0Zpkepuw_1Cj7OiP0q82ogRflXbSswYGMwGcqB6r_lwnVpJe9h5wifu1M6gc-9zqrPyNYX4v5Aew/s1600/khmertype-ocr-scanned-doc-test-0.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdNL3_2WHLi134TMuuiMMaz0zIjSqArR9tYl6ujD-Z-NjyrPJhv9EbwSs_V2BnebF0Zpkepuw_1Cj7OiP0q82ogRflXbSswYGMwGcqB6r_lwnVpJe9h5wifu1M6gc-9zqrPyNYX4v5Aew/s1600/khmertype-ocr-scanned-doc-test-0.png" height="342" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Web Base Interface: KhmerOCR.org</td></tr>
</tbody></table>
<br />
Currently he has asked for people to test and report error to improve the system.<br />
<br />
I have taken some tests with my tested file that I have used in my previous research, the result needs a lot of time/tasks to improve.<br />
<br />
All my below testing cases are using font: <b>Khmer OS Content</b><br />
<br />
<h4>
Case 1: Real Scanned Document (no much noise), font size: 32pt</h4>
<div>
The document is written in Khmer OS Content with font size 32pt, scanned on HP Scanjet G3110 with high resolution which is clear enough. The result is not yet good enough.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2UnFtUwU26o8mgPi100zuObEiwehZvv3AQNGT3xLmR7FulAH1ayN4894v0vLdv5F2IBRbuBiA5UastzA_tQTCgsd-nfkInJXeAKlxP8p9B44EPbxJZiaSLOrAL-kjq86ABsDpM89-Pe8/s1600/khmertype-ocr-scanned-doc-test-1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2UnFtUwU26o8mgPi100zuObEiwehZvv3AQNGT3xLmR7FulAH1ayN4894v0vLdv5F2IBRbuBiA5UastzA_tQTCgsd-nfkInJXeAKlxP8p9B44EPbxJZiaSLOrAL-kjq86ABsDpM89-Pe8/s1600/khmertype-ocr-scanned-doc-test-1.png" height="280" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Case 1 : Scanned Document</td></tr>
</tbody></table>
<b>Updated 15/10: </b>Danh Hong helps to train the expected document and the result is good. (See in comment)<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi619gHmPOaynspDeVQ_gQUMWCEKWVRznt3Y2fqmFo9MZdHVsBU04gvbY22fLDfE3p6OVtuAhgOXaW8cu5rQc3NGFFc_6sN2nQk4Oxh_sGb5fWt4E-K5m-QAGwoUgmRYCIWPtXm_3dKbaNM/s1600/Sokpongsametrey.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi619gHmPOaynspDeVQ_gQUMWCEKWVRznt3Y2fqmFo9MZdHVsBU04gvbY22fLDfE3p6OVtuAhgOXaW8cu5rQc3NGFFc_6sN2nQk4Oxh_sGb5fWt4E-K5m-QAGwoUgmRYCIWPtXm_3dKbaNM/s1600/Sokpongsametrey.png" height="312" width="400" /></a></div>
<b><br /></b>
<h4>
Case 2: Printed Text Using MS Paint (no noise), font size: 11pt</h4>
<div>
I tried this serious document since the size is around the use case of people using.</div>
<div>
Even it has no noise but the result is not well enough yet.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg96MKC8YEj73w4-_GjsOOTSrZ9fFEH8vLbBOTOuzyOBV4XeTz571YwWbS5KnzHF0bodJIidsjUcdULfIJe4Fc5eLlySyz7N9firryp-sQm6DeMM2Lwdyw0gDMr8iOwFFzLXQ4DVk9Z2PY/s1600/khmertype-ocr-scanned-doc-test-2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg96MKC8YEj73w4-_GjsOOTSrZ9fFEH8vLbBOTOuzyOBV4XeTz571YwWbS5KnzHF0bodJIidsjUcdULfIJe4Fc5eLlySyz7N9firryp-sQm6DeMM2Lwdyw0gDMr8iOwFFzLXQ4DVk9Z2PY/s1600/khmertype-ocr-scanned-doc-test-2.png" height="400" width="371" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Case 2: Printed Text with Small font size</td></tr>
</tbody></table>
<div>
<br /></div>
<h4>
Case 3: Printed Text Using MS Paint (no noise), font size: 48pt</h4>
<div>
With this font size, in my method with printed text could product around 98% of accuracy. Here is also producing good result</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHrpYk7y1HvcAwsN9-25OHdAeIs_V1RrMMcBz6f2YQ4gycaiJ7eDgUk_N9KFZQTajoKcQetYraKIgFhYzTg60EcnFP4g1Ur52UgZInieumm03zrDM91RshdeuGRGxJnXIi_DZVGaV7gkE/s1600/khmertype-ocr-scanned-doc-test-3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHrpYk7y1HvcAwsN9-25OHdAeIs_V1RrMMcBz6f2YQ4gycaiJ7eDgUk_N9KFZQTajoKcQetYraKIgFhYzTg60EcnFP4g1Ur52UgZInieumm03zrDM91RshdeuGRGxJnXIi_DZVGaV7gkE/s1600/khmertype-ocr-scanned-doc-test-3.png" height="400" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
These tests are only just one part of the font face and it is also a result for developers to improve.</div>
<div>
I believe TesseractOCR would do more better when more training data are made but it would not produce 100% accuracy as people expected. We need more involvement to make this at least 95% of accuracy together for people to use. That's why <a href="http://blog.khmerocr.com/2014/10/welcome-to-khmer-ocr-conference-28th.html" target="_blank">OCR conference</a> is formed.</div>
<div>
<br /></div>
<div>
Thank to Danh Hong and his team for this public initiation.</div>
<div>
We are waiting other people's result as well.</div>
Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-4292667510389818467.post-20299765194074152942014-10-13T00:08:00.000-07:002014-10-13T00:10:20.405-07:00Welcome to the Khmer OCR Conference, 28th OctoberThe event is now announced to public, the Khmer OCR conference which is hosting at conference hall of Ministry of Posts and Telecommunication.<br />
<br />
The OCR (Optical Character Recognition) software has been around in the world to convert the printed text on the image, pdf or scanned paper into the computed text or characters.<br />
<br />
Khmer OCR topic has been in the researching phrase long time ago with <a href="http://blog.khmerocr.com/2014/04/state-of-art-of-khmerocr-implementation.html" target="_blank">some researchers</a> already but most of the case, each researcher is trying to solve different issue in the OCR technologies such in as in segmentation (line or character separation), recognition or classification etc.<br />
<br />
Now the conference is about to focus on producing the software that work for public uses, the invite all related researchers to discuss about different solution and TODO list for the next steps.<br />
<br />
The event is for invited person only, please contact the host: <a href="mailto:research@niptict.edu.kh" target="_blank">research@niptict.edu.kh</a> if you would like to participate.<br />
<br />
The event only happens after some discussion and meeting <a href="http://blog.khmerocr.com/2014/09/the-1st-khmer-ocr-conference-change-to.html" target="_blank">with the team so far</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhty_vMNs6HzlWSif3LSkjR0QflDNdGpE2xQmX4Zv0OpJlHkWXlscuoayO8ixeCjAI1HJ_70tFML9UqL4ijb9i6MNIYVTQx3pTDas8Msce7WdY2L2XewKcMI_GmIIziW6TWR0uWsc4tM1E/s1600/KhmerOCR-Conference-Consortium-28thOctober2014.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhty_vMNs6HzlWSif3LSkjR0QflDNdGpE2xQmX4Zv0OpJlHkWXlscuoayO8ixeCjAI1HJ_70tFML9UqL4ijb9i6MNIYVTQx3pTDas8Msce7WdY2L2XewKcMI_GmIIziW6TWR0uWsc4tM1E/s1600/KhmerOCR-Conference-Consortium-28thOctober2014.jpg" height="400" width="282" /></a></div>
<br />Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4292667510389818467.post-81094607805519804292014-09-30T08:19:00.002-07:002014-09-30T08:19:57.282-07:00Want To Write Scientific Paper or Article, Start From HereIf you want to start writing that kind of research, you might need to learn about <i>Structure, Format, Content, and Style of a Journal-Style Scientific Paper</i> to get to understand around what you will write.<br />
<br />
The following table is short snapshot to help you out, for the detail, please read <a href="http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html#discussion" target="_blank">this article</a>.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjndJKtN24EEzWj0fJFYpI10q4kYXKOzpuf1f_yzuSNgj5GQzDOj-kPZtK7zI_BeKV01qEtg7NS7w5J9a0RttkdHh1OyXAQBnDiD2jFXHy498Net2_L7wgxzMW-Fi4TuWejS4MxB6cxFh4/s1600/scient-research-article.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjndJKtN24EEzWj0fJFYpI10q4kYXKOzpuf1f_yzuSNgj5GQzDOj-kPZtK7zI_BeKV01qEtg7NS7w5J9a0RttkdHh1OyXAQBnDiD2jFXHy498Net2_L7wgxzMW-Fi4TuWejS4MxB6cxFh4/s1600/scient-research-article.png" height="272" width="400" /></a></div>
Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4292667510389818467.post-85310323499763072112014-09-28T23:11:00.000-07:002014-09-28T23:11:46.502-07:00Stay Tune, The OCR Conference is Delay AgainAs receiving the update today, the venue informed us that the conference which was <a href="http://blog.khmerocr.com/2014/09/the-1st-khmer-ocr-conference-change-to.html" target="_blank">first delayed to 1st of October</a>, it has been postponed again to end of October due to the invitation and want to have all potential researchers on board.<br />
<br />
The invitation is promised to come by next week, wait and see.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4292667510389818467.post-2393540621822825632014-09-08T03:21:00.000-07:002014-09-08T08:31:33.433-07:00The 1st Khmer OCR Conference Changes to 1st OctoberFollowup the information I have posted in my <a href="http://blog.khmerocr.com/2014/08/ocr-2nd-meetup-well-done-welcome-to-ocr.html" target="_blank">post</a> recently, now the OCR team has delayed the conference to <b>1st of October</b> instead due to short time on administrative matter at the venue.<br />
<br />
During the conference there will be some presentations from the people in the article "<a href="http://blog.khmerocr.com/2014/04/state-of-art-of-khmerocr-implementation.html" target="_blank">The State of the Art of Khmer OCR</a>" for their methods and then there will be about the future with "TesseractOCR" to plan.<br />
<br />
Hope the team could make something different for our society and of course, everyone input will help.<br />
<br />
I'll keep update if any change before the conference.<br />
<br />
<div style="text-align: center;">
Here is the picture of recent meeting (2nd Meetup)</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIfhk_asoadLSUcSkXCx1ExSfrE8OpjNIhrYQibyxRCFXsIPOslYVzMz0XYHFjqfs5XJ3lg11wwGUQAn7kSaaMPIqS3jBDZdFg7stChqynXl3uoL-BuoXW5zairW-X5ab9e6BN1pMB03Q/s1600/IMG_20140821_161707.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIfhk_asoadLSUcSkXCx1ExSfrE8OpjNIhrYQibyxRCFXsIPOslYVzMz0XYHFjqfs5XJ3lg11wwGUQAn7kSaaMPIqS3jBDZdFg7stChqynXl3uoL-BuoXW5zairW-X5ab9e6BN1pMB03Q/s1600/IMG_20140821_161707.jpg" height="300" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0PH4Fxvpx1hpJM35vKmhRrTqrABXmtp3DrEko2QWf5uBJxReL0SmteQoyxWlBdckUEitnYhejg47SCdFnL1XWQlJy7hPSQZOSpmV2-ZaGQ_PDJ4dh_a0oKsBhsjWwE-NblwRjJ2l6Ubg/s1600/IMG_20140821_161701.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0PH4Fxvpx1hpJM35vKmhRrTqrABXmtp3DrEko2QWf5uBJxReL0SmteQoyxWlBdckUEitnYhejg47SCdFnL1XWQlJy7hPSQZOSpmV2-ZaGQ_PDJ4dh_a0oKsBhsjWwE-NblwRjJ2l6Ubg/s1600/IMG_20140821_161701.jpg" height="300" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipP_30EDZX2dXDnYQa0GSyFqcZENo7qOqlvPEEmxP8pS6yOIxg2Smudt-xhac-fQ_rNN9QcyvaPbsnRaWwpY3hVNLhcxKY620vO6t9vKB32v_B8eRgo-v8hdoedgahrlJbY53pHrNibcg/s1600/IMG_20140821_173458.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipP_30EDZX2dXDnYQa0GSyFqcZENo7qOqlvPEEmxP8pS6yOIxg2Smudt-xhac-fQ_rNN9QcyvaPbsnRaWwpY3hVNLhcxKY620vO6t9vKB32v_B8eRgo-v8hdoedgahrlJbY53pHrNibcg/s1600/IMG_20140821_173458.jpg" height="300" width="400" /></a></div>
<br />
<div style="text-align: center;">
1st Meetup</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhasHZXFZtjA8y9rhMmUWn1dfGPG4HFDiMDYgaLioaqm2xjAGVCaIotyPOMifDAhw5351haIzjU_PoPlhTf_zqDfATm31BSIq81xiTfaLSCj7Y-k0JTPbDY0xiNiIW3Cd8WO_vFHT3-aKA/s1600/IMG_20140731_171243.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhasHZXFZtjA8y9rhMmUWn1dfGPG4HFDiMDYgaLioaqm2xjAGVCaIotyPOMifDAhw5351haIzjU_PoPlhTf_zqDfATm31BSIq81xiTfaLSCj7Y-k0JTPbDY0xiNiIW3Cd8WO_vFHT3-aKA/s1600/IMG_20140731_171243.jpg" height="300" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeVvoZ9QnWa6gWcUaYU489YTxyFT8RmRgdPx6Wn44Ja1vFcyEdMrciIUxOkBXNy2DCSNIuovCrcXdTm5V0ZqU9kzTjZa09yK6GTKE-Jm9qeSOI9rxvOeS10QHdrWBzOfT5lYdtpYLcW4M/s1600/IMG_20140731_171233.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeVvoZ9QnWa6gWcUaYU489YTxyFT8RmRgdPx6Wn44Ja1vFcyEdMrciIUxOkBXNy2DCSNIuovCrcXdTm5V0ZqU9kzTjZa09yK6GTKE-Jm9qeSOI9rxvOeS10QHdrWBzOfT5lYdtpYLcW4M/s1600/IMG_20140731_171233.jpg" height="300" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJIntwltpmDH4Y5Peqipv2Wx7bNWD6tEDjySuKM1Y87gYDRyJLxm5FqFuBioihj58UpVxnfrjmMOMmjZGS7UMdKiOxd6UcrexqSAChNwJba9_y8jx4ioRpf2CcA9338AVxQ7S_UXcg0jI/s1600/IMG_20140731_171257.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJIntwltpmDH4Y5Peqipv2Wx7bNWD6tEDjySuKM1Y87gYDRyJLxm5FqFuBioihj58UpVxnfrjmMOMmjZGS7UMdKiOxd6UcrexqSAChNwJba9_y8jx4ioRpf2CcA9338AVxQ7S_UXcg0jI/s1600/IMG_20140731_171257.jpg" height="320" width="240" /></a></div>
<div style="text-align: center;">
<br /></div>
Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-4292667510389818467.post-54850795938669605422014-08-21T04:52:00.001-07:002014-09-08T08:33:25.344-07:00OCR 2nd Meetup, Well done - Welcome to OCR Conference in SeptemberJust back from the meeting of OCR Team, it was great.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR2aZv7YZAnNCJQugpC2P4gBeXOs1vs_uRxy5szeBI0xjqz-7KIGztpZdfZQy9B3yn1gupR7785qbinMz_vLpC_AXZQVjt_BRCSIv0K2bBlSq36SocR-qA1BcLFjQYsSzFc_RHBHASAxM/s1600/Khmer-OCR-New_Team-Meetup-21082014.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR2aZv7YZAnNCJQugpC2P4gBeXOs1vs_uRxy5szeBI0xjqz-7KIGztpZdfZQy9B3yn1gupR7785qbinMz_vLpC_AXZQVjt_BRCSIv0K2bBlSq36SocR-qA1BcLFjQYsSzFc_RHBHASAxM/s1600/Khmer-OCR-New_Team-Meetup-21082014.jpg" height="171" width="400" /></a></div>
<br />
The participants from various stakeholders:<br />
<br />
<ul>
<li>Universities: ITC, RUPP, NIPTICT</li>
<li>NGO: OI</li>
<li>Private Sectors/Individual: Myself</li>
</ul>
<br />
<br />
Now it's time to be more open, I'll post more in detail but at first, book your available date for those who are interesting on presenting the Khmer OCR research or product; or even would like to join in the presentation of various researchers; The date on Tuesday 16th of September 2014, place will be inform soon.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFbLuWverSAKWJ_mOI0gOEYSGJ39Vh2wvy9BQqQCK8cHYP0P44Rsgn-pzqF25jJo5NCvLOUAXM14JfE5c8_a-tUxep8jOsWjnl6cWfZYh_X_lTocse7VUEJeDG9uzT0sxZrmaYibK70pE/s1600/Khmer-OCR-Conference-16-09-2014.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFbLuWverSAKWJ_mOI0gOEYSGJ39Vh2wvy9BQqQCK8cHYP0P44Rsgn-pzqF25jJo5NCvLOUAXM14JfE5c8_a-tUxep8jOsWjnl6cWfZYh_X_lTocse7VUEJeDG9uzT0sxZrmaYibK70pE/s1600/Khmer-OCR-Conference-16-09-2014.jpg" height="200" width="400" /></a></div>
<br />
<br />
There will be some invitations from the team to some individuals, group that the team aware about their work on OCR. There will be an announcement about this officially later soon.<br />
<br />
The team is called: Khmer Natural Language Processing Consortium (Khmer NLP Consortium).<br />
<br />
<blockquote class="tr_bq">
It's all about opensource, open data, open idea and methods to make things different.</blockquote>
<br />
Are you interesting to join the conference? If you have any thing to make some different for our community, please come to join us.<br />
<br />
<h3>
Updated 08/09</h3>
<div>
<ul>
<li>The conference changed to 1st of October, <a href="http://blog.khmerocr.com/2014/09/the-1st-khmer-ocr-conference-change-to.html" target="_blank">see this post</a></li>
</ul>
</div>
Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-4292667510389818467.post-15891960797889421312014-08-02T04:48:00.000-07:002014-08-02T08:02:38.968-07:00We are in need of the product, OCR Project for Khmer LanguageAs I have been stated in previous article about "<a href="http://blog.khmerocr.com/2014/04/state-of-art-of-khmerocr-implementation.html" target="_blank">The State of The Art</a>", KhmerOCR is always in researching state and no yet the ready product.<br />
<br />
More recently, some groups are challenging this and welling to introduce the product by forming a concreted team for that. Together there are some individual team also are doing the same thing here.<br />
<br />
The joint team by some universities and individual researchers was formed a meeting recently on 31st July.<br />
Now it's not yet to detail how will be but it's great to see more people were happy and willing to contribute into the project for our Cambodia.<br />
<br />
And yet a surprise, I just saw another project is presenting and on asking for funding: <b>OCR Khmer</b>, it seems to be an online tool, let's watch their promotional video:<br />
<iframe allowfullscreen="" frameborder="0" height="315" src="//www.youtube.com/embed/i9cmZwi6X4c" width="560"></iframe>
<br />
According to the video, the online OCR project is likely to be running on printed image of the font size of 36pt.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3b4efb995be6c5c64252-c03f075f8191fb4e60e74b907071aee8.r12.cf1.rackcdn.com/1811602_1406375093.0808.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3b4efb995be6c5c64252-c03f075f8191fb4e60e74b907071aee8.r12.cf1.rackcdn.com/1811602_1406375093.0808.jpg" height="265" width="400" /></a></div>
<br />
<br />
The project is asking for funding of $4,000 at the website of <a href="http://www.gofundme.com/ocr-khmer" target="_blank">gofundme.com</a>.<br />
It's great to see the product some where around, let's help him, you can click <a href="http://www.gofundme.com/ocr-khmer" target="_blank">here</a> for more info.<br />
<br />
<br />Unknownnoreply@blogger.com1