tag:blogger.com,1999:blog-4292667510389818467.post5485079593866960542..comments2023-02-27T05:45:00.819-08:00Comments on KhmerOCR: OCR 2nd Meetup, Well done - Welcome to OCR Conference in SeptemberUnknownnoreply@blogger.comBlogger4125tag:blogger.com,1999:blog-4292667510389818467.post-38964986505530265152014-09-08T23:50:25.993-07:002014-09-08T23:50:25.993-07:00Thanks.Thanks.Bong MThttps://www.blogger.com/profile/13411144533412664851noreply@blogger.comtag:blogger.com,1999:blog-4292667510389818467.post-81799564723584873832014-09-08T04:56:29.356-07:002014-09-08T04:56:29.356-07:00Hi,
I been living in PP for a few years, though I...Hi,<br /><br />I been living in PP for a few years, though I don't speak Khmer. I develop in C#, javascript or php, while Tesseract is in C++. I will think about it, if I have free time, though as I said I don't speak Khmer, which will likely be the language spoken. However, I found a solution to a problem one of the previous teams in one of the earlier posts on this website was unable to solve - the inability of Tesseract to process skewed lines of text in the images.<br /><br />This library http://felix.abecassis.me/2011/09/opencv-detect-skew-angle/ , will compute the number of degrees the text is tilted on each image , and then its just the matter of rotating the image that number of degrees, which is the easy part.Unknownhttps://www.blogger.com/profile/04216520719966710635noreply@blogger.comtag:blogger.com,1999:blog-4292667510389818467.post-63621082532025993062014-09-08T03:04:29.854-07:002014-09-08T03:04:29.854-07:00You are right. This team is focusing on Tesseract....You are right. This team is focusing on Tesseract. And we will do to improve more before and after the tesseract result. We will findout what to improve in tesseract in order to have a good OCR for Khmer.<br /><br />If you are interesting to share your idea, please come to the conference but now the date is changed to 1st of October.Bong MThttps://www.blogger.com/profile/13411144533412664851noreply@blogger.comtag:blogger.com,1999:blog-4292667510389818467.post-39675056134638094332014-09-08T01:37:33.773-07:002014-09-08T01:37:33.773-07:00I believe the best approach would be to use Tesser...I believe the best approach would be to use Tesseract instead of building something completely new. The software is open sourced and was built originally by a strong team. From what I see, just need to add a dictionary and some unicodes for the alphabet, which looks like are available and then just a matter of "training it", so to speak.<br /><br />A better way to utilize the team would be to create a khmer GUI for the engine or look into ways to improve its ability to read text lines from an image that may be a few degrees tilted, or the background's possible discoloration.<br /><br />I think the end result should be an online tool accepting multi-file png/jpg/tiff submissions while returning a text file, at least for starters to get it out there and then move forward after seeing how its being used etc... just my penny thrown in, but I am sure whoever is running this already considered this approach.Unknownhttps://www.blogger.com/profile/04216520719966710635noreply@blogger.com