Gimp-Forum.net

Full Version: Text to OCR - offline
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

Could anyone indicate software that converts offline text (txt, pdf, etc) to OCR, free to Windows?

I found options for online or mobile platform use.

Any compatible with Gimp?

Thank you.
What kind of OCR? There are OCR fonts, there is a Gnu barcode generator...
(12-13-2019, 10:55 AM)Ofnuts Wrote: [ -> ]What kind of OCR? There are OCR fonts, there is a Gnu barcode generator...
I just wish I could remove the text from images (scanned books like jpg, tiff, etc) and translate them (paste into translator).

On mobile I have this feature, but for desktop-win, I only find online options, and many unreliable.
The FOSS software that seems the most used for this in the Linux world is called "Tesseract". A version for Windows can be found here.
The problem with a basic Tesseract, is it is command line. Obviously the best way if OCR-ing a whole book. One problem is loss of formatting, tend to get long lines of text with no breaks and no headings etc.

I use it in Linux for small 'screen captured' text images using a GUI (prefer YAGF but not working in 'buntu 18.04 so gImageReader) .

For a screen capture always need some pre-processing in Gimp, scaling up 200% - 300%, clean background etc.

There is a Tesseract for Windows with GUI here: https://ocr.space/blog/p/free-ocr-windows.html

And a quick try-out in a Win10 VM https://i.imgur.com/H7fvKCu.jpg and that is typical, some post OCR corrections needed. Still better than typing out the whole thing Wink
(12-14-2019, 12:33 AM)Ofnuts Wrote: [ -> ]A version for Windows can be found here.

(12-14-2019, 10:26 AM)rich2005 Wrote: [ -> ]There is a Tesseract for Windows with GUI here:  https://ocr.space/blog/p/free-ocr-windows.html
Ofnuts and Rich2005,

Already downloaded and installed both files (Tesseract and a9t9).
Take it easy later I will try them on.
Thanks a lot for the help!