Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Erase / remove borders from a table (scanned pdf)
#9
(10-02-2020, 08:38 AM)rich2005 Wrote: The ofn-remove-grid.py plugin works really well.  It answers the topic question on removing that table. Using a more usual scan, greyscale 300 ppi



1. Scan straight into Gimp. Not great, uneven colour, slightly skewed. A typical scan.
2. Apply levels to get even colour
3. A bit faint so apply threshold to get more contrast
4. Apply the plugin, (grow selection = 1) and the dividing lines gone. A few speckles 
5. Export that to a png and run through Tesseract - text is recognised
but
6. Run the original scan through Tesseract and get a better result. Tesseract ignores the lines.

I am gradually adding to my kubuntu 20.04 desktop, I think I will give Tesseract 5 a go, see if YAGF works with that version. There are PPA's for Tesseract.

IMHO you have been lucky because ofn-remove-grid looks at areas, and the area of the single lines must not be very big (at least not much bigger than other features). However looking at your image the sheer width of the feature could be a good criterion. I will consider this for the next version.
Reply


Messages In This Thread
RE: Erase / remove borders from a table (scanned pdf) - by Ofnuts - 10-02-2020, 10:28 AM

Forum Jump: