Gimp-Forum.net

Full Version: Blackest Pixel
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi

I'm writing a python script to cleanup a scan text document. (I'll have to cleanup thousands of pages)
Is there a way to determine what is the darkest pixel in the image?
I want to use "pdb.gimp_image_select_color" using the darkest pixel (which is not always pure black (0, 0, 0))
No direct way as far as I can tell. You can use several calls with Gimp histogram, adjusting the upper limit of the range until you don't get any pixels in the range:

The call  is:
Code:
mean, std_dev, median, pixels, count, percentile = pdb.gimp_drawable_histogram(drawable, channel, start_range, end_range)

The values you are interested in are count and/or percentile (as far as I can tell, count=pixels*percentile).

You use:

Code:
_,_,_,_, count,_ = pdb.gimp_drawable_histogram(drawable, HISTOGRAM_VALUE, 0.,max)

and you try max values (with a dichotomic search you'll never need more than eight calls), something like:

Code:
def blackest(drawable):
    bot,top=0.,1.
    while top-bot>.001:
        print "%5.3f < x < %5.3f" % (bot,top)
        threshold=(top+bot)/2.
        _,_,_,_,count,_ = pdb.gimp_drawable_histogram(drawable, HISTOGRAM_VALUE, 0.,threshold)
        print "%5.3f px @ %5.3f" % (count,threshold)
        if count:
            top=threshold
        else:
            bot=threshold
    return threshold


However doing a color selection on the result may select a single pixel and may not give you the result you want. What is the whole process?
Hi Ofnuts, thanks for taking the time to respond, and moving into right board

We are scanning very old documents, that was still typed using a typewriter. We are scanning it at 600dpi in grayscale. The mission now is to clean-up the scans, and reduce file size as much as possible (300dpi Black and white) to be able to share these documents. Some documents are very bad, with a lot of noise etc.

Pointing me to the histogram, solved my problem.  Based on the percentile I decided to get the range that is used the most in the lower scale of the histogram, and set all pixels up to that scale to black.


Code:
def SetBlack(image, drawable):
    #find the "color" that is used the most in "darker" side of the historgram - this is the black of text
    MaxPercentile = 0.0
    MaxEndRange = 0.0
    Increment = 0.025 #2.5% increase
    
    start_range = 0.0
    end_range = Increment
    mean, std_dev, median, pixels, count, percentile = pdb.gimp_drawable_histogram(drawable, 0, start_range, end_range)
    if percentile > MaxPercentile:
        MaxPercentile = percentile
        MaxEndRange = end_range
    
    for x in range(0, 20):
        start_range = end_range
        end_range = end_range + Increment
        mean, std_dev, median, pixels, count, percentile = pdb.gimp_drawable_histogram(drawable, 0, start_range, end_range)
        if percentile > MaxPercentile:
            MaxPercentile = percentile
            MaxEndRange = end_range
        #pdb.gimp_message(percentile)
    
    MaxEndRange = MaxEndRange * 1.2 #Add 20%
    #pdb.gimp_message(MaxPercentile)
    #pdb.gimp_message(MaxEndRange)
    pdb.gimp_drawable_curves_spline(drawable, 0, 6, (0.0, 0.0, MaxEndRange, 0.0, 0.9, 1.0))


Apologies for my coding style, I'm new to gimp and python.
Looks like you are re-inventing the automatic contrast stretch, either

Code:
pdb.gimp_drawable_levels_stretch(drawable)
pdb.plug_in_autostretch_hsv(image, drawable)

Also, if you batch-process, using ImageMagick instead of Gimp is likely be  a better idea.
Thanks, will look into these. Don't want the re-invent something.
Yes, I thought that ImageMagick might be the way to go, when I was looking for an auto de-skew routine, and came across it.
Now for a few more sleepless nights, playing with a new program.