This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Computer Graphics with Photoshop > June 2004 > How to downsample and convert from grayscale-> black&white?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author How to downsample and convert from grayscale-> black&white?
Ramon F Herrera

2004-06-05, 4:14 am

I have a bunch of TIFF images that were scanned in grayscale
mode at 600 dpi. Each one takes ~32MBytes of disk space, and
the images are typical office documents -mostly text with
a few logos-, which are being processed by OCR.

My main concern is: what is the best way to obtain as much text
recognized as possible? I chose 600 dpi in order to get even
the smallest type. The grayscale leaves a lot of "gray dust"
in the areas were the original paper page was the purest white.
Is there an Photoshop filter that will leave the white background
really white? If such filter exists and I apply it, will it
affect the OCR recognition? (in a positive, negative way?).

Since I won't have access to the documents forever, I am trying
to get the most complete file at scan time, but I may be doing
an overkill.

Should I reduce the sampling to 300 dpi? Or perhaps I should stick
with 600 dpi but scan in black and white?

Finally, how do I change a 600dpi TIFF to 300 dpi?
How do I change a grayscale to B&W? (both with Acrobat)

My OCR software (ABBYY FineReader) takes the original file that
I provide and makes a working copy which is the one that actually
gets OCR'd. The copy that I provide is 32MBytes and the working
copy is 100 KBytes. They achieve that by (1) converting from
grayscale to B&W and (2) doing some compression (lossy or non-lossy?
I don't know).

Thanks in advance,

-Ramon F. Herrera
arrooke1

2004-06-05, 7:14 am

> I have a bunch of TIFF images that were scanned in grayscale
> mode at 600 dpi. Each one takes ~32MBytes of disk space, and
> the images are typical office documents -mostly text with
> a few logos-, which are being processed by OCR.
>
> My main concern is: what is the best way to obtain as much text
> recognized as possible? I chose 600 dpi in order to get even
> the smallest type. The grayscale leaves a lot of "gray dust"
> in the areas were the original paper page was the purest white.
> Is there an Photoshop filter that will leave the white background
> really white? If such filter exists and I apply it, will it
> affect the OCR recognition? (in a positive, negative way?).
>
> Since I won't have access to the documents forever, I am trying
> to get the most complete file at scan time, but I may be doing
> an overkill.
>
> Should I reduce the sampling to 300 dpi? Or perhaps I should stick
> with 600 dpi but scan in black and white?
>


Scan for line copy (black & white) @ 600 ppi. Adjust your exposure to obtain
a suitable balance between background noise & image quality. If you have
some images (fancy colour logo's) on some pages you can scan the image only,
as greyscale, and place it into your line copy.
Keith.


Xalinai

2004-06-05, 12:14 pm

On 4 Jun 2004 21:35:17 -0700, ramon@conexus.net (Ramon F Herrera)
wrote:

>I have a bunch of TIFF images that were scanned in grayscale
>mode at 600 dpi. Each one takes ~32MBytes of disk space, and
>the images are typical office documents -mostly text with
>a few logos-, which are being processed by OCR.
>
>My main concern is: what is the best way to obtain as much text
>recognized as possible? I chose 600 dpi in order to get even
>the smallest type. The grayscale leaves a lot of "gray dust"
>in the areas were the original paper page was the purest white.
>Is there an Photoshop filter that will leave the white background
>really white? If such filter exists and I apply it, will it
>affect the OCR recognition? (in a positive, negative way?).
>
>Since I won't have access to the documents forever, I am trying
>to get the most complete file at scan time, but I may be doing
>an overkill.
>
>Should I reduce the sampling to 300 dpi? Or perhaps I should stick
>with 600 dpi but scan in black and white?


It depends on your scanning software. Older software needed clean
black and white scans and a resolution as high as possible. Modern
software will work better on grayscale scans with a not too big
dynamic range.
If you try to clean the images for the scanning software you sometimes
end up with the software assuming a perfect scan and trying to
interpret each little lost pixel as some text.
If you feed the software with the raw scan it corrects contrast by
itself, has a better guess on the decision between paper structure and
real text and produces better quality.

>Finally, how do I change a 600dpi TIFF to 300 dpi?
>How do I change a grayscale to B&W? (both with Acrobat)


>My OCR software (ABBYY FineReader) takes the original file that
>I provide and makes a working copy which is the one that actually
>gets OCR'd. The copy that I provide is 32MBytes and the working
>copy is 100 KBytes. They achieve that by (1) converting from
>grayscale to B&W and (2) doing some compression (lossy or non-lossy?
>I don't know).


FineReader works even with averagely compressed greyscale JPGs.
Saves a lot of disk space and scanning time.


Michael

>Thanks in advance,
>
>-Ramon F. Herrera


Tacit

2004-06-05, 7:14 pm

>The grayscale leaves a lot of "gray dust"
>in the areas were the original paper page was the purest white.
>Is there an Photoshop filter that will leave the white background
>really white?


Don't use a filter for this. use the Levels command.

Once you've created a good, crisp image, leave it at 600 pixels per inch and
turn it into a bitmap; this is usually what OCR software will perform best
with.


--
Biohazard? Radiation hazard? SO last-century.
Nanohazard T-shirts now available! http://www.villaintees.com
Art, literature, shareware, polyamory, kink, and more:
http://www.xeromag.com/franklin.html

Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews