|
Convenient web based access to our favorite web design Usenet groups
|
 |
This is Interesting: Free Magazines for Graphics designers and webmasters
| Author |
| Thread |
 |
|
|
|
|
|
 |
 |
|
|
 |
|
|
 |
 |
Re: How to downsample and convert from grayscale-> black&white? |
 |
|
 |
|
|
|
  06-05-04 - 05:14 PM
|
On 4 Jun 2004 21:35:17 -0700, ramon@conexus.net (Ramon F Herrera)
wrote:
>I have a bunch of TIFF images that were scanned in grayscale
>mode at 600 dpi. Each one takes ~32MBytes of disk space, and
>the images are typical office documents -mostly text with
>a few logos-, which are being processed by OCR.
>
>My main concern is: what is the best way to obtain as much text
>recognized as possible? I chose 600 dpi in order to get even
>the smallest type. The grayscale leaves a lot of "gray dust"
>in the areas were the original paper page was the purest white.
>Is there an Photoshop filter that will leave the white background
>really white? If such filter exists and I apply it, will it
>affect the OCR recognition? (in a positive, negative way?).
>
>Since I won't have access to the documents forever, I am trying
>to get the most complete file at scan time, but I may be doing
>an overkill.
>
>Should I reduce the sampling to 300 dpi? Or perhaps I should stick
>with 600 dpi but scan in black and white?
It depends on your scanning software. Older software needed clean
black and white scans and a resolution as high as possible. Modern
software will work better on grayscale scans with a not too big
dynamic range.
If you try to clean the images for the scanning software you sometimes
end up with the software assuming a perfect scan and trying to
interpret each little lost pixel as some text.
If you feed the software with the raw scan it corrects contrast by
itself, has a better guess on the decision between paper structure and
real text and produces better quality.
>Finally, how do I change a 600dpi TIFF to 300 dpi?
>How do I change a grayscale to B&W? (both with Acrobat)
>My OCR software (ABBYY FineReader) takes the original file that
>I provide and makes a working copy which is the one that actually
>gets OCR'd. The copy that I provide is 32MBytes and the working
>copy is 100 KBytes. They achieve that by (1) converting from
>grayscale to B&W and (2) doing some compression (lossy or non-lossy?
>I don't know).
FineReader works even with averagely compressed greyscale JPGs.
Saves a lot of disk space and scanning time.
Michael
>Thanks in advance,
>
>-Ramon F. Herrera
|
|
|
| [
Post Follow-Up to this message ]
|
|
|
|
|
 |
|
|
 |
| All times are GMT. The time now is 03:57 AM. |
 |
|
|
|
|
|  |
|