Web Design Web Design Forum
Registration is free! Here you can view your subscribed threads, work with private messages and edit your profile and preferences Calendar Find other members Frequently Asked Questions Search
Home Web Design

Convenient web based access to our favorite web design Usenet groups

web design reviews

This is Interesting: Free Magazines for Graphics designers and webmasters  





  Last Thread  Next Thread
Author
Thread Post New Thread   

How to downsample and convert from grayscale-> black&white?
 

Ramon F Herrera




quote this post edit post

IP Loged report this post

Old Post  06-05-04 - 09:14 AM  
I have a bunch of TIFF images that were scanned in grayscale
mode at 600 dpi.  Each one takes ~32MBytes of disk space, and
the images are typical office documents -mostly text with
a few logos-, which are being processed by OCR.

My main concern is: what is the best way to obtain as much text
recognized as possible?  I chose 600 dpi in order to get even
the smallest type.  The grayscale leaves a lot of "gray dust"
in the areas were the original paper page was the purest white.
Is there an Photoshop filter that will leave the white background
really white?  If such filter exists and I apply it, will it
affect the OCR recognition? (in a positive, negative way?).

Since I won't have access to the documents forever, I am trying
to get the most complete file at scan time, but I may be doing
an overkill.

Should I reduce the sampling to 300 dpi?  Or perhaps I should stick
with 600 dpi but scan in black and white?

Finally, how do I change a 600dpi TIFF to 300 dpi?
How do I change a grayscale to B&W? (both with Acrobat)

My OCR software (ABBYY FineReader) takes the original file that
I provide and makes a working copy which is the one that actually
gets OCR'd.  The copy that I provide is 32MBytes and the working
copy is 100 KBytes.  They achieve that by (1) converting from
grayscale to B&W and (2) doing some compression (lossy or non-lossy?
I don't know).

Thanks in advance,

-Ramon F. Herrera


Post Follow-Up to this message ]
Re: How to downsample and convert from grayscale-> black&white?
 

arrooke1




quote this post edit post

IP Loged report this post

Old Post  06-05-04 - 12:14 PM  
> I have a bunch of TIFF images that were scanned in grayscale
> mode at 600 dpi.  Each one takes ~32MBytes of disk space, and
> the images are typical office documents -mostly text with
> a few logos-, which are being processed by OCR.
>
> My main concern is: what is the best way to obtain as much text
> recognized as possible?  I chose 600 dpi in order to get even
> the smallest type.  The grayscale leaves a lot of "gray dust"
> in the areas were the original paper page was the purest white.
> Is there an Photoshop filter that will leave the white background
> really white?  If such filter exists and I apply it, will it
> affect the OCR recognition? (in a positive, negative way?).
>
> Since I won't have access to the documents forever, I am trying
> to get the most complete file at scan time, but I may be doing
> an overkill.
>
> Should I reduce the sampling to 300 dpi?  Or perhaps I should stick
> with 600 dpi but scan in black and white?
>

Scan for line copy (black & white) @ 600 ppi. Adjust your exposure to obtain
a suitable balance between background noise & image quality. If you have
some images (fancy colour logo's) on some pages you can scan the image only,
as greyscale, and place it into your line copy.
Keith.




Post Follow-Up to this message ]
Re: How to downsample and convert from grayscale-> black&white?
 

Xalinai




quote this post edit post

IP Loged report this post

Old Post  06-05-04 - 05:14 PM  
On 4 Jun 2004 21:35:17 -0700, ramon@conexus.net (Ramon F Herrera)
wrote:

>I have a bunch of TIFF images that were scanned in grayscale
>mode at 600 dpi.  Each one takes ~32MBytes of disk space, and
>the images are typical office documents -mostly text with
>a few logos-, which are being processed by OCR.
>
>My main concern is: what is the best way to obtain as much text
>recognized as possible?  I chose 600 dpi in order to get even
>the smallest type.  The grayscale leaves a lot of "gray dust"
>in the areas were the original paper page was the purest white.
>Is there an Photoshop filter that will leave the white background
>really white?  If such filter exists and I apply it, will it
>affect the OCR recognition? (in a positive, negative way?).
>
>Since I won't have access to the documents forever, I am trying
>to get the most complete file at scan time, but I may be doing
>an overkill.
>
>Should I reduce the sampling to 300 dpi?  Or perhaps I should stick
>with 600 dpi but scan in black and white?

It depends on your scanning software. Older software needed clean
black and white scans and a resolution as high as possible. Modern
software will work better on grayscale scans with a not too big
dynamic range.
If you try to clean the images for the scanning software you sometimes
end up with the software assuming a perfect scan and trying to
interpret each little lost pixel as some text.
If you feed the software with the raw scan it corrects contrast by
itself, has a better guess on the decision between paper structure and
real text and produces better quality.

>Finally, how do I change a 600dpi TIFF to 300 dpi?
>How do I change a grayscale to B&W? (both with Acrobat)

>My OCR software (ABBYY FineReader) takes the original file that
>I provide and makes a working copy which is the one that actually
>gets OCR'd.  The copy that I provide is 32MBytes and the working
>copy is 100 KBytes.  They achieve that by (1) converting from
>grayscale to B&W and (2) doing some compression (lossy or non-lossy?
>I don't know).

FineReader works even with averagely compressed greyscale JPGs.
Saves a lot of disk space and scanning time.


Michael

>Thanks in advance,
>
>-Ramon F. Herrera



Post Follow-Up to this message ]
Re: How to downsample and convert from grayscale-> black&white?
 

Tacit




quote this post edit post

IP Loged report this post

Old Post  06-06-04 - 12:14 AM  
>The grayscale leaves a lot of "gray dust"
>in the areas were the original paper page was the purest white.
>Is there an Photoshop filter that will leave the white background
>really white?

Don't use a filter for this. use the Levels command.

Once you've created a good, crisp image, leave it at 600 pixels per inch and
turn it into a bitmap; this is usually what OCR software will perform best
with.


--
Biohazard? Radiation hazard? SO last-century.
Nanohazard T-shirts now available! http://www.villaintees.com
Art, literature, shareware, polyamory, kink, and more:
http://www.xeromag.com/franklin.html



Post Follow-Up to this message ]
Sponsored Links
 





All times are GMT. The time now is 03:57 AM. Post New Thread   
  Previous Last Thread   Next Thread next
Computer Graphics with Photoshop archive | Show Printable Version | Email this Page | Subscribe to this Thread

Popular forums

Adobe Photoshop forum Macromedia Flash Web Site Design
Dreamweaver FrontPage forum
JavaScript Forum XML forum
Style Sheets VRML
Forum Jump:
Rate This Thread:

 

XML RSS Feed web design latest articles Syndicate our forum via XML or simple JavaScript

Web Design archive  Database administration help  


Top Home  -  Register  -  Control Panel   -  Memberlist  -  Calendar  -  Faq  -  Search Top