This topic gives detailed information about the PDF format supported by Readiris and ways in which you can make good use of the PDF files.
The format PDF Text creates a searchable PDF file that contains the text (and possibly graphic zones for photographs, artwork etc). The page image is not contained in the single-layered PDF file.
The format PDF Image-Text creates a searchable PDF file that contains the page image and the recognized text. The page image is contained in the two-layered PDF file above the text.
Note: compression is used for all elements. Black-and-white images are Group 4 compressed TIFF files, greyscale and color images are JPEG files (with (0.8) high quality). The text is compressed using the Gzip mode.
ôText onlyö PDF files are much more compact than image files!
Text-based PDF files are searchable. (Bitmap images - ôimage onlyö PDF files - can be viewed but not searched.)
Text-based PDF files are editable. (Bitmap images can be viewed but not edited.)
The recognized text can obviously be edited and re-used. (Bitmap images can be viewed but not edited.)
Use the TouchUp Text tool of the Acrobat software to correct small recognition errors in the PDF file.
Tip: it takes the appropriate version of Acrobat (Reader) to correctly display the resulting PDF files! To view and print Central-European texts (such as Czech and Polish), Baltic texts, Turkish and Cyrillic (ôRussianö) texts in the PDF format, you must have the special ôCEö version (Central-European) of the Acrobat (Reader). (You can find this software on the Readiris CD-ROM.)
You can isolate the text from an ôimage-textö PDF file. You can also convert text-only PDF files into RTF files. Open the file with Adobe Acrobat and use the command Save As to save it in an RTF text file.
To re-use small text portions from a PDF file in other applications, select the Text Select tool of the Adobe Acrobat software, select the required text and copy-paste it to another application. (Select the tool Table/Formatted Text to maintain the text formatting.) (The command Select All selects all text of the current page, not of the entire PDF file.)
Use the Find command of your Acrobat (Reader) software for simple searches within a document, use the Search command for advanced searching across several PDF documents.
Warning: not all versions of the Adobe Acrobat Reader software include the Search function!
The button Find of the Adobe Acrobat (Reader) software finds complete words or word parts in the current PDF document. Acrobat looks for the word by sequentially reading every word on every page in the file.
The button Search of the Adobe Acrobat (Reader) software allows you to perform advanced and fast searching on a collection of indexed PDF documents.
You can search for a simple word or phrase.
You can expand your search query by using wildcard characters and Boolean operators.
You can use the search options to refine your search further.
Index-based searching implies that the ôfull-textö index was created for a collection of PDF files with the command Catalog. (A ôfull-textö index is an alphabetized list of every word used in a document or a series of documents. Index-based searching is much faster than the Find command: Acrobat goes right to the word in the list rather than progressively reading through the documents.)