User Lexicons

Operation

Recognition is assisted by linguistic databases for all supported languages. Readiris extensively uses linguistic information to validate good solutions and mark suspicious ones.

DonÆt confuse (user) lexicons with font dictionaries! (User) lexicons are linguistic databases that assist the recognition, font dictionaries contain character shapes learnt during the interactive OCR phase.

As powerful as these standard lexicons may be, users can ôboostö the OCR accuracy further by loading user lexicons.

User lexicons are word lists containing any term that does not occur in the ôbasicö, general purpose lexicons. Think for instance of technical, scientific, legal or other company-specific terms.

User lexicons only take effect when the symbols they contain are covered by the selected OCR language(s).

Example 1: the OCR language is English. My user lexicon contains American city names such as ôPoughkeepsieö and ôMassapequaö. The OCR process makes good use of the user lexicon.

Example 2: the OCR language is English. My user lexicon contains French proper names such as ôAuxerresö, ôFranτoisö and ôVΘllΦresö. The OCR process only uses a portion of the user lexicon. Words that contain symbols not covered by the English character set get ignored: ôAuxerresö will be used, ôFranτoisö and ôVΘllΦresö will not be used by the OCR process.

Example 3: the OCR language is English. My user lexicon contains Russian terms. The OCR process will ignore the user lexicon altogether: the English character set does not include the ôCyrillicö alphabet.

Tip: Readiris Corporate allows you to activate multiple languages simultaneously!

How to...?

Readiris Corporate is equipped with the utility User Lexicon Editor. It allows you to create and maintain user lexicons.

Tip: the tooltip of the Language button indicates the active user lexicon.