home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!asuvax!ncar!csn!sgmlinc!brian
- From: brian@sgmlinc.com (Brian E. Travis)
- Newsgroups: comp.text.sgml
- Subject: Tool for 'automatic' markup
- Distribution: world
- Message-ID: <725605975snx@sgmlinc.com>
- References: <Bzpxqy.C2E@watserv2.uwaterloo.ca>
- Date: Tue, 29 Dec 92 05:12:55 GMT
- Organization: SGML Associates, Inc.
- Reply-To: brian@sgmlinc.com
- Lines: 60
-
- In article <Bzpxqy.C2E@watserv2.uwaterloo.ca> eric@csg.uwaterloo.ca writes:
- > In article <20180.2b34a24a@ul.ie> murraya@ul.ie writes:
- > >Do anybody out there know anything about an application to `automatically`
- ^^^^^^^^^^^^^
- Be careful!
-
- > >generate markuped text, (possibly according to an sgml DTD) from an ordinary
- > >ASCII file.
- > >
- > There is a product called OmniMark from Software Exoterica in Ottawa, Canada
- > which we have been using to translate text to SGML from a number of different
- > formats including ASCII and WordPerfect. We have found this to be an
- > excellent tool and have used it to translate thousands of pages of text.
-
- While OmniMark is an excellent tool for translating text, it is
- far from "automatic". I've used OmniMark and its older brother,
- XTRAN, for several years, and am delighted with its performance
- and reliability. However, it does require a lot of care and feeding.
- The new release of OmniMark (v.2.0) is *very* fast and a joy to
- work with.
-
- Another product, FastTAG from Avalanche Development Co in Boulder
- Colorado, is good at recognizing objects on a page, but takes
- some work to separate objects if its recognition engine
- determines that two objects look like one. FastTAG does not have
- a parser in it, but relies on the programmer to insert start and
- end tags as necessary. FastTAG is very good at recognizing
- tabular material. It is almost human in that respect, and the
- developers should be commended.
-
- FastTAG employs a "Visual Recognition Engine". This allows the
- program to work with "fuzzy" specifications (e.g., heads contain
- 60% capitalized words). It is also considerably cheaper than
- OmniMark ($1,500-2,500 vs. $15,000-25,000 -- these could be old
- prices).
-
- In a recent project, I used OmniMark to recognize the basic
- structure of an ASCII input file, then fired off a FastTAG
- session to mark up tables when encountered. It worked quite well,
- and with a minimum of programming. Look for details in an
- upcoming issue of <TAG>.
-
- Then there are the general-purpose and cheap alternatives: AWK,
- Perl, Lex, and the dreaded Snobol. Most of my clients recognize
- that the support provided by vendors selling a specialized tool
- more than outweighs the cost of the product, and select OmniMark
- or FastTAG (or both) as a strategic tool. There are other
- commercial products that do SGML translations, but I have not had
- much experience with them.
-
- As a rule, in any translation system, the quality of the data
- coming out is only as good as the consistency (or regularity) of
- the data going in.
-
- There is no magic.
-
- --
- Brian E. Travis brian@sgml.com
- SGML Architect, Managing Editor, Tele: +1 303 680-0875
- InfoDesign Corp. <TAG> The SGML Newsletter Fax: +1 303 680-4906
-