Content-type: text/html Manpage of DOCLIFTER

DOCLIFTER

Section:  (1)
Updated: 09 September 2001
Index Return to Main Contents
 

NAME

doclifter - translate troff macros into DocBook  

SYNOPSIS

doclifter [ -qtvx] [ -I path] file ...

 

DESCRIPTION

doclifter translates documents written in troff macros to DocBook. Structural subsets of the requests in man(7), and mdoc(7), and ms(7), and me(7), and troff(1) are supported.

Lifting documents from presentation level to semantic level is hard, and a really good job requires human polishing. This tool aims to do everything that can be mechanized, and to preserve any troff-level information that might have structural implications in SGML comments.

This tool does some of the hard parts, but not all. Tables (TBL markup) are translated into DocBook table markup, but equations (EQN markup) and box-and-arrow pictures (PIC markup) are not translated. Command synopses are (usually) translated automatically, but human intervention is required to translate function synopses.

Some stereotyped patterns of markup and content are automatically lifted into structural markup. This is much better than requiring the user to do these translations by hand, but means there is some risk of false positives.  

OPTIONS

If called without arguments doclifter acts as a filter, translating troff source input on standard input to DocBook markup on standard output. If called with arguments, each argument file is translated separately; any extension beginning with dot on each filename is stripped and the suffix .sgml (or .xml) given to the translated output.

-I
The -I option adds its argument to the include path used when docfilter searches for inclusions. The include path is initially just the current directory.
-q
Normally, requests that doclifter could not interpret (usually because they're presentation-level) are passed through to SGML/XML comments in the output. The -q option suppresses this. Messages about macros that are unrecognized or cannot be translated go to standard error whatever the state of this option.
t
Enable the .tm command, which in troff prints its arguments to standard error (normally used to generate indexes). Since doclifter is a translation tool rather than a production formatter, this feature is disabled by default.
-v
The -v option makes doclifter noisier about what it's doing. This is mainly useful for debugging.
-x
The -x option tries to produce XML-style output (the preamble changes, and tags with an empty content model get a trailing slash). It produces output files with an .xml extension.
 

TRANSLATION RULES

Overall, you can expect that font changes will be turned unto Emphasis macros with a Remap attitude taken from the troff font name. The basic font names are R, I, B, U, and SM.

Troff and macro-package special character escapes are mapped into ISO character entities.

When docflifter encounters a .so directive, it searches for the file. If it can get read access to the file, and open it, and the file consists entirely of command lines and comments, then it is included. If any of these conditions fails, an entity reference for it is generated.

Some notes on specific translations:  

MAN TRANSLATION

doclifter does a good job on most man pages, with the large exception that function synopses are not translated (command synopses are, however). It knows about the extended UR/UE/UN requests supported under Linux. If any UR request is present, it will translate these but not wrap URLs outide them with Ulink tags.

The .TH macro is used to generate a RefMeta section. If present, the date/source/manual arguments (see man(7)) are wrapped in RefMiscInfo tag pairs with those class attributes.

The following man macros are translated into emphasis tags with a remap attribute: .B, .BI, .BR, .I, .IB, .IR, .RB, .RI, .SB, .SM. Some stereotyped patterns involving these macros are recognized and turned into semantic markup.

The following macros are translated into paragraph breaks: .LP, .PP, .P, .HP, and the single-argument form of .IP.

The two-argument form of .IP is translated either as a VariableList (usually) or ItemizedList (if the tag is the troff bullet or square character).

The following macros are translated semantically: .SH,.SS, .TP, .UR, .UE, .UN, .IX.

The \*R, \*(Tm, \*(lq, and \*(rq symbols are translated.

The following macros are ignored: .RS, .RE, .PD.  

POD2MAN TRANSLATION

doclifter recognizes the extension macros procuced by pod2man (Sh, Sp, Ip, Vb, Ve) and translates them structurally.  

MANDOC TRANSLATION

doclifter should be able to do an excellent job on most mdoc(7) pages, because this macro package expresses a lot of semantic structure.

Weak spots in the translation: All .Bd/.Ed display blocks are translated as LiteralLayout tag pairs. The .D1 and .Dl macros are ignored.  

MS TRANSLATION

doclifter does a good job on most ms pages. One weak spot to watch out for is the generation of Author and Affiliation tags. The heuristics used to mine this information out of the .AU section work for authors who format their names in the way usual for English (e.g. "M. E. Lesk", "Eric S. Raymond") but are quite brittle.

The .TL, AU, .AI, and .AE macros turn into article metainformation in the expected way. The PP, .LP, .SH, and .NH macros turn into paragraph and section structure. The tagged form of .IP is translated either as a VariableList (usually) or ItemizedList (if the tag is the troff bullet or square character); the untagged version is treated as an ordinary paragraph break.

The .DS/DE pair is translated to a LiteralLayout tag pair. The .FS/.FE pair is translated to a Footnote tag pair. The .QP/.QS/.QE requests define BlockQuotes.

The .UL font change is mapped to U. SM and LG become numeric plus or minus size steps suffixed to the Remap atribute.

All macros relating to page footers, multicolumn mode, boxes, and keeps are ignored (.ND, .DA, .1C, .2C, .MC, .BX, .B1, .B2, .KS, .KE, .KF) The .R, .RS, and .RE macros are ignored as well.  

ME TRANSLATION

Translation of me documents tends to produce crude results that need a lot of hand-hacking. The format has little usable structure, and documents written in it tend to use a lot of low-level troff macros; both these properties tend to confuse doclifter.

The following macros are translated into paragraph breaks: .lp, .pp. The .ip macro is translated into a variablelist. The .bp macro is translated into an ItemizedList. The .np macro is translated into an OrderedList.

The b, i, and r fonts are mapped to emphasis tags with B, I, and R Remap attributes. The .rb ("real bold") font is treated the same as .b.

Most other requests are ignored.  

TBL TRANSLATION

All structural features of TBL tables are translated, including both horizontal and vertical spanning with `s' and `^'. The `l', 'r', and `c' formats are supported; the `n' column format is rendered as `r'. Line continuations with T{ and T} are handled correctly. So is .TH.

The expand, box, doublebox, and allbox, center, left, and right options are supported. The GNU synonyms frame and doubleframe are also recognized. But the distinction between single and double rules and boxes is lost.

Table continuations (.T&) are not supported.

Most other presentation-level TBL commands are ignored. The `b' format qualifier is processed, but point size and width qualifiers are ignored.  

PIC TRANSLATION

PIC sections are passed through enclosed in LiteralLayout tags.  

EQN TRANSLATION

EQN sections are passed through enclosed in LiteralLayout tags. After a delim statement has been seen, inline eqn delimiters are translated into comments cotaining an "eqn" pseudo-tag-pair.  

TROFF TRANSLATION

The troff translation is meant only to support interpretation of the macro sets. It is not useful standalone.

The .nf and .fi macros are interpreted as literal-layout boundaries. Calls to the .so macro either cause inclusion or are translated into SGML/XML entity inclusions (see above). Calls to the .ul and .cu macros cause following lines to be wrapped in an Emphasis tag with a Remap attribute of "U". Calls to .ft generate corresponding start or end emphasis tags. Calls to .tr cause character translation on output. These are the only troff requests we translate to DocBook. The rest of the troff emulation exists because macro packages use it intenally to expand macros into elements that might be structural.

Requests relating to macro definitions and strings (.ds, .as, .de, .am, .rm, .rn, .em) are processed and expanded. The .ig macro is also processed.

Conditional macros (.if, .ie, .el) are handled. The built-in conditions o, n, t, and e are evaluated as if for nroff on page one of a ducument. String comparisons are evaluated by straight textual comparison. All numeric expressions evaluate to true.

The .tm macro writes its arguments to standard error. The .pm macro reports on defined macros and strings. These facilities may aid in debugging your translation.

All other troff requests are ignored but passed through into SGML/XML comments. A few (such as .ce) also trigger a warning message.  

RETURN VALUES

On successful completion, the program returns status 0. Any error in reading or writing files causes it to return 1. It returns 2 if some file or standard input could not be translated. Translation can fail under the following conditions:

*
One of the input sources is actually a reference to another file via .so inclusion.
*
The NAME section of a manual page is ill-formed (doesn't consist of one line with a recognizable name/description separator).
*
Two or more consecutive .TP headers precede a list item, but are not parseable as a CommandSynopsis section.

Note that a zero return does not guarantee that the output is valid Docbook. In some cases fixups by hand may be necessary.  

BUGS AND WARNINGS

Function synopsis translation in man pages is not implemented.

It is debatable how the man macros .HP and .IP without tag should be translated. We treat them as an ordinary paragraph break. We could visually simulate a hanging paragraph with list markup, but this would not be a structural translation.

Translating eqn delimiters as a commment pseudo-tag is cheesy, but the DocBook 4.1 DTD doesn't have any better alternatives. Neither Phrase nor Emphasis can occur inside footnotes.

There is a conflict between Berkeley ms's documented .P1 print-header-on-page request and an undocumented Bell Labs use for displayed program and equation listings. The ms translator chooses the Bell Labs interpretation because (a) it's structural, and (b) otherwise we'd have to throw out the paired .P2 request.

The crude treatment of conditionals relies on the assumption that conditional macros never generate structural or font-highlight markup that differs between the if and else branches. This appears to be true of all the standard macro packages, but if you roll any of your own macros you're on your own.

The Bell logo, pointing-left-hand, pointing-right hand, and composition characters for large math brackets from old troff are not supported, as there are no ISO entity equivalents.  

REQUIREMENTS

doclifter was written in Python 2.2a1. It will not work under Python 1.5.2. It should work under Python 2.1, but this has not been tested.  

SEE ALSO

man(7), ms(7), me(7), troff(1).  

AUTHOR

Eric S. Raymond <esr@thyrsus.com>


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
TRANSLATION RULES
MAN TRANSLATION
POD2MAN TRANSLATION
MANDOC TRANSLATION
MS TRANSLATION
ME TRANSLATION
TBL TRANSLATION
PIC TRANSLATION
EQN TRANSLATION
TROFF TRANSLATION
RETURN VALUES
BUGS AND WARNINGS
REQUIREMENTS
SEE ALSO
AUTHOR

This document was created by man2html, using the manual pages.
Time: 09:33:37 GMT, November 08, 2001