home *** CD-ROM | disk | FTP | other *** search
- (Message inbox:31)
- Return-Path: comp.mail.sendmail
- From: rickert@mp.cs.niu.edu (Neil Rickert)
- Date: Mon, 29 Apr 1991 14:40:19 GMT
- Newsgroup: comp.mail.sendmail/2224
- Message-Id: <1991Apr29.144019.22206@mp.cs.niu.edu>
- Subject: Sendmail address rewrite sequence (was Re: From: , relaying and FAQ)
-
- ----------------------------
-
- ADDRESS REWRITING IN SENDMAIL.
-
- Prepared by Neil Rickert, Northern Illinois University, Apr 29, 1991.
- <rickert@cs.niu.edu>
-
- The author of this document has no relation whatsoever with the developers of
- sendmail at Berkeley. Consequently any errors are in no way the fault of
- Berkeley. No guarantee is made as to the accuracy of this document. Those
- who wish guaranteed accurate information must read the source code themselves.
-
- Any comments or corrections are welcome.
-
- -------------------------
-
- This description is intended to be a more thorough description of address
- parsing than is contained in the usual manual. Some basic familiarity with
- the manual is assumed. In particular, we do not attempt to explain the
- rewriting which occurs in a single rule, or the process in which an address
- passes from one rule to the next.
-
- The description contains a few oversimplifications, such as omitting mention
- of some special cases during internal processing of alias expansions. These
- simplifications are not directly related to the main issues covered.
-
- I make no guarantees as to the accuracy of this document. It is based on
- experience with developing and debugging rulesets, and with reading parts of
- the code. It mostly reflects my understanding of the status of sendmail based
- on the 5.65+IDA versions but with some cross checks with the Berkeley 5.65
- code.
-
- 1. A BRIEF OVERVIEW OF THE REWRITING PROCESS.
-
- Each address is initially rewritten by ruleset 3. Thereafter the
- processing depends on whether this is a recipient address or a sender
- address. A sender address is processed by ruleset 1, then by the ruleset
- declared in the mailer, and finally by ruleset 4. A recipient address,
- after processing by ruleset 3, is normally processed by ruleset 2, the
- ruleset declared in the mailer, and by ruleset 4.
-
- The above description is, of course, a gross oversimplification. We shall
- fill in some of the details below.
-
- 2. BASIC ADDRESS PARSING.
-
- In order to send a message, the recipient address must be parsed, so as to
- determine the transport mechanism and the next step in the chain of mail
- relays. This is handled in the function parseaddr(), which is in
- parseaddr.c
-
- The basic parsing strategy is as follows:
-
- As with all rewriting, the address is first rewritten by ruleset 3. The
- result of this rewrite is now rewritten by ruleset 0. The purpose of
- ruleset 0 is to resolve the address into a triple: Mailer, Host,
- User_address. This resolution occurs when the left hand side matches a
- suitable rule, and the RHS is rewritten as:
-
- $# MAILER_NAME $@ HOST_NAME $: USER_ADDRESS
-
- Apart from the exceptions noted below, every address must resolve into
- such a triple. If not 'sendmail' will complain. We should clarify,
- however, that the user address is allowed to contain a host component.
- Thus a UUCP address of john@uunet.UUCP might resolve to:
- $# UUCP $@ uunet $: john
- while uunet!seismo!harry might resolve to
- $# UUCP $@ uunet $: seismo!harry
- (In all the above examples the spacing is for readability. We will comment
- more on spacing when we discuss the function prescan() later in this
- document).
-
- It should also be clarified that the HOST_NAME as returned by ruleset 0
- does not necessarily have anything to do with the domain component of the
- original address. It is the name of the host to which the message will
- be next sent as known to the transmission software. For the tcp mailers,
- this is the fully qualified domain name. For other mailers it may be
- different. In our example above, although the fully qualified uucp name
- of uunet may be uunet.UUCP, the name known to the uucp software is just
- uunet, so that must be the host name returned from ruleset 0.
-
- There are two exceptions to the above description. The local mailer, and
- the ERROR mailer. Both of these must resolve to just MAILER_NAME and
- USER_ADDRESS, since a host does not make sense. (The IDA versions of
- 'sendmail' permit the use of a triple with the local mailer, but ignore
- the host). In the case of the ERROR mailer, the USER_ADDRESS is actually
- the error message. Thus one could imagine a rewrite rule in ruleset 0.
-
- R$+@annex1.$D $#ERROR $:terminal servers don't accept mail
-
- It is a basic principle of 'sendmail' that ruleset 0 must select a mailer,
- even if only the ERROR mailer.
-
- An address which resolves to the ERROR mailer is said to be 'unparseable'
- and normally leads to bounced mail.
-
- 3. DIFFERENT CLASSES OF ADDRESSES: ENVELOPES AND HEADERS.
-
- 'sendmail' must deal with addresses on the message envelope, and addresses
- on the headers inside the message. The distinction between envelope and
- header is quite important for an understanding of how 'sendmail'
- functions. However it can be confusing at first.
-
- In basic terms, the "header recipients" are the addresses on the "To:" and
- "Cc:" header lines. (We could include the "Bcc:" header, but this is
- usually discarded.) The "header senders" are the addresses on the "From:"
- and the "Reply-To:" headers. Sometimes there may also be "Resent-To:",
- "Resent-Cc:", "Resent-From:" headers with additional header addresses.
-
- The "envelope recipients" are the addresses the messages is really being
- sent to. They are often different from the "header recipients". The
- "envelope sender" is the real sender of the message, and is occasionally
- different from the "header senders".
-
- To the novice it often not obvious why there need be a distinction between
- header and envelope addresses. When a typical message originates the
- envelope and header addresses are the same. However much can change
- during processing. A message might be sent to two recipients A and B on
- different machines, host-A and host-B. Initially both addresses are in
- the envelope. However when the message is relayed to host-A, only
- recipient A remains in the envelope. Likewise the copy of the message
- sent to host-B will contain only B in the envelope. Roughly speaking, as
- a message is delivered to a recipient address, that address is deleted
- from the envelope. However all recipient addresses remain in the headers,
- as these provide documentation to the human reader of the message as to
- whom the message was sent to.
-
- As another example, you may have a '.forward' file in your home directory
- to forward your mail. The forwarding action applies only to the envelope
- recipients of a message, while the addresses on the header are not
- affected.
-
- The distinction between header senders and envelope sender is sometimes
- harder to explain. We start with an example unrelated to computing.
- Suppose as a gift you purchase for a friend a subscription to "Scientific
- American". As part of the process you fill out a gift card which the
- publishers will send. When the gift card arrives, the card says it is
- from you. However the return address on the envelope is that of
- Scientific American. Essentially you were the header sender, but
- Scientific American was the envelope sender. For a comparable example in
- computing, suppose you post an article in Usenet news. This article comes
- from you, and this is shown in the "From:" header. However at some other
- site, a copy of your article is mailed to someone without direct Usenet
- news access. The Usenet news system on the mailing host is the real
- sender, or envelope sender, while as the author of the article you remain
- the header sender.
-
- In simple terms, the addresses in the headers are for humans to read,
- while the addresses in the envelope are for machines to read.
-
- 4. WHERE DO ENVELOPES COME FROM AND WHERE DO THEY GO.
-
- When mail is sent or received over the network with SMTP, the envelope
- sender is transmitted first in the 'MAIL-From' SMTP command. Then a
- sequence of 'RCPT-To' commands transfer the envelope recipients.
-
- When you send a normal message on your system, the Mail User Agent (MUA)
- invokes sendmail as:
-
- /usr/lib/sendmail recipient1 recipient2 recipient3 ...
-
- Thus the envelope recipients are passed as parameters on the argument
- list. The envelope sender, in that case, is taken from the uid of the
- person sending the message. It is also possible to invoke sendmail as
-
- /usr/lib/sendmail -fsender recipient1 recipient2 recipient3 ...
-
- and thereby give a separate envelope sender. However sendmail normally
- will ignore the '-fsender' operand unless given by a trusted user such as
- 'root' or 'uucp' or 'daemon'.
-
- When 'sendmail' invokes /bin/mail for delivery to your mailbox, it uses a
- parameter list something like:
-
- /bin/mail -r sender -d recipient1 recipient2 recipient3 ...
-
- Thus the envelope information is passed along to /bin/mail in a manner
- similar to the way 'sendmail' received it. If mail is sent out through
- UUCP, the envelope sender is recorded on the unix "From " line which is
- the first line of the message, and the envelope recipients become the
- command operands to 'rmail' on the remote system.
-
- 5. TOKENIZING AND REBUILDING ADDRESSES. prescan() AND cataddr().
-
- Before an address it processed by one of the rewrite rulesets, it must be
- tokenized. The rewriting rules only examine complete tokens and strings
- of tokens during their matching. After the rewriting is complete the
- tokenized address must be converted back to a character string. The
- function prescan() does the tokenizing, while cataddr() converts a
- tokenized address back to a character string.
-
- The output of prescan() is an array of strings, with one array entry for
- each token. The tokenizing is done based on the characters in $o, defined
- in the line beginning 'Do' in 'sendmail.cf'. There are, additionally, a
- few characters whose special handling is built into prescan().
-
- In an address such as the following (from RFC822):
-
- Wilt . (the Stilt) Chamberlain@NBA.US
-
- the parenthesized comments "(the Stilt)" are filtered out by prescan().
- (However when such an address appears on a header, the comments are saved
- for reinsertion after the address has been rewritten). prescan() also
- insists on properly balanced parentheses and properly balanced <angle
- brackets>. A string between "double quotes" becomes a single token. With
- one exception, an escaped character (such as \@) loses any special
- properties. Thus the address 'user@host' would ordinarily become the
- three tokens "user", "@", "host", while the address 'user\@host' becomes a
- single token. The one exception to the backslash escaping is \!, which is
- simply converted to !. The special handling is an attempt to be somewhat
- forgiving to csh users who sometimes become overly zealous in escaping
- their bangs.
-
- In tokenizing, special characters (those characters defined in $o) become
- single character special tokens. A string of ordinary characters becomes
- a single token. The space character, however, is never part of a token
- except when in a quoted string or when backslash escaped. The space
- character is a token separator. Thus the string AB become one token "AB",
- while the string A B becomes two tokens "A", "B". A space before or after
- a special character, however, is completely superfluous. Thus user @ host
- is tokenized as "user","@","host", exactly as would be user@host
-
- The function cataddr() sounds simple. Just concatenate all the tokens to
- form a new string. This is almost what it does. But you have to make a
- special case where there are two ordinary tokens. If 'A B' is tokenized
- as "A","B", then just concatenating the strings would produce the
- incorrect 'AB'. Instead, when cataddr() discovers two consecutive
- ordinary tokens, in inserts between them the space substitute character
- defined on the 'OB' line of 'sendmail.cf'. In most versions of
- 'sendmail.cf' this character is the period '.' leading to the effect that
- the original 'A B' was tokenized to "A","B", then is untokenized to 'A.B'
- If you don't like this, you can define the space substitute as a blank.
- If you do so, but forward the message to another 'sendmail' it will
- probably be converted to a period again. (One problem with defining the
- space substitute as a blank is that this can easily become invisible in
- 'sendmail.cf', and if it is accidently deleted you might find you are
- accidently defining the space substitute to be '\n' or '\0').
-
-
- 6. THE C-FLAG PROCESSING.
-
- Certain mailers have the C flag set in their mailer definition. If
- effects processing of addresses as follows:
-
- After an address has been rewritten by ruleset 3, a check is done to see
- if the address now contains an "@". If it already contains an "@", then
- C-FLAG processing has no effect. If there is no "@", the 'receiving
- mailer' is checked. (The determination of the receiving mailer is
- described below). If the receiving mailer has the C-FLAG defined, and if
- the sender address in $f contains an '@', everything from the '@' onward
- in $f is appended to the end of the address as outputted by ruleset 3, and
- the modified address is now reprocessed by ruleset 3. The actual check
- for the '@' in $f is done while the sender address is still tokenized, so
- there is no additional call to prescan() in the C-FLAG processing.
-
-
- 7. Mailer-specific rulesets. A typical mailer definition looks something
- like the following:
-
- Mlocal, P=/bin/mail, F=MFlusr, S=10, R=12, A=mail -d $u
-
- Here the S= and R= operands specify address rewrite rulesets to be used
- for mail sent by this mailer. The S= operand is for sender addresses, and
- the R= is for recipient addresses. We shall refer to these as the mailer-
- specific rulesets. In the IDA version of sendmail, these can optionally
- be specified as say S=13/15, etc. This would use ruleset 13 for envelope
- sender addresses and ruleset 15 for header sender addresses. The simple
- definition S=10 is equivalent to S=10/10.
-
- If the mailer specific ruleset is omitted, or is defined as 0, this means
- that there is no mailer-specific ruleset. Ruleset 0 itself is never used
- as a mailer-specific ruleset.
-
- 8. A MORE DETAILED LOOK AT SENDMAIL PROCESSING.
-
- 8a. Parsing the sender address.
-
- One of the first steps is to parse the envelope sender address. This
- follows the procedure of address parsing as described above.
-
- prescan()
- rewrite with ruleset 3.
- rewrite with ruleset 0.
- rewrite the user address portion with rulesets 2, the mailer specific
- ruleset, and ruleset 4.
- process the output of ruleset 4 with cataddr()
-
- If the address proves unparseable a message is written to the log, and the
- address 'Postmaster' is parsed in its place. One effect is that should
- the mail be undeliverable, and the sender address is unparseable, any
- bounced mail will in this case be delivered to Postmaster.
-
- The mailer returned from ruleset 0 is called the receiving mailer, and is
- examined for the presence of a C flag as discussed above in our
- description of C-FLAG processing.
-
- If the receiving mailer is the local mailer, the sender is assumed local.
- In that case the user address is looked up in the password file to find
- the full name (for possible use on the 'From:' header) and the home
- directory (in case the message should be stored in dead.letter).
-
- 8b. Defining $f.
-
- The sender address is processed with
-
- prescan()
- rulesets 3,1,4
-
- Search for an '@' in the address, and save a copy of the portion of the
- tokenized address starting with the '@', in case needed for C-FLAG
- processing.
-
- apply cataddr() to the output of ruleset 4, and save as the value of $f
-
- NOTE: Even if the original address was determined to be unparseable in
- step 8a, and replaced by Postmaster, it is still the original address and
- not 'Postmaster' which is used for defining $f. However there is one
- anomoly here. If the message cannot be delivered immediately but must be
- queued, and if the original sender address was unparseable, the original
- sender address is not saved in the queue file, so the sender will become
- 'Postmaster' in that case.
-
- 8c. Building the recipient list.
-
- Each envelope recipient address is now added to an internal list of
- recipients. Before an address is added to the recipient list, it goes
- through the following procedures:
-
- prescan()
- rewrite with ruleset 3
- rewrite with ruleset 0
- The mailer and host returned are saved. The user portion is further
- processed:
- rewrite with ruleset 2
- rewrite with the mailer specific ruleset (for the mailer returned by
- ruleset 0)
- rewrite with ruleset 4.
- cataddr()
-
- Next the recipient list is searched to see if this address already
- appears. The search is based on a comparison of the mailer returned by
- ruleset 0, the host returned by ruleset 0 (except the host is ignored if
- the 'l' flag is set for the mailer), and the user name as output from
- cataddr(). If the address is a duplicate it is not actually added to the
- recipient list.
-
- Next the mailer is checked to see if the local mailer. If so, the address
- is looked up in the aliases database. If there is an aliases entry the
- current address is flagged QDONTSEND so that mail will not be sent to it,
- and each entry in the alias expansion recursively goes through the same
- process for adding to the recipient list. An additional flag is set for
- aliases to indicate they are indeed aliases.
-
- Next, if the mailer is the local mailer, and if the QDONTSEND flag is not
- set, there is a test to see if the user address begins with '|' after
- removal of quotes. If so, and if the address is from an alias expansion,
- the mailer is changed to the 'prog' mailer, and the initial '|' is
- removed.
-
- Next, if the mailer is local, the name is looked up in /etc/passwd, the
- home directory searched for a '.forward' file which, if found, is treated
- to similar processing as an aliases entry.
-
- If at any stage something goes wrong, the address is flagged as bad for a
- later bounce message.
-
- 8d. Beginning the delivery phase.
-
- Once the recipient list is complete, sendmail is ready to attempt
- delivery. This involves running down the recipient list. As an address
- is selected for a delivery attempt, it is marked QDONTSEND which is
- approximately the equivalent of deleting it from the recipient list. This
- is to ensure it will not be sent twice.
-
- Once an address is found for a delivery attempt, a check is made to see if
- the 'm' flag is set in the mailer. If so, sendmail will attempt to send
- to as many recipients with the same mailer/host combination as possible in
- a single operation.
-
- If delivery requires sending to several hosts, the next few steps will be
- repeated several times.
-
- 8e. Determining the envelope sender for delivery.
-
- Start with the expansion of $f
- prescan()
- rewrite with rulesets 3, C-FLAG processing, 1, the mailer specific
- ruleset, and 4.
- cataddr()
- The result is assigned to $g
-
- Note that, because of the way $f was originally determined, this means the
- original incoming address has been processed by 3,1,4,3,1,mailer
- specific,4
-
- 8f. Determining the command.
-
- The command (the P= and A= operands) from the mailer definition are now
- expanded, with $h evaluating to the host. If there are multiple
- recipients, and $u is in the last argument, multiple such arguments are
- created, one for each recipient for this delivery transaction. If there
- is no $u in the argument list, the SMTP protocol is used instead.
-
- For SMTP use only, the recipient address is now processed by
- prescan(), ruleset 3, C-FLAG processing, 2, mailer specific, 4, cataddr().
- The IDA versions, however, do not do this additional rewriting step which
- seems superfluous, and which could conceivably cause mailing loops because
- of the C-FLAG processing.
-
- 8g. Header rewriting.
-
- The headers are now sent to the mailer program as part of the message.
- Before sending them they are subject to any rewriting. If the required
- 'From:' header does not exist it is created from the definition in
- 'sendmail.cf'. If the required 'To:' is missing an 'Apparently-To:'
- header is created.
-
- A special word on the 'From:' header. If the incoming message has a
- 'From:' header whose contents are identical in all respects (except
- leading and trailing white space) to the envelope header, that 'From:'
- header is deleted. The assumption is that it will be recreated with a
- chance of adding the full name.
-
- 8h. Header sender rewriting.
-
- The address is extracted from the 'From:' header and other similar headers
- such as 'Reply-To:' and 'Resent-From:', carefully saving the comments for
- later use. The address is processed with
- prescan(), 3, C-FLAG processing, 1, Mailer specific, 4, cataddr().
-
- The IDA versions use ruleset 5 in place of ruleset 1. This is part of the
- IDA strategy of allowing a distinction between the formatting of headers
- and the formatting of the envelope.
-
- If there was no "From:", or if it was deleted and must be recreated, the
- usual definition of "From:" in most version of 'sendmail.cf' begins with
- $g, or with $q which is defined in terms of $g. In that case, and
- remembering how $f and then $g are determined, the address on the 'From:'
- goes through the following steps:
-
- incoming envelope sender
- prescan(), rulesets 3,1,4, cataddr()
- prescan(), rulesets 3, C-FLAG, 1, Mailer specific, 4, cataddr()
- prescan(), rulesets 3, C-FLAG, 1, Mailer specific, 4, cataddr()
-
- You will note the large amount of redundancy. If designing rulesets you
- must keep this in mind. In particular you should be wary of approaches
- which give different results depending on how many times the address is
- passed through rulesets 3,1,Mailer-specific,4. In the current IDA/NIU
- rulesets, the $q variable is defined in terms of $f instead of $g, in
- order to eliminate the most troublesome one of these extra rewrites, in
- which a header address is rewritten with a mailer specific ruleset
- intended for envelope addresses only.
-
- 8i. Recipient header rewriting.
-
- Each recipient address on a "To:" or "Cc:" or "Apparently-To:" or
- "Resent-To:" or "Resent-Cc:" header goes through the following steps:
-
- prescan(), rulesets 3, C-FLAG, 2, Mailer specific, 4, cataddr().
-
- In the above, the IDA versions use ruleset 6 in place of ruleset 2. Again
- this is to allow header addresses to be formatted differently from
- envelope addresses.
-
- 9. GENERAL COMMENTS ON DESIGNING RULESETS.
-
- If you plan on designing your own 'sendmail.cf', or modifying an existing
- one to add more functionality, here are some things to keep in mind:
-
- 9a. Remember the C-FLAG.
-
- Because of the C-FLAG processing, it is desirable that every address which
- contains a host name should contain an '@' by the end of ruleset 3, and
- addresses without hostname should not have one added in ruleset 3.
-
- This also means that if you want to rewrite 'user' as
- 'user@your.full.domain' you should not do it in ruleset 3, but should do
- it somewhat later.
-
- This means that an address like 'uunet!seismo!harry' needs to be converted
- to something like 'seismo!harry@uunet.UUCP' in ruleset 3, so as to ensure
- that there is an '@' in the address and C-FLAG processing doesn't
- incorrectly add another domain level.
-
- 9b. Make major rewriting steps reversible.
-
- Ideally any address processed by ruleset 3 followed by ruleset 4 should
- finish up in its original form. Anything non-reversible done in ruleset 3
- can never be fully compensated for later. This is difficult to do in
- practice, however. In many versions of sendmail.cf, both 'uunet!john' and
- 'john@uunet.uucp' will be rewritten as 'john<@uunet.UUCP>' in ruleset 3,
- so the original form cannot always be recovered by use of ruleset 4. What
- is probably more critical is that using rulesets 3,4,3,4 should yield the
- same result as just using 3,4. If your rulesets don't manage at least
- this degree of consistency you are likely to run into major problems.
-
- The IDA/NIU rulesets are pretty close to the ideal that ruleset 4
- completely reverse ruleset 3. But to achieve this they use in internal
- form which only vaguely looks like the original address. Thus in these
- ruleset 3 would rewrite 'uunet!john' as '<@uunet.UUCP>!john' while they
- would rewrite 'john@uunet.UUCP' as '<@uunet.UUCP>,john'. And, of course,
- ruleset 4 is rather more complex also. (This approach in IDA/NIU is not
- simply a matter of purism. It has to do with the need to be able to
- unambiguously merge two addresses in the pathalias file lookup.)
-
- 9c. Allow for the fact that sender addresses may be rewritten multiple times.
-
- The habit of sendmail of reprocessing sender addresses can cause some
- problems and result in incorrect addresses if you do not properly allow
- for it.
-
- 9d. Be particularly cautious with how you handle envelope recipients.
-
- Basically if you mess these up, the mail probably won't go anywhere, or
- won't go where you want it.
-
- As an example, suppose I have a uucp neighbor 'uuhost'. Suppose my
- neighbor wants all mail to leave in the format
- 'uuhost!user@my.full.domain'. Now it might be that I become a little too
- vigorous in my changes, and I even rewrite the addresses that way in mail
- sent to 'uuhost'. If this happens to a header address the problem is not
- very serious. At the worst, when someone on 'uuhost' sends a reply, the
- reply will be sent first to my system, then back to 'uuhost'. But if I
- make the same transformation to the envelope, I will cause a mailing loop.
- In other words, if mail I see destined for 'uuhost!user' is sent to
- 'uuhost' with the user address of 'uuhost!user@my.full.domain', the mailer
- on 'uuhost' is likely to just send it back. Then the same thing happens
- over and over again.
-
-
- --
- =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
- Neil W. Rickert, Computer Science <rickert@cs.niu.edu>
- Northern Illinois Univ.
- DeKalb, IL 60115 +1-815-753-6940
-