home *** CD-ROM | disk | FTP | other *** search
- From: Eliot Lear <lear@NET.BIO.NET>
-
- The following was written by Dr. Charles Hedrick of Rutgers University
- sometime in 1985. Please read it with the understanding that rule
- numbers are nothing more than function names. For further reference,
- I suggest the Sun Tutorial on Sendmail in their manuals.
- -eliot
-
- Command: followup
- Newsgroups: net.unix-wizards,net.mail
- To: steve@jplgodo.UUCP
- Subject: a brief tutorial on sendmail rules
- Distribution:
- References: <902@rlgvax.UUCP> <545@jplgodo.UUCP>
-
- A previous message suggested using "sendmail -bt" to see how sendmail
- is going to process an address. This is indeed a handy command for
- testing how an address will be processed. However the instructions
- given were not quite right. To see how sendmail is going to deliver
- mail to a given address, a reasonable thing to type is
- sendmail -bt
- 0,4 address
- Even this isn't quite right, but with "normal" rule sets it should work.
-
- Because there is so much confusion about sendmail rules, the rest of
- this message contains a brief tutorial. My own opinion of sendmail is
- that it is quite a good piece of work. Many people have complained
- about the difficulty of understanding sendmail rule sets. However I
- have also worked with mailers that code address processing directly
- into the program. I much prefer sendmail. The real problem is not
- with sendmail, but with the rules. The rules normally shipped from
- Berkeley have lots of code that does strange Berkeley-specific things,
- and they are not commented. Also, typical complex rule sets are
- trying to handle lots of things, forwarding mail among several
- different mail systems with incompatible addressing conventions. A
- rule set to handle just old-style (non-domain) UUCP mail would be very
- simple and easy to understand. But real rule sets are not doing
- simple things, so they are not simple.
-
- For those not familiar with sendmail, -bt invokes the rule tester. It
- lets you type a set of rule numbers and an address, and then shows you
- what the rules will do to that address. In addition, rule test mode
- automatically applies rule 3 before whatever rule you ask it to apply.
- As we will see shortly, this is a reasonable thing to do.
-
- Before describing the rule sets, let me define two terms: "header" and
- "envelope". Header refers to the lines at the beginning of the
- message, starting with "from:", "to:", "subject:", etc. Sendmail does
- process these lines. E.g. with uucp mail it will add its own host
- name at the beginning of the from line, so that the final recipient
- stands some change of replying to the message. However sendmail
- normally does not depend upon the from and to lines to perform its
- actual delivery. It has more direct knowledge, passed on to it from
- the program that generated the mail, or if it came from another site,
- the mailer at that site. This information is referred to as the
- "envelope", since it is like the addresses on the outside of an
- envelope. For Arpanet mail, the envelope is passed to the next site
- by the MAIL FROM: and RCPT TO: commands. For UUCP mail, it is passed
- on as arguments to the remote rmail command. To see why there have to
- be separate addresses "on the envelope", consider what happens when
- you send mail to "john@vax, mary@sun". Two copies of the message will
- be dispatched, one to vax and the other to sun. The "to: " line in
- the headers will show both addresses. However the envelope will show
- only the right address that we want this copy to go to. The copy sent
- to vax will show "john@vax" and the copy sent to sun will show
- "mary@sun". If sendmail had to look at the "to: " line, it would
- never know which of the addresses shown there it was responsible for
- handling.
-
- Anyway, here is what the rules do:
-
- 3: always done first. This turns addresses from their normal textual
- form into a form that the rest of the rules understand. In most
- cases, all it does it put < > around the name of the host that is next
- in line. Thus foo@bar turns into foo<@bar>. However it also does a
- few transformations. E.g. it turns foo!bar!user into
- bar!user<@foo.UUCP>. Since sendmail accepts either ! syntax or
- @....UUCP syntax, rule 3 standardizes on @ syntax. It also does a few
- other minor things. But you won't be far off if you just think of it
- as adding < > around the host name.
-
- 4: always done last. This turns addresses from internal form back
- into external form. It removes the < > around the host name, and
- turns foo@bar.UUCP back into bar!foo. Again, there are one or two
- other minor things, but you won't be too far off if you think of 4 as
- just removing the < > around the host name.
-
- 0: This is the rule that handles the destination address on the
- envelope. It is in some sense the primary rule. It returns a triple:
- protocol, host, user. The protocol is usually one of local, TCP, or
- UUCP. At the moment, it figures this out syntactically. In our rule
- set, hosts ending in .UUCP are handled by UUCP, the current host is
- local, and everything else is TCP. As domains are integrated into
- UUCP, obviously this rule is going to change. This rule does very
- little other than simply look at the format of the host name, though
- as usual a few other details are involved (e.g. it removes the local
- host. So myhost!foo!bar will be sent directly to foo).
-
- 1 and 2 are protocol-independent transformations used for sender and
- recipient lines in the header (i.e. from: and to: lines). In our
- rule sets, they don't do anything.
-
- Each protocol has its own rules to use for sender and recipient lines
- in the header. E.g. UUCP rules might add the local host name to the
- beginning of the from line and remove it from the to line. In our
- rule set, the complexities in these rules are primarily caused by
- forwarding between UUCP and TCP. The line that defines the mailer for
- a protocol lists the rule to use for source and recipient, in the S=
- and R=.
-
- Finally, here is the exact sequence in which these rules are used.
- For example, the first line means that the destination specified in
- the envelope is processed first by rule 3, then rule 0, then rule 4.
-
- envelope recipient: 3,0,4 [actually rule 4 is applied only to the
- user name portion of what rule 0 returns]
- envelope sender: 3,1,4
- header recipient: 3,2,xx,4 [xx is the rule number specified in R=]
- header sender: 3,1,xx,4 [xx is the rule number specified in S=]
-
- I have the impression that the sender from the envelope (the
- return-path) may actually get processed twice, once by 3,1,4 and the
- second time by 3,1,xx,4. However I'm not sure about that.
-
- Now for the format of the rules themselves. I'm just going to show
- some examples, since sendmail comes with a reference manual, which you
- can refer to. However these examples are probably enough to let you
- understand any set of rules that makes sense in the first place (which
- the normal rules do not). This example is from our UUCP definition.
- It a simplified version of the set of rules used to process the sender
- specification. As such, the major thing it has to do is to add our
- host name to the beginning, so that the guy at the end will know that
- the mail went through us.
-
- S13
- R$+<@$-.UUCP> $2!$1 u@host.UUCP => host!u
- R$=U!$+ $2 strip local name
- R$+ $:$U!$1 stick on our host name
-
- Briefly, the first rule turns the address from the form foo<@bar.UUCP>
- back into bar!foo. The second rule removes our local host name, if
- it happens to be there already, so we don't get it twice. The third
- rule adds our host name to the beginning.
-
- S13 says that this is the beginning of a new rule set, number 13.
-
- R$+<@$-.UUCP> $2!$1 u@host.UUCP => host!u
-
- R says that this is a rule. The thing immediately after it,
- $+<@$-.UUCP> is a pattern. If this pattern matches the address, then
- the rule "triggers". If the rule triggers, the address is replaced
- with the "right hand side", i.e. what is after the tab(s). In this
- rule, the right hand sie is $2!$1. The thing after the next tab(s) is
- a comment. This rule is used in processing UUCP addresses. As noted
- above, by the time we get to it, rule 3 has already been applied. So
- if we had a UUCP address of the form host1!host2!user, it would now be
- in the form host2!user<@host1.UUCP>. This does match the pattern:
-
- $+ <@$- .UUCP>
- host2!user<@host1.UUCP>
-
- $+ and $- are "wildcards" that match anything. $- will match exactly
- one word, while $+ will match any number. (By the way, with the
- increasing use of domains, this production should probably use
- $+.UUCP, not $-.UUCP.) Since the pattern matches, we replace this
- with the "right hand side" of the rule, $2!$1. $ followed by a digit
- means the Nth thing matched by a wildcard. In this case there were
- two wildcards, so
- $1 = host2!user
- $2 = host1
- The final result is
- host1!host2!user
- As you can see, we have simply turned UUCP addresses from the format
- produced by rule 3 back into normal ! format.
-
- The second rule is
-
- R$=U!$+ $2 strip local name
-
- This is needed because there are situations in which our host name
- ends up on the beginning of the recipient address. Since we are
- about to add our host name, we don't want it to be there twice.
- So if it was there before, we remove it. $= is used to see if
- something is a member of a specified "class". U happens to be a list
- of our UUCP host name and any nicknames. So $=U!$+ matches
- any address that begins with our host name or nickname, then !, then
- anything else. Suppose we had topaz!host1!host2!user. The
- match would be
-
- $=U !$+
- topaz!host1!host2!user
-
- The result of the match is that
-
- $1 = topaz
- $2 = host1!host2!user
-
- Since the right hand side of this rule is simply "$2", the result is
-
- host1!host2!user
-
- I.e. we have removed the topaz from the beginning. By the way, the
- class U used by the rule would have been defined earlier in the file
- by the statement
-
- CUtopaz ru-topaz
-
- C defines a class. U is the name of the class. The rest of the
- line is the list of things that will be in the class.
-
- Finally we have the rule
-
- R$+ $:$U!$1 stick on our host name
-
- The $+ matches anything. In this case the name is host1!host2!user, so the
- result of the match is
-
- $1 = host1!host2!user
-
- The result looks slightly obscure. $: is a tag that says to do this
- only once. The problem is that this rule always applies, since the
- pattern matches anything. Normally, rules are applied over and
- over, as long as they apply. In this case, the result would be
- an infinite loop. Putting $: at the beginning says to do it only
- once. $U says to use the value of the macro U. Earlier in the
- file we defined U as our UUCP host name, with a definition
-
- DUtopaz
-
- Note that there can be a class and a macro with the same name.
- $=U tests whether something is in the class U. $U is replaced
- by the value of the macro U.
-
- So the final value of this rule, $:$U!$1, is
-
- topaz!host1!host2!user
-
- So this rule has managed to add our host name to the beginning, as it
- was supposed to. Since there are no further rules in the set (the
- next line is the end of file or the beginning of a new rule set),
- this value is returned.
-
- There are several more magic things that can appear in a pattern.
- The most important are:
-
- $* - this is another wild card. It is similar to $+, but $+ matches
- anything, whereas $* matches both anything and nothing. I.e. $+
- matches 1 or more tokens and $* matches 0 or more tokens. So here
- is a list of the wildcards I have mentioned:
-
- $* 0 or more
- $+ 1 or more
- $- exactly 1
- $=x any member of class x
-
- A typical example of $* is a production where we aren't sure whether
- the user name is before or after the host name:
-
- R$*<@$+.UUCP>$* $@$1<@$2.UUCP>$3
-
- This production would test for the host name ending in .UUCP, and
- return immediately. $@ is a flag you haven't seen yet. It is simply
- a return statement. It causes the right hand side of this rule to be
- returned as the final value of this rule set.
-
- The other magic thing I will mention is $>. This is a subroutine
- call. Here is an example taken from rule set 24, which is used to
- process recipients in TCP mail. Its purpose is to handle the
- situation where we might have an address like topaz!user@red. (Our
- host name is topaz. Red is a local host that we talk to via TCP.)
- I.e. someone is asking us to relay mail to red. Rule 3 will have
- turned this into user@red<@topaz.UUCP>. What we want to do is
- get rid of the topaz.UUCP and treat red as the host. (Rule set 0
- would do this for the recipient on the envelope. This rule is
- used for the to: field in the header.) Here is the rule.
-
- R$+<@$=U.UUCP> $@$>9$1 in case local!a@b
-
- The pattern matches our example, as follows:
-
- $+ <@$=U .UUCP>
- user@red<@topaz.UUCP>
-
- Recall that $+ matches anything and $=U tests whether something is our
- UUCP host name or one of our nicknames. The result of the match is
-
- $1 = user@red
- $2 = topaz
-
- The right hand side is $@$>9$1. The $@ is the tag saying to stop the
- rule set here and return this value. $>9 is a subroutine call. It
- says to take the right hand side, pass it to rule set 9, and then
- use the value of rule set 9. The actual right hand side is simply
- $1, which in this case is user@red. Here is rule set 9:
-
- S9
- R$*<$*>$* $1$2$3 defocus
- R$+ $:$>3$1 make canonical
- R$+ $@$>24$1 and do 24 again
-
- The first rule simply removes < >. It is sort of a quick and dirty
- version of rule 4. In fact we have no < > left, since we have removed
- the <@topaz.UUCP>. So this rule does not trigger. (Now that I think
- about it, I suspect it is probably never going to trigger, and so is
- not needed.)
-
- The next rule is a simple subroutine call. It matches anything ($+
- matches any 1 or more token). The right hand side is $:$>3$1 The $:
- says to do it only once. Since the rule matches anything, you need
- this, or you will have an infinite loop. The $>3 says to call rule 3
- as a subroutine. The $1 is the actual right hand side. Since the
- left hand side matched the whole address, what this rule does is
- simply call rule set 3 on the whole address. Recall that rule set 3
- basically locates the host name and puts < > around it. So in this
- case the result is user<@red>. As you can see, it was not enough to
- remove <@topaz.UUCP>. That leaves us with no host name. We have to
- call rule 3 to find the current host name and put < > around it.
-
- The last rule is really just a goto statement. The pattern is $+,
- which matches anything, so it always triggers. The right hand side is
- $@$>24$1. The $@ is the return tag. It says to stop this rule set
- and return that value. $>24 says to call rule set 24. The actual
- right hand side is $1, so we call rule set 24 with the whole address.
- If you recall, this ruleset (9) was called from the middle of 24 when
- we found user@red<@topaz.UUCP>. So what we have done is to change
- this into user<@red> and say to start rule set 24 over again.
-
- I hope you have found this exposition useful. As a final convenience,
- here is a "reference card" for reading rule sets. Note that this
- contains only operators used by the rules. There are plenty of
- other facilities used in the configuration section which I am
- not documenting here. (I'd love to see someone produce a complete
- reference card.)
-
- wildcards:
- $* 0 or more tokens
- $+ 1 or more tokens
- $- exactly one token
- $=x member of class x (x must be a letter, lower/upper case distinct)
- $~x not a member of class x
-
- macro values (usable in pattern or on right hand side)
- $x value of macro x (x must be a letter, lower/upper case distinct)
- At least on the Pyramid, $x is replaced by the macro's value
- when the sendmail.cf file is being read in.
-
- on the right hand side:
- $n string matched by the Nth wildcard
- $>n call rule set N as a subroutine
- $@ return
- $: only do this rule once
-
- in rule 0, defining the return value
- $# protocol
- $@ host
- $: user
-
- Rutgers extensions, usable only on right hand side
- $%n take the string matched by the Nth wildcard, look it up in
- /etc/hosts, and if found use the primary host name
- $&x use the current value of macro x. x must be a letter.
- upper and lower case are treated as distinct.
-
-
-