home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- Network Working Group K. Moore
- Request for Comments: 1522 University of Tennessee
- Obsoletes: 1342 September 1993
- Category: Standards Track
-
-
- MIME (Multipurpose Internet Mail Extensions) Part Two:
- Message Header Extensions for Non-ASCII Text
-
- Status of this Memo
-
- This RFC specifies an Internet standards track protocol for the
- Internet community, and requests discussion and suggestions for
- improvements. Please refer to the current edition of the "Internet
- Official Protocol Standards" for the standardization state and status
- of this protocol. Distribution of this memo is unlimited.
-
- Abstract
-
- This memo describes an extension to the message format defined in RFC
- 1521 [1], to allow the representation of character sets other than
- ASCII in RFC 822 (STD 11) message headers. The extensions described
- were designed to be highly compatible with existing Internet mail
- handling software, and to be easily implemented in mail readers that
- support RFC 1521.
-
- 1. Introduction
-
- RFC 1521 describes a mechanism for denoting textual body parts which
- are coded in various character sets, as well as methods for encoding
- such body parts as sequences of printable ASCII characters. This
- memo describes similar techniques to allow the encoding of non-ASCII
- text in various portions of a RFC 822 [2] message header, in a manner
- which is unlikely to confuse existing message handling software.
-
- Like the encoding techniques described in RFC 1521, the techniques
- outlined here were designed to allow the use of non-ASCII characters
- in message headers in a way which is unlikely to be disturbed by the
- quirks of existing Internet mail handling programs. In particular,
- some mail relaying programs are known to (a) delete some message
- header fields while retaining others, (b) rearrange the order of
- addresses in To or Cc fields, (c) rearrange the (vertical) order of
- header fields, and/or (d) "wrap" message headers at different places
- than those in the original message. In addition, some mail reading
- programs are known to have difficulty correctly parsing message
- headers which, while legal according to RFC 822, make use of
- backslash-quoting to "hide" special characters such as "<", ",", or
- ":", or which exploit other infrequently-used features of that
-
-
-
- Moore [Page 1]
-
- RFC 1522 MIME Part Two September 1993
-
-
- specification.
-
- While it is unfortunate that these programs do not correctly
- interpret RFC 822 headers, to "break" these programs would cause
- severe operational problems for the Internet mail system. The
- extensions described in this memo therefore do not rely on little-
- used features of RFC 822.
-
- Instead, certain sequences of "ordinary" printable ASCII characters
- (known as "encoded-words") are reserved for use as encoded data. The
- syntax of encoded-words is such that they are unlikely to
- "accidentally" appear as normal text in message headers.
- Furthermore, the characters used in encoded-words are restricted to
- those which do not have special meanings in the context in which the
- encoded-word appears.
-
- Generally, an "encoded-word" is a sequence of printable ASCII
- characters that begins with "=?", ends with "?=", and has two "?"s in
- between. It specifies a character set and an encoding method, and
- also includes the original text encoded as graphic ASCII characters,
- according to the rules for that encoding method.
-
- A mail composer that implements this specification will provide a
- means of inputting non-ASCII text in header fields, but will
- translate these fields (or appropriate portions of these fields) into
- encoded-words before inserting them into the message header.
-
- A mail reader that implements this specification will recognize
- encoded-words when they appear in certain portions of the message
- header. Instead of displaying the encoded-word "as is", it will
- reverse the encoding and display the original text in the designated
- character set.
-
- NOTES
-
- This memo relies heavily on notation and terms defined STD 11, RFC
- 822 and RFC 1521. In particular, the syntax for the ABNF used in
- this memo is defined in STD 11, RFC 822, as well as many of the
- terms used in the grammar for the header extensions defined here.
- Successful implementation of this protocol extension requires
- careful attention to the details of both STD 11, RFC 822 and RFC
- 1521.
-
- When the term "ASCII" appears in this memo, it refers to the "7-
- Bit American Standard Code for Information Interchange", ANSI
- X3.4-1986. The MIME charset name for this character set is "US-
- ASCII". When not specifically referring to the MIME charset name,
- this document uses the term "ASCII", both for brevity and for
-
-
-
- Moore [Page 2]
-
- RFC 1522 MIME Part Two September 1993
-
-
- consistency with STD 11, RFC 822. However, implementors are
- warned that the character set name must be spelled "US-ASCII" in
- MIME message and body part headers.
-
- 2. Syntax of encoded-words
-
- An "encoded-word" is defined by the following ABNF grammar. The
- notation of RFC 822 is used, with the exception that white space
- characters MAY NOT appear between components of an encoded-word.
-
- encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
-
- charset = token ; see section 3
-
- encoding = token ; see section 4
-
- token = 1*<Any CHAR except SPACE, CTLs, and especials>
-
- especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
- <"> / "/" / "[" / "]" / "?" / "." / "="
-
- encoded-text = 1*<Any printable ASCII character other
- than "?" or SPACE>
- ; (but see "Use of encoded-words in message
- ; headers", section 5)
-
- Both "encoding" and "charset" names are case-independent. Thus the
- charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the
- encoding named "Q" may be spelled either "Q" or "q".
-
- An encoded-word may not be more than 75 characters long, including
- charset, encoding, encoded-text, and delimiters. If it is desirable
- to encode more text than will fit in an encoded-word of 75
- characters, multiple encoded-words (separated by CRLF SPACE) may be
- used.
-
- While there is no limit to the length of a multiple-line header
- field, each line of a header field that contains one or more
- encoded-words is limited to 76 characters.
-
- The length restrictions are included not only to ease
- interoperability through internetwork mail gateways, but also to
- impose a limit on the amount of lookahead a header parser must employ
- (while looking for a final ?= delimiter) before it can decide whether
- a token is an encoded-word or something else.
-
- The characters which may appear in encoded-text are further
- restricted by the rules in section 5.
-
-
-
- Moore [Page 3]
-
- RFC 1522 MIME Part Two September 1993
-
-
- 3. Character sets
-
- The "charset" portion of an encoded-word specifies the character set
- associated with the unencoded text. A charset can be any of the
- character set names allowed in an RFC 1521 "charset" parameter of a
- "text/plain" body part, or any character set name registered with
- IANA for use with the MIME text/plain content-type [3]. (See section
- 7.1.1 of RFC 1521 for a list of charsets defined in that document).
-
- Some character sets use code-switching techniques to switch between
- "ASCII mode" and other modes. If unencoded text in an encoded-word
- contains control codes to switch out of ASCII mode, it must also
- contain additional control codes such that ASCII mode is again
- selected at the end of the encoded-word. (This rule applies
- separately to each encoded-word, including adjacent encoded-words
- within a single header field.)
-
- When there is a possibility of using more than one character set to
- represent the text in an encoded-word, and in the absence of private
- agreements between sender and recipients of a message, it is
- recommended that members of the ISO-8859-* series be used in
- preference to other character sets.
-
- 4. Encodings
-
- Initially, the legal values for "encoding" are "Q" and "B". These
- encodings are described below. The "Q" encoding is recommended for
- use when most of the characters to be encoded are in the ASCII
- character set; otherwise, the "B" encoding should be used.
- Nevertheless, a mail reader which claims to recognize encoded-words
- MUST be able to accept either encoding for any character set which it
- supports.
-
- Only a subset of the printable ASCII characters may be used in
- encoded-text. Space and tab characters are not allowed, so that the
- beginning and end of an encoded-word are obvious. The "?" character
- is used within an encoded-word to separate the various portions of
- the encoded-word from one another, and thus cannot appear in the
- encoded-text portion. Other characters are also illegal in certain
- contexts. For example, an encoded-word in a "phrase" preceding an
- address in a From header field may not contain any of the "specials"
- defined in RFC 822. Finally, certain other characters are disallowed
- in some contexts, to ensure reliability for messages that pass
- through internetwork mail gateways.
-
- The "B" encoding automatically meets these requirements. The "Q"
- encoding allows a wide range of printable characters to be used in
- non-critical locations in the message header (e.g., Subject), with
-
-
-
- Moore [Page 4]
-
- RFC 1522 MIME Part Two September 1993
-
-
- fewer characters available for use in other locations.
-
- 4.1. The "B" encoding
-
- The "B" encoding is identical to the "BASE64" encoding defined by RFC
- 1521.
-
- 4.2. The "Q" encoding
-
- The "Q" encoding is similar to the "Quoted-Printable" content-
- transfer-encoding defined in RFC 1521. It is designed to allow text
- containing mostly ASCII characters to be decipherable on an ASCII
- terminal without decoding.
-
- (1) Any 8-bit value may be represented by a "=" followed by two
- hexadecimal digits. For example, if the character set in use
- were ISO-8859-1, the "=" character would thus be encoded as
- "=3D", and a SPACE by "=20". (Upper case should be used for
- hexadecimal digits "A" through "F".)
-
- (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
- represented as "_" (underscore, ASCII 95.). (This character may
- not pass through some internetwork mail gateways, but its use
- will greatly enhance readability of "Q" encoded data with mail
- readers that do not support this encoding.) Note that the "_"
- always represents hexadecimal 20, even if the SPACE character
- occupies a different code position in the character set in use.
-
- (3) 8-bit values which correspond to printable ASCII characters other
- than "=", "?", "_" (underscore), and SPACE may be represented as
- those characters. (But see section 5 for restrictions.)
-
- 5. Use of encoded-words in message headers
-
- An encoded-word may appear in a message header or body part header
- according to the following rules:
-
- (1) An encoded-word may replace a "text" token (as defined by RFC
- 822) in any Subject or Comments header field, any extension
- message header field, or any RFC 1521 body part field for which
- the field body is defined as "*text". An encoded-word may also
- appear in any user-defined ("X-") message or body part header
- field.
-
- Ordinary ASCII text and encoded-words may appear together in the
- same header field. However, an encoded-word that appears in a
- header field defined as "*text" MUST be separated from any
- adjacent encoded-word or "text" by linear-white-space.
-
-
-
- Moore [Page 5]
-
- RFC 1522 MIME Part Two September 1993
-
-
- (2) An encoded-word may appear within a comment delimited by "(" and
- ")", i.e., wherever a "ctext" is allowed. More precisely, the
- RFC 822 ABNF definition for "comment" is amended as follows:
-
- comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"
-
- A "Q"-encoded encoded-word which appears in a comment MUST NOT
- contain the characters "(", ")" or " encoded-word that appears in
- a "comment" MUST be separated from any adjacent encoded-word or
- "ctext" by linear-white-space.
-
- (3) As a replacement for a "word" entity within a "phrase", for
- example, one that precedes an address in a From, To, or Cc
- header. The ABNF definition for phrase from RFC 822 thus
- becomes:
-
- phrase = 1*(encoded-word / word)
-
- In this case the set of characters that may be used in a "Q"-
- encoded encoded-word is restricted to: <upper and lower case
- ASCII letters, decimal digits, "!", "*", "+", "-", "/", "=", and
- "_" (underscore, ASCII 95.)>. An encoded-word that appears
- within a "phrase" MUST be separated from any adjacent "word",
- "text" or "special" by linear-white-space.
-
- These are the ONLY locations where an encoded-word may appear. In
- particular, an encoded-word MUST NOT appear in any portion of an
- "addr-spec". In addition, an encoded-word MUST NOT be used in a
- Received header field.
-
- Each encoded-word MUST encode an integral number of octets. The
- encoded-text in each encoded-word must be well-formed according to
- the encoding specified; the encoded-text may not be continued in the
- next encoded-word. (For example, "=?charset?Q?=?= =?charset?Q?AB?="
- would be illegal, because the two hex digits "AB" must follow the "="
- in the same encoded-word.)
-
- Each encoded-word MUST represent an integral number of characters. A
- multi-octet character may not be split across adjacent encoded-words.
-
- Only printable and white space character data should be encoded using
- this scheme. However, since these encoding schemes allow the
- encoding of arbitrary octet values, mail readers that implement this
- decoding should also ensure that display of the decoded data on the
- recipient's terminal will not cause unwanted side-effects.
-
- Use of these methods to encode non-textual data (e.g., pictures or
- sounds) is not defined by this memo. Use of encoded-words to
-
-
-
- Moore [Page 6]
-
- RFC 1522 MIME Part Two September 1993
-
-
- represent strings of purely ASCII characters is allowed, but
- discouraged. In rare cases it may be necessary to encode ordinary
- text that looks like an encoded-word.
-
- 6. Support of encoded-words by mail readers
-
- 6.1. Recognition of encoded-words in message headers
-
- A mail reader must parse the message and body part headers according
- to the rules in RFC 822 to correctly recognize encoded-words.
-
- Encoded-words are to be recognized as follows:
-
- (1) Any message or body part header field defined as "*text", or any
- user-defined header field, should be parsed as follows: Beginning
- at the start of the field-body and immediately following each
- occurrence of linear-white-space, each sequence of up to 75
- printable characters (not containing any linear-white-space)
- should be examined to see if it is an encoded-word according to
- the syntax rules in section 2. Any other sequence of printable
- characters should be treated as ordinary ASCII text.
-
- (2) Any header field not defined as "*text" should be parsed
- according to the syntax rules for that header field. However,
- any "word" that appears within a "phrase" should be treated as an
- encoded-word if it meets the syntax rules in section 2.
- Otherwise it should be treated as an ordinary "word".
-
- (3) Within a "comment", any sequence of up to 75 printable characters
- (not containing linear-white-space), that meets the syntax rules
- in section 2, should be treated as an encoded-word. Otherwise it
- should be treated as normal comment text.
-
- 6.2. Display of encoded-words
-
- Any encoded-words so recognized are decoded, and if possible, the
- resulting unencoded text is displayed in the original character set.
-
- When displaying a particular header field that contains multiple
- encoded-words, any linear-white-space that separates a pair of
- adjacent encoded-words is ignored. (This is to allow the use of
- multiple encoded-words to represent long strings of unencoded text,
- without having to separate encoded-words where spaces occur in the
- unencoded text.)
-
- In the event other encodings are defined in the future, and the mail
- reader does not support the encoding used, it may either (a) display
- the encoded-word as ordinary text, or (b) substitute an appropriate
-
-
-
- Moore [Page 7]
-
- RFC 1522 MIME Part Two September 1993
-
-
- message indicating that the text could not be decoded.
-
- If the mail reader does not support the character set used, it may
- (a) display the encoded-word as ordinary text (i.e., as it appears in
- the header), (b) make a "best effort" to display using such
- characters as are available, or (c) substitute an appropriate message
- indicating that the decoded text could not be displayed.
-
- If the character set being used employs code-switching techniques,
- display of the encoded text implicitly begins in "ASCII mode". In
- addition, the mail reader must ensure that the output device is once
- again in "ASCII mode" after the encoded-word is displayed.
-
- 6.3. Mail reader handling of incorrectly formed encoded-words
-
- It is possible that an encoded-word that is legal according to the
- syntax defined in section 2, is incorrectly formed according to the
- rules for the encoding being used. For example:
-
- (1) An encoded-word which contains characters which are not legal for
- a particular encoding (for example, a '-' in the "B" encoding),
- is incorrectly formed.
-
- (2) Any encoded-word which encodes a non-integral number of
- characters or octets is incorrectly formed.
-
- A mail reader need not attempt to display the text associated with an
- encoded-word that is incorrectly formed. However, a mail reader MUST
- NOT prevent the display or handling of a message because an encoded-
- word is incorrectly formed.
-
- 7. Conformance
-
- A mail composing program claiming compliance with this specification
- MUST ensure that any string of non-white-space printable ASCII
- characters within a "*text" or "*ctext" that begins with "=?" and
- ends with "?=" be a valid encoded-word. ("begins" means: at the
- start of the field-body or immediately following linear-white-space;
- "ends" means: at the end of the field-body or immediately preceding
- linear-white-space.) In addition, any "word" within a "phrase" that
- begins with "=?" and ends with "?=" must be a valid encoded-word.
-
- A mail reading program claiming compliance with this specification
- must be able to distinguish encoded-words from "text", "ctext", or
- "word"s, according to the rules in section 6, anytime they appear in
- appropriate places in message headers. It must support both the "B"
- and "Q" encodings for any character set which it supports. The
- program must be able to display the unencoded text if the character
-
-
-
- Moore [Page 8]
-
- RFC 1522 MIME Part Two September 1993
-
-
- set is "US-ASCII". For the ISO-8859-* character sets, the mail
- reading program must at least be able to display the characters which
- are also in the ASCII set.
-
- 8. Examples
-
- From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
- To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
- CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
- Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
- =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
-
- From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se>
- To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se
- Subject: Time for ISO 10646?
-
- To: Dave Crocker <dcrocker@mordor.stanford.edu>
- Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se
- From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se>
- Subject: Re: RFC-HDR care and feeding
-
- From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
- (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)
- To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed
- <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu>
- Subject: Test of new header generator
- MIME-Version: 1.0
- Content-type: text/plain; charset=ISO-8859-1
-
- 9. References
-
- [1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail
- Extensions) Part One: Mechanisms for Specifying and Describing
- the Format of Internet Message Bodies", RFC 1521, Bellcore,
- Innosoft, September 1993.
-
- [2] Crocker, D., "Standard for the Format of ARPA Internet Text
- Messages", STD 11, RFC 822, UDEL, August 1982.
-
- [3] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2,
- RFC 1340, USC/Information Sciences Institute, July 1992.
-
-
-
-
-
-
-
-
-
-
- Moore [Page 9]
-
- RFC 1522 MIME Part Two September 1993
-
-
- 10. Security Considerations
-
- Security issues are not discussed in this memo.
-
- 11. Author's Address
-
- Keith Moore
- University of Tennessee
- 107 Ayres Hall
- Knoxville TN 37996-1301
-
- EMail: moore@cs.utk.edu
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Moore [Page 10]
-