1624 lines
75 KiB
HTML
Executable File
1624 lines
75 KiB
HTML
Executable File
<HTML>
|
|
<!-- $Id$ -->
|
|
<HEAD>
|
|
<TITLE>Newsgroup Interchange within FidoNet.</TITLE>
|
|
</HEAD>
|
|
|
|
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
|
|
<BODY
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#000080"
|
|
ALINK="#FF0000"
|
|
>
|
|
<PRE>
|
|
Document: FSC-0059
|
|
Version: 001
|
|
Date: 08-Mar-1992
|
|
|
|
|
|
|
|
|
|
Newsgroup Interchange within FidoNet
|
|
Jack Decker
|
|
1:154/8@fidonet
|
|
|
|
A proposed standard for the interchange of USENET News messages among
|
|
FidoNet nodes.
|
|
|
|
|
|
Status of this document:
|
|
|
|
This FSC suggests a proposed protocol for the FidoNet(r) community,
|
|
and requests discussion and suggestions for improvements.
|
|
Distribution of this document is unlimited.
|
|
|
|
Fido and FidoNet are registered marks of Tom Jennings and Fido
|
|
Software.
|
|
|
|
|
|
|
|
Introduction:
|
|
|
|
This document defines the standard format for the interchange of USENET
|
|
news messages among FidoNet nodes. It incorporates by reference the
|
|
document RFC-1036, "Standard for Interchange of USENET Messages" by M.
|
|
Horton of AT&T Bell Laboratories and R. Adams of the Center for Seismic
|
|
Studies. A copy of RFC-1036 should be included in the distribution
|
|
archive of this standard. However, RFC-1036 is NOT applicable in its
|
|
entirety to FidoNet. Therefore, unless specifically referenced
|
|
elsewhere in this document, only section 2 of RFC-1036 should be
|
|
considered part of this standard. Section 3, which deals with "control
|
|
messages", may be implemented in FidoNet on an optional basis, and if
|
|
processing of control messages is included in a FidoNet implementation,
|
|
it should be done in accordance with section 3 of RFC-1036 to the
|
|
extent possible. Section 4 of RFC-1036 is *NOT* applicable to FidoNet
|
|
(except for section 4.3, which will be discussed later) and therefore
|
|
is NOT included as part of this standard. Section 5 of RFC-1036 is a
|
|
treatise on the News Propagation Algorithm used within UseNet, and
|
|
should be studied even though it is not directly applicable to FidoNet,
|
|
in particular because it contains a discussion on the prevention of
|
|
loops (what we in FidoNet commonly refer to as "dupe loops").
|
|
|
|
Please note that FidoNet implementations do not recognize nor support
|
|
what is referred to as the "old format" or the "A format" in section 2
|
|
of RFC-1036.
|
|
|
|
The goal of this document is to define a standard for the interchange
|
|
of news messages between FidoNet nodes in a format that will also be
|
|
acceptable to UseNet hosts. In order to simplify the creation of
|
|
software that conforms to this standard, we do not intend to support
|
|
every news format that has ever existed in UseNet. The standard
|
|
described in RFC-1036 is used by the majority of UseNet hosts, and
|
|
therefore it is the standard that will be adopted in this document.
|
|
|
|
This standard will contain three sections: General theory of newsgroup
|
|
transmission, Format and protocols of batched newsgroups, and the
|
|
translation of newsgroup messages to and from FidoNet message format.
|
|
|
|
1. General theory of newsgroup transmission:
|
|
|
|
Prior to the introduction of the DoveMail program, the usual method of
|
|
gating a UseNet newsgroup into FidoNet was to convert it to FidoNet
|
|
echomail, and then send it to "downstream" nodes in echomail format.
|
|
This method is still used at the majority of gateway systems at this
|
|
writing. Unfortunately, no conversion process is perfect, and some
|
|
useful control information is usually lost in the conversion. In
|
|
addition, most FidoNet echomail processors don't handle long messages
|
|
(which are fairly common in newsgroups) well at all, and many gateway
|
|
systems either try to split these messages into multiple parts (a
|
|
somewhat awkward process) or discard them entirely. Because the
|
|
duplicate message detection algorithms used in many FidoNet echomail
|
|
processors incorrectly identify some of the parts of a split message as
|
|
duplicates, parts of long messages often get "lost" when transmitted as
|
|
echomail. Also, UseNet allows a message to be posted to multiple
|
|
newsgroups, and when such messages are converted to echomail, it may be
|
|
necessary to create multiple copies of the message (one for each
|
|
echomail area that it would be placed in), thus increasing the
|
|
transmission time for such messages.
|
|
|
|
Even normal-length newsgroup messages may be falsely discarded as
|
|
duplicates by some "downstream" echomail processors. The reason this
|
|
is a particular problem in newsgroups converted to echomail is because
|
|
some echomail processors use a checksum of parts of FidoNet message
|
|
headers to determine if messages are duplicates. Since all newsgroup
|
|
messages are assumed to be addressed to "All", and since some gateway
|
|
software uses the date and time that the message was converted to
|
|
echomail rather than the original date and time from the message, it's
|
|
quite possible that the remainder of the message header contains
|
|
information that is similar enough to information in another message's
|
|
header to cause it to be discarded as a duplicate message. This
|
|
happens far more frequently with converted newsgroup messages than with
|
|
messages originally entered as echomail.
|
|
|
|
Finally, when a BBS user enters a reply to a news message that has been
|
|
converted to echomail, in many cases the information is simply not
|
|
available in the original message to generate a proper "References:"
|
|
line in the reply, as required by RFC-1036. If the original message
|
|
contained a "Followup-To:" line, which requires that replies be posted
|
|
to a different newsgroup than the one in which the original message was
|
|
entered, this line may not transmitted in the message as converted to
|
|
echomail. And even if this information is available, no echomail
|
|
processor currently available will modify the reply message as required
|
|
(to add the "References:" line where necessary, or to move the message
|
|
to a different area if it is a reply to a message that contained a
|
|
"Followup-To:" line).
|
|
|
|
Under this proposed standard, none of the UseNet message header
|
|
information is lost in transmission between nodes, and reply messages
|
|
can be generated that conform to UseNet specifications. If a message
|
|
is posted to multiple newsgroups, it is only transmitted once (instead
|
|
of multiple times as it might be if converted to echomail). Also, long
|
|
messages are not truncated or changed in transmission between nodes,
|
|
and finally, there is no chance that a message will be improperly
|
|
discarded as a duplicate.
|
|
|
|
The main thing to remember is that under this standard, news messages
|
|
are never converted to echomail. Echomail is an irrelevant concept in
|
|
this context, since we are not passing echomail between nodes.
|
|
Instead, newsgroups are transmitted in the native format specified by
|
|
RFC-1036, and tossed directly from batched newsgroup packets to the
|
|
FidoNet message format (e.g. the *.msg format) if necessary. Keep in
|
|
mind that most FidoNet BBS software uses the same general format not
|
|
only for echomail messages, but also for netmail and local message
|
|
areas, so it is not necessary to transmit messages between nodes in
|
|
echomail format if another format is more suitable for the type of
|
|
message being transmitted.
|
|
|
|
2. Format and protocols of batched newsgroups:
|
|
|
|
When newsgroup messages are transmitted between systems, the individual
|
|
messages must conform to the specifications of section 2 of RFC-1036,
|
|
and section 3 of this document. Where section 3 of this document
|
|
defines a more restrictive standard than RFC-1036, this document shall
|
|
take precedence.
|
|
|
|
When transmitting news messages between FidoNet nodes, they must be
|
|
sent in a batched newsgroup file (as described in section 4.3 of
|
|
RFC-1036) unless some other format is agreed upon in advance. The
|
|
transmission of unbatched news messages, or the use of any batching
|
|
method other than that described in section 4.3 of RFC-1036 shall be
|
|
considered non-standard. Please note that RFC-1036 section 4.3 refers
|
|
to this batching process as combining several messages into "one large
|
|
message", but we will refer to this "one large message" as a "batched
|
|
newsgroup file", or a "UseNet format mail packet" rather than as a
|
|
"large message", since FidoNet systems do not normally handle large
|
|
"messages".
|
|
|
|
When messages pass through a FidoNet system on their way to other
|
|
nodes, the header lines in the message may be modified to conform with
|
|
the standards given here. However, the text (body) of a message should
|
|
NEVER be altered (one exception: Carriage Returns MAY be converted to
|
|
Line Feeds in order to conform to this standard, but this is neither
|
|
required nor expected of software).
|
|
|
|
The standard format for sending a batched newsgroup file to other
|
|
FidoNet nodes is as follows:
|
|
|
|
First, as will be noted in section 3 of this document, individual lines
|
|
of the batched newsgroup file must be terminated with Line Feeds only,
|
|
and the file must NOT contain Carriage Return characters (ASCII 13).
|
|
|
|
Batched newsgroup files shall be transmitted between FidoNet nodes as
|
|
files named using the filename ????????.PKU, where the eight character
|
|
root name can be any of the hexadecimal digits 0 - 9 or A - F. The
|
|
.PKU extension (which stands for "PacKet - Usenet format") is the news
|
|
equivalent of the .PKT file used to transmit FidoNet format netmail and
|
|
echomail between nodes.
|
|
|
|
Batched newsgroup files with the filespec ????????.PKU may be archived
|
|
into a standard mail archive file (bearing the extension *.MO?, *.TU?,
|
|
*.WE? ... *.SU?). It is assumed that the receiver of batched newsgroup
|
|
files will take any necessary steps to make sure that both *.PKU and
|
|
*.PKT files are extracted from incoming mail archive files before the
|
|
mail archive files are deleted. In certain cases, this may mean that
|
|
an external unarchive shell may have to be used, instead of allowing
|
|
the echomail processor to call the unarchiver (typical external
|
|
unarchive shell programs at this writing are GUS, POLYXARC, and SPAZ).
|
|
|
|
A batched newsgroup file awaiting transmission may be stored in a
|
|
FidoNet system's "outbound" area in uncompressed form, prior to being
|
|
archived for transmission or sent in uncompressed form. It is
|
|
suggested that when a system uses the .OUT extension to indicate an
|
|
uncompressed netmail or echomail packet, the .UUT extension be used to
|
|
indicate an uncompressed batched newsgroup packet. It is expected that
|
|
a .UUT file in a system's "outbound" area will be treated in much the
|
|
same way as an .OUT file, except it will be renamed to a file with an
|
|
extension of .PKU (rather than .PKT) before being archived into the
|
|
mail archive. This implies that the root name of the .UUT file will
|
|
contain the net number and node number of the destination system,
|
|
expressed as four hexadecimal digits each for net and node numbers, in
|
|
the same manner as the root name for a FidoNet .OUT file is
|
|
constructed.
|
|
|
|
The root filename of the *.PKU file should be an eight digit
|
|
hexadecimal number, with leading zeroes used if necessary, in order to
|
|
make an eight character root filename. It is suggested that this
|
|
hexadecimal number be based on time of year, with 00000000.PKU
|
|
generated at exactly midnight on January 1 and FFFFFFFF.PKU generated
|
|
at just a moment before midnight on December 31. However, it is
|
|
permissible to use the same algorithm that is used to generate the root
|
|
filename for *.PKT files.
|
|
|
|
The normal sequence for transmission of messages between FidoNet nodes
|
|
might then be described as follows:
|
|
|
|
a. Messages created on the originating system are placed into a batched
|
|
newsgroup file conforming to the specifications of RFC-1036 section
|
|
4.3. When this batched newsgroup file is destined for another FidoNet
|
|
node, it will have a filename of the format:
|
|
|
|
[4 hex digit net number][4 hex digit node number].UUT
|
|
|
|
This file will then be placed in the outbound mail area for packing.
|
|
|
|
b. A mail packing program will examine the outbound mail area and, upon
|
|
finding the .UUT file, will rename it to a file with an extension of
|
|
.PKU, and then shell to a compression program in order to place the
|
|
*.PKU file into a new or existing mail archive file for the destination
|
|
node. Mail archive files bear extension names consisting of the first
|
|
two letters of a day of the week (in the English language) plus a
|
|
numeric character in the range 0 - 9 (for example, .MO5 or .TH7). The
|
|
method of compression for the mail archive is as agreed upon between
|
|
the originating and destination nodes. No "standard" method of
|
|
compression for the mail archive is specified in this document. NOTE:
|
|
If the compression program fails for any reason (such as running out of
|
|
disk space), the mail packing program MUST rename the .PKU file back to
|
|
the original *.UUT filename before exiting. Since batched newsgroup
|
|
files do not contain a header that indicates the destination node,
|
|
there would be no way to determine the proper destination node if the
|
|
file were not renamed back to the original filename.
|
|
|
|
c. The mail archive is transmitted in the usual manner by a FidoNet
|
|
compatible mailer, or such other means as may be agreed upon in advance
|
|
by the sysops of the originating and destination nodes.
|
|
|
|
d. At the destination system, the individual files are extracted from
|
|
the mail archive. *.PKT files are processed in the usual manner to
|
|
extract any netmail or echomail messages, while *.PKU files are
|
|
processed by software designed to handle batched newsgroup files. In
|
|
this context, such files could be "handled" by re-processing the
|
|
messages and batching them to be sent on to one or more additional
|
|
node(s), or by tossing the messages to the local message base, or both.
|
|
|
|
Please note that this standard does not anticipate that batched
|
|
newsgroup files will be converted to FidoNet echomail at any point
|
|
along the way. It is realized that this may indeed happen, but such
|
|
conversions should be considered as something to be avoided if at all
|
|
possible due to the problems discussed in section 1 of this document.
|
|
|
|
3. Translation of newsgroup messages to and from FidoNet message
|
|
format:
|
|
|
|
NOTE: Where applicable, the standards defined in this section for
|
|
messages shall apply not only to locally created messages, but also to
|
|
all messages sent to "downstream" FidoNet nodes.
|
|
|
|
In this context, "FidoNet message format" means that format in which
|
|
messages commonly reside on a FidoNet BBS. At this writing, there are
|
|
three formats commonly used for message storage on FidoNet systems, but
|
|
other formats may be in use as well. The three most common formats are
|
|
the "*.msg" format as used by the original Fido program (and a host of
|
|
programs since), also commonly referred to as the "single message per
|
|
file format"; the "Hudson" format, used by QuickBBS, Remote Access, and
|
|
some other products; and the "Squish" format used by the Maximus BBS
|
|
and the "Squish" echomail processor.
|
|
|
|
Because there are so many message formats, some other programs have
|
|
taken the approach of trying to convert UseNet news into echomail,
|
|
creating *.PKT files which can theoretically be processed by any
|
|
FidoNet system. However, since the *.PKT files are processed by the
|
|
echomail processor, all the limitations and pitfalls associated with
|
|
converting newsgroup messages to echomail come into play.
|
|
|
|
The preferred way of handling incoming messages would be to have the
|
|
BBS (or message reader/editor) software directly read batched newsgroup
|
|
files. In this way, the files would not have to be "processed" per se.
|
|
As new batched newsgroup files arrived on a system, they could simply
|
|
be concatenated to the existing message base, and then a utility could
|
|
be run that would build an index to the message base, in a manner
|
|
somewhat similar to the way "flat file" message bases are currently
|
|
implemented on some BBS's. Of course, you'd need to occasionally run a
|
|
utility to delete old messages in order to keep the message base from
|
|
growing too large, and new messages entered on the system would have to
|
|
be exported from the system in a separate batched newsgroup file.
|
|
However, at this writing no FidoNet-compatible BBS or message editor is
|
|
capable of directly reading a batched newsgroup file.
|
|
|
|
The second most preferable method is to convert news messages directly
|
|
to the message format used by that system. At this writing the
|
|
DoveMail software includes utilities (NewsToss and NewsScan) that can
|
|
convert batched newsgroup files to and from messages in the *.msg
|
|
(single message per file) format. It should be possible to convert
|
|
batched newsgroup files to and from other FidoNet message formats as
|
|
well.
|
|
|
|
The method in which messages are stored on a BBS, and the method in
|
|
which it is determined which new (locally-entered) messages need to be
|
|
exported from the system will necessarily be implementation-specific.
|
|
One method that can be used with *.msg type message bases is to
|
|
maintain a "high water mark" in 1.msg, similar to the "high water mark"
|
|
used for echomail messages, and additionally to mark messages received
|
|
from other nodes as "sent" when they arrive, and locally-entered
|
|
messages as "sent" when they have been exported, and to never re-send a
|
|
message marked as "sent".
|
|
|
|
When tossing incoming messages, duplicate messages can be detected by
|
|
comparing the contents of the "Message-ID:" line with those of
|
|
previously received messages. This may be slow processing
|
|
considerably, however, and would require storage of a history file of
|
|
"previously seen" messages. Another method is to look in the "Path"
|
|
line and see if we are already listed in the path; if so, the message
|
|
is a duplicate and should be deleted. This method is faster and does
|
|
not require maintenance of a history file, but will not guard against
|
|
duplicate messages arriving from one's feed that have not passed
|
|
through the system twice (for example, a message that arrived from two
|
|
different paths). Fortunately, UseNet folks seem to understand the
|
|
need for proper topology, so those types of dupes are relatively rare.
|
|
FidoNet sysops taking UseNet feeds must understand that it is
|
|
IMPERATIVE that a feed of any one newsgroup be obtained from only ONE
|
|
source, especially if they are then passing that newsgroup to any
|
|
"downstream" nodes. This absolutely does NOT imply that geographic
|
|
restrictions on newsgroup distribution are necessary or desirable!
|
|
|
|
Additional comments on preventing "loops" can be found in section 5 of
|
|
RFC-1036, in the discussion of the News Propagation Algorithm. Please
|
|
note that only two methods of loop prevention are included in this
|
|
standard:
|
|
|
|
1) The history mechanism. Each host keeps track of all messages it has
|
|
seen (by their Message-ID) and whenever a message comes in that it has
|
|
already seen, the incoming message is discarded immediately.
|
|
|
|
2) Not sending a message to a system listed in the "Path" line of the
|
|
header, or to the system that originated the message (which, in
|
|
practice, should be listed in the Path line).
|
|
|
|
No other methods of dupe loop prevention are acceptable. In
|
|
particular, checksums of portions of the message header or message
|
|
itself are NOT permitted to be used for loop prevention, except perhaps
|
|
as a method to quickly identify POTENTIAL duplicate messages before
|
|
doing a full string comparison with the Message-ID data in the history
|
|
file. In no case should a checksum be used as the SOLE method of
|
|
determining whether a message is a duplicate.
|
|
|
|
When newsgroup messages are created for transmission to other systems,
|
|
or when received messages are transmitted other systems, the individual
|
|
messages must conform to the specifications of section 2 of RFC-1036.
|
|
However, in order to simply programming of software designed to handle
|
|
such messages, the following modifications to the standard are proposed
|
|
for use within FidoNet. Please note that these are slightly more
|
|
restrictive than the standard permitted by RFC-1036:
|
|
|
|
a. The "old format" or "A format" described in section 2 of RFC-1036 is
|
|
NOT supported in FidoNet. Only the format detailed in RFC-1036
|
|
(sometimes referred to as the "B" News format) is supported. The vast
|
|
majority of UseNet sites currently use the "B" News format.
|
|
|
|
b. The UseNet standard permits the use of "white space" to separate
|
|
certain items in the message header, with "white space" defined as
|
|
blanks or tabs. It also states that "the Internet convention of
|
|
continuation header lines (beginning with a blank or tab) is allowed."
|
|
However, it should NOT be ASSUMED that "continuation header lines" will
|
|
be used in any message. It is suggested that when creating newsgroup
|
|
messages for transmission to other systems, the use of tab characters
|
|
be avoided in header lines, and that "continuation header lines" NOT be
|
|
used, even if this means that a header line will be considerably longer
|
|
than the length of a screen line. Software that creates FidoNet-format
|
|
messages (for display to BBS callers) from batched newsgroup files
|
|
(that is, newsgroup message tossers) should break up such extra-long
|
|
header lines, using a single space character ONLY (NOT a tab!) at the
|
|
start of "continuation header lines." Since batched newsgroup files
|
|
received from a UseNet site may contain "continuation header lines"
|
|
and/or tabs as "white space" in header lines, it is necessary to be
|
|
able to decode such header lines properly, but it is strongly suggested
|
|
that FidoNet software not CREATE messages with tabs or "continuation
|
|
header lines" for transmission through the network.
|
|
|
|
c. All lines in news messages, including header lines, shall be
|
|
terminated with a LINE FEED (ASCII 10 decimal) ONLY. Under NO
|
|
circumstances shall a CARRIAGE RETURN (ASCII 13 decimal) appear in news
|
|
messages transmitted through FidoNet (if a Carriage Return is found in
|
|
an in-transit message it MAY be changed to a Line Feed, this being the
|
|
sole exception to the rule about not changing the body of a message,
|
|
but the expectation is that no Carriage Returns will appear in a news
|
|
message). Also, spaces appearing at the end of lines (just prior to
|
|
the Line Feed character) are strongly discouraged since they convey no
|
|
useful information. Finally, there should be only a single line feed
|
|
at the end of each message (blank lines following the last line of a
|
|
message are not allowed, again because they convey no useful
|
|
information). Please note that the use of the Line Feed as a line
|
|
terminator is fairly standard throughout UseNet, and when a news
|
|
message is converted to a FidoNet format message it is a simple matter
|
|
to replace Line Feeds with Carriage Returns so that the message will
|
|
display properly.
|
|
|
|
d. When constructing or adding to "Path" lines, RFC-1036 (section
|
|
2.1.6) states that "The names may be separated by any punctuation
|
|
character or characters (except '.' which is considered part of the
|
|
hostname)." However, in actual practice, only the "!" (exclamation
|
|
point or "bang" character) is commonly used to separate names.
|
|
Therefore, the "!" character will be considered the "standard"
|
|
separator for system names in Path lines in messages generated in
|
|
FidoNet. Also, RFC-1036 states that "Normally, the rightmost name will
|
|
be the name of the originating system. However, it is also permissible
|
|
to include an extra entry on the right, which is the name of the
|
|
sender. This is for upward compatibility with older systems." In
|
|
actual practice, it appears that most Path lines originating in UseNet
|
|
have a user name as the rightmost entry. Therefore, when a Path line
|
|
is created for a message originating in FidoNet, it is suggested that
|
|
the following format be used (assuming a message entered by user John
|
|
Smith at node 1:123/456):
|
|
|
|
Path: f456.n123.z1.fidonet.org!john.smith
|
|
|
|
When a user name is placed in the path, all spaces in the user name
|
|
must be replaced with periods, and all uppercase characters in the name
|
|
should be converted to lowercase. It is permissible to use an alias in
|
|
place of a user's real name if the originating system runs software
|
|
that will recognize that alias in incoming netmail messages, and remap
|
|
such messages to the proper user if necessary. Also, note the
|
|
restrictions on prohibited characters in the user name as specified in
|
|
RFC-1036 section 2.1.1. Although section 2.1.1. deals with the "From"
|
|
line, common sense would indicate that these same restrictions on
|
|
prohibited characters should apply if the user name is placed in the
|
|
Path line (with the obvious exception of the use of the period to
|
|
replace spaces in the user name, which is required).
|
|
|
|
e. Header lines defined as "optional" may be more or less optional
|
|
depending on the keyword. For example, the "Reply-To" and
|
|
"Followup-To" lines should be automatically honored, if at all
|
|
possible, when reply messages are created, and the "References" line,
|
|
even though listed as an "optional" line, is "required for all
|
|
follow-up messages" (replies). On the other hand, lines such as
|
|
"Control" and "Distribution" may have little meaning to FidoNet nodes
|
|
(in particular, "Distribution" is meant to control distribution of a
|
|
message along hierarchial lines, but since FidoNet topology has little
|
|
relation to UseNet hierarchies, it is probably best to just ignore
|
|
"Distribution" lines on in-transit messages).
|
|
|
|
Additional specifications for messages, including required and optional
|
|
header lines, are detailed in section 2 of RFC-1036.
|
|
|
|
When a newsgroup is moderated, it is the responsibility of the sysop of
|
|
each participating BBS to prevent users from entering messages in that
|
|
area (unless the message exporting software is capable of sending any
|
|
locally-entered messages to the conference moderator via MAIL).
|
|
However, if a software newsgroup processor is written that both imports
|
|
(tosses) messages to a FidoNet-format message base, and exports locally
|
|
entered messages, and if the software does not have a way to send
|
|
replies to the moderator via mail, then some mechanism must be provided
|
|
to prevent the export of messages from a moderated area, so that in the
|
|
unlikely event that there is no easy way to prevent users from posting
|
|
messages in the moderated area, such messages will still not be sent
|
|
out. Since this standard does not deal with the transport of UseNet
|
|
MAIL within FidoNet, the method for transmission of replies in
|
|
moderated newsgroups is undefined by this document. However, software
|
|
authors are encouraged to provide some mechanism for private mail
|
|
replies to newsgroup messages, in both moderated and unmoderated areas.
|
|
|
|
Note that if a moderated newsgroup is carried on a system, it is the
|
|
responsibility of the sysop to provide mail access to users so that
|
|
replies can be (manually) sent to the conference moderator, especially
|
|
if replies in the newsgroup area cannot be automatically routed to the
|
|
conference moderator.
|
|
|
|
One point that needs to be emphasized is there is NO message length
|
|
limit on UseNet messages. If a FidoNet node passes newsgroup messages
|
|
to, or on behalf of other FidoNet nodes, it is NOT permissible to
|
|
discard or truncate messages that exceed a preset length limit. Note
|
|
that in a batched newsgroup file, each message is preceded by a header
|
|
of the form "#! rnews <length in bytes>". Since the message text
|
|
length is never changed in processing, it is possible to determine the
|
|
length of a message after processing by reading in all the header
|
|
lines, calculating the combined length of the header lines prior to
|
|
making changes in the header (e.g. the Path line), then calculating the
|
|
combined length of the header lines after making changes. The
|
|
difference between the original and the new length of the header lines
|
|
can then be applied to the value given in the "#! rnews" line to
|
|
determine the new message length, when is then used in the "#! rnews"
|
|
header of the modified message. Also, the number of bytes given in the
|
|
"#! rnews" line, MINUS the length of the message header lines, is the
|
|
length of the body of the message. Once this length is known, the body
|
|
of the message can be copied from the input file to the output file(s)
|
|
in "chunks" small enough to fit in memory, until the end of the message
|
|
is reached.
|
|
|
|
The following comments are implementation suggestions applicable to
|
|
current FidoNet-compatible BBS systems, though not necessarily to
|
|
software that may be written in the future:
|
|
|
|
It should be noted that when a BBS user enters a reply message, most
|
|
FidoNet BBS software will "link" the reply message to the original by
|
|
placing the message number of the original message in the message
|
|
header (this is almost always the case if messages are stored in the
|
|
"*.msg" format, in which case the number of the message being replied
|
|
to is found at bytes 185-186 in the message header). If the
|
|
appropriate header lines have been stored in the text of the original
|
|
message, it is possible to construct a reply message that meets all
|
|
RFC-1036 specifications. For example, a "References" line can be
|
|
constructed from the "Message-ID" line (and the "References" line, if
|
|
any) of the original message. Similarly, if the original message
|
|
contains a "Followup-To:" line, the reply can be posted to the
|
|
newsgroup(s) specified in that line. This may not work as expected if
|
|
a message renumbering program or similar program messes with the
|
|
message base before reply message is exported, so it is highly
|
|
recommended that locally-entered newsgroup messages be exported as soon
|
|
as practicable after they are entered.
|
|
|
|
Since the user of a BBS may reply to a message entered by another user
|
|
of the same BBS, it is recommended that when a message is exported, any
|
|
UseNet format header lines created for the exported message also be
|
|
written back to the original message if possible. This will permit
|
|
reply linking to remain intact even if two or more users of the same
|
|
BBS participate in the same message thread.
|
|
|
|
If a message is received that specifies more than one newsgroup in the
|
|
"Newsgroups" header line, and corresponding message areas are available
|
|
on the local system, one copy of the message should be placed in each
|
|
such area. For example, if the message is posted to four different
|
|
newsgroups, and two of those groups are carried on the local BBS, then
|
|
a copy of the message should be placed in the message base for each of
|
|
those groups. If users of a BBS are allowed to post a message to
|
|
multiple newsgroups, then any message thus posted should be copied to
|
|
the message bases of any of the other areas that are also carried on
|
|
that system (and that the message was posted to) at the time the
|
|
message is exported.
|
|
|
|
Corrections and Additions to this document:
|
|
|
|
Proposed corrections and additions to this document should be submitted
|
|
to Jack Decker at 1:154/8, or jack.decker@f8.n154.z1.fidonet.org
|
|
</PRE>
|
|
|
|
<HR>
|
|
|
|
<PRE>
|
|
Network Working Group M. Horton
|
|
Request for Comments: 1036 AT&T Bell Laboratories
|
|
Obsoletes: RFC-850 R. Adams
|
|
Center for Seismic Studies
|
|
December 1987
|
|
|
|
|
|
Standard for Interchange of USENET Messages
|
|
|
|
|
|
|
|
STATUS OF THIS MEMO
|
|
|
|
This document defines the standard format for the interchange of
|
|
network News messages among USENET hosts. It updates and replaces
|
|
RFC-850, reflecting version B2.11 of the News program. This memo is
|
|
disributed as an RFC to make this information easily accessible to
|
|
the Internet community. It does not specify an Internet standard.
|
|
Distribution of this memo is unlimited.
|
|
|
|
1. Introduction
|
|
|
|
This document defines the standard format for the interchange of
|
|
network News messages among USENET hosts. It describes the format
|
|
for messages themselves and gives partial standards for transmission
|
|
of news. The news transmission is not entirely in order to give a
|
|
good deal of flexibility to the hosts to choose transmission
|
|
hardware and software, to batch news, and so on.
|
|
|
|
There are five sections to this document. Section two defines the
|
|
format. Section three defines the valid control messages. Section
|
|
four specifies some valid transmission methods. Section five
|
|
describes the overall news propagation algorithm.
|
|
|
|
2. Message Format
|
|
|
|
The primary consideration in choosing a message format is that it
|
|
fit in with existing tools as well as possible. Existing tools
|
|
include implementations of both mail and news. (The notesfiles
|
|
system from the University of Illinois is considered a news
|
|
implementation.) A standard format for mail messages has existed
|
|
for many years on the Internet, and this format meets most of the
|
|
needs of USENET. Since the Internet format is extensible,
|
|
extensions to meet the additional needs of USENET are easily made
|
|
within the Internet standard. Therefore, the rule is adopted that
|
|
all USENET news messages must be formatted as valid Internet mail
|
|
messages, according to the Internet standard RFC-822. The USENET
|
|
News standard is more restrictive than the Internet standard,
|
|
|
|
|
|
|
|
Horton & Adams [Page 1]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
placing additional requirements on each message and forbidding use
|
|
of certain Internet features. However, it should always be possible
|
|
to use a tool expecting an Internet message to process a news
|
|
message. In any situation where this standard conflicts with the
|
|
Internet standard, RFC-822 should be considered correct and this
|
|
standard in error.
|
|
|
|
Here is an example USENET message to illustrate the fields.
|
|
|
|
From: jerry@eagle.ATT.COM (Jerry Schwarz)
|
|
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
|
|
Newsgroups: news.announce
|
|
Subject: Usenet Etiquette -- Please Read
|
|
Message-ID: <642@eagle.ATT.COM>
|
|
Date: Fri, 19 Nov 82 16:14:55 GMT
|
|
Followup-To: news.misc
|
|
Expires: Sat, 1 Jan 83 00:00:00 -0500
|
|
Organization: AT&T Bell Laboratories, Murray Hill
|
|
|
|
The body of the message comes here, after a blank line.
|
|
|
|
Here is an example of a message in the old format (before the
|
|
existence of this standard). It is recommended that
|
|
implementations also accept messages in this format to ease upward
|
|
conversion.
|
|
|
|
From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz)
|
|
Newsgroups: news.misc
|
|
Title: Usenet Etiquette -- Please Read
|
|
Article-I.D.: eagle.642
|
|
Posted: Fri Nov 19 16:14:55 1982
|
|
Received: Fri Nov 19 16:59:30 1982
|
|
Expires: Mon Jan 1 00:00:00 1990
|
|
|
|
The body of the message comes here, after a blank line.
|
|
|
|
Some news systems transmit news in the A format, which looks like
|
|
this:
|
|
|
|
Aeagle.642
|
|
news.misc
|
|
cbosgd!mhuxj!mhuxt!eagle!jerry
|
|
Fri Nov 19 16:14:55 1982
|
|
Usenet Etiquette - Please Read
|
|
The body of the message comes here, with no blank line.
|
|
|
|
A standard USENET message consists of several header lines, followed
|
|
by a blank line, followed by the body of the message. Each header
|
|
|
|
|
|
|
|
Horton & Adams [Page 2]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
line consist of a keyword, a colon, a blank, and some additional
|
|
information. This is a subset of the Internet standard, simplified
|
|
to allow simpler software to handle it. The "From" line may
|
|
optionally include a full name, in the format above, or use the
|
|
Internet angle bracket syntax. To keep the implementations simple,
|
|
other formats (for example, with part of the machine address after
|
|
the close parenthesis) are not allowed. The Internet convention of
|
|
continuation header lines (beginning with a blank or tab) is
|
|
allowed.
|
|
|
|
Certain headers are required, and certain other headers are
|
|
optional. Any unrecognized headers are allowed, and will be passed
|
|
through unchanged. The required header lines are "From", "Date",
|
|
"Newsgroups", "Subject", "Message-ID", and "Path". The optional
|
|
header lines are "Followup-To", "Expires", "Reply-To", "Sender",
|
|
"References", "Control", "Distribution", "Keywords", "Summary",
|
|
"Approved", "Lines", "Xref", and "Organization". Each of these
|
|
header lines will be described below.
|
|
|
|
2.1. Required Header lines
|
|
|
|
2.1.1. From
|
|
|
|
The "From" line contains the electronic mailing address of the
|
|
person who sent the message, in the Internet syntax. It may
|
|
optionally also contain the full name of the person, in parentheses,
|
|
after the electronic address. The electronic address is the same as
|
|
the entity responsible for originating the message, unless the
|
|
"Sender" header is present, in which case the "From" header might
|
|
not be verified. Note that in all host and domain names, upper and
|
|
lower case are considered the same, thus "mark@cbosgd.ATT.COM",
|
|
"mark@cbosgd.att.com", and "mark@CBosgD.ATt.COm" are all equivalent.
|
|
User names may or may not be case sensitive, for example,
|
|
"Billy@cbosgd.ATT.COM" might be different from
|
|
"BillY@cbosgd.ATT.COM". Programs should avoid changing the case of
|
|
electronic addresses when forwarding news or mail.
|
|
|
|
RFC-822 specifies that all text in parentheses is to be interpreted
|
|
as a comment. It is common in Internet mail to place the full name
|
|
of the user in a comment at the end of the "From" line. This
|
|
standard specifies a more rigid syntax. The full name is not
|
|
considered a comment, but an optional part of the header line.
|
|
Either the full name is omitted, or it appears in parentheses after
|
|
the electronic address of the person posting the message, or it
|
|
appears before an electronic address which is enclosed in angle
|
|
brackets. Thus, the three permissible forms are:
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 3]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
From: mark@cbosgd.ATT.COM
|
|
From: mark@cbosgd.ATT.COM (Mark Horton)
|
|
From: Mark Horton <mark@cbosgd.ATT.COM>
|
|
|
|
Full names may contain any printing ASCII characters from space
|
|
through tilde, except that they may not contain "(" (left
|
|
parenthesis), ")" (right parenthesis), "<" (left angle bracket), or
|
|
">" (right angle bracket). Additional restrictions may be placed on
|
|
full names by the mail standard, in particular, the characters ","
|
|
(comma), ":" (colon), "@" (at), "!" (bang), "/" (slash), "="
|
|
(equal), and ";" (semicolon) are inadvisable in full names.
|
|
|
|
2.1.2. Date
|
|
|
|
The "Date" line (formerly "Posted") is the date that the message was
|
|
originally posted to the network. Its format must be acceptable
|
|
both in RFC-822 and to the getdate(3) routine that is provided with
|
|
the Usenet software. This date remains unchanged as the message is
|
|
propagated throughout the network. One format that is acceptable to
|
|
both is:
|
|
|
|
Wdy, DD Mon YY HH:MM:SS TIMEZONE
|
|
|
|
Several examples of valid dates appear in the sample message above.
|
|
Note in particular that ctime(3) format:
|
|
|
|
Wdy Mon DD HH:MM:SS YYYY
|
|
|
|
is not acceptable because it is not a valid RFC-822 date. However,
|
|
since older software still generates this format, news
|
|
implementations are encouraged to accept this format and translate
|
|
it into an acceptable format.
|
|
|
|
There is no hope of having a complete list of timezones. Universal
|
|
Time (GMT), the North American timezones (PST, PDT, MST, MDT, CST,
|
|
CDT, EST, EDT) and the +/-hhmm offset specifed in RFC-822 should be
|
|
supported. It is recommended that times in message headers be
|
|
transmitted in GMT and displayed in the local time zone.
|
|
|
|
2.1.3. Newsgroups
|
|
|
|
The "Newsgroups" line specifies the newsgroup or newsgroups in which
|
|
the message belongs. Multiple newsgroups may be specified,
|
|
separated by a comma. Newsgroups specified must all be the names of
|
|
existing newsgroups, as no new newsgroups will be created by simply
|
|
posting to them.
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 4]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
Wildcards (e.g., the word "all") are never allowed in a "News-
|
|
groups" line. For example, a newsgroup comp.all is illegal,
|
|
although a newsgroup rec.sport.football is permitted.
|
|
|
|
If a message is received with a "Newsgroups" line listing some valid
|
|
newsgroups and some invalid newsgroups, a host should not remove
|
|
invalid newsgroups from the list. Instead, the invalid newsgroups
|
|
should be ignored. For example, suppose host A subscribes to the
|
|
classes btl.all and comp.all, and exchanges news messages with host
|
|
B, which subscribes to comp.all but not btl.all. Suppose A receives
|
|
a message with Newsgroups: comp.unix,btl.general.
|
|
|
|
This message is passed on to B because B receives comp.unix, but B
|
|
does not receive btl.general. A must leave the "Newsgroups" line
|
|
unchanged. If it were to remove btl.general, the edited header
|
|
could eventually re-enter the btl.all class, resulting in a message
|
|
that is not shown to users subscribing to btl.general. Also,
|
|
follow-ups from outside btl.all would not be shown to such users.
|
|
|
|
2.1.4. Subject
|
|
|
|
The "Subject" line (formerly "Title") tells what the message is
|
|
about. It should be suggestive enough of the contents of the
|
|
message to enable a reader to make a decision whether to read the
|
|
message based on the subject alone. If the message is submitted in
|
|
response to another message (e.g., is a follow-up) the default
|
|
subject should begin with the four characters "Re:", and the
|
|
"References" line is required. For follow-ups, the use of the
|
|
"Summary" line is encouraged.
|
|
|
|
2.1.5. Message-ID
|
|
|
|
The "Message-ID" line gives the message a unique identifier. The
|
|
Message-ID may not be reused during the lifetime of any previous
|
|
message with the same Message-ID. (It is recommended that no
|
|
Message-ID be reused for at least two years.) Message-ID's have the
|
|
syntax:
|
|
|
|
<string not containing blank or ">">
|
|
|
|
In order to conform to RFC-822, the Message-ID must have the format:
|
|
|
|
<unique@full_domain_name>
|
|
|
|
where full_domain_name is the full name of the host at which the
|
|
message entered the network, including a domain that host is in, and
|
|
unique is any string of printing ASCII characters, not including "<"
|
|
(left angle bracket), ">" (right angle bracket), or "@" (at sign).
|
|
|
|
|
|
|
|
Horton & Adams [Page 5]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
For example, the unique part could be an integer representing a
|
|
sequence number for messages submitted to the network, or a short
|
|
string derived from the date and time the message was created. For
|
|
example, a valid Message-ID for a message submitted from host ucbvax
|
|
in domain "Berkeley.EDU" would be "<4123@ucbvax.Berkeley.EDU>".
|
|
Programmers are urged not to make assumptions about the content of
|
|
Message-ID fields from other hosts, but to treat them as unknown
|
|
character strings. It is not safe, for example, to assume that a
|
|
Message-ID will be under 14 characters, that it is unique in the
|
|
first 14 characters, nor that is does not contain a "/".
|
|
|
|
The angle brackets are considered part of the Message-ID. Thus, in
|
|
references to the Message-ID, such as the ihave/sendme and cancel
|
|
control messages, the angle brackets are included. White space
|
|
characters (e.g., blank and tab) are not allowed in a Message-ID.
|
|
Slashes ("/") are strongly discouraged. All characters between the
|
|
angle brackets must be printing ASCII characters.
|
|
|
|
2.1.6. Path
|
|
|
|
This line shows the path the message took to reach the current
|
|
system. When a system forwards the message, it should add its own
|
|
name to the list of systems in the "Path" line. The names may be
|
|
separated by any punctuation character or characters (except "."
|
|
which is considered part of the hostname). Thus, the following are
|
|
valid entries:
|
|
|
|
cbosgd!mhuxj!mhuxt
|
|
cbosgd, mhuxj, mhuxt
|
|
@cbosgd.ATT.COM,@mhuxj.ATT.COM,@mhuxt.ATT.COM
|
|
teklabs, zehntel, sri-unix@cca!decvax
|
|
|
|
(The latter path indicates a message that passed through decvax,
|
|
cca, sri-unix, zehntel, and teklabs, in that order.) Additional
|
|
names should be added from the left. For example, the most recently
|
|
added name in the fourth example was teklabs. Letters, digits,
|
|
periods and hyphens are considered part of host names; other
|
|
punctuation, including blanks, are considered separators.
|
|
|
|
Normally, the rightmost name will be the name of the originating
|
|
system. However, it is also permissible to include an extra entry
|
|
on the right, which is the name of the sender. This is for upward
|
|
compatibility with older systems.
|
|
|
|
The "Path" line is not used for replies, and should not be taken as
|
|
a mailing address. It is intended to show the route the message
|
|
traveled to reach the local host. There are several uses for this
|
|
information. One is to monitor USENET routing for performance
|
|
|
|
|
|
|
|
Horton & Adams [Page 6]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
reasons. Another is to establish a path to reach new hosts.
|
|
Perhaps the most important use is to cut down on redundant USENET
|
|
traffic by failing to forward a message to a host that is known to
|
|
have already received it. In particular, when host A sends a
|
|
message to host B, the "Path" line includes A, so that host B will
|
|
not immediately send the message back to host A. The name each host
|
|
uses to identify itself should be the same as the name by which its
|
|
neighbors know it, in order to make this optimization possible.
|
|
|
|
A host adds its own name to the front of a path when it receives a
|
|
message from another host. Thus, if a message with path "A!X!Y!Z"
|
|
is passed from host A to host B, B will add its own name to the path
|
|
when it receives the message from A, e.g., "B!A!X!Y!Z". If B then
|
|
passes the message on to C, the message sent to C will contain the
|
|
path "B!A!X!Y!Z", and when C receives it, C will change it to
|
|
"C!B!A!X!Y!Z".
|
|
|
|
Special upward compatibility note: Since the "From", "Sender", and
|
|
"Reply-To" lines are in Internet format, and since many USENET hosts
|
|
do not yet have mailers capable of understanding Internet format, it
|
|
would break the reply capability to completely sever the connection
|
|
between the "Path" header and the reply function. It is recognized
|
|
that the path is not always a valid reply string in older
|
|
implementations, and no requirement to fix this problem is placed on
|
|
implementations. However, the existing convention of placing the
|
|
host name and an "!" at the front of the path, and of starting the
|
|
path with the host name, an "!", and the user name, should be
|
|
maintained when possible.
|
|
|
|
2.2. Optional Headers
|
|
|
|
2.2.1. Reply-To
|
|
|
|
This line has the same format as "From". If present, mailed replies
|
|
to the author should be sent to the name given here. Otherwise,
|
|
replies are mailed to the name on the "From" line. (This does not
|
|
prevent additional copies from being sent to recipients named by the
|
|
replier, or on "To" or "Cc" lines.) The full name may be optionally
|
|
given, in parentheses, as in the "From" line.
|
|
|
|
2.2.2. Sender
|
|
|
|
This field is present only if the submitter manually enters a "From"
|
|
line. It is intended to record the entity responsible for
|
|
submitting the message to the network. It should be verified by the
|
|
software at the submitting host.
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 7]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
For example, if John Smith is visiting CCA and wishes to post a
|
|
message to the network, using friend Sarah Jones' account, the
|
|
message might read:
|
|
|
|
From: smith@ucbvax.Berkeley.EDU (John Smith)
|
|
Sender: jones@cca.COM (Sarah Jones)
|
|
|
|
If a gateway program enters a mail message into the network at host
|
|
unix.SRI.COM, the lines might read:
|
|
|
|
From: John.Doe@A.CS.CMU.EDU
|
|
Sender: network@unix.SRI.COM
|
|
|
|
The primary purpose of this field is to be able to track down
|
|
messages to determine how they were entered into the network. The
|
|
full name may be optionally given, in parentheses, as in the "From"
|
|
line.
|
|
|
|
2.2.3. Followup-To
|
|
|
|
This line has the same format as "Newsgroups". If present, follow-
|
|
up messages are to be posted to the newsgroup or newsgroups listed
|
|
here. If this line is not present, follow-ups are posted to the
|
|
newsgroup or newsgroups listed in the "Newsgroups" line.
|
|
|
|
If the keyword poster is present, follow-up messages are not
|
|
permitted. The message should be mailed to the submitter of the
|
|
message via mail.
|
|
|
|
2.2.4. Expires
|
|
|
|
This line, if present, is in a legal USENET date format. It
|
|
specifies a suggested expiration date for the message. If not
|
|
present, the local default expiration date is used. This field is
|
|
intended to be used to clean up messages with a limited usefulness,
|
|
or to keep important messages around for longer than usual. For
|
|
example, a message announcing an upcoming seminar could have an
|
|
expiration date the day after the seminar, since the message is not
|
|
useful after the seminar is over. Since local hosts have local
|
|
policies for expiration of news (depending on available disk space,
|
|
for instance), users are discouraged from providing expiration dates
|
|
for messages unless there is a natural expiration date associated
|
|
with the topic. System software should almost never provide a
|
|
default "Expires" line. Leave it out and allow local policies to be
|
|
used unless there is a good reason not to.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 8]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
2.2.5. References
|
|
|
|
This field lists the Message-ID's of any messages prompting the
|
|
submission of this message. It is required for all follow-up
|
|
messages, and forbidden when a new subject is raised.
|
|
Implementations should provide a follow-up command, which allows a
|
|
user to post a follow-up message. This command should generate a
|
|
"Subject" line which is the same as the original message, except
|
|
that if the original subject does not begin with "Re:" or "re:", the
|
|
four characters "Re:" are inserted before the subject. If there is
|
|
no "References" line on the original header, the "References" line
|
|
should contain the Message-ID of the original message (including the
|
|
angle brackets). If the original message does have a "References"
|
|
line, the follow-up message should have a "References" line
|
|
containing the text of the original "References" line, a blank, and
|
|
the Message-ID of the original message.
|
|
|
|
The purpose of the "References" header is to allow messages to be
|
|
grouped into conversations by the user interface program. This
|
|
allows conversations within a newsgroup to be kept together, and
|
|
potentially users might shut off entire conversations without
|
|
unsubscribing to a newsgroup. User interfaces need not make use of
|
|
this header, but all automatically generated follow-ups should
|
|
generate the "References" line for the benefit of systems that do
|
|
use it, and manually generated follow-ups (e.g., typed in well after
|
|
the original message has been printed by the machine) should be
|
|
encouraged to include them as well.
|
|
|
|
It is permissible to not include the entire previous "References"
|
|
line if it is too long. An attempt should be made to include a
|
|
reasonable number of backwards references.
|
|
|
|
2.2.6. Control
|
|
|
|
If a message contains a "Control" line, the message is a control
|
|
message. Control messages are used for communication among USENET
|
|
host machines, not to be read by users. Control messages are
|
|
distributed by the same newsgroup mechanism as ordinary messages.
|
|
The body of the "Control" header line is the message to the host.
|
|
|
|
For upward compatibility, messages that match the newsgroup pattern
|
|
"all.all.ctl" should also be interpreted as control messages. If no
|
|
"Control" header is present on such messages, the subject is used as
|
|
the control message. However, messages on newsgroups matching this
|
|
pattern do not conform to this standard.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 9]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
Also for upward compatibility, if the first 4 characters of the
|
|
"Subject:" line are "cmsg", the rest of the "Subject:" line should
|
|
be interpreted as a control message.
|
|
|
|
2.2.7. Distribution
|
|
|
|
This line is used to alter the distribution scope of the message.
|
|
It is a comma separated list similar to the "Newsgroups" line. User
|
|
subscriptions are still controlled by "Newsgroups", but the message
|
|
is sent to all systems subscribing to the newsgroups on the
|
|
"Distribution" line in addition to the "Newsgroups" line. For the
|
|
message to be transmitted, the receiving site must normally receive
|
|
one of the specified newsgroups AND must receive one of the
|
|
specified distributions. Thus, a message concerning a car for sale
|
|
in New Jersey might have headers including:
|
|
|
|
Newsgroups: rec.auto,misc.forsale
|
|
Distribution: nj,ny
|
|
|
|
so that it would only go to persons subscribing to rec.auto or misc.
|
|
for sale within New Jersey or New York. The intent of this header
|
|
is to restrict the distribution of a newsgroup further, not to
|
|
increase it. A local newsgroup, such as nj.crazy-eddie, will
|
|
probably not be propagated by hosts outside New Jersey that do not
|
|
show such a newsgroup as valid. A follow-up message should default
|
|
to the same "Distribution" line as the original message, but the
|
|
user can change it to a more limited one, or escalate the
|
|
distribution if it was originally restricted and a more widely
|
|
distributed reply is appropriate.
|
|
|
|
2.2.8. Organization
|
|
|
|
The text of this line is a short phrase describing the organization
|
|
to which the sender belongs, or to which the machine belongs. The
|
|
intent of this line is to help identify the person posting the
|
|
message, since host names are often cryptic enough to make it hard
|
|
to recognize the organization by the electronic address.
|
|
|
|
2.2.9. Keywords
|
|
|
|
A few well-selected keywords identifying the message should be on
|
|
this line. This is used as an aid in determining if this message is
|
|
interesting to the reader.
|
|
|
|
2.2.10. Summary
|
|
|
|
This line should contain a brief summary of the message. It is
|
|
usually used as part of a follow-up to another message. Again, it
|
|
|
|
|
|
|
|
Horton & Adams [Page 10]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
is very useful to the reader in determining whether to read the
|
|
message.
|
|
|
|
2.2.11. Approved
|
|
|
|
This line is required for any message posted to a moderated
|
|
newsgroup. It should be added by the moderator and consist of his
|
|
mail address. It is also required with certain control messages.
|
|
|
|
2.2.12. Lines
|
|
|
|
This contains a count of the number of lines in the body of the
|
|
message.
|
|
|
|
2.2.13. Xref
|
|
|
|
This line contains the name of the host (with domains omitted) and a
|
|
white space separated list of colon-separated pairs of newsgroup
|
|
names and message numbers. These are the newsgroups listed in the
|
|
"Newsgroups" line and the corresponding message numbers from the
|
|
spool directory.
|
|
|
|
This is only of value to the local system, so it should not be
|
|
transmitted. For example, in:
|
|
|
|
Path: seismo!lll-crg!lll-lcc!pyramid!decwrl!reid
|
|
From: reid@decwrl.DEC.COM (Brian Reid)
|
|
Newsgroups: news.lists,news.groups
|
|
Subject: USENET READERSHIP SUMMARY REPORT FOR SEP 86
|
|
Message-ID: <5658@decwrl.DEC.COM>
|
|
Date: 1 Oct 86 11:26:15 GMT
|
|
Organization: DEC Western Research Laboratory
|
|
Lines: 441
|
|
Approved: reid@decwrl.UUCP
|
|
Xref: seismo news.lists:461 news.groups:6378
|
|
|
|
the "Xref" line shows that the message is message number 461 in the
|
|
newsgroup news.lists, and message number 6378 in the newsgroup
|
|
news.groups, on host seismo. This information may be used by
|
|
certain user interfaces.
|
|
|
|
3. Control Messages
|
|
|
|
This section lists the control messages currently defined. The body
|
|
of the "Control" header line is the control message. Messages are a
|
|
sequence of zero or more words, separated by white space (blanks or
|
|
tabs). The first word is the name of the control message, remaining
|
|
words are parameters to the message. The remainder of the header
|
|
|
|
|
|
|
|
Horton & Adams [Page 11]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
and the body of the message are also potential parameters; for
|
|
example, the "From" line might suggest an address to which a
|
|
response is to be mailed.
|
|
|
|
Implementors and administrators may choose to allow control messages
|
|
to be carried out automatically, or to queue them for annual
|
|
processing. However, manually processed messages should be dealt
|
|
with promptly.
|
|
|
|
Failed control messages should NOT be mailed to the originator of
|
|
the message, but to the local "usenet" account.
|
|
|
|
3.1. Cancel
|
|
|
|
cancel <Message-ID>
|
|
|
|
|
|
If a message with the given Message-ID is present on the local
|
|
system, the message is cancelled. This mechanism allows a user to
|
|
cancel a message after the message has been distributed over the
|
|
network.
|
|
|
|
If the system is unable to cancel the message as requested, it
|
|
should not forward the cancellation request to its neighbor systems.
|
|
|
|
Only the author of the message or the local news administrator is
|
|
allowed to send this message. The verified sender of a message is
|
|
the "Sender" line, or if no "Sender" line is present, the "From"
|
|
line. The verified sender of the cancel message must be the same as
|
|
either the "Sender" or "From" field of the original message. A
|
|
verified sender in the cancel message is allowed to match an
|
|
unverified "From" in the original message.
|
|
|
|
3.2. Ihave/Sendme
|
|
|
|
ihave <Message-ID list> [<remotesys>]
|
|
sendme <Message-ID list> [<remotesys>]
|
|
|
|
This message is part of the ihave/sendme protocol, which allows one
|
|
host (say A) to tell another host (B) that a particular message has
|
|
been received on A. Suppose that host A receives message
|
|
"<1234@ucbvax.Berkeley.edu>", and wishes to transmit the message to
|
|
host B.
|
|
|
|
A sends the control message "ihave <1234@ucbvax.Berkeley.edu> A" to
|
|
host B (by posting it to newsgroup to.B). B responds with the
|
|
control message "sendme <1234@ucbvax.Berkeley.edu> B" (on newsgroup
|
|
to.A), if it has not already received the message. Upon receiving
|
|
|
|
|
|
|
|
Horton & Adams [Page 12]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
the sendme message, A sends the message to B.
|
|
|
|
This protocol can be used to cut down on redundant traffic between
|
|
hosts. It is optional and should be used only if the particular
|
|
situation makes it worthwhile. Frequently, the outcome is that,
|
|
since most original messages are short, and since there is a high
|
|
overhead to start sending a new message with UUCP, it costs as much
|
|
to send the ihave as it would cost to send the message itself.
|
|
|
|
One possible solution to this overhead problem is to batch requests.
|
|
Several Message-ID's may be announced or requested in one message.
|
|
If no Message-ID's are listed in the control message, the body of
|
|
the message should be scanned for Message-ID's, one per line.
|
|
|
|
3.3. Newgroup
|
|
|
|
newgroup <groupname> [moderated]
|
|
|
|
This control message creates a new newsgroup with the given name.
|
|
Since no messages may be posted or forwarded until a newsgroup is
|
|
created, this message is required before a newsgroup can be used.
|
|
The body of the message is expected to be a short paragraph
|
|
describing the intended use of the newsgroup.
|
|
|
|
If the second argument is present and it is the keyword moderated,
|
|
the group should be created moderated instead of the default of
|
|
unmoderated. The newgroup message should be ignored unless there is
|
|
an "Approved" line in the same message header.
|
|
|
|
3.4. Rmgroup
|
|
|
|
rmgroup <groupname>
|
|
|
|
This message removes a newsgroup with the given name. Since the
|
|
newsgroup is removed from every host on the network, this command
|
|
should be used carefully by a responsible administrator. The
|
|
rmgroup message should be ignored unless there is an "Approved:"
|
|
line in the same message header.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 13]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
3.5. Sendsys
|
|
sendsys (no arguments)
|
|
|
|
The sys file, listing all neighbors and the newsgroups to be sent to
|
|
each neighbor, will be mailed to the author of the control message
|
|
("Reply-To", if present, otherwise "From"). This information is
|
|
considered public information, and it is a requirement of membership
|
|
in USENET that this information be provided on request, either
|
|
automatically in response to this control message, or manually, by
|
|
mailing the requested information to the author of the message.
|
|
This information is used to keep the map of USENET up to date, and
|
|
to determine where netnews is sent.
|
|
|
|
The format of the file mailed back to the author should be the same
|
|
as that of the sys file. This format has one line per neighboring
|
|
host (plus one line for the local host), containing four colon
|
|
separated fields. The first field has the host name of the
|
|
neighbor, the second field has a newsgroup pattern describing the
|
|
newsgroups sent to the neighbor. The third and fourth fields are
|
|
not defined by this standard. The sys file is not the same as the
|
|
UUCP L.sys file. A sample response is:
|
|
|
|
From: cbosgd!mark (Mark Horton)
|
|
Date: Sun, 27 Mar 83 20:39:37 -0500
|
|
Subject: response to your sendsys request
|
|
To: mark@cbosgd.ATT.COM
|
|
|
|
Responding-System: cbosgd.ATT.COM
|
|
cbosgd:osg,cb,btl,bell,world,comp,sci,rec,talk,misc,news,soc,to,
|
|
test
|
|
ucbvax:world,comp,to.ucbvax:L:
|
|
cbosg:world,comp,bell,btl,cb,osg,to.cbosg:F:/usr/spool/outnews
|
|
/cbosg
|
|
cbosgb:osg,to.cbosgb:F:/usr/spool/outnews/cbosgb
|
|
sescent:world,comp,bell,btl,cb,to.sescent:F:/usr/spool/outnews
|
|
/sescent
|
|
npois:world,comp,bell,btl,ug,to.npois:F:/usr/spool/outnews/npois
|
|
mhuxi:world,comp,bell,btl,ug,to.mhuxi:F:/usr/spool/outnews/mhuxi
|
|
|
|
3.6. Version
|
|
|
|
version (no arguments)
|
|
|
|
The name and version of the software running on the local system is
|
|
to be mailed back to the author of the message ("Reply-to" if
|
|
present, otherwise "From").
|
|
|
|
3.7. Checkgroups
|
|
|
|
|
|
|
|
Horton & Adams [Page 14]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
The message body is a list of "official" newsgroups and their
|
|
description, one group per line. They are compared against the list
|
|
of active newsgroups on the current host. The names of any obsolete
|
|
or new newsgroups are mailed to the user "usenet" and descriptions
|
|
of the new newsgroups are added to the help file used when posting
|
|
news.
|
|
|
|
4. Transmission Methods
|
|
|
|
USENET is not a physical network, but rather a logical network
|
|
resting on top of several existing physical networks. These
|
|
networks include, but are not limited to, UUCP, the Internet, an
|
|
Ethernet, the BLICN network, an NSC Hyperchannel, and a BERKNET.
|
|
What is important is that two neighboring systems on USENET have
|
|
some method to get a new message, in the format listed here, from
|
|
one system to the other, and once on the receiving system, processed
|
|
by the netnews software on that system. (On UNIX systems, this
|
|
usually means the rnews program being run with the message on the
|
|
standard input. <1>)
|
|
|
|
It is not a requirement that USENET hosts have mail systems capable
|
|
of understanding the Internet mail syntax, but it is strongly
|
|
recommended. Since "From", "Reply-To", and "Sender" lines use the
|
|
Internet syntax, replies will be difficult or impossible without an
|
|
Internet mailer. A host without an Internet mailer can attempt to
|
|
use the "Path" header line for replies, but this field is not
|
|
guaranteed to be a working path for replies. In any event, any host
|
|
generating or forwarding news messages must have an Internet address
|
|
that allows them to receive mail from hosts with Internet mailers,
|
|
and they must include their Internet address on their From line.
|
|
|
|
4.1. Remote Execution
|
|
|
|
Some networks permit direct remote command execution. On these
|
|
networks, news may be forwarded by spooling the rnews command with
|
|
the message on the standard input. For example, if the remote
|
|
system is called remote, news would be sent over a UUCP link
|
|
with the command:
|
|
|
|
uux - remote!rnews
|
|
|
|
and on a Berknet:
|
|
|
|
net -mremote rnews
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 15]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
It is important that the message be sent via a reliable mechanism,
|
|
normally involving the possibility of spooling, rather than direct
|
|
real-time remote execution. This is because, if the remote system
|
|
is down, a direct execution command will fail, and the message will
|
|
never be delivered. If the message is spooled, it will eventually
|
|
be delivered when both systems are up.
|
|
|
|
4.2. Transfer by Mail
|
|
|
|
On some systems, direct remote spooled execution is not possible.
|
|
However, most systems support electronic mail, and a news message
|
|
can be sent as mail. One approach is to send a mail message which
|
|
is identical to the news message: the mail headers are the news
|
|
headers, and the mail body is the news body. By convention, this
|
|
mail is sent to the user newsmail on the remote machine.
|
|
|
|
One problem with this method is that it may not be possible to
|
|
convince the mail system that the "From" line of the message is
|
|
valid, since the mail message was generated by a program on a
|
|
system different from the source of the news message. Another
|
|
problem is that error messages caused by the mail transmission
|
|
would be sent to the originator of the news message, who has no
|
|
control over news transmission between two cooperating hosts
|
|
and does not know whom to contact. Transmission error messages
|
|
should be directed to a responsible contact person on the
|
|
sending machine.
|
|
|
|
A solution to this problem is to encapsulate the news message into a
|
|
mail message, such that the entire message (headers and body) are
|
|
part of the body of the mail message. The convention here is that
|
|
such mail is sent to user rnews on the remote system. A mail
|
|
message body is generated by prepending the letter N to each line of
|
|
the news message, and then attaching whatever mail headers are
|
|
convenient to generate. The N's are attached to prevent any special
|
|
lines in the news message from interfering with mail transmission,
|
|
and to prevent any extra lines inserted by the mailer (headers,
|
|
blank lines, etc.) from becoming part of the news message. A
|
|
program on the receiving machine receives mail to rnews, extracting
|
|
the message itself and invoking the rnews program. An example in
|
|
this format might look like this:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 16]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
Date: Mon, 3 Jan 83 08:33:47 MST
|
|
From: news@cbosgd.ATT.COM
|
|
Subject: network news message
|
|
To: rnews@npois.ATT.COM
|
|
|
|
NPath: cbosgd!mhuxj!harpo!utah-cs!sask!derek
|
|
NFrom: derek@sask.UUCP (Derek Andrew)
|
|
NNewsgroups: misc.test
|
|
NSubject: necessary test
|
|
NMessage-ID: <176@sask.UUCP>
|
|
NDate: Mon, 3 Jan 83 00:59:15 MST
|
|
N
|
|
NThis really is a test. If anyone out there more than 6
|
|
Nhops away would kindly confirm this note I would
|
|
Nappreciate it. We suspect that our news postings
|
|
Nare not getting out into the world.
|
|
N
|
|
|
|
Using mail solves the spooling problem, since mail must always be
|
|
spooled if the destination host is down. However, it adds more
|
|
overhead to the transmission process (to encapsulate and extract the
|
|
message) and makes it harder for software to give different
|
|
priorities to news and mail.
|
|
|
|
4.3. Batching
|
|
|
|
Since news messages are usually short, and since a large number of
|
|
messages are often sent between two hosts in a day, it may make
|
|
sense to batch news messages. Several messages can be combined into
|
|
one large message, using conventions agreed upon in advance by the
|
|
two hosts. One such batching scheme is described here; its use is
|
|
highly recommended.
|
|
|
|
News messages are combined into a script, separated by a header of
|
|
the form:
|
|
|
|
|
|
#! rnews 1234
|
|
|
|
where 1234 is the length of the message in bytes. Each such line is
|
|
followed by a message containing the given number of bytes. (The
|
|
newline at the end of each line of the message is counted as one
|
|
byte, for purposes of this count, even if it is stored as <CARRIAGE
|
|
RETURN><LINE FEED>.) For example, a batch of message might look
|
|
like this:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 17]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
#! rnews 239
|
|
From: jerry@eagle.ATT.COM (Jerry Schwarz)
|
|
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
|
|
Newsgroups: news.announce
|
|
Subject: Usenet Etiquette -- Please Read
|
|
Message-ID: <642@eagle.ATT.COM>
|
|
Date: Fri, 19 Nov 82 16:14:55 EST
|
|
Approved: mark@cbosgd.ATT.COM
|
|
|
|
Here is an important message about USENET Etiquette.
|
|
#! rnews 234
|
|
From: jerry@eagle.ATT.COM (Jerry Schwarz)
|
|
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
|
|
Newsgroups: news.announce
|
|
Subject: Notes on Etiquette message
|
|
Message-ID: <643@eagle.ATT.COM>
|
|
Date: Fri, 19 Nov 82 17:24:12 EST
|
|
Approved: mark@cbosgd.ATT.COM
|
|
|
|
There was something I forgot to mention in the last
|
|
message.
|
|
|
|
Batched news is recognized because the first character in the
|
|
message is #. The message is then passed to the unbatcher for
|
|
interpretation.
|
|
|
|
The second argument (in this example rnews) determines which
|
|
batching scheme is being used. Cooperating hosts may use whatever
|
|
scheme is appropriate for them.
|
|
|
|
5. The News Propagation Algorithm
|
|
|
|
This section describes the overall scheme of USENET and the
|
|
algorithm followed by hosts in propagating news to the entire
|
|
logical network. Since all hosts are affected by incorrectly
|
|
formatted messages and by propagation errors, it is important
|
|
for the method to be standardized.
|
|
|
|
USENET is a directed graph. Each node in the graph is a host
|
|
computer, and each arc in the graph is a transmission path from
|
|
one host to another host. Each arc is labeled with a newsgroup
|
|
pattern, specifying which newsgroup classes are forwarded along
|
|
that link. Most arcs are bidirectional, that is, if host A
|
|
sends a class of newsgroups to host B, then host B usually sends
|
|
the same class of newsgroups to host A. This bidirectionality
|
|
is not, however, required.
|
|
|
|
USENET is made up of many subnetworks. Each subnet has a name, such
|
|
|
|
|
|
|
|
Horton & Adams [Page 18]
|
|
|
|
RFC 1036 Standard for USENET Messages December 1987
|
|
|
|
|
|
as comp or btl. Each subnet is a connected graph, that is, a path
|
|
exists from every node to every other node in the subnet. In
|
|
addition, the entire graph is (theoretically) connected. (In
|
|
practice, some political considerations have caused some hosts to be
|
|
unable to post messages reaching the rest of the network.)
|
|
|
|
A message is posted on one machine to a list of newsgroups. That
|
|
machine accepts it locally, then forwards it to all its neighbors
|
|
that are interested in at least one of the newsgroups of the
|
|
message. (Site A deems host B to be "interested" in a newsgroup if
|
|
the newsgroup matches the pattern on the arc from A to B. This
|
|
pattern is stored in a file on the A machine.) The hosts receiving
|
|
the incoming message examine it to make sure they really want the
|
|
message, accept it locally, and then in turn forward the message to
|
|
all their interested neighbors. This process continues until the
|
|
entire network has seen the message.
|
|
|
|
An important part of the algorithm is the prevention of loops. The
|
|
above process would cause a message to loop along a cycle forever.
|
|
In particular, when host A sends a message to host B, host B will
|
|
send it back to host A, which will send it to host B, and so on.
|
|
One solution to this is the history mechanism. Each host keeps
|
|
track of all messages it has seen (by their Message-ID) and
|
|
whenever a message comes in that it has already seen, the incoming
|
|
message is discarded immediately. This solution is sufficient to
|
|
prevent loops, but additional optimizations can be made to avoid
|
|
sending messages to hosts that will simply throw them away.
|
|
|
|
One optimization is that a message should never be sent to a machine
|
|
listed in the "Path" line of the header. When a machine name is
|
|
in the "Path" line, the message is known to have passed through the
|
|
machine. Another optimization is that, if the message originated
|
|
on host A, then host A has already seen the message. Thus, if a
|
|
message is posted to newsgroup misc.misc, it will match the pattern
|
|
misc.all (where all is a metasymbol that matches any string), and
|
|
will be forwarded to all hosts that subscribe to misc.all (as
|
|
determined by what their neighbors send them). These hosts make up
|
|
the misc subnetwork. A message posted to btl.general will reach all
|
|
hosts receiving btl.all, but will not reach hosts that do not get
|
|
btl.all. In effect, the messages reaches the btl subnetwork. A
|
|
messages posted to newsgroups misc.misc,btl.general will reach all
|
|
hosts subscribing to either of the two classes.
|
|
|
|
Notes
|
|
|
|
<1> UNIX is a registered trademark of AT&T.
|
|
|
|
|
|
|
|
|
|
|
|
Horton & Adams [Page 19]
|
|
</PRE>
|
|
<A HREF="index.htm"><IMG SRC="../images/b_arrow.png" ALT="Back" Border="0">Go Back</A>
|
|
|
|
</BODY>
|
|
</HTML>
|
|
|