home *** CD-ROM | disk | FTP | other *** search
- Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!news.kei.com!sol.ctr.columbia.edu!howland.reston.ans.net!darwin.sura.net!news-feed-1.peachnet.edu!concert!decwrl!looking!brad
- Message-ID: <S615.439@clarinet.com>
- Date: Sun, 21 Nov 93 2:40:08 EST
- Expires: Wed, 22 Dec 93 2:40:08 EST
- Newsgroups: clari.net.newusers,news.answers
- From: brad@clarinet.com (Brad Templeton)
- Reply-To: clarinet@clarinet.com
- Followup-to: poster
- Approved: brad@clarinet.com
- Subject: ClariNet: How it works (Sep/93)
- Lines: 420
- Xref: senator-bedfellow.mit.edu clari.net.newusers:119 news.answers:14924
-
- Archive-name: clarinet/howitworks
-
- ClariNet draws news from a variety of sources. This news is
- processed and converted into USENET format at ClariNet
- facilities. It is then sent out via UUCP (the telephone/modem
- based inter-unix communications facility) and TCP/IP (the
- computer communications protocol used by many machines,
- including those on leased line networks like the internet)
- to ClariNet customers around the world.
-
- We receive UPI (United Press International) wireservice news
- directly via satellite, in the same way that newspapers
- receive it. The wire news comes (more or less) in what is
- known as the ANPA (American Newspaper Publishers
- Association) format.
-
- This format was designed some time ago. In the beginning,
- all wires simply fed directly to printers or teletypes, at
- speeds of 300 bps or less. The ANPA format was eventually
- designed and revised to help newspapers that fed the wire
- directly into the composing computer.
-
- Even so, it is primitive compared to formats like the USENET
- news format and modern electronic mail formats. Only a
- small amount of information is formally specified. By and
- large, the information is intended for use by computer
- assisted humans, not an electronic newspaper system like
- ClariNet.
-
- The satellite feed also provides us with syndicated columns,
- stocks, and other newspaper related services. The
- syndicates all buy transmission time on the two main
- newswire satellite networks (UPI and AP) -- charging it back
- to their customers, of course.
-
- For other sources, we either call pickup points by modem or
- have the sources upload the information to us. Once again,
- our software converts the information and injects it into
- the USENET style news system.
-
- Where possible, news is fed directly to customers with
- minimal human intervention. Our software has been trained
- to deal with the various inconsistencies in the wire feed so
- that news goes out even outside of business hours. This
- ensures that the news gets to you as quickly as possible.
-
- The software takes category information provided by the
- reporters and uses it to classify the articles into one or
- more appropriate newsgroups. For example, all NASA stories
- go to clari.tw.space.
-
- During business hours (and often outside them, too) ClariNet
- editors scan the report. We can delete bad stories, edit
- them to make corrections, or adjust categorizations and
- newsgroups. If a story is corrected, the old version is
- canceled and the update re-issued.
-
- We don't edit every single mistake we find. In general, we
- edit serious errors and add or delete categorizations from
- stories. Most of this news is written quickly, with the
- goal of getting it to the client as soon as possible. As
- such we sometimes let typos and other minor mistakes stand,
- in order to avoid excessive re-issuance of stories.
-
- "Wireservices"
-
-
- Long before USENET existed, the wireservices built the first
- large scale text broadcast systems. Aside from the feeds to
- newspapers -- done at first by telegraph, later by leased
- lines and now by satellite -- the wires have their own
- internal nets as well, where they can issue messages to
- their own people and even engage in limited discussion.
-
- These nets have been around since the 19th century, long
- before computers even existed. Unfortunately, it seems at
- times that their technology hasn't changed much since then.
-
- As you will read, the reporters key in all the headers and
- classifications by hand with cryptic single letter codes.
- This is very prone to error. With luck, this system will be
- replaced in the near future.
-
- The largest wireservice in the world is the Associated
- Press, or AP. AP is owned by member newspapers. It has its
- own reporters, but also draws stories from the member
- papers. In the USA, the #2 wire is United Press International,
- or UPI. UPI is an independent wire, privately owned. UPI
- draws revenue only from fees charged to client newspapers
- and distributors like ClariNet. The third major wire is
- Reuters. Reuters now makes the vast bulk of its revenue not
- from newspapers, but by providing information to people in
- the finance industries. Nonetheless its wireservice
- components in the USA are similar in size to UPI.
-
- As the #2 wire, UPI is far more willing to experiment with
- new concepts like electronic publishing. This is what makes
- ClariNet wireservice news possible.
-
- Just like USENET, wireservices have their own vocabulary.
- You'll see some of it in the advisories on ClariNet stories,
- which we put in the Note: header line.
-
- "Wire Activity"
-
-
- All wire stories have the following main components:
-
- 1. A priority that marks the importance of the story.
-
- 2. A general category from one of about a dozen ANPA
- defined codes.
-
- 3. A *slugword*, or unique keyword that identifies the
- story for that day.
-
- A variety of other fields are optional and described later.
-
- "Priorities"
-
- UPI covers a wide variety of topics. The most important
- stories are termed *breaking* news. These stories are
- assigned one of three special priorities -- flash, bulletin
- and urgent.
-
- *Flash* is the most extreme priority there is. Flash
- stories are only one sentence long, and are followed almost
- immediately by a bulletin. The last known flashes were
- "space shuttle explodes" and "U.S. invades Iraq" -- this
- gives you some idea of the importance of these stories. Any
- flash, if and when it comes, will be posted to clari.news.flash.
- If you're a system administrator, you might arrange for
- special treatment and forwarding of such stories.
-
- *Bulletin* is the normal priority for the most important
- breaking stories of the week. Bulletins can range from
- major government announcements up to big events such as the
- U.S. invasion of Panama. One normally doesn't see more
- than a few bulletins per week; although like world events,
- bulletins come at random.
-
- *Urgent* is a priority assigned reasonably frequently -- 3-6
- times per day. The most important stories of the day get
- this priority.
-
- Most other news gets the *regular* (called *rush* in the
- wire industry) priority. Some other news will see lower
- priorities. These are listed in the description of the
- Priority header line.
-
- All breaking news stories are posted to special groups
- dedicated to news of that priority. When a story is first
- assigned a priority, we maintain it in the group for that
- priority each time it is re-issued, even if the wire has
- dropped the story's priority to a lower value.
-
- "Scheduled News"
-
- A lot of the major news that "moves" on the wires is not
- unexpected. For example, a presidential press conference is
- sure to produce a big story, and everybody knows what time
- that story will arrive -- they just don't (usually) know
- what it will say.
-
- In addition, a number of stories are important, but not
- particularly urgent, and are written with care for release
- at a particular time. This is true of features and analysis
- pieces, or pieces about developing world situations.
-
- These types of stories are known as scheduled stories, or
- "skedded" in the wire lingo. The editors release a schedule
- of upcoming big stories for newspaper editors to use in
- planning their pages. We assign any "skedded" story a
- priority of *major*, and have created some special groups,
- called "top" news groups, for such stories.
-
- "Classification"
-
- The ANPA category provides some useful information about a
- dozen ANPA categories used regularly. To supplement this,
- UPI has reporters and editors classify stories with special
- custom codes. These map to keywords identifying several
- hundred different story topics. It is these codes, along
- with our own judgement, that classify most of the stories
- into newsgroups.
-
- "Story Updates"
-
- When a newspaper goes to press, it wants the latest version
- of any developing story. For this reason, almost all
- breaking stories get issued several times during the day.
- The reporter keeps the text in his or her laptop, edits it
- as new details, quotes and corrections develop, and
- re-issues the entire story whenever anything important
- happens.
-
- On a big story, as many as 20 updates may come in a day.
- Most major stories see two or three.
-
- All updates (should) come with the same *Slugword* -- the
- unique keyword that identifies the story. When ClariNet
- sees a story come in with the same slugword as a previous
- story, we normally arrange to replace the old story with the
- new one. This is done by canceling the old one (USENET
- cancel message) and issuing the new one.
-
- Unfortunately, it's not as simple as that, and this feature
- of wireservices is the source of the greatest problem in
- interfacing a wire to USENET format news.
-
- Often updates come only minutes apart. In these cases, the
- cancel and update is done before the original article is
- batched and sent to our clients. This means that you never
- even see that original, which is good.
-
- If updates are more widely spaced, you will get both
- versions (or several versions) and the cancel message(s).
- This means your newsgroups -- particularly the groups for
- breaking news -- will be full of gaps formed by deleted
- articles. This causes the original rn program to pause, and
- can cause worse problems for the nn newsreader. This can be
- fixed, however.
-
- The worst question is how to present the updates to the
- reader. This system works well for newspapers, for which it
- was designed. They are only issued once a day, so readers
- only get the story that was current at press time.
-
- On ClariNet, however, if you read an article soon after its
- release, and then come back to read again a few hours later,
- you may well see the same article presented again. You
- aren't seeing the same article, of course, you're seeing an
- update. It is up to you to decide if you wish to read the
- update for the latest details, or skip it.
-
- Fortunately most updates have a Note: line indicating what
- has changed in the article -- but only since the last
- update. If several updates have been sent out since you
- last read news, this may not tell you enough.
-
- It is a dilemma. Either we present the subscriber with
- redundant news that most readers will elect to skip, or we
- keep potentially important updates from eager readers. We
- have decided to do the former. The use of Newsclip, and
- eventually fancier reading tools, can deal with this problem
- in a more suitable fashion.
-
- "Other Duplicates"
-
- The update system isn't perfect, because the input from the
- wire isn't perfect. Reporters sometimes forget to put
- updating flags on stories, for example. Our software is
- keyed to look for changes in the headline or byline on a
- story. A changed headline more than a few hours after the
- original story is treated as a new story by us. This works
- about 95% of the time. Sometimes, however, you will see a
- duplicated story appear under two headlines. We try to
- correct these by hand.
-
- Another common source of duplicates is changed slugwords.
- Sometimes an update comes to correct a mistyped or incorrect
- slugword. As no information is provided as to what the old
- slugword was, we can't arrange to cancel the story being
- updated. A duplicate ensues.
-
- The final major source of apparent duplicates comes from the
- old concept of a wireservice being split into multiple
- wires. One hears talk of the "news wire," the "sports wire"
- and the "financial wire." In the old days, each wire went
- to a different department in the newspaper. Today it's all
- the same physical channel, processed by a computer.
-
- If a story breaks that belongs in more than one category, it
- may be sent out twice, with two entirely different
- slugwords, and two different ANPA category codes. For
- example, Pete Rose's expulsion from baseball was both a
- sports story and a general news story.
-
- "Standing Stories"
-
-
- The wires put out a large variety of standing stories. These
- are regular features, all with the same slugword, that
- appear at some particular interval, such as every day or
- every week.
-
- A list of most of the major standing stories can be found in
- a subsequent file.
-
- "Wireservice Errors"
-
-
- As noted, the wireservice coding schemes are particularly
- prone to error. We have trained our software to catch many
- typical errors, but the wires have little in the way of
- formal specification for what they do put out, and they
- don't always follow what formal rules they do have.
-
- Thus you can expect some errors to reach you, particularly
- after business hours, or in the lower importance groups
- which don't receive full time scrutiny.
-
- At first, we at ClariNet found these errors quite annoying.
- One realizes, however, that with thousands of stories to put
- out, even the best staff will make a few errors each day. By
- and large, they do not interfere in any significant way with
- your effort to find the news you want to read, and as such,
- they can simply be ignored.
-
- The most annoying are the coding errors, particularly those
- from coding typos. You will sometimes see a story in a
- group that has nothing to do with the topic of that group.
- For example, a college football story, which a reporter
- would code as sfc (Sports-Football-College) may get entered
- as bfc (Business-manuFacturing-Computers) and thus posted to
- our very popular computer group. Until we can convince UPI
- reporters to adopt a new coding scheme, such things are
- unfortunately possible.
-
- "Local/Regional Stories"
-
-
- A great deal of a wireservice's output is regional news,
- collected for newspaper clients in various U.S. states.
- Now, ClariNet releases many of these stories in the
- clari.local hierarchy. We have local hierarchies for 30
- different U.S. and Canadian regions, in addition to our
- international and national news.
-
- Local stories of national importance are cross-posted
- between local and national newsgroups.
-
- In certain national groups, we do publish regional stories.
- For example, the computer group, as well as most of the
- other technical groups, contain regional stories. While
- this sometimes results in the odd truly-local computer
- story, ("Computer demo day at local University") most of the
- time it is worth it. Our editors delete stories of the
- "demo day" form after-the-fact.
-
- "Broadcast News"
-
-
- ClariNet also buys some wireservice news meant for radio
- stations. These are used to provide our hourly news
- summaries (clari.news.cast and clari.news.headlines) along
- with the various local news summaries in the clari.local
- hierarchy.
-
- Radio station wires contain shorter stories, and the stories
- have no headlines. They are generally a bit sloppier, as
- the reporters do not expect them to see print. In addition,
- they contain phonetic spellings of unusual names, so that
- radio announcers will read things correctly.
-
- "Canadian Broadcast news"
-
-
- To serve Canadian clients, as well as expatriate Canadians
- around the world, ClariNet also offers Canadian news. UPI,
- as a U.S. wire, offers very little coverage of Canada. This
- is normal for U.S. media. The group clari.news.canada
- contains the limited coverage that comes along the main wire
- -- only truly major stories and financial news.
-
- The clari.canada hierarchy provides a feed of a broadcast
- wire (Standard Broadcast Wire) for Canadians to which we
- have arranged access. All the problems of radio wires
- described above apply.
-
- The best group to read for those outside of Canada is
- probably clari.canada.briefs which provides regularly
- updated summaries of major Canadian stories. The group
- clari.canada.newscast provides an hourly newscast on world
- and Canadian news outside of business hours. This also
- covers U.S. and world events, so non-Canadian readers may
- wish to read it for late night updates.
-
- Canadian regional summaries (still from SBW) appear in the
- clari.local hierarchy.
-
- "Newsbytes"
-
-
- Newsbytes articles are not as well classified as UPI
- articles, but there is still some useful information. It is
- put on the Keywords: line.
-
- The most important keyword that appears on each line takes
- the form Bureau-xxx where "xxx" is a three letter code for
- the location of the bureau. You can use the presence of
- these codes to track or filter stories from certain regions.
- For example, filtering out Bureau-AUS will eliminate
- Australian stories.
-
- (International stories that are more likely to be of
- regional interest are also likely to be coded with country
- prefix in the subject line, so you can use that in a filter
- as well.)
-
- Other keywords include things like exclusive, review and
- correction, but it is less likely that you would filter on
- these.
-
- Newsbytes headlines arrive at ClariNet in upper case. Our
- software converts them to a more readable mixed case.
- Naturally such software can't be perfect, so the odd error
- will occur, but this is surprisingly rare.
-
- Newsbytes also tags important stories. These are
- crossposted to the clari.nb.top newsgroup.
-
- "Features"
-
-
- Feature articles (such as the Dave Barry) column come in a
- fashion similar to UPI material, but they will have no
- keywords or location coding. This is not normally a
- problem, as you usually will read every item in a feature
- group.
-