home *** CD-ROM | disk | FTP | other *** search
- /* Written 3:39 pm Jun 19, 1986 by sources-request@mirror.UUCP in mirror:mod.sources */
- /* ---------- "v06i009: Tools for analyzing netne" ---------- */
- Submitted by: talcott!seismo!epiwrl!epimass!jbuck
- Mod.sources: Volume 6, Issue 9
- Archive-name: getpaths
-
- [ I had to reformat the AWK script in histog, strangely enough;
- otherwise AWK crashed. My version is in the file histog.mod
- --r$ ]
-
- # This is a shell archive. Remove anything before this line,
- # then unpack it by saving it in a file and typing "sh file".
- # Contents: 6sites README backbone-list closure getopt.c getpaths.1 getpaths.c
- # histog histog.mod hostcount makefile makefile.42 makefile.sysv
- # makefile.v7 oscantree.c scantree.c threelink twolink
-
- echo x - 6sites
- sed 's/^XX//' > "6sites" <<'@//E*O*F 6sites//'
- XXBACKBONE!decvax!seismo!ihnp4!tektronix!sdcrdcf!decwrl
- @//E*O*F 6sites//
- chmod u=rw,g=rw,o=rw 6sites
-
- echo x - README
- sed 's/^XX//' > "README" <<'@//E*O*F README//'
- XX A Collection of Usenet path-counting tools
- XX Joseph T. Buck - Entropic Processing, Inc.
-
- XXThis is a set of tools for computing statistics on the Usenet paths
- XXon your machine.
-
- XXThe program 'getpaths' gathers paths from your news spool, taking cross
- XXpostings into account (each path appears in the output once).
- XXSee the man page supplied.
-
- XXThere are three makefiles provided; I hope getpaths can be made to
- XXrun on anything. Use makefile.42 for 4.2bsd or 4.3bsd; makefile.sysv
- XXfor System III or System V; use makefile.v7 for Version 7 or 4.1bsd.
-
- XXIf you have trouble, there are three questions you have to answer:
-
- XX 1) Do you have the 4.2 directory access library? If so, use
- XX scantree.c, otherwise use oscantree.c.
-
- XX 2) v7 and bsd systems have functions "index" and "rindex".
- XX SysIII and SysV changed the names to strchr and strrchr,
- XX for no good reason. See the definition of CFLAGS in
- XX makefile.sysv.
-
- XX 3) Do you have getopt? (SysIII,V do, bsd, v7 don't). I'm
- XX including Henry Spencer's public domain version, since
- XX it's short and valuable and Berkeley folks can't run the
- XX program without it.
-
- XXYou may find the scantree functions useful in other programs. You're
- XXwelcome to them, but leave my name in, please.
-
- XXWith no options, getpaths prints the paths in every article on the standard
- XXoutput. If you expire news after one week, this will result in about 300 Kbytes
- XXof output. To restrict yourself to a subtree, you may give an option, which
- XXis a pathname that uses /usr/spool/news as a base.
-
- XX getpaths mod
- XX getpaths net/politics
-
- XXOnce you've used getpaths to gather some paths, you can use the shell scripts
- XXto compute some statistics on them:
-
- XXhistog -
- XX Computes the average path length and prints a histogram.
- XXhostcount -
- XX Counts hostnames that occur anywhere in a path and prints them
- XX in declining order of frequency.
- XXtwolink -
- XX Prints connected pairs of sites in declining order of frequency.
- XX I used twolink to prepare my most recent map.
- XXthreelink -
- XX Prints connected triplets of sites in declining order of frequency.
-
- XXFinally, there is "closure", which will give "awk" a good workout on your
- XXsystem. It is an automated version of the procedure that I used to build
- XXan extended backbone map.
-
- XXclosure takes three arguments. The first argument is the name of a
- XXfile containing the "core" sites you wish to start with (you can
- XXthink of these as "backbone" sites, but I'm not claiming to define what
- XX"backbone" really means).
-
- XXThe file must be in a somewhat peculiar format; each line must begin
- XXwith the word BACKBONE (in caps), followed by the site names,
- XXseparated by ! marks. For example, if you want to start with hosts
- XXalpha, beta, and gamma, the backbone file could be
-
- XXBACKBONE!alpha!beta!gamma
-
- XXor
-
- XXBACKBONE!alpha
- XXBACKBONE!beta
- XXBACKBONE!gamma
-
- XXYes, it's ugly.
-
- XXThe second argument is the name of the file containing a list of paths
- XX(as output by getpaths, perhaps). Finally, the third argument, an
- XXinteger, is a threshold.
-
- XXclosure searches the paths file for chains of paths that connect two
- XX"core" sites; a chain is disregarded unless it occurs at least
- XX"threshold" times, where "threshold" is the third argument. Every
- XXsite in each such chain is added to the "core". This is repeated
- XXuntil no more sites can be added to the "core". A report of each
- XXiteration and what paths are added to the core is printed on the
- XXstandard output.
-
- XXI've supplied two "initial cores" you can use; "6sites" includes
- XXihnp4, seismo, decvax, decwrl, tektronix, and sdcrdcf, and
- XX"backbone-list" contains all sites on Gene Spafford's backbone map.
-
- XXTry the commands
-
- XXgetpaths net > net.paths
- XXclosure 6sites net.paths 20
-
- XXor
-
- XXclosure backbone-list net.paths 20
-
- XXI'm interested in seeing the results for sites far from the SF Bay
- XXArea. You can write me at
-
- XX oliveb-\
- XX \
- XX ihnp4---pesnta--epimass!jbuck
- XX /
- XX hplabs-/
- @//E*O*F README//
- chmod u=rw,g=rw,o=rw README
-
- echo x - backbone-list
- sed 's/^XX//' > "backbone-list" <<'@//E*O*F backbone-list//'
- XXBACKBONE!cbosgd!clyde!watmath!utzoo
- XXBACKBONE!ihnp4!cuae2!alberta!ubc-vision!burl!ulysses!bellcore
- XXBACKBONE!uw-beaver!tektronix!decvax!linus
- XXBACKBONE!qantel!hplabs!oliveb!glacier!decwrl
- XXBACKBONE!sdcrdcf!hao!sdcsvax!drillsys!gatech!akgua!mcnc!philabs
- XXBACKBONE!lll-lcc!lll-crg!cmcl2!seismo
- @//E*O*F backbone-list//
- chmod u=rw,g=rw,o=rw backbone-list
-
- echo x - closure
- sed 's/^XX//' > "closure" <<'@//E*O*F closure//'
- XX#! /bin/sh
- XX# closure: Joseph T. Buck, Entropic Processing, Inc.
- XX# Usage: closure startfile paths thresh
- XX# This rather awk-ward script builds a "transitive closure" of a backbone.
- XXif test $# != 3
- XXthen echo Usage: $0 backbone-list paths threshold >&2; exit 1
- XXfi
- XX# The file /tmp/$$c contains the current backbone list; /tmp/$$a contains
- XX# The latest set of additions.
- XXcp $1 /tmp/$$c
- XXecho "------ Original backbone: ------"
- XXawk -F! '{ for (i=1; i<=NF; i++) if ($i != "BACKBONE") printf ("%s ",$i); }
- XXEND { printf ("\n");}' /tmp/$$c
- XXi=1
- XXwhile true
- XXdo
- XX awk -F! '{
- XX if ($1 == "BACKBONE") {
- XX for (i = 2; i <= NF; i++)
- XX backbone[$i] = 1;
- XX }
- XX else {
- XX i = 1;
- XX while (i <= NF && !backbone[$i])
- XX i++;
- XX if (i > NF)
- XX next;
- XX while (1) {
- XX while (i <= NF && backbone[$i])
- XX i++;
- XX if (i > NF)
- XX break;
- XX first = i - 1;
- XX while (i <= NF && !backbone[$i])
- XX i++;
- XX if (i > NF)
- XX break;
- XX for (j = first; j < i; j++)
- XX printf ("%s!", $j);
- XX printf ("%s\n", $i);
- XX }
- XX }
- XX}' /tmp/$$c $2 | sort | uniq -c |\
- XX awk '{ if ($1 >= thr) printf("BACKBONE!%s\n", $2); }' thr=$3 - > /tmp/$$a
- XX if test ! -s /tmp/$$a
- XX then echo "------ Final backbone: ------"
- XX awk -F! '{for (i=2; i<NF; i++) print $i}' /tmp/$$c | sort -u
- XX rm /tmp/$$c /tmp/$$a
- XX exit 0
- XX fi
- XX echo "------ Pass $i: add the following: ------"
- XX awk -F! '{ for (i=2; i<NF; i++) printf("%s!", $i); printf("%s\n", $NF);}' /tmp/$$a
- XX cat /tmp/$$a >> /tmp/$$c
- XX i=`expr $i + 1`
- XXdone
-
- @//E*O*F closure//
- chmod u=rwx,g=rwx,o=rwx closure
-
- echo x - getopt.c
- sed 's/^XX//' > "getopt.c" <<'@//E*O*F getopt.c//'
- XX/*
- XX * getopt - get option letter from argv
- XX * by Henry Spencer
- XX * posted to Usenet net.sources list
- XX */
- XX#include <stdio.h>
-
- XXchar *optarg; /* Global argument pointer. */
- XXint optind = 0; /* Global argv index. */
-
- XXstatic char *scan = NULL; /* Private scan pointer. */
-
- XXextern char *index();
-
- XXint
- XXgetopt (argc, argv, optstring)
- XXint argc;
- XXchar *argv[];
- XXchar *optstring;
- XX{
- XX register char c;
- XX register char *place;
-
- XX optarg = NULL;
-
- XX if (scan == NULL || *scan == '\0') {
- XX if (optind == 0)
- XX optind++;
-
- XX if (optind >= argc || argv[optind][0] != '-' || argv[optind][1] == '\0')
- XX return (EOF);
- XX if (strcmp (argv[optind], "--") == 0) {
- XX optind++;
- XX return (EOF);
- XX }
- XX scan = argv[optind] + 1;
- XX optind++;
- XX }
- XX c = *scan++;
- XX place = index (optstring, c);
-
- XX if (place == NULL || c == ':') {
- XX fprintf (stderr, "%s: unknown option -%c\n", argv[0], c);
- XX return ('?');
- XX }
- XX place++;
- XX if (*place == ':') {
- XX if (*scan != '\0') {
- XX optarg = scan;
- XX scan = NULL;
- XX }
- XX else {
- XX optarg = argv[optind];
- XX optind++;
- XX }
- XX }
- XX return (c);
- XX}
- @//E*O*F getopt.c//
- chmod u=rw,g=rw,o=rw getopt.c
-
- echo x - getpaths.1
- sed 's/^XX//' > "getpaths.1" <<'@//E*O*F getpaths.1//'
- XX.TH GETPATHS LOCAL 6/2/86
- XX.SH NAME
- XXgetpaths \- gather Usenet article paths
- XX.SH SYNOPSIS
- XX.B getpaths
- XX[
- XX.BI \-n " file"
- XX] [
- XX.BI \-u
- XX] [
- XX.I distrib
- XX]
- XX.SH DESCRIPTION
- XX.PP
- XX.I Getpaths
- XXscans through the news spool and prints out all article paths. Crossposted
- XXarticles have their paths printed only once.
- XX.PP
- XXBy default, usernames are removed from the paths. If the
- XX.B \-u
- XXflag is given, usernames are left on.
- XX.PP
- XXThe
- XX.BI \-n " file"
- XXoption causes paths to be included only from articles that are newer
- XX(in terms of file modification time) than
- XX.IR file .
- XX.PP
- XXIf no argument is given, the entire news spool will be scanned. The
- XX.I distrib
- XXargument is a relative path from /usr/spool/news. For example,
- XX.PP
- XXgetpaths net
- XX.PP
- XXgets paths from net.*, and
- XX.PP
- XXgetpaths net/micro/atari
- XX.PP
- XXdoes what you think.
- XX.SH BUGS
- XXThe algorithm for eliminating crosspostings only prints a path if
- XXthe first newsgroup on the Newsgroups: line matches the directory currently
- XXbeing scanned. So if you say "getpaths na", an article with the header
- XXline
- XX.PP
- XXNewsgroups: net.wanted,na.forsale
- XX.PP
- XXwill not have its path printed (though it will print if the groups are given
- XXin the reverse order). In other words, the first group is always considered
- XXthe primary group, and other groups are always considered crossposts.
- XX.SH AUTHOR
- XXJoseph T. Buck, Entropic Processing, Inc.
- XX.br
- XX{pesnta,oliveb}!epimass!jbuck
- @//E*O*F getpaths.1//
- chmod u=rw,g=rw,o=rw getpaths.1
-
- echo x - getpaths.c
- sed 's/^XX//' > "getpaths.c" <<'@//E*O*F getpaths.c//'
- XX/*
- XX getpaths.c
- XX Program to snarf up lots of Usenet paths
- XX Joseph T. Buck, Entropic Processing, Inc.
- XX*/
- XX#define SPOOLDIR "/usr/spool/news"
- XX#include <stdio.h>
- XX#define LINLEN 512
- XXchar *strcpy();
-
- XXint stripuser = 1;
-
- XX/* This function assumes the current directory is SPOOLDIR, and that
- XX the paths are of the form net/news/group/2345.
- XX*/
- XXvoid perror(), exit();
-
- XXwp (name, ino)
- XXchar *name;
- XXint ino;
- XX{
- XX int status = 0, lng;
- XX char line[LINLEN], savpath[LINLEN];
- XX FILE *fd;
- XX char *p, *q, *rindex();
- XX char curngr[40]; /* current newsgroup */
-
- XX/* Figure out the current newsgroup, by deleting whatever is
- XX after the last /, and changing '/' to '.'. Skip initial "./"
- XX if present
- XX*/
- XX if (name[0] == '.' && name[1] == '/') name += 2;
- XX (void) strcpy (curngr, name);
- XX p = rindex (curngr, '/');
- XX if (p == NULL) return;
- XX *p = 0;
- XX while (p >= curngr) {
- XX if (*p == '/') *p = '.';
- XX p--;
- XX }
- XX lng = strlen (curngr);
- XX/* Open the file */
- XX if ((fd = fopen (name, "r")) == NULL) {
- XX perror (name);
- XX return;
- XX }
- XX while (status != 3) {
- XX if (fgets (line, LINLEN, fd) == NULL) break;
- XX line[strlen(line)-1] = 0; /* delete \n */
- XX if (strncmp (line, "Newsgroups:", 11) == 0) {
- XX p = line + 11;
- XX while (*p == ' ' || *p == '\t') p++;
- XX/* Detect cases like curngr == "net.micro", NG == "net.micro.pc,net.micro" */
- XX if (p[lng] && p[lng] != ',' && p[lng] != ' ' && p[lng] != '\t')
- XX break;
- XX else if (strncmp (p, curngr, lng) != 0) break;
- XX status |= 2;
- XX }
- XX else if (strncmp (line, "Path:", 5) == 0) {
- XX p = line + 5;
- XX while (*p == ' ' || *p == '\t') p++;
- XX/* Strip the username from the end of the path. Don't write anything
- XX if there's no ! in the path (a locally posted article)
- XX*/
- XX if (stripuser) {
- XX q = rindex (line, '!');
- XX if (q == NULL) break;
- XX *q = '\0'; /* Delete the tail */
- XX }
- XX (void) strcpy (savpath, p);
- XX status |= 1;
- XX }
- XX }
- XX if (status == 3) (void) printf ("%s\n", savpath);
- XX (void) fclose (fd);
- XX return;
- XX}
-
- XXmain (argc, argv)
- XXchar **argv;
- XX{
- XX if (chdir (SPOOLDIR) != 0) {
- XX (void) fprintf (stderr, "Can't chdir to ");
- XX perror ("/usr/spool/news");
- XX exit (1);
- XX }
- XX if (argc > 1 && strcmp (argv[1], "-u") == 0) {
- XX stripuser = 0;
- XX argv++;
- XX argc--;
- XX }
- XX if (argc !=2) argv[1] = ".";
- XX scan_tree (argv[1], wp);
- XX return 0;
- XX}
-
- XX#include <sys/types.h>
- XX#include <sys/stat.h>
-
- XXis_dir (name)
- XXchar *name;
- XX{
- XX struct stat sbuf;
- XX return (stat (name, &sbuf) == 0 && (sbuf.st_mode & S_IFMT) == S_IFDIR);
- XX}
- @//E*O*F getpaths.c//
- chmod u=rw,g=rw,o=rw getpaths.c
-
- echo x - histog
- sed 's/^XX//' > "histog" <<'@//E*O*F histog//'
- XX#! /bin/sh
- XXawk -F! '{ count[NF] += 1; if (NF > mx) mx = NF; sum += NF; }
- XXEND { printf ("Path length statistics\nAverage length: %f\nHistogram:\n",
- XX sum / NR);
- XX for (i = 1; i <= mx; i++) printf ("%d %d\n",i,count[i]); }'
- @//E*O*F histog//
- chmod u=rwx,g=rwx,o=rwx histog
-
- echo x - histog.mod
- sed 's/^XX//' > "histog.mod" <<'@//E*O*F histog.mod//'
- XX#! /bin/sh
- XXawk -F! '{ count[NF] += 1; if (NF > mx) mx = NF; sum += NF; }
- XXEND {
- XX printf "Path length statistics\nAverage length: %f\nHistogram:\n", sum / NR;
- XX for (i = 1; i <= mx; i++) printf ("%d %d\n",i,count[i]);
- XX}'
- @//E*O*F histog.mod//
- chmod u=rwx,g=rwx,o=rwx histog.mod
-
- echo x - hostcount
- sed 's/^XX//' > "hostcount" <<'@//E*O*F hostcount//'
- XX#! /bin/sh
- XX# This script takes a list of paths as input.
- XX# It prints a list of the sites encountered in the paths, sorted
- XX# according to their frequency.
- XXawk -F! '{ for (i = 1; i <= NF; i++) print $i }' $* |\
- XXsort | uniq -c | sort -nr
-
-
-
- @//E*O*F hostcount//
- chmod u=rwx,g=rwx,o=rwx hostcount
-
- echo x - makefile
- sed 's/^XX//' > "makefile" <<'@//E*O*F makefile//'
- XX# This makefile is appropriate for 4.2bsd. For SysV or V7,
- XX# replace "scantree" by "oscantree" everywhere.
- XXCFLAGS = -O
- XX# Use the following for Sys V.
- XX# CFLAGS = -O -Dindex=strchr -Drindex=strrchr
-
- XXgetpaths: getpaths.o scantree.o
- XX cc $(CFLAGS) getpaths.o scantree.o -o getpaths
- @//E*O*F makefile//
- chmod u=rw,g=rw,o=rw makefile
-
- echo x - makefile.42
- sed 's/^XX//' > "makefile.42" <<'@//E*O*F makefile.42//'
- XX# This makefile is appropriate for 4.2bsd or 4.3bsd.
- XXCFLAGS = -O
-
- XXgetpaths: getpaths.o scantree.o getopt.o
- XX cc $(CFLAGS) getpaths.o scantree.o getopt.o -o getpaths
- @//E*O*F makefile.42//
- chmod u=rw,g=rw,o=rw makefile.42
-
- echo x - makefile.sysv
- sed 's/^XX//' > "makefile.sysv" <<'@//E*O*F makefile.sysv//'
- XX# This makefile is appropriate for System III and System V.
- XXCFLAGS = -O -Dindex=strchr -Drindex=strrchr
-
- XXgetpaths: getpaths.o oscantree.o
- XX cc $(CFLAGS) getpaths.o oscantree.o -o getpaths
- @//E*O*F makefile.sysv//
- chmod u=rw,g=rw,o=rw makefile.sysv
-
- echo x - makefile.v7
- sed 's/^XX//' > "makefile.v7" <<'@//E*O*F makefile.v7//'
- XX# This makefile is appropriate for Version 7, or any system that
- XX# has neither getopt nor the 4.2 directory library.
-
- XXCFLAGS = -O
-
- XXgetpaths: getpaths.o oscantree.o
- XX cc $(CFLAGS) getpaths.o oscantree.o -o getpaths
- @//E*O*F makefile.v7//
- chmod u=rw,g=rw,o=rw makefile.v7
-
- echo x - oscantree.c
- sed 's/^XX//' > "oscantree.c" <<'@//E*O*F oscantree.c//'
- XX/*
- XX scan_tree: Joseph T. Buck, Entropic Processing, Inc.
- XX Old-style version
-
- XX scan_tree applies a function to each file in a directory tree
- XX (excluding directories), and returns the number of files processed.
- XX This version reads directories directly and will not work with 4.2bsd.
- XX*/
- XX#include <sys/types.h>
- XX#include <sys/dir.h>
- XX#define MAXL 256
-
-
- XXscan_tree (path, func)
- XXchar *path;
- XXint (*func)();
- XX{
- XX char name[MAXL], *strcpy(), *strncpy(), *pos;
- XX struct direct dirent;
- XX int fd = open (path), nf = 0, len;
-
- XX if (fd < 0) return 0;
- XX
- XX (void) strcpy (name, path);
- XX len = strlen (path);
- XX name[len] = '/';
- XX pos = name + len + 1;
- XX pos[DIRSIZ] = '\0'; /* req'd in case dirent has length DIRSIZ */
- XX while (read (fd, (char *) &dirent, sizeof dirent) == sizeof dirent) {
- XX if (dirent.d_ino == 0) continue;
- XX (void) strncpy (pos, dirent.d_name, DIRSIZ);
- XX if (strcmp (pos, ".") == 0 || strcmp (pos, "..") == 0) continue;
- XX if (is_dir (name)) nf += scan_tree (name, func);
- XX else {
- XX (func) (name, dirent.d_ino);
- XX nf += 1;
- XX }
- XX }
- XX (void) close (fd);
- XX return nf;
- XX}
- @//E*O*F oscantree.c//
- chmod u=rw,g=rw,o=rw oscantree.c
-
- echo x - scantree.c
- sed 's/^XX//' > "scantree.c" <<'@//E*O*F scantree.c//'
- XX/*
- XX scan_tree: Joseph T. Buck, Entropic Processing, Inc.
- XX New-style version
-
- XX scan_tree applies a function to each file in a directory tree
- XX (excluding directories), and returns the number of files processed.
- XX This version uses the 4.2bsd directory access library; another version
- XX that reads directories directly is available.
- XX*/
- XX#include <sys/types.h>
- XX#include <sys/dir.h>
- XX#define MAXL 256
- XXscan_tree (path, func)
- XXchar *path;
- XXint (*func)();
- XX{
- XX char name[MAXL], *strcpy(), *pos;
- XX DIR *dirp = opendir (path);
- XX struct direct *de;
- XX int nf = 0, len;
-
- XX if (dirp == NULL) return 0;
- XX (void) strcpy (name, path);
- XX len = strlen (path);
- XX name[len] = '/';
- XX pos = name + len + 1;
- XX while ((de = readdir (dirp)) != NULL) {
- XX if (strcmp (de->d_name, ".") == 0 || strcmp (de->d_name, "..") == 0)
- XX continue;
- XX (void) strcpy (pos, de->d_name);
- XX if (is_dir (name)) nf += scan_tree (name, func);
- XX else {
- XX (func) (name,de->d_ino);
- XX nf += 1;
- XX }
- XX }
- XX (void) closedir (dirp);
- XX return nf;
- XX}
- @//E*O*F scantree.c//
- chmod u=rw,g=rw,o=rw scantree.c
-
- echo x - threelink
- sed 's/^XX//' > "threelink" <<'@//E*O*F threelink//'
- XX#! /bin/sh
- XX# This script takes a list of paths as input.
- XXcat $* |\
- XXawk -F! '{ for (i=1; i<NF-1; i++) printf ("%s!%s!%s\n", $i,$(i+1),$(i+2)); }' |\
- XXsort | uniq -c | sort -nr
- @//E*O*F threelink//
- chmod u=rwx,g=rwx,o=rwx threelink
-
- echo x - twolink
- sed 's/^XX//' > "twolink" <<'@//E*O*F twolink//'
- XX#! /bin/sh
- XX# This script takes a list of paths as input.
- XXcat $* |\
- XXawk -F! '{ for (i = 1; i < NF; i++) printf ("%s!%s\n", $i, $(i+1)); }' |\
- XXsort | uniq -c | sort -nr
-
-
-
- @//E*O*F twolink//
- chmod u=rwx,g=rwx,o=rwx twolink
-
- exit 0
- /* End of text from mirror:mod.sources */
-