home *** CD-ROM | disk | FTP | other *** search
- From: lwall@jato.Jpl.Nasa.Gov (Larry Wall)
- Newsgroups: alt.sources
- Subject: Re: Casefix - "Fix" all upper/lower case messages
- Message-ID: <3083@jato.Jpl.Nasa.Gov>
- Date: 13 Mar 90 20:36:46 GMT
-
- In article <6392@orca.wv.tek.com> jeff@quark.WV.TEK.COM (Jeff Beadles) writes:
- : HAVE YOU BEEN SHOUTED AT ONCE TOO OFTEN IN ALL UPPER CASE?
- :
- : or, do you hate it when you get messages that are in all lower case?
- :
- : If so, then casefix might be the next best thing to user education.
- :
- : I tossed this together one morning, when I had received yet another file that
- : was in all upper case. It works as a filter, so you can do things like
- : (from rn)
- :
- : End of article 123 (of 123)--what next? [npq] | ~/.bin/casefix | more
- :
- :
- : Have fun! I'll send it to comp.sources.misc in a few weeks. (Time to get any
- : remaining kinks out.)
-
- Here's something that does proper nouns too. Think of it as complementary
- to the casefix program, since this trades a little more startup time (and
- running time (and memory)) for the proper nouns. It was written for Peter Yee
- a while back to translate Nasa Headline News, which comes out in upper-case
- only, and which he was (bless his heart) translating by hand every time.
-
- So the exception list that comes with it is for Nasa stuff, but you could
- make it do whatever you want. Any proper nouns in /usr/dict/words, and
- any acronyms containing no vowels, are automatically capitalized without
- having to be in the exception list.
-
- Requires perl 3.0, patchlevel 12.
-
- Larry Wall
- lwall@jpl-devvax.jpl.nasa.gov
-
- #!/bin/sh
- : make a subdirectory, cd to it, and run this through sh.
- echo 'If this kit is complete, "End of kit" will echo at the end'
- echo Extracting unuc
- sed >unuc <<'!STUFFY!FUNK!' -e 's/X//'
- X#!/usr/bin/perl
- X
- Xprint STDERR "Loading proper nouns...\n";
- Xopen(DICT,"/usr/dict/words") || die "Can't find /usr/dict/words: $!\n";
- Xwhile (<DICT>) {
- X if (/^[A-Z]/) {
- X chop;
- X ($lower = $_) =~ y/A-Z/a-z/;
- X $proper{$lower} = $_;
- X }
- X}
- Xclose DICT;
- Xprint STDERR "Loading exceptions...\n";
- X
- Xopen(PATS,"unuc.pats") || die "Can't find unuc.pats: $!\n";
- X
- X$prog = <<'EOT';
- Xwhile (<>) {
- X next if /[a-z]/;
- X y/A-Z/a-z/;
- X s/(\w+)/$proper{$1} ? $proper{$1} : $1/eg;
- X s/^(\s*)([a-z])/$1 . (($tmp = $2) =~ y:a-z:A-Z:,$tmp)/e;
- X s/([-.?!]["']?(\n\s*| \s*)["']?)([a-z])/$1 . (($tmp = $3) =~ y:a-z:A-Z:,$tmp)/eg;
- X s/\b([b-df-hj-np-tv-xz]+)\b/(($tmp = $1) =~ y:a-z:A-Z:,$tmp)/eg;
- X s/([a-z])'([SDT])\b/$1 . "'" . (($tmp = $2) =~ y:A-Z:a-z:,$tmp)/eg;
- XEOT
- Xwhile (<PATS>) {
- X chop;
- X next if /^$/;
- X next if /^#/;
- X if (! /;$/) {
- X $foo = $_;
- X $foo =~ y/A-Z/a-z/;
- X print STDERR "Dup $_\n" if $proper{$foo};
- X $foo =~ s/([^\w ])/\\$1/g;
- X $foo =~ s/ /(\\s+)/g;
- X $foo = "\\b" . $foo if $foo =~ /^\w/;
- X $foo .= "\\b" if $foo =~ /\w$/;
- X $i = 0;
- X ($bar = $_) =~ s/ /'$' . ++$i/eg;
- X $_ = "s/$foo/$bar/gi;";
- X }
- X $prog .= ' ' . $_ . "\n";
- X}
- Xclose PATS;
- X$prog .= "}\ncontinue {\n print;\n}\n";
- X
- X$/ = '';
- X#print $prog;
- Xeval $prog; die $@ if $@;
- !STUFFY!FUNK!
- echo Extracting unuc.pats
- sed >unuc.pats <<'!STUFFY!FUNK!' -e 's/X//'
- XA.M.
- XAir Force
- XAir Force Base
- XAir Force Station
- XAmerican
- XApr.
- XAriane
- XAug.
- XAugust
- XBureau of Labor Statistics
- XCIT
- XCaltech
- XCape Canaveral
- XChallenger
- XChina
- XCorporation
- XCrippen
- XDaily News in Brief
- XDaniel Quayle
- XDec.
- XDiscovery
- XEdwards
- XEndeavour
- XFeb.
- XFord Aerospace
- XFri.
- XGeneral Dynamics
- XGeorge Bush
- XHeadline News
- XHOTOL
- XI
- XII
- XIII
- XIV
- XIX
- XInstitute of Technology
- XJPL
- XJan.
- XJul.
- XJun.
- XKennedy Space Center
- XLDEF
- XLong Duration Exposure Facility
- XLong March
- XMar.
- XMarch
- XMartin
- XMartin Marietta
- XMercury
- XMon.
- Xin May
- Xs/\bmay (\d)/May $1/g;
- Xs/\boffice of (\w)/'Office of ' . (($tmp = $1) =~ y:a-z:A-Z:,$tmp)/eg;
- XNational Science Foundation
- XNASA Select
- XNew Mexico
- XNov.
- XOMB
- XOct.
- XOffice of Management and Budget
- XPresident
- XPresident Bush
- XRichard Truly
- XRocketdyne
- XRussian
- XRussians
- XSat.
- XSep.
- XSoviet
- XSoviet Union
- XSoviets
- XSpace Shuttle
- XSun.
- XThu.
- XTue.
- XU.S.
- XUnion of Soviet Socialist Republics
- XUnited States
- XVI
- XVII
- XVIII
- XVice President
- XVice President Quayle
- XWed.
- XWhite Sands
- XKaman Aerospace
- XAerospace Daily
- XAviation Week
- XSpace Technology
- XWashington Post
- XLos Angeles Times
- XNew York Times
- XAerospace Industries Association
- Xpresident of
- XJohnson Space Center
- XSpace Services
- XInc.
- XCo.
- XHughes Aircraft
- XCompany
- XOrbital Sciences
- XSwedish Space
- XArnauld
- XNicogosian
- XMagellan
- XGalileo
- XMir
- XJet Propulsion Laboratory
- XUniversity
- XDepartment of Defense
- XOrbital Science
- XOMS
- XUnited Press International
- XUnited Press
- XUPI
- XAssociated Press
- XAP
- XCable News Network
- XCape York
- XZenit
- XSYNCOM
- XEastern
- XWestern
- XTest Range
- XJcsat
- XJapanese Satellite Communications
- XDefence Ministry
- XDefense Ministry
- XSkynet
- XFixed Service Structure
- XLaunch Processing System
- XAsiasat
- XLaunch Control Center
- XEarth
- XCNES
- XGlavkosmos
- XPacific
- XAtlantic
- !STUFFY!FUNK!
- echo ""
- echo "End of kit"
- : I do not append .signature, but someone might mail this.
- exit
-