home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.shell
- Path: sparky!uunet!paladin.american.edu!darwin.sura.net!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!caen!sol.ctr.columbia.edu!eff!news.oc.com!convex!tchrist
- From: Tom Christiansen <tchrist@convex.COM>
- Subject: Re: ksh 1, perl 2 - ksh or perl for scripting?
- Originator: tchrist@pixel.convex.com
- Sender: usenet@news.eng.convex.com (news access account)
- Message-ID: <1992Dec21.174403.26053@news.eng.convex.com>
- Date: Mon, 21 Dec 1992 17:44:03 GMT
- Reply-To: tchrist@convex.COM (Tom Christiansen)
- References: <ASH.92Dec21095237@ulysses.mr.ams.com>
- Nntp-Posting-Host: pixel.convex.com
- Organization: Convex Computer Corporation, Colorado Springs, CO
- X-Disclaimer: This message was written by a user at CONVEX Computer
- Corp. The opinions expressed are those of the user and
- not necessarily those of CONVEX.
- Lines: 186
-
- From the keyboard of ash@ulysses.mr.ams.com (Alan Harder):
- :Hi, all. We are currently trying to decide if we should move from
- :/bin/sh as our language for production scripting to ksh, or if we
- :should move to perl instead. Is anyone out there using perl as their
- :production scripting language of choice?
-
- Certainly. Many shops use perl for various purposes, including
- install scripts, test drivers, database interfaces, menu systems,
- and sysadmin tools.
-
- [The following text contains pieces of semi-canned prose which some of
- you may have seen before.]
-
- What you're really asking is whether you should use perl or ksh for your
- scripting. The problem with that question is that not all problems require
- the same solution. For simple command-oriented tasks, a shell script
- works just fine. It's faster to write, and sometimes faster to execute.
- The built-in test, expr, and other features of ksh will give you a
- performance win in this area.
-
- Perl and ksh do not address precisely the same problem set. There is some
- overlap, but in general, ksh is an interactive shell and command
- language, whereas perl is not a shell but much more the general purpose
- programming language. Perl fills the gap between sh and C, and indeed
- extends into those languages' problem domains as well Ksh does not.
-
- Shell programming is inherently cumbersome at expressing certain kinds of
- algorithms. Most of us have written, or at least seen, shell scripts from
- hell. While often touted as one of UNIX's strengths because they're
- conglomerations of small, single-purpose tools, these shell scripts
- quickly grow complex that they're cumbersome and hard to understand,
- modify and maintain. After a certain point of complexity, the strength of
- the UNIX philosophy of having many programs that each does one thing well
- becomes its weakness.
-
- The big problem with piping tools together is that there is only one
- pipe. This means that several different data streams have to get
- multiplexed into a single data stream, then demuxed on the other end of
- the pipe. This wastes processor time as well as human brain power.
-
- For example, you might be shuffling through a pipe a list of filenames,
- but you also want to indicate that certain files have a particular
- attribute, and others don't. (E.g., certain files are more than ten
- days old.) Typically, this information is encoded in the data stream
- by appending or prepending some special marker string to the filename.
- This means that both the pipe feeder and the pipe reader need to know
- about it. Not a pretty sight.
-
- Because perl is one program rather than a dozen others (sh, awk, sed, tr,
- wc, sort, grep, ...), it is usually clearer to express yourself in perl
- than in sh and allies, and often more efficient as well. You don't need
- as many pipes, temporary files, or separate processes to do the job. You
- don't need to go shoving your data stream out to tr and back and to sed
- and back and to awk and back and to sort back and then back to sed and
- back again. Doing so can often be slow, awkward, and/or confusing.
-
- Anyone who's ever tried to pass command line arguments into a sed script
- of moderate complexity or above can attest to the fact that getting the
- quoting right is not a pleasant task. In fact, quoting in general in the
- shell is just not a pleasant thing to code or to read.
-
- In a heterogeneous computing environment, the available versions of many
- tools varies too much from one system to the next to be utterly reliable.
- Does your sh understand functions on all your machines? What about your
- awk? What about local variables? It is very difficult to do complex
- programming without being able to break a problem up into subproblems of
- lesser complexity. You're forced to resort to using the shell to call
- other shell scripts and allow UNIX's power of spawning processes serve as
- your subroutine mechanism, which is inefficient at best. That means your
- script will require several separate scripts to run, and getting all these
- installed, working, and maintained on all the different machines in your
- local configuration is painful. With perl, all you need do is get
- it installed on the system -- which is really pretty easy thanks to
- Larry's Configure program -- and after that you're home free.
-
- Shell scripts can seldom hope to approach a perl program's speed.
- In fact, perl programs are often faster than a C program, at least, one
- that hasn't been highly tuned. In general, if you have a perl and C expert
- working on the same problem, you'll get the perl code to 2 to 3 times
- the C code's speed, although for some problems, it's much better than that.
- The next release of perl will also be substantially faster (current figures
- indicate that 25% faster is not unlikely) than the current one.
-
- Besides being faster, perl is a more powerful tool than sh, sed, or awk.
- I realize these are fighting words in some camps, but so be it. There
- exists a substantial niche between shell programming and C programming
- that perl conveniently fills. Tasks of this nature seem to arise with
- extreme frequency in the realm of systems administration. Since a system
- administrator almost invariably has far too much to do to devote a week to
- coding up every task before him in C, perl is especially useful for him.
- Larry Wall, perl's author, has been known to call it "a shell for C
- programmers." I like to think of it as a "BASIC for UNIX." I realize
- that this carries both good and bad connotations. So be it.
-
- In what ways is perl more powerful than the individual tools? This list
- is pretty long, so what follows is not necessarily an exhaustive list.
- To begin with, you don't have to worry about arbitrary and annoying
- restrictions on string length, input line length, or number of elements in
- an array. These are all virtually unlimited, i.e. limited to your
- system's address space and virtual memory size.
-
- Perl's regular expression handling is far and above the best I've ever
- seen. For one thing, you don't have to remember which tool wants which
- particular flavor of regular expressions, or lament that fact that one
- tool doesn't allow (..|..) constructs or +'s \b's or whatever. With
- perl, it's all the same, and as far as I can tell, a proper superset of
- all the others.
-
- Perl has a fully functional symbolic debugger (written, of course, in
- perl) that is an indispensable aid in debugging complex programs. Neither
- the shell nor sed/awk/sort/tr/... have such a thing. There've been folks
- who've switched over to doing all their major production scripting in Perl
- just so that they have access to a real debugger.
-
- Perl has a loop control mechanism that's more powerful even than C's. You
- can do the equivalent of a break or continue (last and next in perl) of
- any arbitrary loop, not merely the nearest enclosing one. You can even do
- a kind of continue that doesn't trigger the re-initialization part of a
- loop, something you do from time to time want to do.
-
- Perl's data-types and operators are richer than the shells' or awk's,
- because you have scalars, numerically-indexed arrays (lists), and
- string-indexed (hashed) arrays. Each of these holds arbitrary data
- values, including floating point numbers, for which mathematic built-in
- subroutines and power operators are available. In can handle
- binary data of arbitrary size.
-
- Speaking of lisp, you can generate strings, perhaps with sprintf(), and
- then eval them. That way you can generate code on the fly. You can even
- do lambda-type functions that return newly-created functions that you can
- call later. The scoping of variables is dynamic, fully recursive subroutines
- are supported, and you can pass or return any type of data into or out
- of your subroutines.
-
- You have a built-in automatic formatter for generating pretty-printed
- forms with automatic pagination and headers and center-justified and
- text-filled fields like "%(|fmt)s" if you can imagine what that would
- actually be were it legal.
-
- There's a mechanism for writing suid programs that can be made more secure
- than even C programs thanks to an elaborate data-tracing mechanism that
- understands the "taintedness" of data derived from external sources. It
- won't let you do anything really stupid that you might not have thought of.
-
- You have access to just about any system-related function or system call,
- like ioctl's, fcntl, select, pipe and fork, getc, socket and bind and
- connect and attach, and indirect syscall() invocation, as well as things
- like getpwuid(), gethostbyname(), etc. You can read in binary data laid
- out by a C program or system call using structure-conversion templates.
-
- At the same time you can get at the high-level shell-type operations like
- the -r or -w tests on files or `backquote` command interpolation. You can
- do file-globbing with the <*.[ch]> notation or do low-level readdir()s as
- suits your fancy.
-
- Dbm files can be accessed using simple array notation. This is really
- nice for dealing with system databases (aliases, news, ...), efficient
- access mechanisms over large data-sets, and for keeping persistent data.
-
- Perl is extensible, and with the next release, will be embeddable.
- People link it with their own libraries for accessing curses, database
- access routines, network management routines, or X graphics libraries.
- Perl's namespace is more flexible than C's, having a sort of package
- notation for public and private data and code as well as
- module-initialization routines.
-
- Don't be dismayed by the apparent complexity of what I've just discussed.
- Perl is actually very easy to learn because so much of it derives from
- existing tools. It's like interpreter C with sh, sed, awk, and a lot
- more built in to it. There's a very considerable quantity of code out
- there already written in perl, including libraries to handle things
- you don't feel like reimplementing.
-
- Don't give up your shell programming. You'll want it for writing
- makefiles and making shell callouts in perl, if for no other reason. :-)
- My personal rule of thoumb is usually that if I can do the task in under a
- dozen or so lines of shell code that is straightforward and clean and not
- too slow, then I use the Bourne shell, otherwise I use perl, except for
- the occasional task for which C is very clearly the optimal solution.
-
- --tom
- --
- Tom Christiansen tchrist@convex.com convex!tchrist
-
-
- Emacs is a fine operating system, but I still prefer UNIX. -me
-