Chapter 19. Regular Expressions

To fully utilize the power of shell scripting, you need to master Regular Expressions. Certain commands and utilities commonly used in scripts, such as expr, sed and awk interpret and use REs.

19.1. A Brief Introduction to Regular Expressions

An expression is a string of characters. Those characters that have an interpretation above and beyond their literal meaning are called metacharacters. A quote symbol, for example, may denote speech by a person, ditto, or a meta-meaning for the symbols that follow. Regular Expressions are sets of characters and/or metacharacters that UNIX endows with special features. [1]

The main uses for Regular Expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters (a substring or an entire string).

Sed, awk, and Perl, used as filters in scripts, take REs as arguments when "sifting" or transforming files or I/O streams. See Example A-7 and Example A-12 for illustrations of this.

"Sed & Awk", by Dougherty and Robbins gives a very complete and lucid treatment of REs (see the Bibliography).

Notes

[1]

The simplest type of Regular Expression is a character string that retains its literal meaning, not containing any metacharacters.

[2]

Since sed, awk, and grep process single lines, there will usually not be a newline to match. In those cases where there is a newline in a multiple line expression, the dot will match the newline.
   1 #!/bin/bash
   2 
   3 sed -e 'N;s/.*/[&]/' << EOF   # Here Document
   4 line1
   5 line2
   6 EOF
   7 # OUTPUT:
   8 # [line1
   9 # line2]
  10 
  11 
  12 
  13 echo
  14 
  15 awk '{ $0=$1 "\n" $2; if (/line.1/) {print}}' << EOF
  16 line 1
  17 line 2
  18 EOF
  19 # OUTPUT:
  20 # line
  21 # 1
  22 
  23 
  24 # Thanks, S.C.
  25 
  26 exit 0