home *** CD-ROM | disk | FTP | other *** search
-
- =head1 NAME
-
- perlreftut - Mark's very short tutorial about references
-
- =head1 DESCRIPTION
-
- One of the most important new features in Perl 5 was the capability to
- manage complicated data structures like multidimensional arrays and
- nested hashes. To enable these, Perl 5 introduced a feature called
- `references', and using references is the key to managing complicated,
- structured data in Perl. Unfortunately, there's a lot of funny syntax
- to learn, and the main manual page can be hard to follow. The manual
- is quite complete, and sometimes people find that a problem, because
- it can be hard to tell what is important and what isn't.
-
- Fortunately, you only need to know 10% of what's in the main page to get
- 90% of the benefit. This page will show you that 10%.
-
- =head1 Who Needs Complicated Data Structures?
-
- One problem that came up all the time in Perl 4 was how to represent a
- hash whose values were lists. Perl 4 had hashes, of course, but the
- values had to be scalars; they couldn't be lists.
-
- Why would you want a hash of lists? Let's take a simple example: You
- have a file of city and country names, like this:
-
- Chicago, USA
- Frankfurt, Germany
- Berlin, Germany
- Washington, USA
- Helsinki, Finland
- New York, USA
-
- and you want to produce an output like this, with each country mentioned
- once, and then an alphabetical list of the cities in that country:
-
- Finland: Helsinki.
- Germany: Berlin, Frankfurt.
- USA: Chicago, New York, Washington.
-
- The natural way to do this is to have a hash whose keys are country
- names. Associated with each country name key is a list of the cities in
- that country. Each time you read a line of input, split it into a country
- and a city, look up the list of cities already known to be in that
- country, and append the new city to the list. When you're done reading
- the input, iterate over the hash as usual, sorting each list of cities
- before you print it out.
-
- If hash values can't be lists, you lose. In Perl 4, hash values can't
- be lists; they can only be strings. You lose. You'd probably have to
- combine all the cities into a single string somehow, and then when
- time came to write the output, you'd have to break the string into a
- list, sort the list, and turn it back into a string. This is messy
- and error-prone. And it's frustrating, because Perl already has
- perfectly good lists that would solve the problem if only you could
- use them.
-
- =head1 The Solution
-
- By the time Perl 5 rolled around, we were already stuck with this
- design: Hash values must be scalars. The solution to this is
- references.
-
- A reference is a scalar value that I<refers to> an entire array or an
- entire hash (or to just about anything else). Names are one kind of
- reference that you're already familiar with. Think of the President:
- a messy, inconvenient bag of blood and bones. But to talk about him,
- or to represent him in a computer program, all you need is the easy,
- convenient scalar string "Bill Clinton".
-
- References in Perl are like names for arrays and hashes. They're
- Perl's private, internal names, so you can be sure they're
- unambiguous. Unlike "Bill Clinton", a reference only refers to one
- thing, and you always know what it refers to. If you have a reference
- to an array, you can recover the entire array from it. If you have a
- reference to a hash, you can recover the entire hash. But the
- reference is still an easy, compact scalar value.
-
- You can't have a hash whose values are arrays; hash values can only be
- scalars. We're stuck with that. But a single reference can refer to
- an entire array, and references are scalars, so you can have a hash of
- references to arrays, and it'll act a lot like a hash of arrays, and
- it'll be just as useful as a hash of arrays.
-
- We'll come back to this city-country problem later, after we've seen
- some syntax for managing references.
-
-
- =head1 Syntax
-
- There are just two ways to make a reference, and just two ways to use
- it once you have it.
-
- =head2 Making References
-
- B<Make Rule 1>
-
- If you put a C<\> in front of a variable, you get a
- reference to that variable.
-
- $aref = \@array; # $aref now holds a reference to @array
- $href = \%hash; # $href now holds a reference to %hash
-
- Once the reference is stored in a variable like $aref or $href, you
- can copy it or store it just the same as any other scalar value:
-
- $xy = $aref; # $xy now holds a reference to @array
- $p[3] = $href; # $p[3] now holds a reference to %hash
- $z = $p[3]; # $z now holds a reference to %hash
-
-
- These examples show how to make references to variables with names.
- Sometimes you want to make an array or a hash that doesn't have a
- name. This is analogous to the way you like to be able to use the
- string C<"\n"> or the number 80 without having to store it in a named
- variable first.
-
- B<Make Rule 2>
-
- C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
- that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
- reference to that hash.
-
- $aref = [ 1, "foo", undef, 13 ];
- # $aref now holds a reference to an array
-
- $href = { APR => 4, AUG => 8 };
- # $href now holds a reference to a hash
-
-
- The references you get from rule 2 are the same kind of
- references that you get from rule 1:
-
- # This:
- $aref = [ 1, 2, 3 ];
-
- # Does the same as this:
- @array = (1, 2, 3);
- $aref = \@array;
-
-
- The first line is an abbreviation for the following two lines, except
- that it doesn't create the superfluous array variable C<@array>.
-
-
- =head2 Using References
-
- What can you do with a reference once you have it? It's a scalar
- value, and we've seen that you can store it as a scalar and get it back
- again just like any scalar. There are just two more ways to use it:
-
- B<Use Rule 1>
-
- If C<$aref> contains a reference to an array, then you
- can put C<{$aref}> anywhere you would normally put the name of an
- array. For example, C<@{$aref}> instead of C<@array>.
-
- Here are some examples of that:
-
- Arrays:
-
-
- @a @{$aref} An array
- reverse @a reverse @{$aref} Reverse the array
- $a[3] ${$aref}[3] An element of the array
- $a[3] = 17; ${$aref}[3] = 17 Assigning an element
-
-
- On each line are two expressions that do the same thing. The
- left-hand versions operate on the array C<@a>, and the right-hand
- versions operate on the array that is referred to by C<$aref>, but
- once they find the array they're operating on, they do the same things
- to the arrays.
-
- Using a hash reference is I<exactly> the same:
-
- %h %{$href} A hash
- keys %h keys %{$href} Get the keys from the hash
- $h{'red'} ${$href}{'red'} An element of the hash
- $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
-
-
- B<Use Rule 2>
-
- C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >>
- instead.
-
- C<${$href}{red}> is too hard to read, so you can write
- C<< $href->{red} >> instead.
-
- Most often, when you have an array or a hash, you want to get or set a
- single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have
- too much punctuation, and Perl lets you abbreviate.
-
- If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is
- the fourth element of the array. Don't confuse this with C<$aref[3]>,
- which is the fourth element of a totally different array, one
- deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the
- same way that C<$item> and C<@item> are.
-
- Similarly, C<< $href->{'red'} >> is part of the hash referred to by
- the scalar variable C<$href>, perhaps even one with no name.
- C<$href{'red'}> is part of the deceptively named C<%href> hash. It's
- easy to forget to leave out the C<< -> >>, and if you do, you'll get
- bizarre results when your program gets array and hash elements out of
- totally unexpected hashes and arrays that weren't the ones you wanted
- to use.
-
-
- =head1 An Example
-
- Let's see a quick example of how all this is useful.
-
- First, remember that C<[1, 2, 3]> makes an anonymous array containing
- C<(1, 2, 3)>, and gives you a reference to that array.
-
- Now think about
-
- @a = ( [1, 2, 3],
- [4, 5, 6],
- [7, 8, 9]
- );
-
- @a is an array with three elements, and each one is a reference to
- another array.
-
- C<$a[1]> is one of these references. It refers to an array, the array
- containing C<(4, 5, 6)>, and because it is a reference to an array,
- B<USE RULE 2> says that we can write C<< $a[1]->[2] >> to get the
- third element from that array. C<< $a[1]->[2] >> is the 6.
- Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a
- two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get
- or set the element in any row and any column of the array.
-
- The notation still looks a little cumbersome, so there's one more
- abbreviation:
-
- =head1 Arrow Rule
-
- In between two B<subscripts>, the arrow is optional.
-
- Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the
- same thing. Instead of C<< $a[0]->[1] >>, we can write C<$a[0][1]>;
- it means the same thing.
-
- Now it really looks like two-dimensional arrays!
-
- You can see why the arrows are important. Without them, we would have
- had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For
- three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
- the unreadable C<${${$x[2]}[3]}[5]>.
-
-
- =head1 Solution
-
- Here's the answer to the problem I posed earlier, of reformatting a
- file of city and country names.
-
- 1 while (<>) {
- 2 chomp;
- 3 my ($city, $country) = split /, /;
- 4 push @{$table{$country}}, $city;
- 5 }
- 6
- 7 foreach $country (sort keys %table) {
- 8 print "$country: ";
- 9 my @cities = @{$table{$country}};
- 10 print join ', ', sort @cities;
- 11 print ".\n";
- 12 }
-
-
- The program has two pieces: Lines 1--5 read the input and build a
- data structure, and lines 7--12 analyze the data and print out the
- report.
-
- In the first part, line 4 is the important one. We're going to have a
- hash, C<%table>, whose keys are country names, and whose values are
- (references to) arrays of city names. After acquiring a city and
- country name, the program looks up C<$table{$country}>, which holds (a
- reference to) the list of cities seen in that country so far. Line 4 is
- totally analogous to
-
- push @array, $city;
-
- except that the name C<array> has been replaced by the reference
- C<{$table{$country}}>. The C<push> adds a city name to the end of the
- referred-to array.
-
- In the second part, line 9 is the important one. Again,
- C<$table{$country}> is (a reference to) the list of cities in the country, so
- we can recover the original list, and copy it into the array C<@cities>,
- by using C<@{$table{$country}}>. Line 9 is totally analogous to
-
- @cities = @array;
-
- except that the name C<array> has been replaced by the reference
- C<{$table{$country}}>. The C<@> tells Perl to get the entire array.
-
- The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>,
- C<print>, and doesn't involve references at all.
-
- There's one fine point I skipped. Suppose the program has just read
- the first line in its input that happens to mention Greece.
- Control is at line 4, C<$country> is C<'Greece'>, and C<$city> is
- C<'Athens'>. Since this is the first city in Greece,
- C<$table{$country}> is undefined---in fact there isn't an C<'Greece'> key
- in C<%table> at all. What does line 4 do here?
-
- 4 push @{$table{$country}}, $city;
-
-
- This is Perl, so it does the exact right thing. It sees that you want
- to push C<Athens> onto an array that doesn't exist, so it helpfully
- makes a new, empty, anonymous array for you, installs it in the table,
- and then pushes C<Athens> onto it. This is called `autovivification'.
-
-
- =head1 The Rest
-
- I promised to give you 90% of the benefit with 10% of the details, and
- that means I left out 90% of the details. Now that you have an
- overview of the important parts, it should be easier to read the
- L<perlref> manual page, which discusses 100% of the details.
-
- Some of the highlights of L<perlref>:
-
- =over 4
-
- =item *
-
- You can make references to anything, including scalars, functions, and
- other references.
-
- =item *
-
- In B<USE RULE 1>, you can omit the curly brackets whenever the thing
- inside them is an atomic scalar variable like C<$aref>. For example,
- C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
- C<${$aref}[1]>. If you're just starting out, you may want to adopt
- the habit of always including the curly brackets.
-
- =item *
-
- To see if a variable contains a reference, use the `ref' function.
- It returns true if its argument is a reference. Actually it's a
- little better than that: It returns HASH for hash references and
- ARRAY for array references.
-
- =item *
-
- If you try to use a reference like a string, you get strings like
-
- ARRAY(0x80f5dec) or HASH(0x826afc0)
-
- If you ever see a string that looks like this, you'll know you
- printed out a reference by mistake.
-
- A side effect of this representation is that you can use C<eq> to see
- if two references refer to the same thing. (But you should usually use
- C<==> instead because it's much faster.)
-
- =item *
-
- You can use a string as if it were a reference. If you use the string
- C<"foo"> as an array reference, it's taken to be a reference to the
- array C<@foo>. This is called a I<soft reference> or I<symbolic reference>.
-
- =back
-
- You might prefer to go on to L<perllol> instead of L<perlref>; it
- discusses lists of lists and multidimensional arrays in detail. After
- that, you should move on to L<perldsc>; it's a Data Structure Cookbook
- that shows recipes for using and printing out arrays of hashes, hashes
- of arrays, and other kinds of data.
-
- =head1 Summary
-
- Everyone needs compound data structures, and in Perl the way you get
- them is with references. There are four important rules for managing
- references: Two for making references and two for using them. Once
- you know these rules you can do most of the important things you need
- to do with references.
-
- =head1 Credits
-
- Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref@plover.com>)
-
- This article originally appeared in I<The Perl Journal>
- (http://tpj.com) volume 3, #2. Reprinted with permission.
-
- The original title was I<Understand References Today>.
-
- =head2 Distribution Conditions
-
- Copyright 1998 The Perl Journal.
-
- When included as part of the Standard Version of Perl, or as part of
- its complete documentation whether printed or otherwise, this work may
- be distributed only under the terms of Perl's Artistic License. Any
- distribution of this file or derivatives thereof outside of that
- package require that special arrangements be made with copyright
- holder.
-
- Irrespective of its distribution, all code examples in these files are
- hereby placed into the public domain. You are permitted and
- encouraged to use this code in your own programs for fun or for profit
- as you see fit. A simple comment in the code giving credit would be
- courteous but is not required.
-
-
-
-
- =cut
-