One reason that C is so popular is that it's very effective for performing low-level manipulation on individual bits, single bytes, and other simple memory structures. There are few, if any, low-level operations that you can't perform more easily in C than in an assembler.
As the successor to C, C++ contains this low-level programming
power as well. However, if you're relatively new to C++
programming, you may not realize how powerful some of these C-style
features are. In this article, we'll show you some handy
coding techniques that take advantage of unions, simple
data structures that specify how two or more variables can occupy
the same memory space.
We've discussed anonymous unions before, but in case you're
not familiar with them, anonymous unions are simply unions
that don't have type identifiers, instances, or instance
pointers associated with them. For example, the declaration
union { int member1; long int member2; char* member3; };
specifies that member1, member2, and member3
are all visible in the current scope, just as if you'd
declared
int member1; long int member2; char* member3;
as three independent variables.
However, when you declare these variables as part of an anonymous
union, they all refer to the same memory address. If you modify
member2 and then read the member1 value, the
compiler will reinterpret those bytes as an int value
instead of as a long int value.
According to the current C/C++ syntax rules, you can initialize
a union by its first member only. For example, to initialize a
union of the following type
union IntToChar { int IntValue; char charValue[2]; }
you'd write
IntToChar newUnion = { 456 };
You can't use
IntToChar newUnion = { "a" };
unless you reverse the order of the members in the IntToChar union.
If you want to initialize the union using a member other than
the first, you must create a constructor for the union that accepts
a parameter of the same type as the other member. For example,
to initialize the IntToChar union we just mentioned,
you could write
union IntToChar { int IntValue; char charValue[2]; IntToChar(char* charP) { charValue[0] = charP[0]; charValue[1] = charP[1]; } };
Now, the compiler will call the IntToChar() constructor
to initialize an IntToChar union with a char
array.
As we mentioned earlier, an anonymous union's members are accessible in the same scope that the union is in, and therefore, you can use those anonymous union members without specifying a union name. However, if you needed to pass values from a named union to a function, you'd have to specify the appropriate member by name.
Instead, you can create conversion operators for a union that
allow you to pass that union's name in place of a variable
of a given type. For example, in the IntToChar union
we mentioned earlier, you could add
operator int() { return IntValue; }
to the class declaration, and then pass an IntToChar
union object to any function that accepts int parameters.
(Since an anonymous union doesn't have a union name, you
can create a conversion operator for it.)
If you're writing software that shares files with Macintosh computers, you're probably familiar with the byte-ordering differences these machines exhibit compared to PC compatibles. A Macintosh computer (or any other Motorola processor system) stores integral values larger than one byte with the most significant byte first and the least significant byte last. PC-compatible systems, on the other hand, store the least significant byte first and the most significant byte last.
By the way, the formatting differences that exist between Intel and Motorola processors are referred to as endianess. A big-endian system uses the Motorola storage format, and a little-endian system uses Intel's method. (The name derives from Jonathan Swift's Gulliver's Travels, in which the Big-Endian people cracked eggs beginning with the large enddirectly opposite of the way Lilliputians cracked them.)
For example, if you write the following code with Borland C++
int IntelInteger = 0x1122;
you'll find the value 0x22 at the first byte and 0x11 at the second byte. If you're going to use data files from a Macintosh-based software package, and the data file contains Macintosh-format integers or long integers, you'll need to reverse the ordering of the bytes before you use them in your program.
A simple way to do so is to provide a union that stores the integral
value and an array of bytes that you can use to rearrange the
integer's format. The following class demonstrates this:
union BigToLittle { unsigned int LittleValue; unsigned char charValue[2]; BigToLittle(unsigned int BigVal) : LittleValue(BigValue) { unsigned char temp = charValue[0]; charValue[0] = charValue[1]; charValue[1] = temp; } operator unsigned int() { return LittleValue; } };
When you create a BigToLittle union, you'll pass a big-endian number to the constructor.
The constructor's initializer list assigns the big-endian
value to the LittleValue member. Then, in the body of
the constructor, we swap the byte values. To add the finishing
touch, we've included a conversion operator to allow you
to use objects of this union's type anywhere you'd
use an unsigned int value.
In C and C++, an explicit cast tells the compiler, "Treat the following identifier as if it were really this type of data." While object-oriented programming may reduce the need for explicit casts, you may still find them necessary.
Depending on your experiences, you may consider C-style casts to be an incredible demonstration of the language's poweror a good example of what's wrong with C. If, like many programmers, you don't want to litter your code with explicit casts, you can use unions to transparently cast for you.
To hide your casts inside a union, simply place inside a union
each of the types that you'd cast to and from. For example,
if you use two identically sized data structures,
struct Skipper { int Weight; int Height; };
and
struct Professor { int IQ; int Height; };
you can create the union
union Castaway { Skipper AsSkipper; Professor AsProfessor; };
to handle the casts. Now, you can create a Castaway union
and treat it as either type without a cast, as shown below.
int main() { Skipper s = { 250, 6 }; Castaway c; c.AsSkipper = s; cout << "IQ = " << c.AsProfessor.IQ << endl; cout << "Height = " << c.AsProfessor.Height; return 0; }
Copyright (c) 1996 The Cobb Group, a division of Ziff-Davis Publishing Company. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis Publishing Company is prohibited. The Cobb Group and The Cobb Group logo are trademarks of Ziff-Davis Publishing Company.