C++ BASICS - Unusual union uses

One reason that C is so popular is that it's very effective for performing low-level manipulation on individual bits, single bytes, and other simple memory structures. There are few, if any, low-level operations that you can't perform more easily in C than in an assembler.

As the successor to C, C++ contains this low-level programming power as well. However, if you're relatively new to C++ programming, you may not realize how powerful some of these C-style features are. In this article, we'll show you some handy coding techniques that take advantage of unions, simple data structures that specify how two or more variables can occupy the same memory space.

Unions anonymous

We've discussed anonymous unions before, but in case you're not familiar with them, anonymous unions are simply unions that don't have type identifiers, instances, or instance pointers associated with them. For example, the declaration

union {
  int member1;
  long int member2;
  char* member3;
};

specifies that member1, member2, and member3 are all visible in the current scope, just as if you'd declared

int member1;
long int member2;
char* member3;

as three independent variables.

However, when you declare these variables as part of an anonymous union, they all refer to the same memory address. If you modify member2 and then read the member1 value, the compiler will reinterpret those bytes as an int value instead of as a long int value.

Initialization by aggregation

According to the current C/C++ syntax rules, you can initialize a union by its first member only. For example, to initialize a union of the following type

union IntToChar {
  int IntValue;
  char charValue[2];
}

you'd write

IntToChar newUnion = { 456 };

You can't use

IntToChar newUnion = { "a" };

unless you reverse the order of the members in the IntToChar union.

If you want to initialize the union using a member other than the first, you must create a constructor for the union that accepts a parameter of the same type as the other member. For example, to initialize the IntToChar union we just mentioned, you could write

union IntToChar {
  int IntValue;
  char charValue[2];

  IntToChar(char* charP)
  { charValue[0] = charP[0];
    charValue[1] = charP[1];
  }
};

Now, the compiler will call the IntToChar() constructor to initialize an IntToChar union with a char array.

Union do's

As we mentioned earlier, an anonymous union's members are accessible in the same scope that the union is in, and therefore, you can use those anonymous union members without specifying a union name. However, if you needed to pass values from a named union to a function, you'd have to specify the appropriate member by name.

Instead, you can create conversion operators for a union that allow you to pass that union's name in place of a variable of a given type. For example, in the IntToChar union we mentioned earlier, you could add

operator int()
{ return IntValue; }

to the class declaration, and then pass an IntToChar union object to any function that accepts int parameters. (Since an anonymous union doesn't have a union name, you can create a conversion operator for it.)

Big endian, little endian

If you're writing software that shares files with Macintosh computers, you're probably familiar with the byte-ordering differences these machines exhibit compared to PC compatibles. A Macintosh computer (or any other Motorola processor system) stores integral values larger than one byte with the most significant byte first and the least significant byte last. PC-compatible systems, on the other hand, store the least significant byte first and the most significant byte last.

By the way, the formatting differences that exist between Intel and Motorola processors are referred to as endianess. A big-endian system uses the Motorola storage format, and a little-endian system uses Intel's method. (The name derives from Jonathan Swift's Gulliver's Travels, in which the Big-Endian people cracked eggs beginning with the large enddirectly opposite of the way Lilliputians cracked them.)

For example, if you write the following code with Borland C++

int IntelInteger = 0x1122;

you'll find the value 0x22 at the first byte and 0x11 at the second byte. If you're going to use data files from a Macintosh-based software package, and the data file contains Macintosh-format integers or long integers, you'll need to reverse the ordering of the bytes before you use them in your program.

A simple way to do so is to provide a union that stores the integral value and an array of bytes that you can use to rearrange the integer's format. The following class demonstrates this:

union BigToLittle {
  unsigned int LittleValue;
  unsigned char charValue[2];
  BigToLittle(unsigned int BigVal) :
    LittleValue(BigValue)
  { unsigned char temp = charValue[0];
    charValue[0] = charValue[1];
    charValue[1] = temp;
  }
  operator unsigned int()
  { return LittleValue; }
};

When you create a BigToLittle union, you'll pass a big-endian number to the constructor.

The constructor's initializer list assigns the big-endian value to the LittleValue member. Then, in the body of the constructor, we swap the byte values. To add the finishing touch, we've included a conversion operator to allow you to use objects of this union's type anywhere you'd use an unsigned int value.

Casts away!

In C and C++, an explicit cast tells the compiler, "Treat the following identifier as if it were really this type of data." While object-oriented programming may reduce the need for explicit casts, you may still find them necessary.

Depending on your experiences, you may consider C-style casts to be an incredible demonstration of the language's poweror a good example of what's wrong with C. If, like many programmers, you don't want to litter your code with explicit casts, you can use unions to transparently cast for you.

To hide your casts inside a union, simply place inside a union each of the types that you'd cast to and from. For example, if you use two identically sized data structures,

struct Skipper 
{ int Weight;
  int Height;
};

and

struct Professor 
{ int IQ;
  int Height;
};

you can create the union

union Castaway
{ Skipper   AsSkipper;
  Professor AsProfessor;
};

to handle the casts. Now, you can create a Castaway union and treat it as either type without a cast, as shown below.

int main()
{ Skipper s = { 250, 6 };
  Castaway c;
  c.AsSkipper = s;

  cout << "IQ = " << c.AsProfessor.IQ << endl;
  cout << "Height = " << c.AsProfessor.Height;
  return 0;
}

Return to the Borland C++ Developer's Journal index

Subscribe to the Borland C++ Developer's Journal

Copyright (c) 1996 The Cobb Group, a division of Ziff-Davis Publishing Company. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis Publishing Company is prohibited. The Cobb Group and The Cobb Group logo are trademarks of Ziff-Davis Publishing Company.