delorie.com is funded by banner ads.
  www.delorie.com/djgpp/v2faq/faq181.html   search  

| Previous | Next | Up | Top |

22.10 What should sizeof (struct xyzzy) return?

Q: When I call sizeof on a struct, I sometimes get values which are larger than the sum of the sizes of the struct members, whereas in Borland C++ I always get the correct result. Is it a bug in GCC?

Q: I have a program that reads struct contents from a binary file. It works OK when compiled with BC, but reads garbage when compiled with DJGPP. This must be a bug in DJGPP, right?


A: No, it's not a bug. GCC generates 32-bit code, and in that mode, there is a significant penalty (in terms of run-time performance) for unaligned accesses, like accessing a 16-bit short which isn't aligned on a word boundary, or accessing a 32-bit int which isn't aligned on a dword boundary. To produce faster code, GCC pads struct members so that each one can be accessed without delays; this sometimes produces struct size which is larger than the sum of the sizes of its members. If you need to minimize this padding (e.g., if your program uses large arrays of such structs, where padding will waste a lot of memory), lay out your structures so that the longer members are before the shorter ones. For example, let's say that you have a struct defined thus:
       struct my_struct {
         char name[7];
         unsigned long offset;
         double quality;
       };

To make such a struct use the least number of bytes, rearrange the members, like this(Note: Note that this still allows the struct to be padded at the end.):
       struct my_struct {
         double quality;
         unsigned long offset;
         char name[7];
       };

If the layout of the structure cannot be changed (e.g., when it must match some external specification, like a block of data returned by a system call), you can use the __attribute__((packed)) extension of GCC (see the "Type Attributes" section of the "GNU C/C++ Manual".) to prevent GCC from padding the structure members; this will make accesses to some of the members significantly slower.

Beginning with version 2.7.0, GCC has a command-line option -fpack-struct which causes GCC to pack all members of all structs together without any holes, just as if you used __attribute__((packed)) on every struct declaration in the source file you compile with that switch. If you use this switch, be sure that source files which you compile with it don't use any of the structures defined by library functions, or you will get some members garbled (because the library functions weren't compiled with that switch). Alternatively, you could declare any single structure to be packed, like so:

       struct {
         char name[7];
         unsigned long offset;
         double quality;
       } __attribute__ ((packed));

However, note that the latter will only work when you compile it as a C source; C++ doesn't allow such syntax, and you will have to fall back to declaring each struct member with the packed attribute. Therefore, it's best to only use declarations such as above if you are certain it won't be ever compiled as a C++ source.

The padding of struct members should be considered when you read or write struct contents from or to a disk file. In general, this should only be done if the file is read and written by the same program, because the exact layout of the struct members depends on some subtle aspects of code generation and the compiler switches used, and these may differ between programs, even if they were compiled by the same compiler on the same system. If you do need this method, be aware of the struct member padding and don't assume that the number of the file bytes that the structure uses is equal to the sum of the members' sizes, even if you instructed the compiler to pack structs: GCC still can add some padding after the last member. So always use sizeof struct foo to read and write a structure.

Another problem with porting programs that read structs from binary files is that the size of some data types might be different under different compilers. Specifically, an int is 16-bit wide in most DOS-based compilers, but in DJGPP it's 32-bit wide.

The best, most robust and portable way to read and write structs is through a char buffer, which your code then uses to move the contents into or out of the struct members, one by one. This way, you always know what you are doing and your program will not break down if the padding rules change one day, or if you port it to another OS/compiler. The ANSI-standard offsetof macro comes in handy in many such cases.


  webmaster   donations   bookstore     delorie software   privacy  
  Copyright ⌐ 1998   by Eli Zaretskii     Updated Sep 1998  

Powered by Apache!

You can help support this site by visiting the advertisers that sponsor it! (only once each, though)