September, 1995 - Vol. 2 No. 9
by David Reid
Arguably the most common bugs you'll find in C and, to some extent, C++ programs are memory leaks, which are caused by code that allocates, but never releases, blocks of memory. Unlike most bugs, memory leaks are fairly benign problems. In fact, unless a memory leak causes a program to waste huge amounts of memory, you might never know the problem exists. And there's the crux of the matter: Because memory leaks can be difficult to detect, developers sometimes release a program without realizing it contains a memory leak.
In this article, we'll present a technique that can help
you eliminate most, if not all, memory leaks from C programs.
This technique is also helpful in C++ programs that use the malloc
and free functions.
Actually, determining that your program contains a memory leak is the easy partfinding the cause of the leak can be a nightmare. All you know is that there's less free memory just before your program terminates than there was just after it started.
For some reason, your program failed to free one or more blocks of memory. However, where the memory is and what your program used it for are difficult questions to answer. After all, the pointers that referenced the memory are no longer visible to that part of the program, so you have no easy way of looking at the memory to see what it contains.
When you're trying to track down a memory leak, you need
two pieces of information: the location of the memory that isn't
being freed and the location of the code that allocated the memory
in the first place. Armed with this information, you shouldn't
need to do too much work to locate and eliminate the bug. Let's
see how you might extract this information from your program.
In C and C++ programs, there are a limited number of functions you can call to allocate memory. Among these are the malloc family of functions (malloc, fmalloc, calloc, realloc, and so on) and a few miscellaneous functions such as strdup. Therefore, you should be able to name every function your program uses to allocate memory. Of course, your program may call these functions in hundreds of different locations, but that's not a problem.
For every memory allocation function your program uses, you can define a thunk, which is an associated function that your program calls instead. The thunk function performs some intermediate task (such as displaying or recording debugging information), and then calls the original memory allocation function. For example, if you want to monitor the memory allocation activity of the malloc function, you might define a function called dmalloc. Then, each time the program's code specifies a call to malloc, you force it to call dmalloc instead.
The dmalloc function notes the location in the code of the call to malloc. Then, it calls malloc, notes the memory address returned by malloc, and returns the memory address just as malloc itself would have done. In addition, you could define the function dfree to serve as a thunk for the free function. That way, you'd know whenever the program properly frees a memory block allocated by malloc. The beauty of this thunking scheme is that your program functions just as it always has (albeit a little slower). In addition, just before your program terminates, you can examine the information stored by dmalloc and dfree to determine the memory blocks that haven't been freed, where they are, what they contain, and where in your code you can find the malloc statement that created them.
Since you want the thunks to exist only in the debug version of
your code, they need to be easily turned off when you switch from
the debug to the release version. Therefore, you need to design
them to key off the __DEBUG preprocessor macro. You can
do so by placing statements such as
#ifdef __DEBUG #define malloc(n) dmalloc(__FILE__,__LINE__,n) #define free(p) dfree(p) #endif
in a header file that's common to every C and CPP module in your program.
When you compile a debug version of your program, the preprocessor
will replace each call to malloc with a call to dmalloc,
effectively generating a thunk. Futhermore, dmalloc will
receive two additional parameters not found in the original call
to malloc. At compile time, the compiler will translate
__FILE__ and __LINE__ macros to the name of
the source file that contains the statement, and to the line number
of the statement inside the file, respectively.
Now that you understand the technique, let's see how to apply it to an actual program. Even though the sample program we'll present is very simple, we broke it into several source files since most programs comprise more than one source file. This approach also lets us illustrate how important header files are to the successful implementation of this debugging technique.
Listing A: MYPROG.C
//************************************************* // MYPROG.C - Some fictitious program // #include <stdlib.h> #include <stdio.h> #include "myprog.h" int main( void ) { module1(); module2(); dreport(); return EXIT_SUCCESS; }
Listing B: MODULE1.C
//************************************************* // MODULE1.C - Allocates and frees some // memory #include <stdlib.h> #include <stdio.h> #include "myprog.h" int module1( void ) { int i; void *ptr_array[ARRAYSIZE]; //----Allocate ARRAYSIZE memory blocks for ( i=0; i<ARRAYSIZE; i++ ) { ptr_array[i] = malloc( 16 ); if ( !ptr_array[i] ) { puts("Error allocating memory"); return 0; } } //----Free all but the unlucky ones for ( i=0; i<ARRAYSIZE; i++ ) if ( i%13 ) // Oops free( ptr_array[i] ); return 1; }
Listing C: MODULE2.C
//************************************************* // MODULE2.C - Allocates and frees some more // memory. Because this file contains // extra comments, the call to malloc // will be on a different line than // the one in MODULE1.C. #include <stdlib.h> #include <stdio.h> #include "myprog.h" int module2( void ) { int i; void *ptr_array[ARRAYSIZE]; //----Allocate ARRAYSIZE memory blocks for ( i=0; i<ARRAYSIZE; i++ ) { ptr_array[i] = malloc( 16 ); if ( !ptr_array[i] ) { puts( "Error allocating memory" ); return 0; } } //----Free all but every other unlucky one for ( i=0; i<ARRAYSIZE; i++ ) if ( i%26 ) // Oops free( ptr_array[i] ); return 1; }
We've saved the best part for last. Based on the code in Listing A through Listing D, there's no indication that the dmalloc and dfree functions will ever be called. That's where the header file MYPROG.H, shown in Listing E , comes into play. Notice that in addition to normal definitions and prototypes needed by MYPROG, the header file contains a section enclosed by #ifdef __DEBUG and #endif statements. The statements in this section provide the link between the application and our thunk functions.
Listing E: MYPROG.H
//************************************************* // MYPROG.H - Definitions for MYPROG #ifdef __DEBUG //----Prototypes for debug functions void *dmalloc( char *file, int line, size_t size ); void dfree( void *ptr ); int dreport( void ); //----Thunk substitutions #define malloc(n) dmalloc(__FILE__,__LINE__,n) #define free(p) dfree(p) #else #define dreport() 0 #endif //----Normal MYPROG definitions #define ARRAYSIZE 100 int module1( void ); int module2( void );
Because every source file in the MYPROG application includes MYPROG.H,
the thunks will be activated throughout the program whenever you
build a debug version of the program. The prototypes for dmalloc
and dfree are necessary in MYPROG.H because they're
the function calls that the compiler sees after the preprocessor
#define statements have done their work. Just as important,
notice that we don't include MYPROG.H in the DMALLOC.C
source file, because the functions in this source file need to
be able to call the real malloc and free functions.
After compiling and running a debug version of MYPROG, you can examine the DREPORT.LST file to see what, if any, memory leaks occurred. Figure A shows what you'll find in the file. As the DREPORT.LST listing in Figure A shows, the first module failed to free eight memory blocks, and the second module failed to free four memory blocks. Although this report doesn't include the addresses of the memory blocks, that information is available in the DLIST structure, should you need it.
Figure A - The DREPORT.LST file lists memory leaks.
12 entries in dlist Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00017 File: C:\work\module1.c Line: 00020 File: C:\work\module2.c Line: 00020 File: C:\work\module2.c Line: 00020 File: C:\work\module2.c Line: 00020 File: C:\work\module2.c
Obviously, our example program is much simpler than any real application. In particular, a more complex program would probably use additional functions besides malloc to allocate memory. In order to trap all memory leaks in complex programs, you'll need to create additional functions such as dcalloc and drealloc, using the code in dmalloc as a starting point for these functions.
Furthermore, because of the way we designed the dlist array, this technique will allow a maximum of 16380 memory allocations. (For far pointers, 16380 is the maximum number that can fit in a 64-KB memory segment.) Although this limitation shouldn't affect most applications, you'll need to redesign the list if your application allocates more than this number of memory blocks.
On a related note, the thunks use a considerable amount of memory to store the information for each memory allocation. The exact amount varies from around 20 to 100 bytes per memory allocation, depending on the length of the source file's pathname. Therefore, the thunk functions may cause the application to run out of memory on machines where memory is tight to begin with. In addition, the thunk functions will slow down the application, although the effect probably won't be noticeable unless the application allocates and frees many, many memory blocks.
Finally, the thunk functions will adapt to the memory model you use to compile your program. However, if your program allocates both far and near memory addresses, you'll need to redesign the thunk functions to handle both types.
Lest we scare you off with all these caveats, let us reassure you. These functions are very reliable and make detecting and finding memory leaks relatively easy. Furthermore, since you'll be using them only to debug your program, you don't have to worry about them causing problems for your end users.
For most programmers, detecting and correcting memory leaks is
not a pleasant task. Even finding out which part of your program
allocated the leaked memory is just about impossible without special
debugging tools. In the next issue, we'll show some more
advanced methods you can use to track allocation problems.
Copyright (c) 1996 The Cobb Group, a division of Ziff-Davis Publishing Company. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis Publishing Company is prohibited. The Cobb Group and The Cobb Group logo are trademarks of Ziff-Davis Publishing Company.