Debugging Tip - Using thunk functions to track down memory leaks

by David Reid

Arguably the most common bugs you'll find in C and, to some extent, C++ programs are memory leaks, which are caused by code that allocates, but never releases, blocks of memory. Unlike most bugs, memory leaks are fairly benign problems. In fact, unless a memory leak causes a program to waste huge amounts of memory, you might never know the problem exists. And there's the crux of the matter: Because memory leaks can be difficult to detect, developers sometimes release a program without realizing it contains a memory leak.

In this article, we'll present a technique that can help you eliminate most, if not all, memory leaks from C programs. This technique is also helpful in C++ programs that use the malloc and free functions.

An overview

Actually, determining that your program contains a memory leak is the easy partfinding the cause of the leak can be a nightmare. All you know is that there's less free memory just before your program terminates than there was just after it started.

For some reason, your program failed to free one or more blocks of memory. However, where the memory is and what your program used it for are difficult questions to answer. After all, the pointers that referenced the memory are no longer visible to that part of the program, so you have no easy way of looking at the memory to see what it contains.

When you're trying to track down a memory leak, you need two pieces of information: the location of the memory that isn't being freed and the location of the code that allocated the memory in the first place. Armed with this information, you shouldn't need to do too much work to locate and eliminate the bug. Let's see how you might extract this information from your program.

I thunk, therefore I am

In C and C++ programs, there are a limited number of functions you can call to allocate memory. Among these are the malloc family of functions (malloc, fmalloc, calloc, realloc, and so on) and a few miscellaneous functions such as strdup. Therefore, you should be able to name every function your program uses to allocate memory. Of course, your program may call these functions in hundreds of different locations, but that's not a problem.

For every memory allocation function your program uses, you can define a thunk, which is an associated function that your program calls instead. The thunk function performs some intermediate task (such as displaying or recording debugging information), and then calls the original memory allocation function. For example, if you want to monitor the memory allocation activity of the malloc function, you might define a function called dmalloc. Then, each time the program's code specifies a call to malloc, you force it to call dmalloc instead.

The dmalloc function notes the location in the code of the call to malloc. Then, it calls malloc, notes the memory address returned by malloc, and returns the memory address just as malloc itself would have done. In addition, you could define the function dfree to serve as a thunk for the free function. That way, you'd know whenever the program properly frees a memory block allocated by malloc. The beauty of this thunking scheme is that your program functions just as it always has (albeit a little slower). In addition, just before your program terminates, you can examine the information stored by dmalloc and dfree to determine the memory blocks that haven't been freed, where they are, what they contain, and where in your code you can find the malloc statement that created them.

Since you want the thunks to exist only in the debug version of your code, they need to be easily turned off when you switch from the debug to the release version. Therefore, you need to design them to key off the __DEBUG preprocessor macro. You can do so by placing statements such as

#ifdef __DEBUG
#define malloc(n) dmalloc(__FILE__,__LINE__,n)
#define free(p) dfree(p)
#endif

in a header file that's common to every C and CPP module in your program.

When you compile a debug version of your program, the preprocessor will replace each call to malloc with a call to dmalloc, effectively generating a thunk. Futhermore, dmalloc will receive two additional parameters not found in the original call to malloc. At compile time, the compiler will translate __FILE__ and __LINE__ macros to the name of the source file that contains the statement, and to the line number of the statement inside the file, respectively.

An example

Now that you understand the technique, let's see how to apply it to an actual program. Even though the sample program we'll present is very simple, we broke it into several source files since most programs comprise more than one source file. This approach also lets us illustrate how important header files are to the successful implementation of this debugging technique.

Listing A: MYPROG.C

//*************************************************
// MYPROG.C - Some fictitious program
//

#include <stdlib.h>
#include <stdio.h>

#include "myprog.h"

int main( void )
{
    module1();
    module2();
    dreport();

    return EXIT_SUCCESS;
}

Listing B: MODULE1.C

//*************************************************
// MODULE1.C - Allocates and frees some
// memory

#include <stdlib.h>
#include <stdio.h>

#include "myprog.h"

int module1( void )
{
    int   i;
    void *ptr_array[ARRAYSIZE];

    //----Allocate ARRAYSIZE memory blocks
    for ( i=0; i<ARRAYSIZE; i++ )
    {
        ptr_array[i] = malloc( 16 );
        if ( !ptr_array[i] )
        {
            puts("Error allocating memory");
            return 0;    
        }
    }

    //----Free all but the unlucky ones
    for ( i=0; i<ARRAYSIZE; i++ )
        if ( i%13 )     // Oops
            free( ptr_array[i] );

    return 1;
}

Listings A, B, and C show the main source file and two source modules for a program called MYPROG. Upon inspection of the logic in MODULE1.C and MODULE2.C, you'll notice that this program has a severe memory leakquite a few of the allocated memory blocks are never freed. Listing D shows the file DMALLOC.C, which contains our thunk functions, dmalloc and dfree. In addition, this file contains a support function called dreport, which writes information describing the status of the allocated memory to the DREPORT.LST file.

Listing C: MODULE2.C

//*************************************************
// MODULE2.C - Allocates and frees some more
//     	memory. Because this file contains
//	extra comments, the call to malloc 
//	will be on a different line than
//	the one in MODULE1.C.

#include <stdlib.h>
#include <stdio.h>

#include "myprog.h"

int module2( void )
{
    int   i;
    void *ptr_array[ARRAYSIZE];

    //----Allocate ARRAYSIZE memory blocks
    for ( i=0; i<ARRAYSIZE; i++ )
    {
        ptr_array[i] = malloc( 16 );
        if ( !ptr_array[i] )
        {
           puts( "Error allocating memory" );
            return 0;    
        }
    }

    //----Free all but every other unlucky one 
    for ( i=0; i<ARRAYSIZE; i++ )
        if ( i%26 )     // Oops
            free( ptr_array[i] );

    return 1;
}

We've saved the best part for last. Based on the code in Listing A through Listing D, there's no indication that the dmalloc and dfree functions will ever be called. That's where the header file MYPROG.H, shown in Listing E , comes into play. Notice that in addition to normal definitions and prototypes needed by MYPROG, the header file contains a section enclosed by #ifdef __DEBUG and #endif statements. The statements in this section provide the link between the application and our thunk functions.

Listing E: MYPROG.H

//*************************************************
// MYPROG.H - Definitions for MYPROG

#ifdef __DEBUG
//----Prototypes for debug functions
void *dmalloc( char *file, int line, 
	       size_t size );
void dfree( void *ptr );
int dreport( void );

//----Thunk substitutions
#define malloc(n) dmalloc(__FILE__,__LINE__,n)
#define free(p) dfree(p)
#else
#define dreport() 0
#endif

//----Normal MYPROG definitions
#define ARRAYSIZE 100

int module1( void );
int module2( void );

Because every source file in the MYPROG application includes MYPROG.H, the thunks will be activated throughout the program whenever you build a debug version of the program. The prototypes for dmalloc and dfree are necessary in MYPROG.H because they're the function calls that the compiler sees after the preprocessor #define statements have done their work. Just as important, notice that we don't include MYPROG.H in the DMALLOC.C source file, because the functions in this source file need to be able to call the real malloc and free functions.

Running the program

After compiling and running a debug version of MYPROG, you can examine the DREPORT.LST file to see what, if any, memory leaks occurred. Figure A shows what you'll find in the file. As the DREPORT.LST listing in Figure A shows, the first module failed to free eight memory blocks, and the second module failed to free four memory blocks. Although this report doesn't include the addresses of the memory blocks, that information is available in the DLIST structure, should you need it.

Figure A - The DREPORT.LST file lists memory leaks.

12 entries in dlist

Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00017 File: C:\work\module1.c
Line: 00020 File: C:\work\module2.c
Line: 00020 File: C:\work\module2.c
Line: 00020 File: C:\work\module2.c
Line: 00020 File: C:\work\module2.c

A few caveats

Obviously, our example program is much simpler than any real application. In particular, a more complex program would probably use additional functions besides malloc to allocate memory. In order to trap all memory leaks in complex programs, you'll need to create additional functions such as dcalloc and drealloc, using the code in dmalloc as a starting point for these functions.

Furthermore, because of the way we designed the dlist array, this technique will allow a maximum of 16380 memory allocations. (For far pointers, 16380 is the maximum number that can fit in a 64-KB memory segment.) Although this limitation shouldn't affect most applications, you'll need to redesign the list if your application allocates more than this number of memory blocks.

On a related note, the thunks use a considerable amount of memory to store the information for each memory allocation. The exact amount varies from around 20 to 100 bytes per memory allocation, depending on the length of the source file's pathname. Therefore, the thunk functions may cause the application to run out of memory on machines where memory is tight to begin with. In addition, the thunk functions will slow down the application, although the effect probably won't be noticeable unless the application allocates and frees many, many memory blocks.

Finally, the thunk functions will adapt to the memory model you use to compile your program. However, if your program allocates both far and near memory addresses, you'll need to redesign the thunk functions to handle both types.

Lest we scare you off with all these caveats, let us reassure you. These functions are very reliable and make detecting and finding memory leaks relatively easy. Furthermore, since you'll be using them only to debug your program, you don't have to worry about them causing problems for your end users.

Conclusion

For most programmers, detecting and correcting memory leaks is not a pleasant task. Even finding out which part of your program allocated the leaked memory is just about impossible without special debugging tools. In the next issue, we'll show some more advanced methods you can use to track allocation problems.

Return to the Borland C++ Developer's Journal index

Subscribe to the Borland C++ Developer's Journal

Copyright (c) 1996 The Cobb Group, a division of Ziff-Davis Publishing Company. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis Publishing Company is prohibited. The Cobb Group and The Cobb Group logo are trademarks of Ziff-Davis Publishing Company.