Overview of the Garbage Collection Architecture

Goals

GC should be transparent as far as replacing malloc based C programs
GC overhead should be minimal (in terms of time and percentage execution) for most Java applets and applications.

The key to countering the overhead for determining the difference between live and dead pointers is by making allocating memory very efficient. Most modern, high performance GCs start with a simple way of allocating memory. To allocate, we have a current block with a part that's used and a part that's free. The new object gets added on the end of the used pile and the pointer in the used pile is moved on. When we run off the end of the block, we allocate another block. The blocks are fixed size - if you need a larger block ("large" is dependent on tuning - typically this will be between 4k and 8k), then we allocate a custom block that will have only that object. All the blocks are linked together. The decision about the size of the block is the same tradeoff as page size within the operating system

To distinguish live pointers from dead pointers, we start from a root set where we know there are live pointers, and we follow those pointers by copying those into new blocks, then work through the each object that these pointers point to in these roots, and do the same with them, and so on. When we reach the end of scanning the new objects in the new copy, we know we've copied over every live object. This scheme can be fairly expensive if you always copy many objects that are very stable. We improve on this by taking a snapshot of the last point we GC'd and the new point we GC'd and copy only the objects that are in between - leaving old objects alone. This is called generational GC - using two generations in this case.

The large blocks are not copied - they're just logically moved from the old set to the new. We may (for version 2) we may make further improvements by considering only roots between the phases the part of memory that has been modified.

Garbage collection is called on allocation of new objects. When a new object is alloacted, the garbage collector detects memory threshold (a heuristic is used to decide if it needs to GC and, if so, sets a flag). The VM looks at the flag and initiates the process of checking the blockcount. We allow for one GC per namespace.

Mark and compact is not used as a garbage collection method, because they are inherently less efficient in situations where you have a lot of garbage - which will be typical in Java programs.

Note This document is an early release of the final specification. It is meant to specify and accompany software that is still in development. Some of the information in this documentation may be inaccurate or may not be an accurate representation of the functionality of the final specification or software. Microsoft assumes no responsibility for any damages that might occur either directly or indirectly from these inaccuracies. Microsoft may have trademarks, copyrights, patents or pending patent applications, or other intellectual property rights covering subject matter in this document. The furnishing of this document does not give you a license to these trademarks, copyrights, patents, or other intellectual property rights.

Conservative scanning or garbage collection is the term used when nothing is known about the data which is being checked. It may just be characters or some other simple data type, but if it looks like it may be a pointer, it is treated as such to err on the side of caution. Since the data is not known to be a pointer, it cannot be modified to refer to a new copy of the object to which it seems to point. Instead, the object in question must be fixed in memory and made immovable.

Overview of the Garbage Collection Architecture

Goals

Garbage Collection Mechanism