GCO4020/CSC428 - Advanced Object Oriented Techniques In C++
Week 4

 

Topic 7: Reference Counting

Introduction

The appropriate use of destructors can greatly reduce the amount of "manual" memory management that must be implemented in a program. However, memory management problems can still arise when dynamically-allocated memory is to be accessed through multiple pointers. Reference counting is a technique which can solve some of these problems.


Synopsis


Source Code Examples


Why count references?

The problem of reclaiming dynamically-allocated memory that is no longer required by a program is known as "garbage collection¤". Ideally, we would like to allocate memory as it is required and then have it automatically reclaimed when it is no longer useful.

The problem is not so much knowing what to reclaim (since the program is aware of all the memory it has allocated), but knowing when any specific piece of memory is no longer needed.

One way of assessing whether a piece of memory is still required is to check whether any pointers exist which point to that memory. Clearly if no pointers are currently pointing to a piece of memory, it is no longer accessible from the program (in programming jargon, it has "leaked") and so the program will not be affected it we reclaim that memory. On the other hand, if even one pointer still points at the memory, it might be accessed at a later point in the program, so we can't reclaim it.

If every piece of dynamically-allocated memory was only ever pointed to by a single pointer (or, at least, only by a single pointer at any one time):


        [Diagram showing three distinct pointers pointing to three 
        distinct pieces of memory.]

garbage collection¤ would be easy. We could simply write a class whose objects each encapsulated a single pointer and have that class's destructor call delete on the encapsulated pointer whenever an object went out of scope (which would make the pointer inaccessible).

In fact, some applications need to use dynamic memory in exactly this way and the Standard Template Library¤ provides a template class called auto_ptr (TDCS, § 20.4.5) which implements a version of the "encapsulated pointer" class outlined above.

A much harder problem is knowing when to reclaim memory that is simultaneously accessible via two or more pointers:


        Diagram showing three distinct pointers pointing to a 
        single piece of memory.

It is no longer correct to reclaim memory when a pointer to it is destroyed, because other pointers might still be using that memory. Only when the last pointer pointing at a piece of memory is destroyed can the memory itself be reclaimed.

The problem is that individual pointers cannot know whether they are the last such pointer (since they have no information about each other, knowing only about the common memory they all point at).

To enable the pointers' destructors to decide whether they should delete the common memory, the common memory itself must keep track of how many pointers currently refer to it. That is, it must keep a reference count.

See also: Exercise 1


The structure of a reference counting system

Typically, references are counted by interposing another object - a reference counter - between the pointers and their common memory:


        Diagram showing three distinct pointers pointing to a 
        reference counter object which in turn points to a
        single piece of memory.

When a new pointer is made to point at the common memory through the reference counter, the new pointer informs the counter of this fact and the counter then increments its count. When each pointer ceases to point at the common memory (either because it is destroyed, or because it is made to point elsewhere), it again informs the reference counter, which then decrements its reference count:


        Diagram showing two distinct pointers pointing to a 
        reference counter object which in turn points to a
        single piece of memory.

If that count ever becomes zero during a decrement, the reference counter knows that the common memory it is guarding is no longer pointed to by any pointer. In this situation it can immediately delete the guarded memory and then itself (since it has nothing left to do, and no dependent pointer to do it for).


The reference count class

The class used to count references is actually very simple (see source code listing: RefCount.C):

template <class Type> class RefCount { public: RefCount(Type* ptr);

RefCount* Incr(void) const; void Decr(void);

Type* Ptr(void);

RefCount* Clone(void);

private: ~RefCount(void);

mutable int myCount; Type* myPtr; };

The constructor is given a pointer to the memory the RefCount is to "guard" and stores it away in myPtr. It also initializes its reference count to 1 (on the assumption that the new RefCount is going to be immediately stored in a single pointer):

        template <class Type>
        RefCount<Type>::RefCount(Type * ptr)
        : myCount(1)
        , myPtr(ptr)
        {}

The Incr() and Decr() members control incrementing and decrementing of the reference count. In addition, Decr() checks whether the count was decremented to zero, in which case it deletes the entire RefCount object on which it was called:

        template <class Type>
        RefCount<Type>*
        RefCount<Type>::Incr(void) const
        {
        	myCount++;
        	return const_cast<RefCount*>(this);
        }

template <class Type> void RefCount<Type>::Decr(void) { if (--myCount==0) { delete this; } }

Incr() is declared as a const member function to facilitate assignment of reference-counted pointers (see below). It is this member that makes it necessary to declare the actual counter (myCount) mutable.

Note that Incr() returns a pointer to the RefCount on which it is called. This is because it is usually used in this way (see "The reference-counted pointer class" for examples):

        newRefCount = oldRefCount->Incr();

The Ptr() member simply provides (totally unrestricted) access to the guarded pointer:

        template <class Type>
        Type*
        RefCount<Type>::Ptr(void)
        {
        	return myPtr;
        }

The Clone() member provides a mechanism for creating a copy of a RefCount and the memory it is guarding. It does so (in the generic¤ case) by simply calling the copy constructor for Type:

        template <class Type>
        RefCount<Type>*
        RefCount<Type>::Clone(void)
        {
        	return new RefCount<Type> (new Type (*myPtr));
        }

Note that this assumes that the myPtr member of the RefCount object is being used to point to a single object of class Type, rather than to the first object in an array of Type. The most common use of reference counting is, however, to guard a null-terminated character string pointed to be a char* - a case which violates this assumption. Hence it is useful to provide a specialization¤ of the RefCount::Clone() member for guarding chars:

        template <>
        RefCount<char>*
        RefCount<char>::Clone(void)
        {
        	char* clonedPtr = 0;
        	if (myPtr != 0)
        	{
        		clonedPtr = new char [strlen(myPtr)+1];
        		strcpy(clonedPtr,myPtr);
        	}
        	return new RefCount<char> (clonedPtr);
        }

The RefCount destructor reclaims the dynamically-allocated memory which the RefCount is guarding:

        template <class Type>
        RefCount<Type>::~RefCount(void)
        {
        	delete myPtr;
        }

Here too, since a char* typically points at a vector¤ of char (which must be deallocated with delete[]), we would have to provide another explicit specialization¤:

        RefCount<char>::~RefCount(void)
        {
        	delete[] myPtr;
        }

Note that the RefCount destructor is declared private. This ensures that a RefCount cannot be explicitly deleted, except from within a member of RefCount. Thus it is guaranteed that no user's code can accidently reclaim the memory prematurely:

        newRefCount = oldRefCount->Incr();
        delete oldRefCount;			// COMPILE-TIME ERROR

See also: Exercise 2

See also: Exercise 3


The reference-counted pointer class

By itself, the RefCount class is not sufficient to ensure that the dynamically-allocated memory it guards is properly garbage collected. RefCount's will only work correctly if they are properly "driven". That is, if their Incr() function is called whenever another pointer is made to point to them, and if their Decr() function is called every time a pointer ceases to do so.

To automate and ensure this behaviour, it is convenient to encapsulate it, by creating another class (RefCountedPtrTo) to represent the pointers themselves. Then, instead of writing:

        char* ptr1;
        char* ptr2;
        char* ptr3;

we will write:

        RefCountedPtrTo<char> ptr1;
        RefCountedPtrTo<char> ptr2;
        RefCountedPtrTo<char> ptr3;

and the ptrN objects will take care of the reference counting and garbage collection¤ automatically.

The class template itself provides members to mimic the functions of the raw pointer it replaces (that is, it acts like a proxy¤ - see "Topic 2: Proxies"):

        template <class Type>
        class RefCountedPtrTo
        {
        public:
        	RefCountedPtrTo(void);
        	RefCountedPtrTo(Type* ptr);
        	RefCountedPtrTo(const RefCountedPtrTo & p);

~RefCountedPtrTo(void);

Type & operator[] (int index); Type & operator* (void); Type * operator-> (void);

operator const Type* (void);

RefCountedPtrTo& operator= (const RefCountedPtrTo& p);

void DetachClone(void);

private: RefCount<Type>* myRefCount; };

The constructors do nothing except initialize the myRefCount pointer to point to a (possibly already existing) RefCount which guards the underlying shared memory:

        template <class Type>
        RefCountedPtrTo<Type>::RefCountedPtrTo(void) 
        : myRefCount(new RefCount<Type>(0))
        {}

template <class Type> RefCountedPtrTo<Type>::RefCountedPtrTo(Type* ptr) : myRefCount(new RefCount<Type>(ptr)) {}

template <class Type> RefCountedPtrTo<Type>::RefCountedPtrTo(const RefCountedPtrTo & p) : myRefCount(p.myRefCount->Incr()) {}

Note that the copy constructor doesn't need to create a new RefCount. It simply increments the counter of the RefCount in the RefCountedPtrTo which is being copied. This explains the popularity of reference counting in some applications: because the copy constructor doesn't actually have to copy anything, it is very cheap to copy objects guarded by reference counters. This can result in a big saving in both space and time costs if those guarded objects are large or otherwise expensive to copy.

The destructor does not delete the RefCount in myRefCount. Instead, it simply decrements the RefCount's counter, letting the RefCount decide whether it is appropriate to delete itself:

        template <class Type>
        RefCountedPtrTo<Type>::~RefCountedPtrTo(void)
        {
        	myRefCount->Decr();
        }

RefCountedPtrTo provides the three standard pointer dereferencing mechanisms offered by raw pointers: indexing (operator[]), unary dereference (operator*), and member dereference (operator->). Accessing the guarded memory through these mechanisms will not directly alter the reference count, so these members simply return the appropriate dereference of the guarded memory:

        template <class Type>
        Type&
        RefCountedPtrTo<Type>::operator[] (int index)
        {
        	return (myRefCount->Ptr())[index];
        }

template <class Type> Type& RefCountedPtrTo<Type>::operator* (void) { return * myRefCount->Ptr(); }

template <class Type> Type* RefCountedPtrTo<Type>::operator-> (void) { return myRefCount->Ptr(); }

To facilitate a limited amount of backwards compatibility, RefCountedPtrTo also provides a conversion to raw const Type*:

        template <class Type>
        RefCountedPtrTo<Type>::operator const Type* (void) const
        {
        	return myRefCount->Ptr();
        }

This, for example, allows the following code to execute without requiring additional overloading of operator<<:

        RefCountedPtrTo<char> rcstr1 = "str 1";
        RefCountedPtrTo<char> rcstr2 = rcstr1;

cout << rcstr1 << " shares memory with " << rcstr2 << endl;

Assignment between RefCountedPtrTo's causes a change in the reference count of both the assigned and assigning pointers. Note that this explains why the RefCount::Incr() was declared as a const member function (since it must be able to operate through the const pointer p.myRefCounter):

        template <class Type>
        RefCountedPtrTo<Type>&
        RefCountedPtrTo<Type>::operator= (const RefCountedPtrTo& p)
        {
        	if (myRefCount != p.myRefCount)
        	{
        		myRefCount->Decr();
        		myRefCount = p.myRefCount->Incr();
        	}
        	return *this;
        }

Finally, RefCountedPtrTo provides one feature not found in raw pointers: the ability to "detach" a "clone" of the memory it is pointing to. The DetachClone() member function leaves the RefCountedPtrTo on which it is called pointing to a reference-counted copy of the original guarded memory. That is, if ptr1, ptr2 and ptr3 are RefCountedPtrTo<char>'s, all of which point to the same guarded memory:


        Diagram showing three RefCountedPtrTo's pointing
        at a single RefCount guarding an array of char.

then after calling ptr1.DetachClone(), ptr1, ptr2 and ptr3 will look like this:


        Diagram showing two RefCountedPtrTo's pointing
        at a single RefCount guarding the original array of char
        and one RefCountedPtrTo pointing
        at a new RefCount guarding a copy of the original array of char.

Note that the other two pointers (ptr2 and ptr3) still point to the original reference-counted memory.

To accomplish this useful behaviour, DetachClone() first saves a pointer to the original RefCount, then clones that RefCount (see above) and stores the clone (which thus becomes its new RefCount). Finally it decrements the old RefCount's counter:

        template <class Type>
        void 
        RefCountedPtrTo<Type>::DetachClone(void)	
        {
        	RefCount<Type>* oldRefCount = myRefCount;
        	myRefCount = oldRefCount->Clone();
        	oldRefCount->Decr();
        }

See also: Exercise 4

See also: Exercise 5

See also: Exercise 6


Limitations of reference counting

Reference counting is a useful technique when the data structures used by a program lend themselves to it. However, it is not a complete solution to garbage collection, and dynamically-allocated memory guarded by a reference counter can still "leak" under certain circumstances.

For example, consider a dynamically-allocated circular list:


        [Diagram showing a linked list of nodes where the last node's
         myNext pointer points back to the first node.]

If the pointers in the nodes were reference counted the structure would look like this:


        [Diagram showing the previous circular list with reference
         counters between each node.]

Consider what happens when the head pointer is destroyed. As it is the only link between the program and the dynamically-allocated list, we would hope that it's destruction would cause the entire list to be reclaimed. However, when head's destructor decrements the first node's reference count, that count will only reduce to 1, and hence the first list node will not be deleted:


        [Diagram showing the previous circular list without it's head
         pointer and with all reference counts still at 1.]

Solving this particular case is not very difficult, as it simply requires that the circle of links be explicitly broken (at any point) before the head pointer is destroyed. For example:

        {
            RefCountedPtrTo<CircularNode> head;
            	:
            // CREATE CIRCULAR LIST
            	:
            head->myNext = 0;	// ENTIRE LIST (EXCEPT 1st NODE) RECLAIMED HERE
        }			// 1st NODE RECLAIMED HERE AS head DESTROYED

This code reclaims everything because the reassignment of head->myNext decrements the reference count of the second node in the list to zero, causing it to be automatically reclaimed. During this deletion, the destructor for the second node's myNext member (itself a RefCountedPtrTo<CircularNode>) is called, which decrements the third node's reference count to zero, causing it to be automatically reclaimed as well. This cascade effect continues along the list until the last node is destroyed (which automatically reduces the first node's reference count to 1). When head is subsequently destroyed (at the end of its scope) the first reference count is again decremented, this time to zero, which causes the first node to be automatically reclaimed as well.

Self-referential data structures are a general problem for reference counters, and will typically not be as easy to spot or overcome as in this simple circular list example.

Designing and implementing general, robust and inexpensive garbage collection is a complex and challenging problem, but sophisticated techniques are being developed, and there is even a class library available which implements an efficient, general, and automatic garbage collection system for C++.

See also: Exercise 7


Exercises

There are 7 exercises associated with this topic.

 


This material is part of the GCO4020/CSC428 - Advanced Object Oriented Techniques In C++ course.
Copyright © Damian Conway, 1997. All rights reserved.

Last updated: Fri Feb 18 11:17:28 2000