GCO4020/CSC428 - Advanced Object Oriented Techniques In C++
Week 4
The appropriate use of destructors can greatly reduce the amount of "manual" memory management that must be implemented in a program. However, memory management problems can still arise when dynamically-allocated memory is to be accessed through multiple pointers. Reference counting is a technique which can solve some of these problems.
The problem of reclaiming dynamically-allocated memory that is no longer required by a program is known as "garbage collection¤". Ideally, we would like to allocate memory as it is required and then have it automatically reclaimed when it is no longer useful.
The problem is not so much knowing what to reclaim (since the program is aware of all the memory it has allocated), but knowing when any specific piece of memory is no longer needed.
One way of assessing whether a piece of memory is still required is to check whether any pointers exist which point to that memory. Clearly if no pointers are currently pointing to a piece of memory, it is no longer accessible from the program (in programming jargon, it has "leaked") and so the program will not be affected it we reclaim that memory. On the other hand, if even one pointer still points at the memory, it might be accessed at a later point in the program, so we can't reclaim it.
If every piece of dynamically-allocated memory was only ever pointed to by a single pointer (or, at least, only by a single pointer at any one time):
garbage collection¤ would be easy. We could simply write a class whose
objects each encapsulated a single pointer and have that class's
destructor call delete on the encapsulated pointer whenever an
object went out of scope (which would make the pointer inaccessible).
In fact, some applications need to use dynamic memory in exactly this way
and the Standard Template Library¤ provides a template class called
auto_ptr (TDCS, § 20.4.5) which implements
a version of the "encapsulated pointer" class outlined above.
A much harder problem is knowing when to reclaim memory that is simultaneously accessible via two or more pointers:
It is no longer correct to reclaim memory when a pointer to it is destroyed, because other pointers might still be using that memory. Only when the last pointer pointing at a piece of memory is destroyed can the memory itself be reclaimed.
The problem is that individual pointers cannot know whether they are the last such pointer (since they have no information about each other, knowing only about the common memory they all point at).
To enable the pointers' destructors to decide whether they should delete the common memory, the common memory itself must keep track of how many pointers currently refer to it. That is, it must keep a reference count.
See also: Exercise 1
Typically, references are counted by interposing another object - a reference counter - between the pointers and their common memory:
When a new pointer is made to point at the common memory through the reference counter, the new pointer informs the counter of this fact and the counter then increments its count. When each pointer ceases to point at the common memory (either because it is destroyed, or because it is made to point elsewhere), it again informs the reference counter, which then decrements its reference count:
If that count ever becomes zero during a decrement, the reference counter knows that the common memory it is guarding is no longer pointed to by any pointer. In this situation it can immediately delete the guarded memory and then itself (since it has nothing left to do, and no dependent pointer to do it for).
The class used to count references is actually very simple (see source code listing: RefCount.C):
template <class Type> class RefCount { public: RefCount(Type* ptr);
RefCount* Incr(void) const; void Decr(void);
Type* Ptr(void);
RefCount* Clone(void);
private: ~RefCount(void);
mutable int myCount; Type* myPtr; };
The constructor is given a pointer to the memory the RefCount is
to "guard" and stores it away in myPtr. It also initializes its
reference count to 1 (on the assumption that the new RefCount
is going to be immediately stored in a single pointer):
template <class Type>
RefCount<Type>::RefCount(Type * ptr)
: myCount(1)
, myPtr(ptr)
{}
The Incr() and Decr() members control incrementing and decrementing
of the reference count. In addition, Decr() checks whether the count
was decremented to zero, in which case it deletes the entire RefCount
object on which it was called:
template <class Type>
RefCount<Type>*
RefCount<Type>::Incr(void) const
{
myCount++;
return const_cast<RefCount*>(this);
}
template <class Type>
void
RefCount<Type>::Decr(void)
{
if (--myCount==0) { delete this; }
}
Incr() is declared as a const member function to facilitate
assignment of reference-counted pointers (see below). It is this member that makes it necessary to declare the
actual counter (myCount) mutable.
Note that Incr() returns a pointer to the RefCount on which it is
called. This is because it is usually used in this way (see
"The reference-counted pointer class" for examples):
newRefCount = oldRefCount->Incr();
The Ptr() member simply provides (totally unrestricted) access to the
guarded pointer:
template <class Type>
Type*
RefCount<Type>::Ptr(void)
{
return myPtr;
}
The Clone() member provides a mechanism for creating a copy of a
RefCount and the memory it is guarding. It does so (in the generic¤
case) by simply calling the copy constructor for Type:
template <class Type>
RefCount<Type>*
RefCount<Type>::Clone(void)
{
return new RefCount<Type> (new Type (*myPtr));
}
Note that this assumes that the myPtr member of the RefCount object
is being used to point to a single object of class Type, rather than
to the first object in an array of Type. The most common use of
reference counting is, however, to guard a null-terminated character
string pointed to be a char* - a case which violates this assumption.
Hence it is useful to provide a specialization¤ of the
RefCount::Clone() member for guarding chars:
template <>
RefCount<char>*
RefCount<char>::Clone(void)
{
char* clonedPtr = 0;
if (myPtr != 0)
{
clonedPtr = new char [strlen(myPtr)+1];
strcpy(clonedPtr,myPtr);
}
return new RefCount<char> (clonedPtr);
}
The RefCount destructor reclaims the dynamically-allocated memory which the
RefCount is guarding:
template <class Type>
RefCount<Type>::~RefCount(void)
{
delete myPtr;
}
Here too, since a char* typically points at a vector¤ of char (which must be
deallocated with delete[]), we would have to provide
another explicit specialization¤:
RefCount<char>::~RefCount(void)
{
delete[] myPtr;
}
Note that the RefCount destructor is declared private. This
ensures that a RefCount cannot be explicitly deleted, except from
within a member of RefCount. Thus it is guaranteed that no user's code
can accidently reclaim the memory prematurely:
newRefCount = oldRefCount->Incr();
delete oldRefCount; // COMPILE-TIME ERROR
See also: Exercise 2
See also: Exercise 3
By itself, the RefCount class is not sufficient to ensure that the
dynamically-allocated memory it guards is properly garbage collected.
RefCount's will only work correctly if they are properly "driven".
That is, if their Incr() function is called whenever another
pointer is made to point to them, and if their Decr() function is
called every time a pointer ceases to do so.
To automate and ensure this behaviour, it is convenient to
encapsulate it, by creating another class (RefCountedPtrTo) to
represent the pointers themselves. Then, instead of writing:
char* ptr1;
char* ptr2;
char* ptr3;
we will write:
RefCountedPtrTo<char> ptr1;
RefCountedPtrTo<char> ptr2;
RefCountedPtrTo<char> ptr3;
and the ptrN objects will take care of the
reference counting and garbage collection¤ automatically.
The class template itself provides members to mimic the functions of the raw pointer it replaces (that is, it acts like a proxy¤ - see "Topic 2: Proxies"):
template <class Type>
class RefCountedPtrTo
{
public:
RefCountedPtrTo(void);
RefCountedPtrTo(Type* ptr);
RefCountedPtrTo(const RefCountedPtrTo & p);
~RefCountedPtrTo(void);
Type & operator[] (int index);
Type & operator* (void);
Type * operator-> (void);
operator const Type* (void);
RefCountedPtrTo& operator= (const RefCountedPtrTo& p);
void DetachClone(void);
private:
RefCount<Type>* myRefCount;
};
The constructors do nothing except initialize the myRefCount pointer
to point to a (possibly already existing) RefCount which guards the
underlying shared memory:
template <class Type>
RefCountedPtrTo<Type>::RefCountedPtrTo(void)
: myRefCount(new RefCount<Type>(0))
{}
template <class Type>
RefCountedPtrTo<Type>::RefCountedPtrTo(Type* ptr)
: myRefCount(new RefCount<Type>(ptr))
{}
template <class Type>
RefCountedPtrTo<Type>::RefCountedPtrTo(const RefCountedPtrTo & p)
: myRefCount(p.myRefCount->Incr())
{}
Note that the copy constructor doesn't need to create a new RefCount.
It simply increments the counter of the RefCount in the
RefCountedPtrTo which is being copied. This
explains the popularity of reference counting in some applications:
because the copy constructor doesn't actually have to copy anything,
it is very cheap to copy objects guarded by reference counters. This can
result in a big saving in both space and time costs if those guarded
objects are large or otherwise expensive to copy.
The destructor does not delete the RefCount in myRefCount.
Instead, it simply decrements the RefCount's counter, letting the
RefCount decide whether it is appropriate to delete itself:
template <class Type>
RefCountedPtrTo<Type>::~RefCountedPtrTo(void)
{
myRefCount->Decr();
}
RefCountedPtrTo provides the three standard pointer dereferencing
mechanisms offered by raw pointers: indexing (operator[]), unary
dereference (operator*), and member dereference (operator->).
Accessing the guarded memory through these mechanisms will not
directly alter the reference count, so these members simply return
the appropriate dereference of the guarded memory:
template <class Type>
Type&
RefCountedPtrTo<Type>::operator[] (int index)
{
return (myRefCount->Ptr())[index];
}
template <class Type>
Type&
RefCountedPtrTo<Type>::operator* (void)
{
return * myRefCount->Ptr();
}
template <class Type>
Type*
RefCountedPtrTo<Type>::operator-> (void)
{
return myRefCount->Ptr();
}
To facilitate a limited amount of backwards compatibility, RefCountedPtrTo
also provides a conversion to raw const Type*:
template <class Type>
RefCountedPtrTo<Type>::operator const Type* (void) const
{
return myRefCount->Ptr();
}
This, for example, allows the following code to execute without requiring
additional overloading of operator<<:
RefCountedPtrTo<char> rcstr1 = "str 1";
RefCountedPtrTo<char> rcstr2 = rcstr1;
cout << rcstr1 << " shares memory with " << rcstr2 << endl;
Assignment between RefCountedPtrTo's causes a change in the
reference count of both the assigned and assigning pointers. Note that
this explains why the RefCount::Incr() was declared as a const
member function (since it must be able to operate through the const
pointer p.myRefCounter):
template <class Type>
RefCountedPtrTo<Type>&
RefCountedPtrTo<Type>::operator= (const RefCountedPtrTo& p)
{
if (myRefCount != p.myRefCount)
{
myRefCount->Decr();
myRefCount = p.myRefCount->Incr();
}
return *this;
}
Finally, RefCountedPtrTo provides one feature not found in raw
pointers: the ability to "detach" a "clone" of the memory it is
pointing to. The DetachClone() member function leaves the
RefCountedPtrTo on which it is called pointing to a reference-counted
copy of the original guarded memory. That is, if ptr1, ptr2
and ptr3 are RefCountedPtrTo<char>'s, all of which point to the
same guarded memory:
then after calling ptr1.DetachClone(), ptr1, ptr2 and
ptr3 will look like this:
Note that the other two pointers (ptr2 and ptr3) still point to the
original reference-counted memory.
To accomplish this useful behaviour, DetachClone() first saves a
pointer to the original RefCount, then clones that RefCount (see
above) and stores the clone (which thus becomes its
new RefCount). Finally it decrements the old RefCount's
counter:
template <class Type>
void
RefCountedPtrTo<Type>::DetachClone(void)
{
RefCount<Type>* oldRefCount = myRefCount;
myRefCount = oldRefCount->Clone();
oldRefCount->Decr();
}
See also: Exercise 4
See also: Exercise 5
See also: Exercise 6
Reference counting is a useful technique when the data structures used by a program lend themselves to it. However, it is not a complete solution to garbage collection, and dynamically-allocated memory guarded by a reference counter can still "leak" under certain circumstances.
For example, consider a dynamically-allocated circular list:
If the pointers in the nodes were reference counted the structure would look like this:
Consider what happens when the head pointer is destroyed. As it is the only
link between the program and the dynamically-allocated list, we would hope that
it's destruction would cause the entire list to be reclaimed. However,
when head's destructor decrements the first node's reference
count, that count will only reduce to 1, and hence the first list node will
not be deleted:
Solving this particular case is not very difficult, as it simply
requires that the circle of links be explicitly broken (at any point)
before the head pointer is destroyed. For example:
{
RefCountedPtrTo<CircularNode> head;
:
// CREATE CIRCULAR LIST
:
head->myNext = 0; // ENTIRE LIST (EXCEPT 1st NODE) RECLAIMED HERE
} // 1st NODE RECLAIMED HERE AS head DESTROYED
This code reclaims everything because the reassignment of head->myNext
decrements the reference count of the second node in the
list to zero, causing it to be automatically reclaimed. During this
deletion, the destructor for the second node's myNext member
(itself a RefCountedPtrTo<CircularNode>) is called, which
decrements the third node's reference count to zero, causing it
to be automatically reclaimed as well. This cascade effect continues along the
list until the last node is destroyed (which automatically reduces
the first node's reference count to 1). When head is subsequently
destroyed (at the end of its scope) the first reference count is again
decremented, this time to zero, which causes the first node to be
automatically reclaimed as well.
Self-referential data structures are a general problem for reference counters, and will typically not be as easy to spot or overcome as in this simple circular list example.
Designing and implementing general, robust and inexpensive garbage collection is a complex and challenging problem, but sophisticated techniques are being developed, and there is even a class library available which implements an efficient, general, and automatic garbage collection system for C++.
See also: Exercise 7
Last updated: Fri Feb 18 11:17:28 2000