Multiple Dispatch and Subroutine Overloading in Perl

Damian Conway

School of Computer Science and Software Engineering
Monash University
Clayton 3168, Australia

damian@csse.monash.edu.au
http://www.csse.monash.edu.au/~damian

Abstract

Sometimes Perl's standard polymorphic method dispatch mechanism isn't sophisticated enough to cope with the complexities of finding the right method to handle a given situation. For example, in a graphical user interface, objects representing events may be passed to objects representing windows. What happen next depends on the type of window, but also on the type of event. It's not enough to invoke a receive_event method on the window object, since that won't distinguish between the various possible kinds of event. Nor is it sufficient to invoke a send_to_window method on the event object, since that won't distinguish between the various possible kinds of window. What's needed is the ability to polymorphically select a suitable method for the appropriate combination of window and event types. This paper describes a new CPAN module--Class::Multimethods--that provides such a mechanism.

What is multiple dispatch?

In object-oriented Perl, the selection of which subroutine to call in response to a method invocation (e.g. $objref->method(@args)) is performed polymorphically. That means the subroutine that's invoked is the one defined in the class that the invoking object belongs to. So a call such as $objref->method(@args) invokes the method CLASSNAME::method, where "CLASSNAME" is the class name returned by ref($objref).

If the class in question doesn't have a suitable method, then the dispatch procedure searches upwards through the various ancestors of the original class, looking for an appropriate subroutine. If that search fails, the dispatch procedure attempts to invoke an AUTOLOAD subroutine somewhere in the invoking object's inheritance hierarchy.

The important point is that, whichever subroutine the method dispatcher eventually selects, it was all determined by the class of the original invoking object (i.e. according to the class of the first argument).

For most applications, the ability to select behaviour based on the type of a single argument is sufficient. However, some applications, such as the GUI event handler mentioned above, need to select the most applicable polymorphic method on the basis of more than one argument. This behaviour is known as multiple dispatch.

Generally speaking, multiple dispatch is needed whenever two or more objects belonging to different class hierarchies are going to interact, and we need to do different things depending on the combination of actual types of those objects. Typical applications that need this kind of ability include graphical user interfaces, image processing libraries, mixed-precision numerical computation systems, and most types of simulations.

It's possible to build "hand-crafted" multiply dispatched methods that look at the types of each of their arguments and react accordingly. For example, a normal (singly-dispatched) method could use ref or isa to determine the types of its other arguments and then select the correct behaviour in a cascaded if statement. Alternatively, it's possible to use hashes of hashes to set up a multidimensional table of subroutine references, then use the class names of the arguments (again found with ref) to index into it. Both these approaches are described in detail in [1,2].

The problem is that such hand-crafted mechanisms are complicated to construct and even harder to debug. And because every hand-built method is structurally similar, they're also tedious to code and maintain. Life would be very much easier if it were possible to define a set of identically named methods with distinct parameter lists, and then the program would "automagically" find the right one. Such a set of multiply dispatched methods is known as a multimethod, and each alternative method in the set is known as a variant.

But Perl has no mechanism for specifying parameter types, or for overloading subroutine names. And certainly there's no mechanism for automatically selecting between several (hypothetical) overloaded subroutines on the basis of the inheritance relationships of those (unspecifiable) parameter types.

Until now.

The Class::Multimethods module

The Class:Multimethod module[3] exports a subroutine (named multimethod) that can be used to declare other subroutines that are to be multiply dispatched.

The new dispatch mechanism looks at the classes or types of each argument to the multimethod (by calling ref on each) and determines the "closest" matching variant of the multimethod, according to the parameter types specified in the variants' definitions (see below for a definition of "closest").

The result is something akin to C++'s function overloading but more sophisticated, since multimethods take the inheritance relationships of each argument into account. Another way of thinking of the mechanism is that it performs polymorphic dispatch on every argument of a method, not just on the first.

Defining multimethods

Class::Multimethods::multimethod can be used to specify multimethod variants with the dispatch behaviour described above. It takes the name of the desired multimethod, a list of class names, and a subroutine reference, and uses this information to generate a corresponding multimethod variant within the current package.

For example:

package LargeInt;    @ISA = (LargeNum);
package LargeFloat;  @ISA = (LargeNum);

package LargeNum;
use Class::Multimethods;

multimethod
        divide => (LargeInt, LargeInt) =>
                sub {
                        LargeInt::divide($_[0],$_[1])
                };

multimethod
        divide => (LargeInt, LargeFloat),
                sub {
                        LargeFloat::divide($_[0]->AsLargeFloat(), $_[1]);
                };
This creates a (single) multimethod called LargeNum::divide with two variants. If the multimethod is called with two references to LargeInt objects as arguments, the first variant is invoked. If the multimethod is called with a LargeInt reference and a LargeFloat reference, the second variant is called.

Calling the multimethod with any other combination of LargeNum reference arguments (e.g. a reference to a LargeFloat and a reference to a LargeInt, or two LargeFloat references) results in an exception being thrown, with the message:

No viable candidate for call to multimethod LargeNum::divide
To avoid this, a "catch-all" variant could be specified:
multimethod
        divide => (LargeNum, LargeNum) =>
                sub {
                        LargeFloat::divide($_[0]->AsLargeFloat(), $_[1]->AsLargeFloat());
                };
Now, calling LargeNum::divide with (for example) a LargeFloat reference and a LargeInt reference causes this third variant to be invoked. That's because a LargeFloat is-a LargeNum (so the first argument is compatible with the first parameter), and a LargeInt is-a LargeNum too (so the second argument is compatible with the second parameter). Note that adding this third variant doesn't affect calls to the other two, since Class::Multimethods always selects the nearest match (see the next section for details of what nearest means).

This general "best fit" behaviour is extremely useful, because it means you can code the specific cases you want to handle (e.g. (LargeInt, LargeFloat)), and then provide one or more "catch-all" cases (e.g. (LargeNum, LargeNum)) to deal with any other combination of arguments. The multimethod automatically picks the most suitable variant to handle each actual argument list.

Finding the "nearest" multimethod

Of course, the usefulness of the entire system depends on how intelligently Class::Multimethods decides which version of a multimethod is "nearest" to a given set of arguments. That decision process is called dispatch resolution, and Class::Multimethods does it like this:
  1. If the types of the arguments given (as determined by applying ref to each in turn) exactly match the set of parameter types specified in any variant of the multimethod, that variant is immediately called.
  2. Otherwise, Class::Multimethods compiles a list of viable targets. A viable target is a variant of the multimethod with the same number of parameters as there were arguments passed. In addition, for each parameter the specified parameter type must be a base class of the actual type of the corresponding argument in the actual call (i.e. each argument must belong to a subclass of the corresponding parameter type).
  3. If there is only one viable target, it is called immediately. If there are no viable targets, an exception is thrown indicating that the multimethod can't handle the specific set of arguments.
  4. Otherwise, Class::Multimethod examines each viable target and computes its inheritance distance from the actual set of arguments. The inheritance distance from a single argument to the corresponding parameter is the number of inheritance steps between their respective classes (working up the tree from argument to parameter). If there's no inheritance path between them, the distance is infinite. The inheritance distance for a set of arguments is just the sum of their individual inheritance distances.
  5. Class::Multimethod then chooses the viable target with the smallest inheritance distance as the actual target. If more than one viable target has the same smallest distance, the call is ambiguous. In that case, the dispatch process fails and an exception is thrown. If there's only a single actual target, Class::Multimethod records its identity in a special cache, so the distance computations don't have to be repeated next time the same set of argument types is used . The actual target is then called and the dispatch process is complete.

Declaring multimethods

Class::Multimethods doesn't care which packages the individual variants of a multimethod are defined in. Every variant of a multimethod is visible to the underlying multimethod dispatcher, no matter where it was defined. In other words, all multimethod variants share a common namespace that is independent of their individual package namespaces.

For example, the three variants for the divide multimethod shown above could all be defined in the LargeNum package, or the LargeFloat package or the LargeInt package, or in the main package, or anywhere else. They don't even have to be declared in the same package.

Of course, to enable a specific multimethod to be called within a given package, the package must know about it. That can be achieved by specifying just the name of the multimethod (i.e. with no argument list or variant code), much like a standard Perl subroutine declaration:

package Some::Other::Package;
use Class::Multimethods;

# import "divide" multimethod
multimethod "divide";
For convenience, the two steps can be consolidated, and the declaration abbreviated to:
package Some::Other::Package;
use Class::Multimethods "divide";

Subroutine overloading

Class::Multimethod also doesn't care whether multimethods are called as methods or as regular subroutines. This is quite different from the behaviour of normal Perl methods and subroutines, where how you call them determines how they're dispatched.

With multimethods, since all arguments participate in the polymorphic resolution of a call (instead of just the first), it make no difference whether a multimethod is called as a method:

$num3 = $num1->divide($num2);
or a subroutine:
$num3 = divide($num1, $num2);
That means that Class::Multimethods also provides general subroutine overloading. For example:
package main;
use IO;
use Class::Multimethods;

multimethod
        test => (IO::File, DataSource) =>
                sub {
                        $_[0]->print( $_[1]->data() )
                };

multimethod
        test => (IO::Pipe, Queue) =>
                sub {
                        $_[0]->print( $_[1]->next() )
                                 while $_[1]->count();
                };

multimethod
        test => (IO::Socket, Stack) =>
                sub {
                        $_[0]->print( $_[1]->pop() )
                };

# and later...

test($some_handle, $some_data_ref);

Non-class types as parameters

Yet another thing Class::Multimethods doesn't care about is whether the parameter types for each multimethod variant are the names of "real" classes, or just the identifiers returned when raw Perl data types are passed to the built-in ref function. That means multimethod variants can also be defined like this:
multimethod stringify => (ARRAY) =>
        sub {
                my @arg = @{$_[0]};
                return  "[" . join(", ",@arg) . "]";
        };

multimethod stringify => (HASH) =>
        sub {
                my %arg = %{$_[0]};
                return  "{" . join(", ", map(    "$_=>$arg{$_}", keys %arg) ) . "}";
        };

multimethod stringify => (CODE) =>
        sub { return "sub {???}" };

# and later...

print stringify([1,2,3]);
print stringify({a=>1,b=>2,c=>3});
print stringify($array_or_hash_ref);
In other words, the names of built-in types (i.e. those returned by ref) are perfectly acceptable as multimethod parameters. That's a nice bonus, but there's a problem. Because ref returns undef when given any literal string or numeric value, the following code:
$str = "a multiple dispatch oddity";
print stringify( 2001 );
print stringify( $str );
will produce a nasty surprise:
No viable candidate for call to multimethod stringify() at line 1
That's because the dispatch resolution process first calls ref(2001) to get the class name for the first argument, and therefore thinks it's of class undef. Since there's no stringify variant with undef as its parameter type, there are no viable targets for the multimethod call. Hence the exception.

To overcome this limitation, Class::Multimethods allows three special pseudo-type names within the parameter lists of multimethod variants. The first pseudo-type--'$'--is the class Class::Multimethods pretends any scalar value (except a reference) belongs to. Hence, the following definition makes the two recalcitrant stringifications of scalars work correctly:

multimethod stringify => ('$') =>
        sub { return qq("$_[0]") };
With that definition in place, the two calls:
print stringify( 2001 );
print stringify( $str );
would produce:
"2001"
"a multiple dispatch oddity"
That solves the problem, but not as elegantly as it might. It would be better if numeric values were left unquoted. To this end, Class::Multimethods offers a second pseudo-type--"#"--which is the class it pretends numeric scalar values belong to. Hence, the following additional variant removes the quotes from stringified numbers:
multimethod stringify => ('#') =>
        sub { return $_[0] };
The final pseudo-type--"*"--is a wild-card or "don't care" type specifier, which matches any argument type exactly. For example, we could provide a "catch-all" stringify variant (to handle "GLOB" or "IO" references, for example):
multimethod stringify => ('*') =>
        sub {
                croak   "can't stringify a " . ref($_[0]);
        }
Note that, even though the "*" variant matches any possible argument type, it does so with a greater inheritance distance than any other possible match. In other words, a "*" variant is a last resort, used only if every other variant is unviable.

Recursive multiple dispatch

As defined above, the stringify multimethod still fails rather badly on nested data structures. For example:
print stringify(
        {  a => [1,2,3],
           b => {b1=>4,b2=>5},
           c => sub{3}
        }
);
will print out something like:
{a=>ARRAY(0x1001c23e), b=>HASH(0x10023ae6), c=>CODE(0x10027698)}
because when the hash reference is passed to the HASH variant of stringify, each of its keys and values is interpolated directly into the returned string, rather than being individually stringified.

Fortunately a small tweak to the ARRAY and HASH variants solves the problem:

multimethod stringify => (ARRAY) =>
        sub {
                my @arg = map { stringify($_) } @{$_[0]};                 
                return  "[" . join(", ",@arg) . "]";
        };

multimethod stringify => (HASH) =>
        sub {
                my %arg = map { stringify($_) } %{$_[0]}; 
                return  "{" . join(", ", map("$_=>$arg{$_}", keys %arg)) . "}";
};
The difference here is that each element in the array or hash is recursively stringified (within the map operation) before the container itself is processed. And because stringify is a multimethod, there's no need for any special logic inside the map block to distinguish the various possible nested data types. Instead, the recursive calls automatically select the appropriate variant for each element, so nested references and values are correctly processed. So now the call:
print stringify(
        {  a => [1,2,3],
           b => {b1=>4,b2=>5},
           c => sub{3}
        }
);
prints:
{"a"=>[1, 2, 3], "b"=>{"b1"=>4, "b2"=>5}, "c"=>sub{???}}

Resolving ambiguities and non-dispatchable calls

It's relatively easy to set up a multimethod such that particular combinations of argument types cannot be correctly dispatched. For example, consider the following variants of a multimethod called put_peg:
class RoundPeg;     @ISA = ( 'Peg' );
class SquareHole;   @ISA = ( 'Hole' );

multimethod
        put_peg => (RoundPeg,Hole) =>
                sub {
                        print "round peg in hole\n"
                };

multimethod
        put_peg => (Peg,SquareHole) =>
                sub {
                        print "peg in square hole\n"
                };

multimethod
        put_peg => (Peg,Hole) =>
                sub {
                        print "a peg in a hole\n"
                };
If put_peg is called like this:
my $peg  = RoundPeg->new();
my $hole = SquareHole->new();

put_peg($peg, $hole);
then Class::Multimethods can't dispatch the call, because it cannot decide between the variants (RoundPeg,Hole) and (Peg,SquareHole), each of which is the same inheritance distance (i.e. 1 derivation) from the actual arguments.

The default behaviour is to throw an exception like this:

Cannot resolve call to multimethod put_peg(RoundPeg,SquareHole).
The multimethods:
        put_peg(RoundPeg,Hole)
        put_peg(Peg,SquareHole)
are equally viable
Sometimes, however, the more specialized variants are only optimizations, and a more general variant (in this case, the (Peg,Hole) variant) would suffice as a default where such an ambiguity exists. In such situations, it's possible to tell Class::Multimethods to resolve the ambiguity by calling that general variant.

The resolve_ambiguous subroutine is automatically exported by Class::Multimethods and is used like this:

resolve_ambiguous
        put_peg => (Peg,Hole);
That is, it takes the name of the multimethod being "disambiguated", and the parameter list of the variant that is to be the default for ambiguous cases. Of course, the specified variant must actually exist at the time of the call. If it doesn't, Class::Multimethod ignores it and throws the usual exception.

Alternatively, if no variant is suitable as a default, some other (non-multimethod) subroutine can be registered instead:

resolve_ambiguous
        put_peg => \&disambiguator;
Now, whenever put_peg can't dispatch a call because it's ambiguous, disambiguator will be called instead, with the same argument list as put_peg was given.

Of course, resolve_ambiguous doesn't care what kind of subroutine it's given a reference to, so you can also use an anonymous subroutine:

resolve_ambiguous
        put_peg => sub {
                print "can't put a ", ref($_[0]), " into a ", ref($_[1]), "\n";
        };
Dispatch can also fail if there are no suitable variants available to handle a particular call. For example:
my $peg  = JPEG->new();
my $hole = Loophole->new();

put_peg($peg, $hole);
which would normally produce the exception:
No viable candidate for call to multimethod put_peg(JPEG,Loophole)
since classes JPEG and Loophole aren't in the Peg and Hole hierarchies, so there's no inheritance path back to a more general variant.

The resolve_no_match subroutine, which is also exported from Class::Multimethods, can be used to set up a handler for such cases. For example:

resolve_no_match
        put_peg => sub {
                my ($p, $h) = map {ref} @_;

                $_[0]->display($_[1]) 
                                if $p =~ /[JM]PEG/;

                call_plumber()
                                if $p eq 'ClothesPeg' && $h eq 'DrainHole';

                # etc.
        };
As with resolve_ambiguous, the variant or subroutine registered with resolve_no_match is called with the same set of arguments that were passed to the original multimethod call.

Debugging a multimethod

Class::Multimethods provides a (non-exported) subroutine called analyse, which takes the name of a multimethod and generates a report (to STDERR) listing the behaviour of that multimethod under all feasible combinations of its various potential arguments. For example, given the definitions of the test multimethod shown earlier, a call to:
Class::Multimethods::analyse("test");
will print out an analysis of the dispatch behaviour for all possible combinations of an IO::File, IO::Pipe, or IO::Socket object (as the first argument), and a DataSource, Queue, or Stack object (as the second argument). Furthermore analyse will examine the class hierarchies of which these classes are a part, and generate test cases for any ancestral or descendant classes as well. For instance, for the first argument it will also test objects of the classes IO::Handle, and IO::Seekable, (since these are both ancestral classes of IO::File), and for the second argument it might also test objects of the classes PriorityQueue and FixedLengthQueue, if these where derived from the Queue class.

The analyse method iterates through every possible combination of argument types and reports which variant (if any) would have been called for that set of arguments. Combinations that result in ambiguities or failure to dispatch are reported separately. Even more usefully, for argument sets where a single variant would be successfully dispatched, analyse also reports any other viable candidates (i.e. other variants that could handle the call, but which were at a greater inheritance distance from the argument list, and so not selected). This can be especially useful in determining why a particular variant was not called as expected.

Conclusion

Multiple dispatch is a specialized technique that handles a small but important class of problems where two or more objects drawn from different hierarchies must interact polymorphically. Although Perl doesn't provide an built-in multiple dispatch mechanism, one can be added to it.

The Class::Multimethods module enables variants of a multimethod to be declared and used, either as object methods or as independent, overloaded subroutines. It provides a sophisticated breadth-first dispatch resolution mechanism and allows the implementor to dictate resolution strategies when dispatch would normally fail.

The module is available from the CPAN.

References

  1. Conway, D., Object Oriented Perl, Chapter 13, Manning Publications, 1999.
  2. Conway, D., Multiple Dispatch in Perl, The Perl Journal (to appear).
  3. http://www.perl.com/CPAN/authors/id/DCONWAY/