|Backplanes or Cables?|
|Originally published Novmeber, 2000|
|¿ 2000, 2005 Carlo Kopp|
The quest for speed in computer hardware is without doubt one of the principal drivers of our industry. Recent advances in achievable processor speeds are now pushing the boundaries of many well established techniques in computer system engineering.
Recent issues explored Moore's Law and related developments in machine architecture. In this month's issue the future of the backplane, and the motherboard, will be scrutinised in a similar manner.
Busses, Backplanes and Motherboards.
The bus is a feature fundamental to modern computer design. Busses come in all shapes and sizes, stated colloquially, but in general most internal busses in a computer share the architectural attribute of being a bunch of parallel wires, each of which carries 1 bit of data and some of which carry control signals.
In a sense, a bus is a digital highway which connects the various functional blocks within the machine. Regardless of the protocol in use, a device must gain access to the bus in order to transfer data to and from another device.
Busses have evolved considerably over recent decades. An early minicomputer bus would have a datapath, the portion of the bus reserved for transferring data, which was either 8- or 16-bits wide. The address path, the portion of the bus reserved for the transfer of addresses between devices in the machine, started out in the region of 16-bits, but very quickly expanded to accommodate growing memory sizes.
Bandwidth is almost everything in a bus, from a performance perspective. There are two basic ways in which the bandwidth of a bus can be increased. Either you make the datapath wider, or you run the bus at a faster speed.
While bus protocols can have important effects on performance, they cannot typically compete with the more dramatic effects of bus width and bus clock speed on throughput.
Consider the effect of quadrupling the width of a bus. At the same speed, and using the same protocol, the bandwidth is increased fourfold. Alternately, consider the effect of clocking the bus four times faster, using the same or a similar protocol, again the bandwidth is increased fourfold.
In practice there are penalties which arise from higher clock speeds and wider datapaths.
Let us first consider the effects of increasing datapath width. The first and obvious penalty is that we require N-fold the number of wires to achieve an N-fold increase in bandwidth. Since these wires have to be fabricated as tracks in printed circuit boards, and also connected to chip packages on these boards, an N-fold increase in the number of datapath wires in the bus requires N-fold greater an area of the printed circuit board, per plane in the board, and N-fold the number of pins on the bus interface chip. This may not result in an N-fold increase in cost, but it will certainly reflect in the final bill. Are there other costs to be paid ? Reliability does depend upon the number of solder joints, pins and vias in a circuit board, and increasing N-fold the number of these will increase N-fold the number of failure items in the board or assembly. What about power dissipation ? Increase the number of lines to be driven by any chip N-fold, and you increase the power dissipated by the line drivers in every bus interface in the machine, N-fold.
Another problem not to be scoffed at is density, especially in tightly packed hardware such as motherboards. Increasingly the bus width may simply become impossible beyond a certain point, as it will become too hard to route the bus lines inside the board.
In practice modern busses range between 64-bits wide and 256-bits wide, beyond which it becomes increasingly difficult to improve upon.
Speed is no less traumatic a performance parameter to improve upon in a bus. The speed at which any bus can be clocked is a complex function of a number of electrical parameters.
The first limit we must grapple with is the speed performance of the driver circuits and receiver circuits, embedded these days inside bus interface chips, or bus interfaces in more complex chips.
The speed of drivers and receivers is largely limited by the type of transistor device used in these, and the electrical circuits used in the design. A driver must be capable of squirting or sinking a large spike of current into or from the highly capacitive load presented by the bus wire, quickly enough to force the voltage on the bus to a logical 1 or logical 0 well within the duration of a single clock cycle. If that cycle is of the order of a nanosecond, then this becomes a challenging task. This is because the easiest way of making transistors faster is to make them smaller, which is at odds with the need to handle large peak electrical currents. The problem is not unlike that seen with microwave power transistors, in that the designer has to deal with the conflicting needs of speed and transistor area on the die. Making transistors big costs speed, making them small can result in a localised pool of molten silicon.
While bus receivers do not need to handle large currents, they do need to have a high electrical gain, to force the output of the receiver to a logical 1 or logical 0 as quickly as possible, once the voltage at the input of the receiver circuit crosses a specified threshold. Gain and bandwidth in transistors tend to be mutually exclusive, in the sense that any given transistor design has a hard limit, called the gain-bandwidth-product. Increase one in a circuit, and you lose the other.
Therefore, the process of driving ones and zeroes at a very high speed through a piece of wire connecting a bus driver chip and a bus receiver chip can become a major challenge for an electrical engineer, in circuit design issues alone. It is worth pointing out that the integrated bus drivers and receivers which many logic hardware designers take for granted as simple building blocks, kept a circuit design engineer in the semiconductor plant awake for many a late night session.
Does the pathology end here ? Not quite, since there is another major electrical design challenge the bus designer must grapple with. This is transmission line behaviour inherent in any electrical wire or circuit board trace of appreciable length when driven at a decent speed.
Transmission line behaviour is a wave phenomenon very common in nature. When an electromagnetic wave hits the boundary between two materials with different electrical properties, part of the wave propagates through the boundary and part of the wave is reflected back in the direction it came from. Indeed, this is the idea upon which radar and time domain reflectometry is built upon.
At any appreciable speed, any wire, cable or track in a circuit board starts to exhibit transmission line behaviour, usually at the point where the length of the connection becomes comparable or greater than the length of a pulse on the wire/cable/track. Travelling down the wire at the speed of light, a one nanosecond pulse will be around 1 foot in length.
Under these conditions the electrical impedance of the line becomes important. Electrical impedance is determined by the geometry of the line and the dielectric properties of the insulator used. A 2.5 mm wide trace on a circuit board using standard fibreglass has around 50 Ohms of impedance, a ribbon cable around 120 Ohms.
Ideally, if the line has a constant electrical impedance, and we place a resistive load with identical impedance at either end of this line, no reflections will occur and whatever waveform we drive into this line appears at the other end with the same shape, albeit a little diminished in size. This is because the load at either end appears to the impinging waveform as a more of the same transmission medium, with the same impedance.
What happens if we change the impedance of the line at some point, for instance by changing the geometry of the connection, putting a kink into it, or removing a load form either end? The place at which the change occurs begins to reflect the impinging wave. What happens if we have more than one impedance change in the line? We get reflections bouncing back and forth along the line.
If we observe this behaviour at either end of the cable, we will find that it results in nasty distortion and noise-like hash added to the waveform we put into the line. This distortion can result in spurious doubling of leading and trailing edges on waveforms, or an actual delay before the logical 1 or 0 is understood by a receiver.
Herein lies the greatest single challenge to running busses at high speeds.
Backplanes and Motherboards
A backplane is a printed circuit board into which other circuit boards are plugged into, at right angles. The backplane contains the parallel tracks of a bus. Connectors soldered or pressed into the backplane allow each board to tap into the bus connections in the backplane. While the backplane itself usually contains no driver or receiver chips, the circuit boards plugged into it, with CPUs, memory arrays and I/O devices, always do.
If we are clocking the backplane bus at 10, 20 or 50 MHz, or even a bit faster, this is fairly tolerable environment from a transmission line perspective. This is because the backplane is electrically short relative to the waveforms we drive into it, and therefore little distortion will arise. Such backplanes were the basic building block of 1970s and 1980s super-minicomputers used as servers, and since then, most large multi-processing servers.
What happens if we try to drive it faster, say at 100, 200 or more MegaHertz ? The first and obvious problem is that every connector in the backplane results in a localised change in the electrical impedance of every single connection in the backplane. This means that the backplane becomes noisy with reflections. If it is long enough, and we drive it fast enough, then another problem will arise, termed skew. A receiver which is closer to a driver asserting a 1 or a 0 on this bus will see it before a receiver which is further down the bus. Getting the timing right then becomes a serious challenge in itself.
Dr Seymour Cray was the first designer to run into genuine problems with skew, in the early Cray machines. The characteristic round shape of a Cray 1 or 2 is the result of an annular shaped backplane placed into the core of the machine, into which the circuit boards with logic on them are plugged into. Not unlike petals on a flower. By wrapping the backplane around a cylinder, Cray was able to get the minimal bus length geometrically possible for his machine design.
Motherboards are much smaller than backplanes, but create no less daunting problems. While the electrical length of a motherboard is much smaller than that of a backplane, unlike the backplane which has nice parallel connections of constant thickness, usually sandwiched between copper ground planes, the motherboard is typically the proverbial cats breakfast' of wiring. Since the motherboard has a number of chips on it, each with variously irregular pin layouts, a bus line must snake it's way from chip to chip all over the board. With a 64- or 128-bit wide bus, this can become an almighty bird's nest of connections within the board.
Every time a pair of connections must cross, they have to routed through different layers in the board. As a result of every jump between layers, we get a localised change in impedance thus resulting in reflections. Since the wiring geometry is irregular, there can be large variations in length between lines, creating not only skew but also different load behaviour. Like all nice wiring birds' nests, crosstalk between wires becomes an issue.
Having done considerable work with backplane and especially motherboard designs, in a previous life, I am undecided as to which is the biggest headache to design and debug ! Perhaps it is best to say that both have their better moments.
What is the speed limit for backplanes and motherboards ?
In practice, clocking them beyond 100 MHz becomes difficult. While transmission line behaviour, crosstalk and skew are headaches within themselves, the challenges of driving signals out of chips at such speeds also become daunting.
With 1 GHz and faster CPUs now becoming available, bus bandwidth off the chip will become the biggest barrier to computer system performance growth.
Breaking The Speed Barriers
What choices do contemporary machine designers have to get more speed out of busses? Several means exist to tackle the problems which we see with established motherboard and backplane technology.
A technique which has become very popular in Unix workstation and server designs is that of replacing the snaking bus connections between processor, memory and device controller chips on the board with a high density crossbar switch device. Instead of having each and every device tap into a single wire, each and every device uses a simple and dedicated set of short connections to pins on the switch device. When one device wishes to transfer data with another, it routes the data through the switch device. In this manner, the wiring is much simplified but also made much shorter, allowing for much faster clock speeds. The price to pay for this is that a large, expensive and power hungry crossbar swith array sits somewhere in the middle of the board.
A solid state crossbar switch solves one set of problems but introduces another, in that we end up with a monstrously large device with hundreds of pins, to support even a modest number of CPUs, memory controllers and I/O controllers. For the foreseeable future, switches will dominate the server motherboard market.
An alternative play is to replicate the traditional backplane strategy within the motherboard, but in miniature. If we plug the CPU modules, memory array modules, and perhaps some I/O controllers into the motherboard, and make them all small enough, we can produce a miniature implementation of the parallel wire backplane which we know and love. Shrunken to motherboard dimensions and bus line lengths of inches, many of the troubles can be staved off by a decent number of tens of MegaHertz.
What is the next step beyond these strategies, seeing that inevitably they too will run out of speed over the next few years?
One approach is to adopt a trick used by designers of Emitter Coupled Logic (ECL) boards for at least two decades. This play is the idea of reverting to cables for interconnections between modules, rather than hard wiring in a printed circuit board. E.g. in a 400 MHz clocked ECL logic board, critical data and clock connections would be run across the top of the board using identical lengths of coaxial cable, bypassing the pathology of embedded connections.
At very high speeds, a good quality ribbon cable, twisted pair ribbon cable, or coaxial ribbon cable exhibits much nicer electrical behaviour than a track embedded in a circuit board. Moreover, every connection within the cable is identical in length, avoiding the skew problem.
In such an arrangement, the wiring and connectors in the printed circuit board are committed to power feeds and earth plane connections. Signals are then run between modules using flat high density cables.
A good example of such a strategy being applied at a system level is the new SGI design approach in the NUMA series machines. These are large servers or power user machines, which are logically and functionally divided into modules termed bricks. A brick contains its own internal motherboard carrying devices such a CPU and memory chips, a power supply and its own cooling. The bricks are then interconnected using high speed parallel cables, each of which is clocked at hundreds of MegaHertz and providing point to point throughput claimed to be as high as 3.2 GigaBytes/s.
The reality of trying to cram dozens or hundreds of CPUs into a large modular multiprocessor was incompatible with any shared system level backplane design, for the required throughput between modules.
Large systems such as the NUMA series aside, we are likely to see a growing trend toward very high density and high speed parallel flat cables for connections inside high performance machines. Not surprisingly, the latest SCSI-3 incarnation is based upon a star topology using connections from an electrical bus interface device to multiple drives, departing fundamentally from the traditional SCSI cable snaking from drive to drive.
It is not inconceivable that a desktop system in the near future will be build around a stack of modules plugged into a power distribution motherboard, with short high density flat cables running busses between specific modules. For connections between CPUs, memory and top level I/O bus controllers, this may be the only way to achieve the required bandwidth.
It is also not inconceivable that future processor or I/O device chip carriers or packages might end up with built in connectors to support such cable connections directly.
The current market growth in 1 GHz class CPUs is clearly pushing the limits of established packaging and interconnection technology. It will be most interesting to observe developments in coming years.
|$Revision: 1.1 $|
|Last Updated: Sun Apr 24 11:22:45 GMT 2005|
|Artwork and text ¿ 2005 Carlo Kopp|