<1999<   ^up^

Computer Controlled Car

ECS4397 BCSE Thesis
(16 credit points)

by D K L TUNG

BCSE Honours 2000,
Department of Computer Science and Software Engineering,
Monash University(Clayton),
Australia.

Supervised by

Dr. L Allison

Second Reader

Dr. R Lisner

Submitted in October 2000
This is a HTML Document

http://www.csse.monash.edu.au/hons/projects/2000/Daniel.Tung/





ABSTRACT This paper presents an overview of the computer controlled car project. It describes the process involved in bridging the existing system to a PC and details a vision-based track extraction method. A software-driven navigation unit is implemented which enabled a RF car to drive itself autonomously by following a track. The use of Autonodes is motivated.

The software-driven navigation unit generated a maximum 25.01 moves per second. It was found that this speed was dominated by the time required to capture one frame. The overall system integration was completed on a Pentium III 500Mhz computer running Linux (RedHat 6.2)




Acknowledgments

I would like to thank my supervisor Dr. L Allison for his advice and support throughout the year. Thank you also to Jamie Scgulia, Gary Evans and David Arnold for their technical support and prompt responses.

I also would like to thank my sister Karen who always cheered me up when I was down. Thank you for being there.

And last of all, I reserve my deepest thanks for my parents for their continuous encouragement and support throughout these five years of my university life. Therefore I would like to dedicate this thesis to them.

Table of Contents

1. Introduction

2. Project Descriptions

2.1 Previous Work
2.2 Assumptions
2.3 Project Goals
2.4 Achievements

3. Research and Development

3.1 The Existing Architecture of CCC
3.2 PCI Frame-grabber Choices
3.3 Track Detection Methodologies

4. The PC Re-Implementation

4.1 DIN8 - RS232
4.2 PCI Frame-grabber Configuration
4.2.1 Configuring the Linux Kernel
4.2.2 Linux Module Initialization
4.2.3 Video For Linux (V4L)
4.3 C++ Class Design
4.4 Testing Strategies

5. Vision Based Software-Driven Navigation Design

5.1 Track Detection
5.1.1 Image Analysis - How to speed it up?
5.1.2 Autonodes - idea and design
5.1.3 Autolines
5.2 A Threaded Design Model
5.2.1 Low Level Controller (Consumer-Producer Topology)
5.3 Controlling the Car
5.3.1 Presence of Noise

6. Results and Discussions

6.1 Speed of Capture
6.2 Speed of Autolines
6.3 Speed of drawing one frame to screen
6.4 Speed of Navigation

7. Conclusion

8. References

9. Glossary

10. Appendices

Appendix A: Instructions for compiling and using the program
Appendix B : More capture images
Appendix C : Capture speed plots
Appendix D : Autoline speed plots




Chapter 1. Introduction

The idea of utilizing a computer to achieve a fully automated vehicle is not new. It was the subject of various research projects during the past few decades [1,2]. This project, Computer Controlled Car (CCC), is a continuing project. There has been on-going development since 1998. In previous years [5], a basic prototype was developed with sufficient functionality that a computer was able to control the movement of a RF remote controlled car at low speed.

This year, the focus is on bridging the existing implementation to work with a PC. The reasons behind this is two-fold: Firstly, in the light of rapid explosion in CPU power for PCs, their speed-cost ratio is more likely to increase in the future. Thus, a PC can provide a more cost-effective platform for future extension to CCC. Secondly, in previous years, the difficulties in obtaining in-depth documentations for video capture provided by proprietary software were apparent [5]. By utilizing an open source video capture library, it is hoped that a PC re-implementation can eliminate the performance bottleneck in video capture. Consequently, it may provide a more accessible platform for future research and testing.

Another goal this year is to produce a better vision based software navigation unit that may utilize some "smarter" method to perform track extraction. It is hopeful that this navigation unit can be served as a basis for further research in track extraction and collusion detection.

Research was conducted into how track extraction can be sped up. Different ways to perform track extraction were examined. Eventually, a method that takes advantage of several aspect of image analysis was implemented.

The technologies that enable CCC are:

  • low cost frame-grabbing devices,
  • low cost CCD camera, UHF/VHF modulator and transmitter,
  • RF controlled devices,
  • reasonable computation power.

This report is organized into the following chapters. Chapter 2 summarizes the background information of this project. Chapter 3 presents the research and development carried out for a PC re-implementation. Overviews on the existing system's architecture and the current track extraction method are provided. Chapter 4 addresses the process carried out for the PC re-implementation, with detailed illustrations on how to configure a frame-grabber card. This is followed by a brief summary on the design of various C++ classes. Chapter 5 discusses the navigation control unit and the properties of Autonodes and Autolines. A detail investigation into Autonode is presented. Chapter 6 presents the results and discussions of measuring the software navigation unit. Finally, Chapter 7 summarizes the achievements and completed goals.

Chapter 2. Project Descriptions

2.1 Previous Work

This project is an extension of the project completed by Tim Bruton [5]. The existing system was built during 1999. It included a prototype car, a modified hand-held controller with a DIN-8 interface and a video transmission unit.

A Silicon Graphics Indy Computer was utilized to interpret the captured images. The VL library API, also provided by Silicon Graphics, was used to program the capturing routines. No X user interface was implemented. All user interactions were provided on the console window.

A simple navigation sub-system was implemented. The recognition of the track was accomplished in 2.8ms by analyzing 188 pixels on every frame received. A parent/child architecture was utilized (by forking) to provide a more efficient model so that the system can process the image signal and control the car concurrently.

By the end of 1999, the car was capable of driving itself at a speed of 0.4m/s [5]. For more information on the existing prototype, refer to Section 3.1 Existing Architecture of CCC.

2.2 Assumptions

Assumptions are made in this project to overcome some important issues. These assumptions are outlined in the following paragraphs.

  • Assume lighting conditions is sufficient.
    Lighting condition is important. Since the navigation unit completely relies on visual clues, the correct operation of the car will fail mysteriously under insufficient lighting environment.

    Ultimately, the view that the car sees is provided by the camera. Thus, the sensitivity of the camera readily suggests what level of lighting condition is sufficient. For cameras that are equipped with automatic gain control, such as the one being utilized in this project, the image is stabilized automatically to an acceptable level of contrast and brightness.

    Consequently, the assumed condition is that if there is sufficient light for a human to drive safely, there should be sufficient light for the navigation unit to operate correctly.

  • Assume high speed machines are used
    The software navigation unit requires a reasonable amount of CPU time in order to drive the car effectively. The PCI video capture card also consumes a reasonable amount of bandwidth on the memory bus, considering it was capturing constantly at 24fps. The system implemented runs on a Pentium III 550Mhz machine. The speed of these machine greatly helps in giving the car a "smooth drive".

    2.3 Project Goals

    The goals of this project are :

    1. to bridge the existing system to work with PC
    2. to implement a navigation unit that make use of some “smarter” algorithms.

    2.4 Achievements

    The existing system implementation was successfully bridged to work on a PC running Linux (RedHat6.2). A completely new navigation unit was built.

    A bttv video capture card was purchased after a survey of six frame-grabber cards on the market. A multi-threaded X window user interface was constructed which also provided fast access for video capture. Autonode was devised to simplify and localize the tracking logic. Autoline was derived to enable Autonodes to work in a more cooperative manner.

    The time to acquire one frame was verified to vary under different reception signal strength. Profiling was completed on the inner loop of the navigation unit.

  • Chapter 3. Research and Development

    3.1 Existing Architecture of CCC

    As mentioned in Chapter 2, the existing prototype had three main components: a RF car, a video transmission unit and a modified hand-held controller. This section further describes the existing prototype architecture and the flow of information through different parts of the system.

    The prototype car itself had a camera and a video modulator mounted on board. When the video modulator began transmitting, the signal would be picked up by a VCR. This signal was then fed into a Silicon Graphics INDY computer.

    By analyzing the real-time images provided by the on board camera, a simple software navigation system was implemented. A sequence of moves were forwarded to the RF controller which in turn controlled the movement of the car.

    The basic architecture of the overall design remains the same this year. The following diagram illustrates how different parts of the system integrate together.

    Diagram 1 : Existing System

    3.2 PCI Frame-grabber Choices

    A frame-grabber is a device that enables a computer to interpret a video signal. Since normal PCs do not readily support video input, a PCI based frame-grabber had to be purchased so that a video link could be established between a PC and the camera mounted on the car. This section describes the choices made in choosing a frame-grabber card.

    3.2.1 Frame-grabber Testing

    A PCI frame-grabber card had to be purchased to allow a PC implementation of the visual navigation unit. In choosing a capture card, the main considerations are performance, cost and availability of documentations.

    Throughout the first six months of CCC, there is a on-going search for a suitable capture card. Ideally, the capture card should be affordable with a reasonable performance rating. Several capture cards were tested under Linux and Windows 98. Refer to the table below for suitability ratings.

    The capture cards considered are :

    • Xtreme98 cost AUS$245 from Harvey Norman.
    • Huapuange cost AUS$450 from Harvey Norman.
    • FlyVideo II EZ cost AUS$302 from Dicksmith (includes a CCD camera).
    • Winnov VO
    • FlyVideo 98 cost AUS$98.70 from Myer.

    Card ID Price
    Suitability
    ratg*

    best-10
    Comment
    Winnov VO (borrowed from ECSE department - 7 May) $500+ not really suitable
    Matrox Meteor II 
    (Dongong - 6 June)
    $600+ -1  too expensive
    Xtreme98 
    (Harvey Norman - 10 July)
    $300+ N/A  did not test its performance.
    FlyVideo EZ 
    (Dicksmith - 10 August)
    $302 comes with a low quality color CCD camera, two composite connectors only. Not same as FlyVideo98
    Haupuange TV 
    (Harvey Norman - 21 Aug)
    $200+ N/A  did not test its performance
    FlyVideo98
    (Myer - 22 August)
    $98.70 9 with TV tuner as bonus!
    * Suitability rating refers to whether the capture card fits the application of CCC, taken into account its cost and performance. All tests were performed on a pentium133 with 48mb ram running Linux 2.2.16 and Windows 98. No on board hardware compression were utilized. The goal here was to let the capture card to fill up a given buffer as quick as possible and immediately blitz that to the screen. XShm was used under X for Linux.
    Table 1 : A Survey of several frame-grabbers in the market

    Winnov VO
    The first capture card tested was the "Winnov VO" from
    Winnov. This card had no sound support and it cost around AUS$500. The performance of this card was reasonable. It used more than 75% of CPU time giving a ~20fps frame rate. At this rate, the CPU utilization was 75% (which was quite high). Nevertheless, the heavy use of system resources during capture was quite apparent in most of the frame-grabber encountered. With this particular card, only 25% CPU time were available for track extraction and navigation control.

    Considering the CPU power required for track extraction and navigation control, a 25% CPU time was not very practical. Frame skipping was then tested along with a few performance estimations. Obviously, if there was a better solution, this card was not ideal.

    FlyVideo EZ
    The second card tested was the FlyVideo EZ from Lifeview. This card utilized a bt878 chip, which claimed to have no CPU overhead [10] for projecting captured images onto a screen buffer. This card provided of two composite inputs only. It also came with a low-end CCD camera.

    The performance of this card was quit impressive considering its price, AUS$302. However, the bonus CCD camera was definitely not suitable for CCC. This is because it was not equipped with automatic gain control. (if lighting condition changed, i.e: entering a shadow area, the received image became very dark and impossible to interpret intelligently)

    FlyVideo 98
    During August, it was decided to purchase the Flyvideo 98 capture card from LifeView. This card was also equipped with a bt878 chip on-board. Its design was, however, quit different from the FlyVideo EZ mentioned above. It accepted three inputs and one output which were TV Tuner (PAL), Composite1, S-video and sound respectively. This card was equipped with a software adjustable TV tuner. This suggested that the video signal pick up could be done by the card directly. Therefore, the VCR stage as in Diagram 1 was eliminated. Images were captured by directly tuning into the transmitting frequency.

    Apart from that, the bt878 chip on-board provided a fast image capturing capability. As mentioned before, the distinct advantage of utilizing a bt878 is that it has a minimal (or none) CPU overhead when capturing [10]. When initially tested, this board gave a ~20fps with only 15% CPU utilization. An excellent 85% CPU time was spared, which should be sufficient for navigation control related task.

    Finally, this board cost only AUS$98.70 from Myer.

    3.3 Track Detection Methodologies

    An extensive list of track detection methodologies was compiled in previous years [5]. This list was supported by detail comparisons and supportive explanations. In summary, because a number of assumptions were made in CCC, track detection can utilize the attributes inherited from these assumed conditions (such as black colored tracks). Investigation into these assumptions resulted in vast improvements in speed for performing track detection, in comparison to the generalized methods presented on that list.

    Ultimately, the reduced use of CPU time is beneficial for real-time applications. Furthermore, the spared CPU cycles could be utilized for performing other real-time image analysis task such as collusion avoidance.

    A detailed investigation in track detection will not be repeated here. For a detailed summary, refer to [5].

    Chapter 4. The PC Re-Implementation

    This chapter details the implementation of components that have been introduced to CCC for the sake of a PC implementation. The term "frame-grabber" and "video capture card" are used interchangeably.

    Bridging the existing system to work with a PC is a challenging task. The most difficult part is in providing video capture. This is so because it is necessary to have a good understanding in many areas, such as OS design, hardware architectures and streaming process, before an efficient capture could be implemented. Thus, the vast amount of information poses a time consuming job for the implementor. The difficulties are rendered even harder when documentations provided are insufficient, in which case a trial and error method have to be carried out.

    The steps taken in order to provide video capture on a PC are: purchasing a suitable frame-grabber card, re-configuring the OS (Linux kernel), and programming with a set of APIs on the capture device. These are discussed in the following sections. But first, let's consider how the connection between the existing hand-held controller and a computer was re-established via a com port.

    4.1 DIN-8 to RS-232 Cable

    DIN-8 is a set of connection specifications. A DIN-8 port is commonly found in Macs and INDYs. In contrast, a PC is usually equipped with two RS-232, commonly known as a com port. A DIN-8 to RS-232 cable was sought after because the existing hand-held controller was only equipped with a DIN-8 connector output.

    In order to interface this existing hand-held controller with a PC, a DIN-8 to RS-232 connection must be established. Unfortunately, there were no such cables at the stores. A cable was constructed manually by breaking apart a DIN-8 to DIN-8 wire. One end of this wire was soldered onto a female type RS-232 connector. An extensive test on the controller command-set was performed and mapped. The command set available for navigation is summarized in the following table. The port parameters used must be: 9600, none, 8, 1, representing baud, parity, data and stop respectively.

    Value*
    Symb
    Action
    0000 (0)
    HALT NIL
    0001 (1)
    Backward
    0010 (2)
    Forward
    0011 (3)
    Forward (fast mode)
    0100 (4)
    Right turn
    0101 (5)
    Right & Back
    0110 (6)
    Right & Forward
    0111 (7)
    Right & Forward (fast mode)
    1000 (8)
    Left
    1001 (9)
    Left & Back
    1010 (10)
    Left & Forward
    1011 (11)
    Left & Forward (fast mode)
    1100 (12)
    HALT NIL
    1101 (13)
      Same as 1
    1110 (14)
      Same as 2
    *value represents the char value being sent to the serial port. (i.e: value 3 means ’\003’)

    Table 2 : A Command set for controlling the movement of the car

    4.2 PCI Frame-grabber Configuration

    This section describes the process involved in configuring a frame-grabber to work with a PC. An introduction on how to setup a video capture card on Linux is presented. This is followed by a summary on the kernel level API function set - Video for Linux (Version 1) [4]. The main interest here is to present the user sufficient information so that the results of this project can be replicated.

    4.2.1 Configuring the Linux Kernel
    The frame-grabber purchased belongs to a family of capture cards commonly known as bttv cards(with a bt-878 chip on board). Linux (Kernel 2.2.12) comes with a standard bttv.o module (driver). It enables a user to interface with a bttv capture device. However, for some newer capture devices, such as the one purchased, the standard driver fails to work (in both Slackware 7.1 and RedHat 6.2).

    Most of the capture card packages sighted did not come with a Linux driver. It was soon discovered that some patches were available for a few newer bttv cards. This patch was applied on the Linux kernel source. In particular, the bttv.c file had to be edited manually in order to specify the capability of a bttv card and its relevant input channel addresses (i.e: tuner, composite1, s-video). A new bttv.o module was compiled. The following section describes how this was done.

    Compiling The Kernel

    The kernel that came with both Slackware and Redhat Linux distributions was capable of supporting a bttv interface. Both distributions provided a bttv.o module that was compiled in as default. A re-compilation of the entire kernel was not necessary. Only a specific module (bttv.o) had to be re-compiled, in order to specify the hardware address of the inputs.

    Compiling The Module (bttv.o)

    Note that it was necessary to re-compile the bttv.o module because the video capture card purchased was recently new in comparison to the kernel build date. It is expected that in future release of Linux, applying the patch manually is not required. Thus, re-compiling the bttv.o module might not be necessary. Refer to your updated Linux documentation for whether the patch is required. The Linux version utilized in this project was Redhat 6.2 (Linux 2.2.12-20 #5).

    The steps in applying the patch are:

    The new specification is outlined in the following code segment in red.

    // line #527. static struct tvcard tvcards[] = { /* default */ { 3, 1, 0, 2, 0, { 2, 3, 1, 1}, { 0, 0, 0, 0, 0}}, ... snipped... /* Note: only 1 line needs to change, for specifying the input address of the bttv card, i.e: TV-tuner, SVHS, Composite1 */ /* LifeView FlyKit with a Tuner, replace an existing line with this */ { 3, 1, 0, 2, 0x8dff00, {2, 3, 1, 1}, \ {0, 0x8dff00, 0x8df700, 0x8de700, 0x8dff00, 0 }}, ... snipped... };

    Segment 1: Patching bttv.c

    After patching, the module was re-compiled. Do a make module under the kernel base directory which is usually /usr/src/linux/. Note that the default makefile would generate both bttv.o and tuner.o. If this is not the case, you can do a make config or, if under X, make xconfig to include them. When make is complete, copy the new bttv.o and tuner.o module from /usr/src/linux/modules to the default module library directory (/lib/modules/2.2.12-20/misc).

    4.2.2 Linux Module Initialization

    At this point, the new modules are ready for use. Once both modules are initialized, all the necessary hardware address space are readily detectable by the kernel. This section details the initialization settings.

    The easiest way to initialize both modules is to perform the initialization automatically when the system boots up. This can be done by editing the system init file, rc.local, under /etc/rc.d/. Append the following two lines to rc.local.

    	modprobe tuner type=1
    	modprobe bttv card=19 pll=1
    
    All parameters are necessary. Both modules must be initialized as root or via etc/rc.d/rc.local.
    • The parameter passed into tuner.o is type=1. It specifies that it should use PAL instead of NTSC or any other.
    • The two parameters passed into bttv.o are card=19 and pll=1. The first parameter card=19 is directly related to the manual patch applied earlier. It specifies that card number 19 should be initialized, as that particular array position was replaced by the patch with a new set of specs. The second parameter specifies that a 24 Mhz crystal clock is on board.

    Note that these parameters are absolutely necessary for the correct operation for the bttv card. If the above two modules are initialized incorrectly, they can still be re-initialized. But first, both modules must be removed (with root privilege) from the kernel space. Then a re-initialization can be initiated along with the correct parameters. An example is shown below :

    	rmmod -r tuner		// remove the tuner.o module
    	rmmod -r bttv		// remove the bttv.o module
    
    	insmod i2c			// required for bttv.o
    	insmod videodev		// required for bttv.o
    
    	insmod tuner type=1
    	insmod bttv card =10 pll=1
    

    After initialization, video capture should be working straight away.

    4.2.3 Video For Linux (V4L)
    V4L is an
    API layer that interacts directly to the bttv.o module. It provides a standardized set of functions for a programmer to interface with video capture devices, not limited to bttv. In previous years, all frame capture operations were handled by the VL library (provided by Silicon Graphics). This year, an open source video capture library, Video For Linux (version 1) [4] (V4L) is utilized instead. A brief introduction to V4L API is presented in this section.

    The V4L API is very simple. Whenever an operation is to be performed on a bttv device, either sending an instruction or acquiring some data, a ioctl(...) function call is issued. The function parameters specify what operation is to be done. ioctl(...) accepts three parameters.

    • the first parameter represents a unique bttv device ID,
    • the second parameter (a pre-defined macro) instructs V4L what to do, and,
    • the third parameter specifies a data address for a get/set operation.
    The following code segment illustrates a few correct calling syntaxes:

    #include <linux/videodev.h> struct video_capability vCap; struct video_window vWin; // open a bttv device, fd --> a unique ID number int fd =open(devID.c_str(), O_RDWR); // Query device for capability if ( ioctl(fd,VIDIOCGCAP, &vCap) < 0) { cout << "ERROR : VIDIOGCAP error, not a v4l device?" << endl; exit(-1); } // Query Capture window info if ( ioctl(fd,VIDIOCGWIN, &vWin) < 0) { cout << "ERROR : VIDIOCGWIN error" << endl; exit(-1); }

    Segment 2: V4L sample

    A simple overview in using V4L for streaming video is shown below. The detail on how to use V4L is not presented here. Refer to [11] or the CCC source code for more information on how to use the V4L API.

    • open video capture device
    • a sequence of ioctl(...) function calls to probe the capability of a capture card
    • reserved memory space for capturing and different inputs.
    • start the capture loop until exit
    • clean up allocated memory
    • close video capture device

    All relevant frame-grabbing routines were separated into a class (class BttvCapture) encapsulating all the necessary initializations, handshaking and terminations procedures required by V4L. A modulized design was utilized. This facilitated further extensions to CCC. In extracting all machine dependant operations into a particular class and enforcing standard interface, code reuse was being promoted.

    Reason of Choice: Why V4L, not V4L2 ?
    Video for Linux version 1 (V4L) was released during 1997. Since then, it had gone through several revision version. Video for Linux version 2 (V4L2) was released during April 2000. V4L2 implements a new set of API. It has better support on many newer devices. Yet, it was decided not to incorporate it into the implementation due to two reasons: Firstly, it is recently new. During May 2000, V4L2 is still in a beta stage. The wide spread use of this API is not apparent thus only limited articles and documentations are available. Secondly, V4L comes as a standard package on RedHat (Kernel 2.2.12). It is not necessary to re-configurate or re-compile anything.

    4.3 C++ Class Design

    This section presents the software reusable objects that have been implemented. We begin by briefly describing the design of various classes and the reason why they are implemented the way they are.

    The CCC project required a software-driven navigation control. This was the main software component that was aimed from the very beginning. In an attempt to closely follow an object-orientated design, various physical components and their corresponding interfaces were encapsulated into particular classes. For example: in comport.h, class ComPort encapsulated all the com port related initialization and termination routines. An instance of class ComPort provided all the necessary com port functionality including opening, resetting, accessing and closing a com port.

    These classes are reusable objects. An efficient use of these classes can greatly reduce the coding time. For example, class Timer had been used widely in the CCC to perform profiling. The creation of a timer object is very robust and its start-stop behaviour is provided by its function members. An example on how to use class Timer is shown in code segment 3.

    #include "timer.h" main() { Timer t; // create a timer object t.start(); // start timer ... ... some operations to time . .. ... t.stop(); // stop timer cout << "time elapsed : " << t << endl; }

    Segment 3: A Timer example

    Many different classes were generated. They encapsulated different physical objects and provided different functionality. They include ComPort, BttvCapture, Xwindow, FileReader, Timer, Streamer, FastBuffer, Thread, Autonodes, Autolines, etc. These classes are briefly described in Table 3. A discussion on several classes of interest follows. The portability of these classes had been tested under different machines, including SunOS, Linux and OSF1 Alpha.

    Class Name
    Class Usage
    class Timer Initializes either as a "count-down" timer or a normal "start-stop" timer
    class Fastbuf Represents a single 384x288 captured frame. A XShm buffer, could blitz 30 of these to screen in 0.01 seconds. (PIII)
    class Thread<> A pthread wrapper. Provides a robust multi-thread interface
    class Streamer Streams raw captured data from a bttv device
    class FileReader Reads in a 384x288 file in ppm format and forwards that to a Fastbuf object for simulation purposes
    class Xwin An Xwindows class specially designs for CCC. Encapsulates all of the necessary user interactions functions. Multi-threaded on class Thread<>
    class Consumer
    class Producer
    A consumer/producer class. In a Consumer-Producer setting. Sharing a variable with a producer
    class BttvCapture Provides the core bttv interface implementation
    class Car Encapsulates the prototype car
    class Autoline
    class Autonode
    Estimates center of track, for navigation purposes
    Table 3 : C++ Classes

    class Timer
    A Timer object provides the functionality of a timer. It operates in two distinct modes:

    • 1. "start-stop" mode
    • 2. "count-down" mode

    The default behaviour is determined at the point of initialization. If a value is given during its constructor call (representing the seconds to count), a timer object will behave as a count-down timer. Otherwise, if initialized with no parameter, it falls back to the default a start-stop watch behaviour as illustrated in Segment 3.

    The interface design of class Timer is quit direct and self-explanatory. When a timer is operating in count-down mode, it is desirable for the counter to block until a certain period has passed, thus, replicating a delay function.

    This is done by the nanosleep() function provided by <time.h>. It puts a program (or a thread inside a program) to sleep until certain time had elapsed. No "check time now and compare" (spin check) is utilized because it will consume CPU time unnecessarily. The advantage of this implementation is that a timer makes very little use of CPU time when operating in counting down mode.

    class Fastbuf
    This class is particularly important. It provides the buffer space to store the image captured from frame-grabber and manages a list
    Autonodes that are responsible for track extraction. A standard X extension (MIT-SHM) [12] is utilized to provide a fast buffer implementation. During the initial testing, an impressive ~100 frames per second was obtained. (repeatedly blitzing same set of 15 images to screen)

    When a frame is captured into a Fastbuf object, this Fastbuf object is also responsible for calling its link-listed organized Autonodes to perform track extraction. Hence, class Fastbuf also encapsulates several necessary functions for the correct operation of Autonodes.

    class Autonode
    Class Autonode symbolizes our attempt to simplify the track extraction logic. Instances of Autonodes serve to extract the track. They work in a coordinated manner and in cooperation with other instances. All Autonodes are link-listed together inside a (statically declared) nodeHolder variable inside class Fastbuf. For every image received, a group of autonodes are applied on that particular image and a center of track estimate is produced. This estimate is then used for navigation purposes.

    Refer to the section 5.1.2 for more information.

    class Streamer
    Class Stream is an envelop class [13]. It forwards all the relevant function calls to its encapsulated variable of type class BttvCapture.

    Class Streamer provides a public interface for video capture. It has one member variable of type BttvCapture, which implements the actual V4L API call to access a bttv device. Thus, class Streamer itself does not provide video capture. It delivers video capture via its member variable. Any video request on a streamer object is forwarded to this BttvCapture variable.

    Implementing video capture this way (by employing two different classes) maximizes the reusability of the source code. This is so because all device dependant code are bounded inside a single class, namely class BttvCapture. If a new video capturing device was introduced, only this class would have to be re-implemented. Thus, a re-implementation of a single class is sufficient for the overall system to adapt to a new hardware. The rest of the source code could remain unchanged. Consequently, through careful class design, code re-use is being promoted.

    class BttvCapture
    As mention before, this class contains all the hardware dependant operations to interface with a bttv device. Its main task is to capture an image to a designated buffer space as fast as possible. BttvCapture has two frame buffer spaces which are for streaming purposes. Predictably, all the initialization and termination routines are also encapsulated.

    class NavBrain
    This class merges all the different parts together emulating a "brain" function for driving the car. Basically, this class implements a compact loop that performs a set of procedures repeatedly.

    class Thread
    Class Thread is templated on a particular class function. It is responsible for the creation and destruction of new threads inside a program. Class Thread is implemented on top of the standard p-thread function family. However, this implementation of threads has one distinct advantage: It is capable of calling any member function of a particular class which is usually prohibitive in C++ for p-threads. This particular aspect would be discussed in more detail in the sections to follow.

    4.4 Testing Strategies

    Testing was done throughout the implementation stage. Because there were numerous parts that the program was interfacing with (i.e: capture card, com port, threads, navigation sub-system, user interactions), it was decided to build smaller sub-applications for testing a particular class and its functionality. The object-oriented approach taken further facilitated this method of testing, since all the necessary functions for the correct operation of a class were all encapsulated.

    This method of testing had proven to be very useful. Once certain part of a sub-system was working correctly, all relevant files were merged back into the main program tree. This testing method enabled a programmer to probe into any particular aspect of the code more deeply and directly. This is so because the sub-applications written were very short and simple. It became possible to pinpoint and address different aspect of a class. Furthermore, this method of testing provided an alternative way to add new (and possibly unstable) functions into the core program. New functions were added to the main program by firstly carrying out extensive testing in the sub-applications. It was not until a new function was working correctly (and stable), only then would its associated files be merged back into the main program tree. Thus, new functionality can be introduced to the core program in a safer and more thoroughly tested fashion.

    Chapter 5. Vision Based Software-Driven Navigation Design

    This chapter describes the software-driven navigation unit that was built. An overview on how to speed up image analysis is presented. This summarizes the key ideas behind Autonodes. Next, the implementation of Autonode is considered. This is followed by a detail investigation into track extraction using Autonodes.

    5.1 Track Extraction

    In order to navigate a car within a bounded area, a navigation unit must be able to determine the track location and estimate the car's orientation relative to the track. Track extraction refers to the process for estimating a track position.

    5.1.1 Image Analysis - How to speed it up?
    An image usually contains several objects of interest (i.e: track, desk and chair). In order to compute the properties of these objects (i.e: their size, orientation or boundaries), individual objects must first be identified as separate objects. This process in image analysis usually consumes a lot of CPU time. Thus, for any track extraction algorithm to be fast and accurate, it is necessary to investigate the reasons why image analysis take so long to perform. The goal here is to find a method to eliminate any time consuming operations identified.

    The fact that image analysis consumes a lot of CPU time is due to several reasons. Firstly, there is usually a high number of calculations associated with image analysis algorithms. This is further related to the underlying operator(s) that is utilize to extract an edge or to obtain an object from. For example, if we are to differentiate a sub-image region, a subtraction operation is required on every consecutive pixel. If a negative of an image is to be obtained, an inversion on every pixel is required. If an image is to be fourier transformed into its frequency domain, a sequence of operations is applied on a pixel and its neighbors in order to transform it. Naturally, a user expects fourier transform take longer to process compare to an invert operation. This is because it is understood that the underlying operator (and eventually the number of operations performed) associated with fourier transform has a higher number of calculations associated with it.

    Secondly, the area of an image represents a large set of data for computations. If a brute-force approach is taken on applying any image analysis algorithms (i.e: apply it on all pixels regardlessly), this will incur a longer processing time than if such an operation is performed on a sub-area. Furthermore, relating the brute-force approach to track identification, there is an obvious disadvantage : after processing an image, only a fraction of the end result would be useful (i.e: a particular track position) and most other calculated information would be discarded. To illustrate this point, consider the method for converting a full color (RBG) image to grayscale for track identification purposes. Is it absolutely necessary to convert the full image to greyscale before track identification can be done? Or is it sufficient to apply it within a certain sub-region? An efficient and selective use of the data set could spare a reasonable amount of processing time.

    Finally, understanding the nature of video in comparison with static pictures is important. Objects between consecutive frames inside a video sequence usually moves in small increments. In other words, the change in the exact position of an object inside first frame and the second frame is usually small. This information can be utilized to provide a more efficient scheme to perform track searching.

    In CCC, the capture window size is 384x288 pixels per frame (RGB32). What can be done in the design to speed up track search? How can the required computation power be reduced? As for the time-critical navigation control sub-system, the need to implement a fast track identification algorithm is apparent. Can the ideal 30fps scenario be handled?

    Notice when the exact track position is being estimated, a significant speed up can occur if only as few pixels as possible are utilized. Also, in certain circumstances the availability of computation time may be reduced (due to swapping or a higher priority task waiting). Does the intrinsic property of the algorithm efficiently handles such scenarios and adjust its computational load appropriately. In implementing a track extraction methodology, one must keep in mind these possible rules and tricks.

    An Autonode class was designed to provide the basic track extraction logic. Its design is focused on improving and providing workarounds to the above-list issues and requirements. Class Autonode encapsulates all the necessary functionality for performing track extraction and image analysis. Its core design and operations would be discussed in the next section.

    5.1.2 Autonodes - idea and design
    The design of Autonodes and their ideal behaviour are detailed in this section. An in-depth summary on the implementation of Autonode is presented.

    Class Autonode is the result of our attempt to simplify and modulize the track extraction logic. An Autonode encapsulates all the necessary edge detection operators and relevant functions which are essential for the correct detection and eventual tracking of a roadpath.

    Ideally, an instance of Autonode would automatically attach itself to the closest horizontal edge found in range. Once initiated by the user, it is capable of detecting an edge and attaching itself to it. Thus, the exact location of an Autonode represents the edge location found. The navigation unit can utilize this information to steer the car.

    The following diagram illustrates a few Autonodes in action.

    Diagram 1 : Applying Autonodes to an Image file captured during 1999 by Tim Bruton.

    This image illustrates the first version of Autonode implemented. Instances of Autonodes always snap to the side of a track (where the edges are). Multiple instances of Autonodes provide an estimate in which the center of track position can be the located from an image.

    An Autonode is consisted of three by ten pixels, as illustrated in the diagram below. The operations of Autonode are quite simple. Initially, a 1-D edge operator is applied to the first row. If an edge is not found inside this ten by one pixel line, the Autonode is then allowed to search (and spin) left and right incrementally until a possible horizontal edge is detected in range. When an edge is detected by the first edge operator (on the first line), another two optional line detection operators can be applied to the remaining two rows. Note: we assume a light background and dark tracks.

    Diagram 2 : An Autonode.

    By applying one or more line detection operators to the neighbouring rows, a higher probability in correctly identifying an edge can be secured. Furthermore, the decision on whether to apply the remaining two line detection operators can be made at run time. If the CPU time available is suddenly reduced (due to swapping, or higher priority task waiting), the remaining two operators can be by-passed. Thus, by employing this method on track extraction, Autonode may adapt to the current load condition. Consequently, providing a tracking logic that is sensitive to the load condition.

    The roadpath utilized in CCC are thick dark lines. Ideally, the roadpath should be in high contrast to the floor so that Autonodes can detect the edges of a roadpath correctly. Note that Autonode relies on a sufficiently large gradient change for track extraction. This is intrinsic to the underlying edge detection operators. Because only line detection operators are utilized for performing track extraction, the usefulness of Autonode is limited to certain conditions. Its effectiveness is dependent on whether a high contrast between the roadpath and floor can be guaranteed. In designing Autonodes, it was assumed this condition can be satisfied. Thus, providing an authentic and valid tracking behaviour.

    Once an Autonode identifies an edge, this particular instance of Autonode would attempt to track the movements of this edge (in temporal domain). It will remember its current exact location. When next frame arrives, this node would again apply the same set of edge detection operator(s) onto the exact same location. If the edge has shifted, the operator(s) would fail. Subsequently, this Autonode would start searching (and spinning) again incrementally. Thus, searches begin on the location where a track was found previously. In other words, temporal locality [21] is utilized (on the X,Y image plane) in an attempt to speed up the track search. This method is ideal if all objects inside two consecutive frames move in small increments. Nevertheless, this condition is difficult to realize as soon as the car is sped up and when it is going through a corner. Since capturing is done at 24fps, and the car is travelling at relatively low speed (2m/s max), applying this method still provide a handsome saving in some situations. For example, when the car was travelling in straight line, this method worked very well.

    A center of track estimate is generated by initiating a few Autonodes to the side of a road path. Since Autonodes automatically attempt to follow it (in time domain), the exact location of Autonodes essentially indicates the track boundaries. Evaluating them reflects the track's relative position to the car, thus, a navigation unit can make use of this information promptly for navigation purposes.

    The frame format utilized for capturing is RBG32. This format provided full color characteristics in the images received. Hopefully, further extension to this project can make better use of this color information. For example, in area of object segmentations and collusion avoidance. This is one of reasons why RBG32 was chosen. A good programming trick, when utilizing RGB32, is to type-cast a single pixel data being 32bits wide into an array of ints, instead of handling it as four separate char value. In doing so, each exact pixel location can be addressed by the usual array notation (i.e: pixel[100][99]), which provides all three RGB values at that pixel location. Note that this method works on all machines that use 32bits for int.

    Before performing track extraction, the captured image is firstly converted to grayscale. As addressed earlier, blindly applying the conversion to the whole image is unnecessary. The conversion is done at the same time as the edge detection operator(s) are applied. Performing conversion this way eliminates one of the in-efficient issues in image analysis addressed earlier(applying same set of operation on a large data-set, and discard most calculated results).

    In next section, we will derive two edge detection operators. They were utilized by the current Autonode implementation.

    Extensions to Autonode : AutoLine AutoMesh

    Edge Detection Operators

    As mentioned above, the ideal behaviour of Autonode is to automatically locate an edge. Autonode only utilize edge detection operators to locate edges and appends itself to it. This section examines two 1-D edge detection operators implemented. The outline of their design were arrived at after an investigation were performed on the assumed track conditions made in CCC.

    Other edge detection operators are available [15 20 23 24]

    Gradient jump edge detection

    Due to the nature of the track setup, a very fast and simple method for detecting an edge can be utilized. A two-pixel comparing method was implemented as the first operator to apply on Autonodes. This operator is capable of estimating the probability of having an edge between two end-points of an Autonode very quickly.

    When the intensities between two end points on a line vary largely, it readily suggests that a possible edge is enclosed inside this line. To illustrate, let's have a look at the intensity of pixels around the track edges.

    Diagram 4 : investigating the track edge.

    Eight nodes are pinpointed for the sake of investigation. A graphical plot on the pixel value for the above eight nodes follows. This is followed by a magnified (x1600 times) version of point number two.

    Graph 1 : An edge plot between track and floor .

    From the above eight plots, it is observed that the difference between the first pixel and the last pixel in all cases differ by more than one hundred intensity value. Thus, a large dis-continuity is present. By examining the value on the first and the last pixel, an estimate that indicates the presence of an edge, can be produced. Instances of Autonodes can readily make use of this information for edge detection.

    Note, however, that this method of edge detection is extremely sensitive to changes in lighting condition. However, if sufficient light can be guaranteed, and that the camera is equipped with automatic gain control, this method still produces satisfactory results. Diagram 5 shown below is a magnified version of point two from Diagram 4. It had been converted to grayscale. A closer look at the pixel intensity value supports this line detection method.

    Diagram 5 : Point 2 magnified 1600 times.

    Note also that this method of edge detection is also very sensitive to changes in the capture resolution and the position of the camera. In Diagram 5, one can observe that an edge between the floor and the track is readily captured by less than four pixels area. If an extremely high resolution camera is utilized, or if the distance between the camera and the ground changes, this method of edge detection could produce incorrect result. In CCC, the camera was fixed on the car in a permanently position and the image resolution remained at 384x288 pixel. This method of edge detection had proven to work extremely well. Note that it is important for the maintainer of CCC to understand how and why this method works and where its strengths and weaknesses are, so that this edge detection operator can be applied in a correct manner.

    DCT edge detection

    A 1-D Fast Discrete Cosine Transform (DCT) operator was implemented as a cross checker for the gradient jump operator. This method required a lot more CPU time to process and was quite inefficient from the software point of view. The advantage of this method is that it could readily be extended to two dimensions, hence, providing edge detection in all direction not limited to horizontal. Note also that various hardware implementation of DCT are readily available due to the popularity of Jpeg, H.261 and H.263. In many cases, a DCT operator is built into a video capture device. Thus, a DCT operation is readily available.

    DCT can be viewed as an energy packing operation [21]. Initially, eight consecutive pixels are passed to a DCT operator. The DCT operator would try to "push and compact all the energies (intensity value)" within these eight pixel positions. The underlying nature of DCT is discussed in [21].

    If all energies (intensity value) within this eight pixel array are the same, all of their energy (intensity value) can be compressed and packed into one single value. This is indicated on graph 2 below (top left plot). In this particular plot, all eight pixels are white with a value of 255. If an edge is enclosed inside this eight pixel array, after applying the DCT operator, the second term would always inhabits the highest absolute value (not including the first term).

    Graph 2 : DCT plot (x-axis: pixel position, y-axis: value after DCT operator applied). The black/white box below x-axis indicates the pixel value before applying DCT (black=0, white=255).

    The presence of an edge is estimated by comparing the second term produced by a DCT operator against every other term (apart from the first). The relative value of the second term is utilized for detecting the presence of an edge.

    The following diagram illustrates how DCT can be applied in 2D to extract edges. Notice how the presence of edges produces more "white area" in Diagram 6c. This information can be utilized to locate edges.

    Diagram 6a : original sub-image

    Diagram 6b, 6c : Applying DCT in 2D, (left(6b) - after filtering Diagram 6a, right(6c) - after applying 2D DCT to Diagram 6b.)
    5.1.3 Autolines

    Class Autoline is an extension to class Autonode. The main idea behind Autoline is to allow instances of Autonodes to operate in a more coordinated and cooperative manner. As stated in the previous section, an Autonode would snap to the closest edge found. An Autoline is an Autonode pair. It provides further functionality than Autonodes in a more efficient manner.

    Before we examine the detail of Autoline, let's have a look at how Autoline can simplify track extraction further. Autoline encapsulates two instances of Autonodes, a left node and a right node. As explained in section 5.1.2, when an Autonode is initialized, it would attach itself to an edge immediately. After attaching to an edge, the exact location of this Autonode represents the edge location found. This particular node has immediate access to the next ten pixels value. At this point, several questions arise: is it possible to extract further information from an Autonode so that a more efficient track extraction method can be implemented? Can the pixel value provide any useful information? How can the information between instances of Autonodes be synchronized and be used in a more collaborated manner? To find the answers to these question, let's have a look at Diagram 7a below.

    Diagram 7a : investigating the track edge, using Autoline.

    When the orientation of the car is "correct" (looking straight ahead), the physical tracks represented by two thick dark lines are clearly visible. They represent the left and right boundaries where the car should travel on. Again, if the orientation is "correct", both boundaries usually come in pairs. Instances of Autonodes would attach to them indiscriminately. Since they usually come in pairs, two instances of Autonodes can be utilized to locate the center position between two black tracks as illustrated in Diagram 7b. Using an Autonode pair to locate both boundaries of a track is the idea behind class Autoline.

    Diagram 7b : An Autoline.

    In order to support the pairwise behaviour, the existing implementation of Autonode was extended to provide a more stringent support for Autoline. In particular, the "edge type" (gradient direction for an edge) was implemented so that the Autonode pair can specifically assign to search in different and opposing direction.

    Initially during initialization, an Autoline constructs two Autonode objects, being the left node and the right node. Their expected "edge type" is initialized. Each of them attempt to search towards their own respective direction for an edge. If both searches are successful, the center of track position can be extracted quite efficiently. This is illustrated in Diagram 8. The red cross in the center represents the estimated center of track position, which the car always adjusts itself to.

    Diagram 8: Heading straight.

    Operations of Autoline

    An Autoline consists of two Autonodes, a left node and a right node. This section describes the possible state of Autolines and how they enable automatic navigation.

    Autoline provides a more coordinated automatic tracking logic. It follows the movement of two particular edges of opposing direction between consecutive frames. An Autoline has five possible states as below:

    • UNTESTED
    • LOCK
    • LEFT_OUT_BOUND
    • RIGHT_OUT_BOUND
    • BOTH_OUT_BOUND

    When initialized, an Autoline is in an UNTESTED state. It means that this Autoline has no previous history available.

    Once initialized, an Autoline immediately creates two Autonodes. If both Autonodes manage to attach themselves to an edge of opposing direction, LOCK state is entered. This means that both left and right nodes found an appropriate edge. Thus, center of track can be evaluated readily.

    If an edge is not found in either node or both nodes, an out of bound condition is indicated. This situation can easily arises as illustrated in the diagram below. Note the top Autonode pair is incomplete. The left node cannot attach itself to an edge, so it went out of bound. How can the center of track be extracted if such condition arises?

    Diagram 9 : Top left node is missing - out of bound (turn maximum left).

    In order to find a resolution to this problem, let's first consider how Autolines enable navigation. As mentioned above, each Autoline supplies a center of track estimate. An overall center of track estimate is generated by a summation on all instances of Autolines. This value is then utilized by the navigation unit for steering the car.

    When an out of bound condition is indicated, the orientation of the car is incorrect. If either side of an Autoline is out of bound, it is understood that a particular track boundary has gone too far outside the picture. Thus, the navigation unit should steer the car towards that out of bound direction with maximum torque. This suggests that the center of track position is at negative/positive infinity. However, other Autolines could still be active, an infinity value could corrupt the summation value which is needed for steering. Eventually, the border limit was utilized for an out of bound condition. In CCC, the capture size is 384x288, thus the limits are : left out of bound (1), right out of bound (384).

    When both side are out of bound, the center of track position is invalid. This situation can arise when the car is perpendicular to the track. This is illustrated in Diagram 10 below. In this case, the car should move backward and re-adjust its heading until either side of the Autonode pair is attached.

    Diagram 10 : Car position is perpendicular to track, Autolines become invalid.

    5.2 A Threaded Design Model

    This section outlines what a thread is, the advantage of using threads and why they were chosen. The design of a thread model is detailed.

    A thread is a single sequential flow of control within a program [16]. Programs that utilize threads are commonly known as a multi-threaded program. A multi-threaded program is capable of executing different portion of code in a time-shared (if single CPU) concurrent fashion.

    Oftentimes, the handling (and eventual processing) of many asynchronous events could be done concurrently. This means that the execution time of these events can overlap each other. Since the software component for this project must be capable of handling different time-critical events, such as frame grabbing, communication channel control and X interactions etc, the overall system design should readily facilitate an efficient handling of these events.

    This unique requirement naturally leads to a multi-threaded design model. In providing better support to asynchronized events, one can observe that there is usually a long wait (in terms of CPU time) until a certain (external) event is ready to be processed. Thus, in employing a multi-threaded design, events that vary differently in time (and possibly time-consuming) are effectively being handled in a more robust manner. It also improves the application's interaction with users [8]. A threaded application design model is beneficial in the following areas [8]:

    • X user interface interactions
    • Frame capture operations
    • Interfacing with com port and files
    Reason of choice: why are threads used?
    The reason for choosing threads instead of utilizing a parent/child architecture (by forking) is because a thread is a Light Weight Process(LWP) [9]. As distinctively highlighted by various sources [7-9], "The cost of switching between multiple (forked) processes is relatively high."[7] "Threads reduce overhead by sharing fundamental parts. By sharing these parts, switching happens much more frequently and efficiently. Also, sharing information is not so 'difficult' anymore: everything can be shared." [9] Since threads share a common address space, and are often scheduled internally in a process, a lot of inefficiencies in forking can be avoided by employing a thread model.

    Encapsulating Thread into a Class
    A templated thread class was implemented to enable CCC to spawn new task with ease. This class was implemented on top of the existing P-thread function family that came with standard Linux. The Thread class implementation is similar to a wrapper. Several private members variables were incorporated into class Thread to guarantee that some hazardous conditions could never occur.

    To initiate a new thread, a class Thread object is first created. Its constructor takes two parameters. They are an object of a certain type and one of its member functions. Code segment 4 illustrates how to use class Thread correctly. After a class Thread object is constructed, a call to its overloaded function operator would generate a new sequential flow of control, where two function procedures can execute concurrently (similar to parent and child).

    #include "thread.h" ///////////////////////// class Base { ///////////////////////// Base():i(1) {;} Show() { cout >> i; } int i; }; //------------------ main() { //------------------ Base a; // create a Base object Thread<Base> newThread (&a, &a.Show); // create a Thread object newThread(); // execute a.Show() concurrently .... .... }

    Segment 4: Creating a new Thread

    Implementation of Threads

    Notice in segment 4, how incredibly simple it is to create a new thread. Using C++ to encapsulate a Thread is ideal. Observe the similarities between the creation and termination of a Thread and a normal object. The construction and destruction of an object are handled automatically in C++ by an appropriate constructor and destructor call. This automated behaviour of C++ closely resembles the setup and termination procedures required for a Thread. Thus, the constructor initiates all setup procedures required for a thread, and the destructor automatically performs all necessary clean up operations.

    Eventually, Threads can be visualized as a particular type of objects in which they always "break away" from the main program and execute concurrently. A thread can therefore be treated as normal objects and this make programming with Threads very easy. A programmer can create a new thread by using similar syntax as he/she does for creating a normal object.

    The implementation of class Thread is intuitive. It is capable of executing class member functions which are normally prohibitive in a p-thread implementation for C++. In brief, this was accomplished by utilizing operator->*(). Refer to thread.h for more information. A semaphore was utilized to synchronize between the destructor and constructor for class Thread. This guarantees a correct exit condition.

    5.2.1 Low Level Controller (Consumer-Producer Topology)

    The control of the car is divided into two layers. A low level controller that serves as the interface to the com port, and a navigation unit which essentially drives the car.

    The primary task for the low level controller is to communicate between the com port and to provide a standardized interface for the navigation unit to control the car. The low level controller is abstracted inside a class function, which is called upon by a Thread object after initialization. Thus, the low level controller executes concurrently along side the core program. The low level controller consists of a very compact loop. Its main job is to (1) open and close the Com Port for communication, (2) write an assigned byte to the Com Port and (3) block when there is nothing to do.

    The low level controller is executing in its own thread space. There is a need to provide inter thread communication so that this low level controller can operate correctly and concurrently. Approaching this from a parallel model point of view, it can be generalized into a well known consumer-producer problem. Semaphore is employed as a synchronization tool between producer and consumer. The basic design for the low level controller is outlined in the following diagram. For more information on Semaphores, refer to [18].

    Diagram 11: Consumer-Producer topology.

    Note that only one consumer and one producer are illustrated in the diagram above. In practice, however, more than one producers are possible for producing data. For example, in an emergency situation, such as approaching possible obstacle, a higher priority task can signal the consumer and an appropriate action will be forwarded to the com port straight away without going through the normal channel.

    Semaphores are utilized throughout to synchronize different threads.


    5.3 Controlling the Car

    The visual tracking logic in CCC is based on a continuous recognition and evaluation of the dynamically changing environment in which the track is being extracted continuously. An important prerequisite for such an approach is a timely and comprehensive perception of the car's dynamically changing environment. This can be realized by interpreting a visual scene as fast as possible. In do so, a restriction is imposed on the navigation unit. Subsequently, the operations performed by the navigation unit must be minimal so that the dynamically changing environment could be interpreted as fast as possible.

    The control of the car is generated by continuously evaluating the center of track position. This is estimated by instances of Autolines. At 24 frames per second, this process is being carried out at a rate of once every 0.04seconds. The pseudo code in segment 5 illustrates how the navigation loop was implemented. The actual implementation of this pseudo code is located in navbrain.h.

    Activate the camera LOOP: acquire a frame extract left and right track boundaries (using Autoline) obtain a center of track value move towards center of track position signal a move command is ready do X window stuff, if any. goto LOOP

    Segment 5: Pseudo code for driving the car automatically

    5.3.1 Presence of noise

    Experiments were conducted to measure the average frame rate. This led to the discovery that the achievable frame rate is dependant on the quality of image captured. In other words, when a received image is noisy as shown Diagram 12, it directly implies that the frame rate achieved is not optimal. Thus, the full impact of the noise is not limited to a possible incorrect center of track estimate, but also in reduced frame rates. Eventually, the presence of noise also suggests an increased execution time for the navigation unit (due to slower frame capture and longer Autonode execution time). Refer to Section 6.1 for the experimental results obtained.

    In previous year, a method was proposed to reduce the impact of noise when estimating a center of track value[5]. This method had been incorporated into the current implementation of CCC. In short, a noisy image is divided into blocks. The intensity value inside every block is averaged. This block-divided image is then used for performing track extraction.

    Diagram 12 : Image distortion due to RF interference.
    Chapter 6. Results and Discussions

    This chapter contains the experimental results obtained from implementing the design methods and solutions strategies put forth in Chapter 4 and 5. It presents the results obtained in profiling the navigation unit. This includes the measurements made in (1) the speed of capture, (2) the speed of Autolines (to produce a center of track estimate) and (3) the speed of drawing a single frame onto the screen. A worst case time limit is estimated for generating a move.

    6.1 Speed of Capture

    As mentioned in Section 5.3.1, the frame rate of capture is dependant on the quality of image received. The following table shows the difference in achievable frame rate under "good" and "bad" picture quality condition (Refer to Appendix B for screenshots on "good" and "bad" quality pictures). Plots on the profiling of capture are shown in Appendix C. The frame size and format used in this experiment were 384x288 and RGB32 respectively.

    Picture Quality
    Average frame rate
    Good - ø 25.01 fps - (0.0399sec per frame)
    Bad - ø 12.76 fps - (0.0784sec per frame)
    Table 4 : Average frame rate

    A large deduction in achievable frame rate occurred when the picture quality became noisy. This reduction in capture speed was inherited by the hardware. Thus, the only way to eliminate this deduction in speed was to individually address the issues that caused the noisy image. Eventually, a stronger transmission signal strength could be utilized so that the resultant signal pickup could be improved and that a "bad" situation could never occur.

    A "bad" picture could result from the following situations.

    • car is too far away from antenna,
    • RF interference,
    • line of straight between car and antenna is obstructed.

    6.2 Speed of Autolines

    The time required for an Autoline to produce a center of track estimate varied under different track arrangements. For example, an Autoline required a longer time to compute a center of track estimate when it was going through a corner. This property of Autoline was directly inherited from the image analysis issues that were identified in section 5.1.1. In particular, it was the result of the assumptions made on video sequence. It was assumed that objects between consecutive frames moved in small increments. By exploiting this property, searches for the track position between consecutive frames resumed at the same position where it was found previously. In doing so, this process took longer to perform when the position of the track shifted significantly. Graph 3 shows the time required for four instances of Autolines to generate a center of track estimate under different track arrangements. A screenshot is attached in Appendix D.

    Graph 3 : Execution time for four instances of Autolines, during straight section of tracks, followed by a corner.

    This non-constant execution time behaviour of Autoline is well justified. During the course of the straight section, Autoline was exploiting the incremental property of objects between consecutive frames; therefore, the time in producing a center of track estimate was minimal. In going through corners, this incremental property was less evident, and therefore the track search took longer and the savings were reduced. In order words, the current implementation of Autolines does not always provide a substantial saving in track extraction (such as going through corners).

    Eventually, the worst case occurred when the operations of Autoline degenerated into the scanline track extraction method [5] implemented last year, in which a 188 pixel line was utilized constantly every frame to achieve track extraction.

    A deeper understanding of this behaviour of Autolines led to new insights on how more efficient implementation of Autolines could be produced. Methods that predict the future track center position could be introduced to Autoline. The prediction procedures could further be calibrated with the wheel turning position to produce a more educated estimation for a future center of track position. Unfortunately, due to the time constraint, this idea had not been implemented.

    The average time for four Autolines to compute a center of track estimate are presented in Table 5. Two plots for the sample data set can be found in Appendix D.

    Immediate Track Arrangement
    Average Execution Time
    Straight 0.0000287 sec
    Through corner 0.0003880 sec
    Table 5 : Average execution time for four instances Autolines

    6.3 Speed of drawing one frame to screen

    The average execution time for drawing one frame to screen was 5.699e-05. This typical value represented the time to blitz a 384x288 buffer space onto a screen. This value was derived from calculating the average of 250 samples as shown in Graph 4. Table 6 shows the average time for blitzing twenty four frames onto a screen.

    Graph 4 : Execution time for drawing one frame to screen.

    Execution Time for 24 frames
    0.00138 sec
    Table 6 : Time for drawing 24 frames to screen

    6.4 Speed of Navigation

    The speed of navigation refers to the time required to execute the navigation inner loop once. The speed in executing this inner loop and the maximum achievable car speed are proportional to each other. The pseudo code for this inner loop is shown in section 5.3.

    The time required for the navigation unit to execute its inner loop once is a summation of time in (1) capturing a frame; (2) producing a center of track estimate; (3) drawing one frame to screen; and (4) forwarding that generated move to the hand-held controller via the com port. The calculation for this summation is shown at Figure 1 below:

      0.0784   sec (capture -worst case average)
      0.000387 sec (turning corner -worst case average)
      0.000057 sec (draw to screen -typical)
    + 0.00000  sec (neglectable com port writing time)
    ---------------------------------------------------
      0.078844 sec (worst case average - using 4 instances of Autolines)
    Figure 1 : Time required for generating a move (worst case average)

    At worst case in average, the navigation unit generated 12.68 moves per second. This estimate provided a lower bound on which the navigation unit could generate a valid move to control the car. A calculation on the typical case is shown at Figure 2 below.

      0.0399   sec (capture -typical average)
      0.000028 sec (straight -typical average)
      0.000057 sec (draw to screen -typical average)
    + 0.00000  sec (neglectable com port writing time)
    ---------------------------------------------------
      0.039985 sec (typical case - using 4 instances of Autolines)
    Figure 2 : Time required for generating a move (typical case)

    Typically, the navigation unit could generate 25.01 moves per second.

    In summary, it was observed that the speed of navigation was dominated by the speed to capture one frame. It is, therefore, desirable to maintain a maximum achievable frame rate at all time so that the inner loop of the navigation unit can execute as fast as possible. Thus, the car can be controlled continuously at optimal speed. Methods to guarantee maximal achievable frame rate were suggested earlier in section 6.1.

    Ultimately, the latency in the navigation loop can be kept as small as possible by maintaining the highest frame rate.

    Chapter 7. Conclusion

    The process of re-implementing the existing system to a PC was successfully completed. This re-implementation led to a very noticeable speed improvement in maintainable frame rate, increased from previous year 8.95 fps (typical case) [5] to 25.01 fps (typical case) and 12.68 fps (worst case). The goal of bridging the existing system to a PC was achieved.

    A new track extraction method was designed and implemented, namely Autoline. This was the result of investigating the intrinsic property of videos, various redundant image analysis methodologies and the assumptions made in CCC. This research has shown that through careful investigation to these fields, a more computation efficient track extraction method could be derived. Unfortunately, the speed up produced by Autoline was not maintainable when the car was going through corners. Eventually, it degenerated to the worst case scenario which was equivalent to the scanline track extraction method implemented in previous year. Further improvements to Autolines were suggested.

    A new multi-threaded software-driven navigation unit was constructed. Communications with the frame-grabber card were implemented via V4L. Track extraction logic was localized and simplified via Autonodes. Its operations were sensible on the system load. Finally, a complete integration of different parts enabled the car to drive itself autonomously.

    8. References

    [1]  Parag Batavia, “The Navlab Group at CMU”, http://www.cs.cmu.edu/afs/cs/project/alv/member/www/navlab_home_page.html ,last visited:14 July 2000.

    [2]  John McCarthy, “Computer Controlled Car”, http://www-formal.stanford.edu/jmc/progress/cars/cars.html , Last Visited:13th July 2000.

    [3] John McCarthy, “Computer Controlled Cars (April 1999)”, http://www-formal.stanford.edu/jmc/future/cars.html , Last Visited:13th July 2000.

    [4] Alan Cox, “Building Number Three –Video for Linux Group”, http://roadrunner.swansea.uk.linux.org/v4l.shtml , Last Visited:17th July 2000.

    [5] Tim Bruton, “A Vision Based Navigation System for a Computer Driven Model Car”, http://www.csse.monash.edu.au/hons/projects/1999/Tim.Bruton/report.html , Last Visited: 16th July 2000.

    [6] B. Dirks, “Winnov Videum Linux Page”, http://millennium.thedirks.org/ , Last Visited: 25 August 2000.

    [7] Andrae Muys, “Pthread tutorial”, http://www.cs.nmsu.edu/~jcook/Tools/pthreads/pthreads.html , Last Visited: 18 Sept 2000.

    [8] S. Balakrishna, “Pthreads”, http://www.coe.neu.edu/~signoril/op_sys/pthreads1.html , Last Visited: 16 Sept 2000.

    [9] FAQ Contributors, “Linux Threads Home Page: (FAQ)What are threads?”, http://www.arnor.com/redhat/FAQ/Types.html, Last Visited: 19 Sept 2000.

    [10] Moritz Wenk, “kwintv”, http://www.mathematik.uni-kl.de/~wenk/kwintv/links.html, Last Visited: 19 Sept 2000.

    [11] Bill Dirks, “Video4Linux Kernel API Reference v0.1:19980516”, http://roadrunner.swansea.uk.linux.org/v4lapi.shtml, Last Visited: 19 Oct 2000.

    [12] Jonathan Corbet, “MIT-SHM-The MIT Shared Memory Extension : How the shared memory extension works”, http://www.cc.iastate.edu/olc_answers/packages/graphics/MIT-SHM.extension.html, Last Visited: 10 Oct 2000.

    [13] Damian Conway, “ Topic 9: Envelopes and Letters”, http://www.csse.monash.edu.au/~damian/Idioms/Topics/05.1.EnvelopeLetters/htm l/text.html, Last Visited: 30 Sept 2000.

    [14] Tim Bruton, “Compiling and running the car”,http://www.csse.monash.edu.au/hons/projects/1999/Tim.Bruton/Instn.html , Last Visited: 10 Oct 2000.

    [15] M. Nitzberg, D. Mumford, T.Shiota, Lecture Notes in Computer Science 622 : Filtering, Segmentation and Depth, Berlin : Springer, 1993.

    [16] M. Campione, K. Walrath,“The Java Tutorial Online: What is a Thread?”, http://recife.u-bourgogne.fr:8081/JAVA-tutorial/java/threads/definition.html , Last Visited: 19 Oct 2000

    [17] Damian Conway, “Topic 8: Wrappers”, http://www.csse.monash.edu.au/~damian/Idioms/Topics/04.2.Wrappers/html/text. html, Last Visited: 27 Sept 2000.

    [18] Dave Marshall , “ICP : Semaphores”, http://www.cm.cf.ac.uk/Dave/C/node26.html, Last Visited: 18 Oct 2000.

    [19] Daniel A. Menascé , “Producer/Consumer Java Workbench”, http://www.cne.gmu.edu/workbenches/pcsem/pcsem.html, Last Visited: 8 Oct 2000.

    [20] R. Jain, R. Kasturi, B G Schunck, Machine Vision, Singapore: McGRAW Hill International Edition, 1995.

    [21] D. A Patterson, J. L Hennesy, Computer Organization & Design: The Hardware and Software Interface,, California: Morgan Kaufmann Inc, Second Edition, Pp540-544, 1998.

    [22] K.R. Rao, P.Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications, San Diego: Academic Press, 1990.

    [23] A. Blake and M. Isard, Active contours : the application of techniques from graphics, vision, control theory and statistics to visual tracking of shapes in motion , New York : Springer, 1998.

    [24] M. Petrou, P. Bosdogianni, Image Processing -The Fundamentals , Brisbane : John Wiley & Sons Ltd, 1999.

    9. Glossary


    API
    Application Programming Interface.

    Autoline
    class Autoline is an extension to Autonodes. It encapsulates two instances of Autonodes. Refer to Chapter 5 for more information.

    Autonode
    class Autonode encapsulates all the tracking logic in CCC. Refer to Chapter 5 for more information.

    CCC
    Computer Controlled Car (CCC) supervised by Dr. Allison during 2000.

    Consumer-Producer problem
    A well known synchronization problem in parallel programming. Refer to [19].

    DCT
    Discrete Cosine Transform, a variant of Fourier Transform.

    DIN-8
    A RS-232 compatible interface commonly found in Macs

    Forking
    Fork is a unix system-call API function. It creates a new process. The new process (child process) is an exact copy of the calling process (parent process). The child process inherits several attributes from the parent process. Refers to the Unix manual for more information.

    FPS
    Frame Per Second.

    Semaphore
    Semaphores provide a mechanism to regulate access to a shared resources. "Semaphores let processes (and threads) query or alter status information. They are often used to monitor and control the availability of system resources such as shared memory segments" [18].

    Thread
    A thread is a single sequential flow of control within a program [16].

    V4L
    Video For Linux [4]. Version 1 was utilized in this

    Wrapper
    Wrapper is a C++ programming technique. "The "Wrapper" technique provides a simple means of implementing classes in terms of existing classes" [17].

    XSHM-MIT
    X SHared Memory. This is an X extension that provides shared memory between kernel and user space. It enables extremely fast screen updates when under X. Refer to [12].

    10. Appendices

    Appendix A : Instructions for compiling and using the program

    A standard makefile is included in the source directory. Do a make in the source directory, an executable file ccc would be generated. Type ccc in the command prompt to execute the program.

    Using the program

    When the program fires up, two X windows would pop out. The user should see the captured camera image straight away on the "capture window". A "com port command window" would also appear, showing the commands being forwarded to the com port.

    How to initiate an Autoline

    Activate the "capture window" by pointing the mouse cursor to it. press 'q' to enter the Q LOCK mode. Once in Q LOCK mode, Autoline can be generated by clicking on the "capture window" image area. Initiate a few autolines in the middle of the track. Hit 'q' again to leave Q LOCK mode. Once exited from Q LOCK mode, the car would immediately begin to drive by itself. Press 'ESC' or right click on a window to exit from the program.

    Other Considerations

    To enable the car to drives itself, a bttv device must be installed in the local machine and the user must have read/write access to it. During 2000, this was done in honspc14.csse.monash.edu.au inside honours lab in building 26.

    Before starting the program, make sure the camera and transmission unit is turned on, with the protective cap removed from the camera and an antenna is attached to the bttv card correctly.

    For more information on how to handle the car, refer to instructions in [14],

    Appendix B : More captured images

    The frame size and format used in this experiment were 384x288 and RGB32 respectively. The following two pictures illustrate a "good" and a "bad" capture image.

    Diagram 13 : Good reception, 24fps maintainable. Single-frame capture speed averaged to 0.039861122 second per frame or 25.01 fps.

    Diagram 14 : Bad reception, frame rate reduced. Single-frame capture speed averaged to 0.078355987 second per frame or 12.76 fps.

    Appendix C : Capture speed plots

    The frame size and format used in this experiment were 384x288 and RGB32 respectively. These two graphical plots contains a 69 sample data-set. A average value were derived. Note that the time scale on these two graphs is different.

    Graph 5 : Capture time of 69 samples during good reception. Single-frame capture speed averaged to 0.03986 second per frame or 25.01 fps.

    Graph 6 : Capture time of 69 samples during bad reception.Single-frame capture speed averaged to 0.07836 second per frame or 12.76 fps.

    Appendix D : Autoline speed plots

    Four instances of Autolines.

    Diagram 15: Four instances of Autolines, car about to go into a corner.

    Graph 7 : Execution time for four instances of Autoline, when car is idle.

    Graph 8 : Execution time for four instances of Autolines, when track is laid out as a full circle. note: time scale is 10 times larger than Graph 7









    © 2000 D K L TUNG
    version 1.1

    << CCC | Final Thesis | >>