Introduction: Modes of Debugging

The debugger has three basic modes of operation. The first, and arguably the most common, is to analyse a core dump. The second, and almost as common, is to run a program in a controlled manner, stopping at certain points and stepping gradually through instructions. The third, which is not common but can be surprisingly useful, is to attach to a process which is already running, and take control of it.

Analysing a core dump

A core dump is a copy, in a file, of everything a process had in memory at the time it generated the core dump. Generation of the dump typically happens if a process crashes with a segmentation fault or bus error. When a crash does occur, one of the first things you usually want to know is where the crash happened and the command for this is, surprisingly, `where'. It gives what is called a stacktrace, which is a display of all the function calls that are on the stack. If the stack has been corrupted, it will not be of much help, but fortunately this doesn't happen often. Also, the stacktrace cannot give line numbers and other information if the program has not be compiled with debugging information.

To see how a stacktrace can be used, download this compressed tarfile and extract it (if you need it, there is a page on tar here). If you type make you should end up with an executable called slen which is a program that uses a recursive algorithm to calculate the length of a string. The program has a bug in it, and will crash. Run the program, and then type something like
gdb slen core
to begin analysis of the core dump (the core file may have a different name). Display a stacktrace, and make sure you understand what it shows. You will notice that it does not show much, because the program has not been compiled with debugging information. The easiest way to fix this is to add -g onto the CFLAGS variable in the Makefile, type make clean and then make again. Run the program again, and use gdb to analyse the core dump again. This time, the stacktrace will show you the function arguments, line numbers, and even the source code.

Notice that the program crashed on a very simple line, and that the error seems to be with the arguments that were passed to this call of the function. To see how this happened, you will need to inspect what was done before the function was called, which is in the function call one level up on the stack. Move into this function call with the command `up' (the direction refers to an increase in frame number, even though the display of the stacktrace is upside down). The result of the command will be to show you exactly what was happening at the time the last function call was made, and the bug is here. To see some of the surrounding code, type `list'. Don't worry if you can't see the bug yet.

A small diversion: ctags

The next step is to edit the source file where the bug is because we may be ready to fix it or we may need to see more of the surrounding lines. The stacktrace always allows us to identify the function that we want to look at, and with debugging information we have a filename and a line number, but often while developing programs we don't have such specific information. When we want to look at a particular function in a very large project we may not know which file it is in, or where it is in that file. The solution is an index of functions called a tags file, and the utility ctags is the usual way of generating this.

Type make tags now, and a file called `tags' will be generated. This is the index, and it is worth looking at although you don't need to understand it. It will not be very big since this is a very small program. Once the tags file is generated, most editors can use it to jump straight to the spot in the correct file for a particular function. Exactly how you do this depends on the editor, but using vi it is achieved with
vi -t functionname
Try this now to get to the function we want to debug.

Fixing the bug

Try to identify and fix the bug. A very large hint will be provided by the compiler if you remove the -w option from the CFLAGS variable, clean, and then rebuild again. The -w turns of GCC's standard warnings, and these are very useful. Many other warnings can be very helpful too, and you should look through the warning options that GCC supports. Once the bug is fixed, the program will not crash, but it will probably always give 0 as the length of the string. Fixing this takes us to the next mode of debugging.

Running a program

Start up gdb again giving only the program as an argument. We are not interested in the core dump now since the crash has been fixed. You can run the program using the command `run', but it will go to the end, only stopping for input as it normally does. What we need to do instead is stop it before the error and continue step by step. Since we may not know where the error is for the moment, it is best to stop it at the beginning. This is done with the `break' command, which puts a marker on a particular line in the source file for the debugger to stop at. The line can be given as either a line number or a function name, and since we want to stop at the very beginning the easiest is to type
break main

Run the program again, and it will stop at the first line of main(). To proceed line by line, use the command `next', but remember that when the gdb prompt is not there you will be typing input to the program instead. As you go, try using `print' to show the value of a variable, particularly the input string, to make sure it is correct. It should only take a few nexts to finish the program, as main() is not very long, and you should find that the error is not in any of those lines.

What we need is to go through the program in such a way that we move into the function calls when they happen. For this, we use the command `step' instead of `next'. Run the program again, and this time use `step'. Make sure your string is at least 3 characters long, and pay attention to the arguments of the function calls as you step. You will see the recursion reducing the length of the string each time. Also, you will see something wrong with the other argument. This is the bug that you need to fix, and the program should work completely after that.

Attaching to a process

As a quick demonstration of this ability, run the program normally, not from the debugger, but do not type any input. This will make the process wait. In a different window type ps, note the number of the process that is running your program, and start the debugger with
gdb slen PID
where PID is the process number (Process ID) that you noted. The debugger will load the program, and attach to the process running it, and will then behave as if you had run the program inside the debugger to begin with.