Monash University > School of Computer Science and Software Engineering > CSE1303 > Part B > Pracs > Prac B3

CSE1303 Computer Science
Semester 3 (summer), 2003
Part B
Prac B3

This prac covers material from lectures B01 to B09 and relates to material in Tutorial B3.

Hyperlinks in this document are underlined; the files themselves can be found on the HTML version of this document on the CSE1303 courseware page.

Background

Occasionally you will find yourself needing to debug a program for which you don't have the source code. This is usually quite difficult, because all you have is the executable machine language program, which is notoriously difficult for humans to read.

In situations like this, it is desirable to convert the machine code back into something more legible. Converting machine language into a high-level language like C is not practical, but it can usually be disassembled back to the assembly language stage without too much difficulty. Such assembly language is not as easy to read as human-coded assembly, because comments, spacing and labels have been removed from it, but it is infinitely easier to read than machine code.

A disassembler is a program which reads a machine language program and outputs an equivalent program in assembly language. It does this by breaking up each instruction into its components (opcode and operands) and rendering them in a human-readable form.

In this prac, you will write a functional disassembler for MIPS programs.

Assessment

For this prac your answers should address the following criteria. Your demonstrator will consider these items when giving your answers a mark.

Preparation (3 marks)


The marks for Preparation will be awarded only if the preparation is completed before the start of the class.

Even if you do not get the marks for preparation, you will need to complete it before going on with the regular questions, because they will refer to its answers.


  1. The following experiment shows you about C, assembly language and executable programs, and how they are related. Try it out on a Unix machine.
    1. Type the following C program into the file temp.c:
      #include <stdio.h>
      
      int celsius;
      int fahrenheit;
      
      int convert(int C)
      {
          return (C * 9 / 5) + 32;
      }
      
      int main()
      {
          printf("Enter a temperature in Celsius: ");
          scanf("%d", &celsius);
          printf("That comes to %d in Fahrenheit.\n", convert(celsius));
          return 0;
      }
      

      Note that C programs must have the file suffix .c (lowercase); other extensions such as .C (uppercase) and .cpp indicate the language C++.

    2. Ask gcc to translate the program to assembly language, but not to compile it all the way to an executable program:
        gcc -S -o temp.s temp.c
      

      Note that the -S option is in uppercase.

    3. View the contents of the file temp.s. Take note of the following:
      • If you compiled this on an Intel machine, you are seeing Intel assembly language. It is not too different from MIPS assembly language. Note how registers are identified with a % prefix, and that the instructions have different opcodes, though you can probably still guess what some of them mean.
      • Look for the variables celsius, fahrenheit and C in the code. Are they all there?
      • Look for the functions main, convert, printf and scanf. Which ones have labels, and which ones are only referred to in instructions?
      • Look for the constant numbers 9, 5 and 32 in the code.
    4. Now, using gcc, turn your assembly language program into an object code file:
        gcc -c -o temp.o temp.s
      
    5. Object files are not human-readable - view temp.o with a program like less to see what it contains - but there are programs which can read object files and produce human-readable data. Use the objdump program to examine the object file temp.o. Try the following:
      • objdump -t temp.o

        (This shows the symbol table of the program. Amid all the messy names you should see the functions and global variables that are contained in your program.)

      • objdump -d temp.o

        (This shows a disassembly of the program's code, like you'll be doing in this prac. Find the sections corresponding to your program's functions and compare them to the file temp.s.)

    6. Now make an executable (machine code) version of the program with gcc:
        gcc -o temp temp.o
      

      Try running temp.


      You can view the contents of a binary file using od or xxd, which dump a file's contents in hexadecimal:
      xxd -g1 temp
      od -Ax -tx1 temp
      

      Read the manual pages for xxd and od for a complete description of options.


  2. MIPS instructions come in three general formats: R-format, I-format and J-format. These instructions are packed into their 32-bit words as shown in this diagram:

    Write a function which extracts the "opcode" field from bits 31 to 26 of an unsigned long parameter (the instruction), returning the value as an integer from 0 to 63.

    Write similar functions to extract all of the other possible fields shown in the above diagram:

Question 1 (3 marks)

The files b3-short-sample.bin and b3-long-sample.bin contain machine language executable MIPS programs. Fetch them and store them locally.


Once you have saved them, verify that their length is a multiple of four bytes. If not, the transfer has been corrupted and you will have to fetch them again. You may have to select "save as binary" to save the files correctly.

Write a program which repeatedly reads four bytes from the file, storing them into an unsigned long variable and printing out the contents of that variable in hexadecimal. (The printf format string for printing an 8-digit hex value stored in a long is "%08lx".) It should print each word on its own line.

Note that when your C program opens the file for reading, it should indicate "binary mode", by using a mode of "rb":

handle = fopen("b3-short-sample.bin", "rb");

You will need to be aware of byte order here; the file was created in big-endian format, so if you are using a little-endian machine (such as a PC-compatible computer) you cannot simply read an unsigned long straight from the file into a variable. Instead, you will have to read the value into four unsigned char variables, and construct the 32-bit value using appropriate shift and bitwise-logical operations.

These values are the 32-bit words that make up the instructions of the program.

Modify your above program so that, before each instruction word, it prints the address that each instruction would be loaded into. Recall that the MIPS text segment begins at address 0x00400000, and that instructions are all four bytes long.

Question 2 (2 marks)

Modify your program so that it also prints out the opcode that each instruction encodes. Do not worry about the operands yet.


You may find the variables defined in mips-format.c helpful here. These variables are arrays of strings, ordered so that an array access will produce the correct instruction:
printf("%s", opcodeTable[8]);    /* print instruction that has opcode field 8 */

If you use these variables, the strings you print will contain some other cruft with % symbols; just ignore this, it will be solved in Question 3.


Remember that if the opcode field is zero, then the instruction is an R-format instruction and the operation is defined in the "function" field in bits 5 to 0. Your program will need to deal with this special case. An array in mips-format.c contains the necessary data.

Question 3 (2 marks)

Now it is time to deal with the instruction operands. Modify your program so that it also prints each instruction's operands in a human-readable form. That is, registers should be displayed as $a0, $t3, $sp, etc., and immediate numbers should be printed in decimal.


Again, the file mips-format.c contains much useful information, including the names of registers and the formats for each instruction.

To use these formats, print out the format strings character by character. When you encounter a % character, do not print it, but examine the following character. Depending on what it is, output the appropriate field using the functions that you wrote in the preparation.


You can treat branch offsets and jump target addresses as literal integer values; simply print them like other immediate numbers.

Bonus question (2 bonus marks)

Your disassembler is now able to handle most instructions, but it still does not display some instructions correctly.

  1. Modify your program so that it correctly prints the target addresses of branch and jump instructions. Branches are I-format instructions; they use the signed immediate field as the number of words (not bytes) to jump forward (back if the value is negative) from the branch insruction. Jumps encode their 26-bit target address as an absolute number of words starting from address 0.
  2. Modify your program so that it correctly prints the following instructions, which use a variant of I-format:
    opcode (bits 31-26)"destination" register (bits 20-16)format
    000001 (1 decimal)00000 (0 decimal)blez %s, %o
    000001 (1 decimal)00001 (1 decimal)bgez %s, %o
    000001 (1 decimal)10000 (16 decimal)bltzal %s, %o
    000001 (1 decimal)10001 (17 decimal)bgezal %s, %o

[ Top | Home ]

Last modified 2002-07-03