Monash University > School of Computer Science and Software Engineering > CSE1303 > Part B > Pracs > Prac B3
This prac covers material from lectures B01 to B09 and relates to material in Tutorial B3.
Hyperlinks in this document are underlined; the files themselves can be found on the HTML version of this document on the CSE1303 courseware page.
In situations like this, it is desirable to convert the machine code back into something more legible. Converting machine language into a high-level language like C is not practical, but it can usually be disassembled back to the assembly language stage without too much difficulty. Such assembly language is not as easy to read as human-coded assembly, because comments, spacing and labels have been removed from it, but it is infinitely easier to read than machine code.
A disassembler is a program which reads a machine language program and outputs an equivalent program in assembly language. It does this by breaking up each instruction into its components (opcode and operands) and rendering them in a human-readable form.
In this prac, you will write a functional disassembler for MIPS programs.
For this prac your answers should address the following criteria. Your demonstrator will consider these items when giving your answers a mark.
The marks for Preparation will be awarded only if the preparation is completed before the start of the class.
Even if you do not get the marks for preparation, you will need to complete it before going on with the regular questions, because they will refer to its answers.
#include <stdio.h>
int celsius;
int fahrenheit;
int convert(int C)
{
return (C * 9 / 5) + 32;
}
int main()
{
printf("Enter a temperature in Celsius: ");
scanf("%d", &celsius);
printf("That comes to %d in Fahrenheit.\n", convert(celsius));
return 0;
}
Note that C programs must have the file suffix .c (lowercase); other extensions such as .C (uppercase) and .cpp indicate the language C++.
gcc -S -o temp.s temp.c
Note that the -S option is in uppercase.
gcc -c -o temp.o temp.s
(This shows the symbol table of the program. Amid all the messy names you should see the functions and global variables that are contained in your program.)
(This shows a disassembly of the program's code, like you'll be doing in this prac. Find the sections corresponding to your program's functions and compare them to the file temp.s.)
gcc -o temp temp.o
Try running temp.
You can view the contents of a binary file using od or xxd, which dump a file's contents in hexadecimal:xxd -g1 temp od -Ax -tx1 tempRead the manual pages for xxd and od for a complete description of options.
Write a function which extracts the "opcode" field from bits 31 to 26 of an unsigned long parameter (the instruction), returning the value as an integer from 0 to 63.
Write similar functions to extract all of the other possible fields shown in the above diagram:
The files b3-short-sample.bin and b3-long-sample.bin contain machine language executable MIPS programs. Fetch them and store them locally.
Once you have saved them, verify that their length is a multiple of four bytes. If not, the transfer has been corrupted and you will have to fetch them again. You may have to select "save as binary" to save the files correctly.
Write a program which repeatedly reads four bytes from the file, storing them into an unsigned long variable and printing out the contents of that variable in hexadecimal. (The printf format string for printing an 8-digit hex value stored in a long is "%08lx".) It should print each word on its own line.
Note that when your C program opens the file for reading, it should indicate "binary mode", by using a mode of "rb":
handle = fopen("b3-short-sample.bin", "rb");
You will need to be aware of byte order here; the file was created in big-endian format, so if you are using a little-endian machine (such as a PC-compatible computer) you cannot simply read an unsigned long straight from the file into a variable. Instead, you will have to read the value into four unsigned char variables, and construct the 32-bit value using appropriate shift and bitwise-logical operations.
These values are the 32-bit words that make up the instructions of the program.
Modify your above program so that, before each instruction word, it prints the address that each instruction would be loaded into. Recall that the MIPS text segment begins at address 0x00400000, and that instructions are all four bytes long.
Modify your program so that it also prints out the opcode that each instruction encodes. Do not worry about the operands yet.
You may find the variables defined in mips-format.c helpful here. These variables are arrays of strings, ordered so that an array access will produce the correct instruction:printf("%s", opcodeTable[8]); /* print instruction that has opcode field 8 */If you use these variables, the strings you print will contain some other cruft with % symbols; just ignore this, it will be solved in Question 3.
Remember that if the opcode field is zero, then the instruction is an R-format instruction and the operation is defined in the "function" field in bits 5 to 0. Your program will need to deal with this special case. An array in mips-format.c contains the necessary data.
Now it is time to deal with the instruction operands. Modify your program so that it also prints each instruction's operands in a human-readable form. That is, registers should be displayed as $a0, $t3, $sp, etc., and immediate numbers should be printed in decimal.
Again, the file mips-format.c contains much useful information, including the names of registers and the formats for each instruction.To use these formats, print out the format strings character by character. When you encounter a % character, do not print it, but examine the following character. Depending on what it is, output the appropriate field using the functions that you wrote in the preparation.
You can treat branch offsets and jump target addresses as literal integer values; simply print them like other immediate numbers.
Your disassembler is now able to handle most instructions, but it still does not display some instructions correctly.
| opcode (bits 31-26) | "destination" register (bits 20-16) | format | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 000001 (1 decimal) | 00000 (0 decimal) | blez %s, %o
| 000001 (1 decimal) | 00001 (1 decimal) | bgez %s, %o
| 000001 (1 decimal) | 10000 (16 decimal) | bltzal %s, %o
| 000001 (1 decimal) | 10001 (17 decimal) | bgezal %s, %o
| |
Last modified 2002-07-03