Introduction

The purpose of this prac is to give you a working knowledge of processes and programs, and an appreciation of the differences and relations between them. You will be asked to write a basic shell in UNIX as an example of how an operating system makes programs and processes available to programmers. At the end of the prac you will have a working shell which is based on the same ideas as any other shell, and so having written your own you will have an understanding in the use of any others.

The prac is divided into sections, and in each section you will be asked to develop your program a little further. It is important you keep copies of the state of your code from the end of each section, as your demonstrator will mark every stage, not just the final version.

You must complete this prac to pass it.

Since many of the functions introduced in this prac will probably be new to you, this document explains in general terms what they do. Neverthless, it will be necessary for you to look them up in the man pages to find the appropriate #include's that you need to use, along with the types and order of the function arguments. There are instructions for printing man pages, along with other information, here.

execve() - A program is not a function

To begin with, what we'll aim to do is write a program that will load in a different program and execute it as many times as we like. For this, write a loop, and at the start of the loop give the user the option of exiting so the user can choose, each time, whether they want to let the loop go again. Make sure that you print a message when presenting the user with the choice. After that choice, we will execute the new program.

The standard UNIX call to execute a program is execve(), and it is best if you refer to the man page on it. For the moment, make the arguments to the program just the name of the program itself, and make the environment empty. It is wrong to use the arguments from main() since the arguments to the programs you run in your shell will be completely different. Specify an absolute path to the program, and hardcode this into your shell. Do not read the program name from input yet. An easy program to begin with is /bin/ls. You may find when looking through the man pages that there are variations to execve() that do extra things; in this prac you are not allowed to use them, as you must learn how to do some of the things those variants do for you.

Once you get this working, you may be surprised to find that your program doesn't loop. This is due to the fact that what execve() really does is it completely replaces your program with the one that is being loaded in. The process which was running your program is still the same process, but it is now running a different program, and when that program finishes, the process ends also. In this way, running the new program is nothing like a function call since it doesn't return to your original program.

(For any seriously curious hackers, there is a way to call a program like a function, but muck around with it only once you've finished the prac. The are no marks for this. The hints are here.)

fork() - Creating a new process

So how do shells overcome the problem of running a new program without wiping themselves out? The answer is to run the new program in a new process.

The way to create a new process in UNIX with fork(). The name refers to the kind of fork which is a `fork in the road', because before the fork there is one flow of execution (the original process), whereas after it there are two (the original process and the new one). The analogy doesn't entirely work, however, because there is a difference in the two flows - one of them is new - and this is a difference we want to know, because it is the new process which will load in and execute the new program.

At this point you might wonder what program the new process would be running to start with, since there is no such thing as a process which is not running a program at all. fork() does not take any arguments, so there is now way of specifying a program, but on the other hand we need the new process to do exactly what we want, so it can't be running just anything. The only solution is that the new process must be running the same program that the original process was, so that we have control over it, and this is exactly what happens.

When the new process appears, however, it doesn't start the old program from the beginning again. This would make no sense because we would not be able to control new processes; every new one would just repeat the beginning. Instead, the new process starts executing immediately after the fork() which created it returns. In other words, that function call really is the place where the flow of execution forks, and the new process is (nearly) an exact copy of the original one at that point.

Finally, to be able to properly control the two processes we now have, the program that they run needs to be able to tell the new process, which in UNIX is called the child, from the original one. which is called the parent. Since fork() was responsible for creating the new process, it is reasonable to expect fork() to provide this information, and it does that with the return value. When fork() returns to the child, the return value is 0, and so is very easy to test for to direct the child to behave differently to the parent. The parent, on the other hand, gets a return value which is the process ID of the child (a positive number), but you don't need to worry about this yet. The only other possibility is a negative return value, which happens if the process creation fails, and so of course is only returned to what would have been the parent process. The man page for fork() has more detail.

Modify your program so that each time through the loop it calls fork() after asking whether the user wants to exit, and does the same execve() as before, but only in the newly created child.

When you have your program working, run it and go through the loop a couple of times (watch out for the exit prompt possibly mixing with the output of your child program), but when you're done, do not exit - just leave your program sitting there waiting for input. In a different window type
ps jx
This will list all the processes you have running and give, amongst other things, the parent process ID (PPID) and the process ID (PID) for each of them. You should see that your program has a number of children listed and that they're just sitting there, even though they've finished. If you exit your program now they will vanish, but they shouldn't have been sitting there in the first place.

waitpid() - Waiting for the children

If you took the suggestion made earlier, the program you've been using to execve() has been /bin/ls, which probably executes very quickly. Imagine, however, that the children were executing a very slow program, or one that asked for input when it was run (you can try this if you like). It is possible, if not likely, that your shell program would loop and be ready to create another child before the previous child had finished (this might even have happened with /bin/ls), and in that case the prompt for the next loop would be mixed with the output of the previous one. This is not how shells behave - they usually only return you to the prompt once the current program has finished running.

This is what waitpid() is for. When called by the parent, it will cause the parent to pause until the particular process being waited for has finished. If we did this, however, there is another possible problem. Imagine that the child was very quick and ended before the parent got up to the waitpid(). What happens once the parent actually gets to wait? Will it wait forever since the child is already gone? To solve this, UNIX forces a finishing child to wait until the parent waits for it, as bizarre as that sounds. That is why you got a list of children at the end of the previous section - they were waiting for their parent to wait for them. Such child processes that have finished but are being forced to wait like that are called `zombies'.

Modify your program so that it waits for its children straight after creating them. Check the man page for the meaning of the other arguments for waitpid(). You will see that waitpid() has options, but you shouldn't use any of them. You must also specify the particular PID of the child you are waiting for, even though waitpid() can accept special values to wait for all children.

Choosing what to execute

What you have so far is, of course, not much like a real shell yet because it always executes the same thing. Modify your program now so that each time through the loop it either offers to exit, or asks for the name of a program to execute. This name is then what is passed to execve(). At this point it is important to add error handling to the execve() call because it is very easy to make a mistake typing in a program name. If you look in the execve() man page you will see that if an error occurs its code is placed in errno, and you should use strerror() to convert that to a readable error for output. You should also exit if the execve() call fails, since this is what the child process should do if it can't execute the chosen program.

Since you will still have to type absolute paths, it is useful to find where some standard UNIX commands are by using which.

Command line arguments

It is the convention with shells that spaces separate arguments. Modify your program so that after reading the input, it splits it up into parts using spaces as the separator. The function strtok() is very good for this kind of thing. The first part will be the program to execute as well as being argument 0 (actually argument 0 is what the program is called in the process table, which can be different), but the remaining parts will only be arguments to the program. Pass the program arguments to execve() correctly and try running some familiar UNIX programs with command line arguments.

Program environment

If you haven't tried running a graphical program inside your shell yet, try it now. The reason it doesn't work is that it doesn't know the address of the graphic display. In a normal shell, if you type
printenv DISPLAY
you will see what it is that the program needed. This isn't the only information in the environment that programs rely on, and usually shells hand on everything about their own environment to any other programs that they run. This is found in the environ variable (which has a man page), and is passed as an argument to execve().

Modify your program to correctly pass the environment, and make sure it works by running some graphical applications.

Path searching

The finishing touch to make your program a basic working shell is to allow commands without absolute paths, i.e. typing just ls instead of /bin/ls. Nothing can change the fact, however, that you still need to know where ls is to run it, and so the only solution is to search for it. So where do you search? The answer, by convention, is in another environment variable called PATH, which you can access with the function getenv().

In a normal shell print the PATH variable like you did for DISPLAY earlier and see what it looks like; it is a list of directories separated by colons. When you do the search you have to go through each of these directories and see if it contains the program that has been asked for. The easiest way to test for the program is with the stat() function, but you can use any file I/O routine you like that will work. The first directory that has the program is the one that you use. If none of them have it, you output an error.

Modify your program to do this path search, but only if the program name does not have a '/' character in it. If that character is present, then at least some part of the path has been given and path search should be skipped. That way you can still run programs by specifying absolute paths, or even with relatives paths such as `./prog' which you probably use for your own programs.

For some final polishing you might like to replace the exit option of the loop with a special case if the normal command line just says the word `exit'. Now your shell should look, and work, like the real thing.