| MUSO | About FIT2022 | Assessment | Contacts | Laboratories | Lectures | Resources | Timetables | Tutorials |
| Last modified: 20070820:160827/fixed missing '//' in svnse2 URL and double ats | FIT2022 AJH-2007-16 |
Objectives and Outcomes | Introduction | The Disk Subsystem | The File Subsystem | Contiguous Allocation | Indexed vs. Contiguous Allocation
IMPORTANT NOTE: THE SVN SERVER HAS CHANGED TO https://svnse2.infotech.monash.edu.au/svn/FIT2022-labs
At the end of this lab session, you should:
Your demonstrator may ask you questions about these points, and will only give you a "satisfactory" for the lab if you can respond correctly.
You will doubtless be aware of modern file systems, as realised in typical operating systems. File systems provide resource management of storage space, using a model that is directed at the user, rather than the underlying system characteristics. A user can store information in files, which are managed by the operating system into read and write requests onto the disk subsystem.
In this laboratory, we build a model file system, including the underlying disk subsystem: a disk with n blocks (or sectors) of k bytes (here n = 1024 and k = 256).
In the second part, the file system is modelled after a flat single-level directory system, which includes directories and data files, using index linked allocation.
The user interface will include routines to perform file open, close, read, write, as well as file creation and deletion. You can download the two files generated by this documents: disk_sys.py and file_sys.py to use in the following exercises.
We shall build our disk subsystem as a homogeneous array of blocks, each of k bytes in length. Get a copy of disk_sys.py, and follow the code as you read the following sections.
"disk_sys.py" 3.1 =There are two main parts to the disk subsystem: a class that describes the disk data block structure, and a class that describes the disk operation itself.
First we define the block class. Each instance of a block (a block object) represents one disk block.
<block class definition 3.2> =Compile in the block size as a fixed constant, 256 bytes per block.
<block: class definitions 3.3> =Initialization function for block instantiation. Set the number of bytes, and initialize the array to contain exactly this number of zeros. Note the notation n*[0], which in Python creates a list of n copies of 0 (NB, you should be careful to distinguish this from n copies of the single element list [0]).
<block: class definitions 3.4> =Function to return the byte at the given address n. The address is checked to ensure that it is within the range of byte addresses within a block. Return a negative value if adr is not in range. (In general, we use negative return values to indicate error. In practice, we might encode more meaning into this negative value, but here we keep it simple and just use -1.)
<block: class definitions 3.5> =Function to set the byte at the given address n to the value b. Again, the address is checked to ensure that it is within the range of byte addresses within a block.
<block: class definitions 3.6> =See below.
<block: class definitions 3.7> =Two routines to get and set 16 bit quantities from a block. n is assumed to be a byte address, and does not point to the last byte in a block (or an address check error is forced).
Warning:This exercise should be completed only after all other basic sections have been completed, and are working satisfactorily. Revise getword/setword to return an error if n>=nbytes-1.
Now read on:
Generate a printable representation of a block. There are three parts to this representation:
00: 0000 0000 0000 0000 0000 0000 0000 0000 ................10: 0000 0000 0000 002a 0000 0000 0000 0000 .......*........20: 0000 0000 0000 0000 0000 0048 656c 6c6f ...........Hello30: 0000 0000 0000 0000 0000 0000 0000 0000 ................40: 0000 0000 0000 0000 0000 0000 0000 0000 ................50: 0000 0000 0000 0000 0000 0000 0000 0000 ................60: 0000 0000 0000 0000 0000 0000 0000 0000 ................70: 0000 0000 0000 0000 0000 0000 0000 0000 ................80: 0000 0000 0000 0000 0000 0000 0000 0000 ................90: 0000 0000 0000 0000 0000 0000 0000 0000 ................a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Run the test program test1.py that imports the disk subsystem, creates a block, and writes a various sequence of bytes to the block, then prints the block. Read through the code to make sure you follow what is happening.
"test1.py" 3.9 =Try modifying the code to generate your own short string (the variable printable might help you, particularly if you know that there are 32 (or 0x20) characters per line).
Define our disk system to contain 1024 disk blocks, as indicated by the constant DiskSize.
<disk: class definitions 3.11> =Create a simulated disk with DiskSize blocks in it, initialized to zero. You need a loop to initialize all the blocks, as each block must be separately instantiated.
<disk: class definitions 3.12> =Address a block on the disk at specified block number and return (a reference to the) block as result. Note that because we return a reference to the block, there is no need to provide an equivalent setblockref routine.
<disk: class definitions 3.13> =Create a printable representation of a specified block number.
Use the test program test2.py to create and test a disk object. Make sure you understand what is happening. Hint: python uses reference semantics. What does this mean? Your tutor can explain if you are not sure
"test2.py" 3.14 =What sort of interface to the file system do we want? That is, what routines are we going to define that provide file system functionality? Here is a first pass at defining those routines:
A directory defines what files are accessible within the file system. Generally, the directory stores information - such as name, location, size and type - for all files on a particular disk partition. Some of the operations that are to be performed on a directory include searching for a file, creating a file, deleting a file, listing a directory, renaming a file and traversing the file system.
For this prac, we will assume that a directory is a sequence of entries, each (conceptually) containing two key pieces of information: the identity of a file (synonymous for the moment with file name), and the location of the file. These need to be mapped into byte sequences for storing in disk blocks.
The flexibility in the implementation of files is enabled by the direct-access nature of disks. With so many files stored on the same disk, the main issue to be concerned with is how to allocate space to these files so that disk space is utilized effectively and files can be accessed quickly.
We propose several models (the reader may like to think of others) for the file and disk information:
File Name Length
Directory Structures
Allocation Method
The first part of this section will be individual work, we will implement a file system with fixed length names, flat single level directory structure, with indexed allocation. In the next part, as a group, you will be required to implement a file system also with fixed length names with the same flat single level directory structure, but with the contiguous allocation method.
Since there are 1024 disk blocks on a disk, we require 10 bits to represent a block pointer. We will be generous and allocate 16! This means that we can store 128 block addresses in a single block. (A block address will need two bytes anyway. We could squash block pointers up, and get 256*8/10 = 204.8 adrs per block, but it is hardly worth the trouble.)
We must also decide on the format of a directory entry. Given that there are a fixed number of characters to represent the directory name, we can choose say 12, with 4 bytes left over to make a 16 byte entry, one line of the block dump format. Such choices are arbitrary, and demonstrate the variability possible in file system design.
But beware! Choices like this often come back to haunt the designer, and are usually the reason why file systems must subsequently be modified as technology advances. It is the reason why, for example, Windows FAT16 file systems are limited to 2Gb in total size (see for example, Converting your hard disk to the FAT32 file systems and Scalability and Performance in Modern File Systems).
So our directory structure will look like this, and we (somewhat arbitrarily) choose to store this at block 0 on the disk. In practice, block 0 would be used for bootstrap purposes, and the root of the file system would be located at block 1 (or higher).
00: rrrr bbbb NNNN NNNN NNNN NNNN NNNN NNNN ....File Name 00 10: rrrr bbbb NNNN NNNN NNNN NNNN NNNN NNNN ....File Name 01 20: rrrr bbbb NNNN NNNN NNNN NNNN NNNN NNNN ....File Name 02 30: rrrr bbbb NNNN NNNN NNNN NNNN NNNN NNNN ....File Name 03
where rrrr is reserved for future use, bbbb is a two byte block address, and NN...NN is a 12 character file name (with blank fill). With a block size of 256 bytes, this restricts us to just 16 directory entries!!
Since block 0 is reserved for the directory table, we use a block address of 0 in the directory entry to indicate that the corresponding file does not exist, and that the directory entry is free for use. If the block address is zero, the value of the filename field is irrelevant. We assume nothing about the ordering of allocation of files to directory entries, using a first available strategy for allocation.
Now we must tackle another issue. How do we allocate disk blocks to particular files? We will need to keep track of which blocks are in use, and which ones are available or free. To do this, we implement a free block bitmap, which has a single bit allocated for each block in the disk, which is 0 if the block is free, and a 1 if in use.
<file: housekeeping routines 4.2> =allocateblock reserves block numbered n to be in use. We read the appropriate byte from the bitmap, and "or" in the mask, which has a 1 shifted into the appropriate position. Note that the shift has to shift the mask into the high position (7 bit shift) when the bit number is zero, counting from the msb of the byte.
On a sheet of paper, write the values (in binary and hexadecimal) of the key variables nbyte, nbit, mask, and clearmask for the following calls to allocateblock:
allocateblock(3) allocateblock(7) allocateblock(25)
freeblock frees block numbered n to be no longer in use. Note how it performs the complementary operation to allocateblock, using an and instead of an or (Why?).
<file: housekeeping routines 4.4> =getfreeblock returns the block adr of a free block in the disk subsystem.
Note that as block 1 is used to store the disk free block bitmap, it must itself be reserved in that bitmap.
Now we can write the file initialization routines. There are two routines. The first is for object initialization, which creates the underlying disk structure. The second routine models the nkfs routine of Unix, setting up the initial directory block and free block bitmap. It must ensure that at least every block address in the directory is zeroed, and that the first two blocks in the disk subsystem are marked as allocated.
<file: initialization routine 4.5> =Creating the file system requires that we have a disk subsystem upon which it is built.
<file: initialization routine 4.6> =Note that dirblock is a reference to the directory block (block 0), and that freebitmap is a reference to the free block bitmap (block 1). We just use the references as a convenient way of referring permanently to these blocks. Note that these two blocks are themselves reserved in the free bitmap.
Creating a file requires firstly finding a free directory entry, saving the file name, and allocating space for the file (index table).
<file: find a free directory entry 4.8> =Scan through the directory, looking for zero index block pointers. If none exist, we are in trouble, and the file directory is full. so no more files can be created.
<file: record the file name 4.9> =Write in the file name at the free location.
<file: create an index block and record it 4.10> =We choose to implement fixed length names, flat single level directory, with indexed allocation. This means that the file location is defined by a pointer to a single disk block containing the addresses of all the constituent blocks. We must grab a free block for this index table before any file operations can occur.
We now have developed enough of the system to try creating a few files. Download the file test3.py and run it. Extend it by creating additional files. Make sure you understand what happens to the directory block and free space block.
"test3.py" 4.11 =What does it mean to open a file? It is a signal to the operating system that you wish to interact with a particular file, and that data structures for operating on the file should be created, ready to use the file. The main data structure is called a file descriptor, which records information such as the address of the index table (called the inode in Unix terminology), and the current access position in the file.
We will follow the Unix model, and regard a file as a contiguous stream or sequence of bytes. This is of course, the logical model: the physical reality is that the bytes are scattered over whatever blocks have been allocated to the file.
We model the file descriptor as a Python list of [indexblock,fileposition], returned by the open routine:
<file: open routine 4.12> =Opening a file requires that we search the directory to find the corresponding file name. We call a convenient housekeeping routine to do the work, which returns a pointer to the index block or inode if the file is present in the directory, or -1 if it isn't.
<file: housekeeping routines 4.13> =scan through the directory block, this time looking at non-zero index pointers. If we find none, or no match, then report failure.
<file: check if dir entry valid and match: break out if so 4.14> =Collect filename from this entry, and compare to target. Break out of enclosing search loop if there is a match.
Try opening one of the files you created in exercise 6, and also a file that you haven't created. Is this a sensible response? What might a more elaborate design do?
How big is a file? One of the questions we haven't addressed yet is how much space to allocate to a file, and to do this requires some knowledge of how big the file may get.
Rather than reserve space a priori (Latin: meaning "before the fact or event"), we allocate no space initially to the file, other than its directory entry and index table. Then, as things are written to the file, we expand it dynamically to grow as necessary. This has the advantage (with the indexed organisation) that areas never written never use up space. A sparse file will only require a few disk blocks, even if it is conceptually megabytes in size.
Nevertheless, we still need to keep track of the logical size of a file, so that attempts to access beyond its size can be trapped. To this end, we record the maximum address ever written, known as the end of file. Where is this recorded? The most appropriate place is in the inode or index table. Hence we reserve the first two bytes of the inode to store the file size or end of file pointer. Two bytes limits us to files that are at most 65535 bytes in length, but since that represents a quarter of our disk system, we won't worry too much about that!
<file: write routine 4.15> =We work out where the data is to be written by computing a block number bn and offset within that block. This formed from the position field (from the file descriptor) by dividing it by the block size to get the block number, and using the remainder as the offset (see lecture slide 11.9). We update the block with the data (remember the reference semantics!.
<file: get data block, allocate if necessary 4.16> =As described above, the index table is extended as necessary. If the block number pointer is zero, the required block has never been allocated, so we must do that first and update the index table. Then we can grab the necessary data block.
<file: update file size and file descriptor 4.17> =We keep track of the current position in the file and update the file descriptor accordingly. In addition, if this write extends the size of the file, we must update the end of file pointer. Since this is restricted to two bytes, there is an upper bound on the file size.
<file: write routine 4.18> =This routine simply extends the "user friendliness" of the interface. It takes a string of data and pumps it into the file at the current position. One might write a more efficient way of doing this in a real system!
Download test4.py and run it. Add some more writestrs and experiment to see what data gets written to disk and where.
"test4.py" 4.19 =Code up the routine readbyte, and test it using the block2str routine provided in disk.py.
You can also modify the routine or extend it.
Write the readstr routine yourself and test it. In modifying and writing the readstr routine yourself, you will need to address the design questions of what happens on End-Of-File, and how long a string should be read. (How long is a piece of string?)
In practice, closing a file means releasing any resources used by the requesting program. These usually the file descriptors used for accessing the file: here the file descriptor is represented by a single Python list, and we do not need to do anything specific to release that. Nevertheless, we define a close routine for future compatibility!
<file: close routine 4.22> =Write a delete file interface. Using a test file similar to test4.py, delete one or two files, and look particularly at what happens to the directory block and free space block.
The above code fragments are based on the index allocation method. Your group will now be required to re-implement the file subsystem file_sys.py based on an fixed file length name with the same directory structure but with the contiguous allocation method for the same underlying disk subsystem.
Since you have already gone through the first part of the development for a simulated file system with fixed length file names and index allocation method, now, you need to apply what you have learned to implement a file system with the contiguous allocation method.
Remember, as part of your group work, you will need to:
Once you have set up the SVN repository and determined the work distribution, you can get started on your work.
Discuss within your group and run these experiments.
Explore what happens when you try and fill the simulated disk: do you hit the file size limit first, or the limit on the number of index blocks? Can you work this out from the code first? Then run an experiment to check your answer. You may need to run the experiment for your tutor as well if asked.
In the preceding sections, we have examined the directory management options and block-allocation, specifically the single-level indexed directory structure with both the indexed and contiguous allocation methods. Now, considering the two allocation methods implemented, we evaluate the disk performance and efficiency using the two differing allocation methods used.
For this evaluation, we consider two issues:
For the purpose of this evaluation and simplicity (and since we have not specified the type of disk used), we will opt to ignore the seek time (the time required to position the read-write head to a location)
Given that there are 1024 blocks to a cylinder and the rotational latency (the time for the proper block to rotate under the head) is 0.0000167 seconds per block with a transfer time (the time required to transfer data to or from the disk) for one byte of 0.00000083 seconds, we can evaluate the performances based on the number of disk accesses and the time it takes to read a file.
Now, your group task is to implement a simple routine that will calculate the total time taken to read a file, using both the indexed and contiguous allocation methods. You will need to use the readbyte() routine to obtain the number of disk accesses and the number of bytes read. Based on this number, you can then calculate the total time taken to read a file. Remember, you will need to consider the distances between the blocks as well as the number of bytes read within a block.
| 20070820:160827 | 1.0.2 | ajh | fixed missing '//' in svnse2 URL and double ats |
| 20070816:123840 | 1.0.1 | ajh | added advanced exercise and corrected typos |
| 20070809:154843 | 1.0.0 | ajh | initial version |
| This page maintained by John Hurst. Copyright Monash University Copyright Policy |
| ||
Generated at 20090705:0417 from an XML file modified on 20071117:1905 | |||