Data used in computer programs needs to be organized for efficient storage and retrieval. Thus, a critical design decision in the development of any program is the selection of an appropriate data structure and its implementation. This chapter describes arrays, queues, linked lists, binary trees, and hash tables and their implementations on the PC. For further study of these data structures as well as more complex structures, consult the references.
A data structure is a collection of data records along with a mechanism for insertion, retrieval, and deletion of records. Each record consists of several fields, including the information fields that contain the actual data. Other fields of a record may contain links, which hold addresses of other records.
The array is a fundamental data structure. Most programming languages provide for arrays as primitive structures. In a linear array, each record is associated with a single integer called its subscript or index. The records in a linear array X of n records are customarily denoted:
For example, the array Names given below contains 6 records: Alice, Bobby, Cindy, David, Ellen, and Frank.
In a d-dimensional array, each record is associated with a vector of d integer subscripts. For example, the 2-dimensional array Alias given below has the same set of records as Names, but they are organized differently:
Alias is called a 2 x 3 array, for it has 2 rows and 3 columns.
On the PC, the records of an array are stored in contiguous words to facilitate the computation of the address corresponding to the subscripts. First consider a linear array Y with n records, each of which contains b bytes of information. We reserve n*b bytes of memory space for Y with the pseudo-op resb:
Y resb n*b
To access this array, an address calculation has to be performed. Let y be the offset of the first byte of Y. With b bytes per record in Y, the address of the first byte of Y(i) is:
Suppose that each record of array Y comprises one word (b=2). Then to copy Y(i) into AX, we use the following:
mov bx, <value of i>
sal bx, 1 ; bx = 2*i
mov ax, [Y+bx] ; fetch entry
A similar, but more complex calculation must be made for d-dimensional arrays. First one decides how to allocate memory for the records. The Fortran convention is to iterate the leftmost subscript first. Thus the elements of the array Alias above would be stored in successive memory words in the following order (memory addresses increase downwards):
Alias(0,0)
Alias(1,0)
Alias(0,1)
Alias(1,1)
Alias(0,2)
Alias(1,2)
For example, assume that each record of an r x s array Z with r rows and s columns has b bytes and that Z is defined by:
Z resb r*s*b
If z is the offset of Z, then the offset of the first byte of record Z(i,j) is given by: