4.2 Variations on Loops

A loop is a programming building block which allows you to repeat certain instructions until some predefined condition holds (or until a condition is no longer met, which is logically equivalent). Many loops simply repeat for a predefined number of iterations, but others are more complicated. Every processor architecture has instructions specifically designed to facilitate loop control. We treat here various methods for writing loops on the x86 processor family.

The writing of loop code is most easily shown by example; here we use a simple task of clearing a block of memory. The C version of this would be the following for loop:

for(i=0; i<100; i++)
    list[i] = 0;

Assume in the following that the memory block has been defined elsewhere with first byte address, ListBegin, and (last byte + 1) address, ListEnd (note this means that the last location to be cleared is the one before ListEnd), e.g.:

ListBegin       resb    100     ; reserve 100 bytes
ListEnd         equ     $       ; define as last-byte-address+1

In these examples, BX is used as a pointer into the memory block.

4.2.1 Standard Loops

Here is an example of the standard version of a loop, similar to the C version:

clrmem1:
        mov     bx, ListBegin   ; loop setup
.loop:
        mov     byte [bx], 0    ; loop action
        inc     bx              ; advance
        cmp     bx, ListEnd     ; termination test
        jb      .loop1          ; recycle in loop

This short and fast version illustrates the 4 elements of a loop: 1) setup; 2) loop action; 3) loop advance; and 4) termination test. As written, this version has the disadvantage that it always executes the loop action at least once. This comes about because the end test is performed after the loop action; hence there will be one loop action done even for an empty list (ListBegin = ListEnd).

A safer version is:

clrmem2:
        mov     bx, ListBegin   ; loop setup
.loop:
        cmp     bx, ListEnd     ; termination test
        jae     .next
        mov     byte [bx], 0    ; loop action
        inc     bx              ; advance
        jmp     short .loop     ; recycle
.next:  ...

Here, at the cost of one more instruction, the loop will work properly when zero iterations are called for. To speed up the loop itself, one can use the structure of the first example, but enter into the loop differently, i.e.,

clrmem3:
        mov     bx, ListBegin   ; loop setup
        jmp     .lptest         ; check for termination first
.loop:
        mov     byte [bx], 0    ; loop action
        inc     bx              ; advance
.lptest:
        cmp     bx, ListEnd     ; termination test
        jb      .loop           ; recycle in loop

4.2.2 Indexed Loops

Use of indexed addressing creates a shorter loop sequence:

clrmem4:
        mov     bx, ListEnd-ListBegin-1 ; BX = # bytes - 1
.loop:
        mov     byte [ListBegin+bx], 0  ; loop action
        dec     bx                      ; advance (dec here)
        jg      .loop                   ; (arithmetic) termination test

Note that now the block is cleared in backwards order, i.e., so that ListBegin is cleared last. The arithmetic termination test works here so long as the memory block to be cleared is less than (215) bytes long--i.e., so long as (ListEnd-ListBegin) is positive.

4.2.3 The LOOP instruction

The LOOP label instruction is useful when the number of iterations can be determined before the execution of the loop begins. The LOOP instruction decrements CX by 1 and, if the result is not zero, jumps to label. This results in the following form for our example task:

clrmem5:
        mov     cx, ListEnd-ListBegin   ; CX = # bytes
        xor     bx, bx                  ; index counts up in BX (from 0)
.loop:
        mov     byte [ListBegin+bx], 0  ; loop action
        inc     bx                      ; advance index
        loop    .loop                   ; dec cx and jump if cx not 0

Note: On modern processors, the two instruction sequence

        dec     cx
        jnz     .loop

is faster than loop .loop

This loop could be even shorter if it were also possible to index through CX rather than BX, but alas this is not so in the 16-bit instruction set (in the 32-bit instruction set, it's possible to index using ECX). Note that with a loop offset advance of 1 only, the MOV instruction must be a byte move. There are also variations on the LOOP instruction available for testing zero results from the loop action in addition to counting in CX: see Section B.4.98 for further information on LOOPZ and LOOPNZ.

In addition to the examples shown, there are many other address stepping and testing forms, the usefulness of which depends on special operand situations. The string instructions (see Section 4.6) also provide specialized operations (move, compare, scan, load, and store) on memory blocks of words or bytes.