A loop is a programming building block which allows you to repeat certain instructions until some predefined condition holds (or until a condition is no longer met, which is logically equivalent). Many loops simply repeat for a predefined number of iterations, but others are more complicated. Every processor architecture has instructions specifically designed to facilitate loop control. We treat here various methods for writing loops on the x86 processor family.
The writing of loop code is most easily shown by example; here we use a simple task of clearing a block of memory. The C version of this would be the following for loop:
for(i=0; i<100; i++)
list[i] = 0;
Assume in the following that the memory block has been defined elsewhere with first byte address, ListBegin, and (last byte + 1) address, ListEnd (note this means that the last location to be cleared is the one before ListEnd), e.g.:
ListBegin resb 100 ; reserve 100 bytes ListEnd equ $ ; define as last-byte-address+1
In these examples, BX is used as a pointer into the memory block.
Here is an example of the standard version of a loop, similar to the C version:
clrmem1:
mov bx, ListBegin ; loop setup
.loop:
mov byte [bx], 0 ; loop action
inc bx ; advance
cmp bx, ListEnd ; termination test
jb .loop1 ; recycle in loop
This short and fast version illustrates the 4 elements of a loop: 1) setup; 2) loop action; 3) loop advance; and 4) termination test. As written, this version has the disadvantage that it always executes the loop action at least once. This comes about because the end test is performed after the loop action; hence there will be one loop action done even for an empty list (ListBegin = ListEnd).
A safer version is:
clrmem2:
mov bx, ListBegin ; loop setup
.loop:
cmp bx, ListEnd ; termination test
jae .next
mov byte [bx], 0 ; loop action
inc bx ; advance
jmp short .loop ; recycle
.next: ...
Here, at the cost of one more instruction, the loop will work properly when zero iterations are called for. To speed up the loop itself, one can use the structure of the first example, but enter into the loop differently, i.e.,
clrmem3:
mov bx, ListBegin ; loop setup
jmp .lptest ; check for termination first
.loop:
mov byte [bx], 0 ; loop action
inc bx ; advance
.lptest:
cmp bx, ListEnd ; termination test
jb .loop ; recycle in loop
Use of indexed addressing creates a shorter loop sequence:
clrmem4:
mov bx, ListEnd-ListBegin-1 ; BX = # bytes - 1
.loop:
mov byte [ListBegin+bx], 0 ; loop action
dec bx ; advance (dec here)
jg .loop ; (arithmetic) termination test
Note that now the block is cleared in backwards order, i.e., so that ListBegin is cleared last. The arithmetic termination test works here so long as the memory block to be cleared is less than (215) bytes long--i.e., so long as (ListEnd-ListBegin) is positive.
The LOOP label instruction is useful when the number of iterations can be determined before the execution of the loop begins. The LOOP instruction decrements CX by 1 and, if the result is not zero, jumps to label. This results in the following form for our example task:
clrmem5:
mov cx, ListEnd-ListBegin ; CX = # bytes
xor bx, bx ; index counts up in BX (from 0)
.loop:
mov byte [ListBegin+bx], 0 ; loop action
inc bx ; advance index
loop .loop ; dec cx and jump if cx not 0
Note: On modern processors, the two instruction sequence
dec cx jnz .loopis faster than loop .loop
This loop could be even shorter if it were also possible to index through CX rather than BX, but alas this is not so in the 16-bit instruction set (in the 32-bit instruction set, it's possible to index using ECX). Note that with a loop offset advance of 1 only, the MOV instruction must be a byte move. There are also variations on the LOOP instruction available for testing zero results from the loop action in addition to counting in CX: see Section B.4.98 for further information on LOOPZ and LOOPNZ.
In addition to the examples shown, there are many other address stepping and testing forms, the usefulness of which depends on special operand situations. The string instructions (see Section 4.6) also provide specialized operations (move, compare, scan, load, and store) on memory blocks of words or bytes.