| What is it? | Who wrote it? | Where is it? | Latest version | Known bugs | |||||
| Link to this site | Manual | Other documents | Chat forum | Tools and Libs | |||||
Next Chapter | Previous Chapter | Contents | Index
Like most assemblers, each NASM source line contains (unless it is a macro, a preprocessor directive or an assembler directive: see chapter 4 and chapter 5) some combination of the four fields
label: instruction operands ; comment
As usual, most of these fields are optional; the presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field.
NASM places no restrictions on white space within a line: labels may have
white space before them, or instructions may have no space before them, or
anything. The colon after a label is also optional. (Note that this means that
if you intend to code alone on a line, and type
by accident, then that's still a valid source
line which does nothing but define a label. Running NASM with the command-line
option will cause it to warn you if
you define a label alone on a line without a trailing colon.)
Valid characters in labels are letters, numbers, ,
, ,
, ,
, and . The only
characters which may be used as the first character of an identifier
are letters, (with special meaning: see section
3.8), and . An
identifier may also be prefixed with a to indicate
that it is intended to be read as an identifier and not a reserved word; thus,
if some other module you are linking with defines a symbol called
, you can refer to
in NASM code to distinguish the symbol from the register.
The instruction field may contain any machine instruction: Pentium and P6
instructions, FPU instructions, MMX instructions and even undocumented
instructions are all supported. The instruction may be prefixed by
, ,
/ or
/, in the usual
way. Explicit address-size and operand-size prefixes
, ,
and are provided -
one example of their use is given in chapter 9. You can
also use the name of a segment register as an instruction prefix: coding
is equivalent to coding . We recommend the latter syntax, since it is consistent
with other syntactic features of the language, but for instructions such as
, which has no operands and yet can require a
segment override, there is no clean syntactic way to proceed apart from
.
An instruction is not required to use a prefix: prefixes such as
, ,
or can appear on a
line by themselves, and NASM will just generate the prefix bytes.
In addition to actual machine instructions, NASM also supports a number of pseudo-instructions, described in section 3.2.
Instruction operands may take a number of forms: they can be registers,
described simply by the register name (e.g. ,
, ,
: NASM does not use the
-style syntax in which register names must be
prefixed by a sign), or they can be effective
addresses (see section
3.3), constants (section
3.4) or expressions (section
3.5).
For floating-point instructions, NASM accepts a wide range of syntaxes: you can use two-operand forms like MASM supports, or you can use NASM's native single-operand forms in most cases. Details of all forms of each supported instruction are given in appendix A. For example, you can code:
fadd st1 ; this sets st0 := st0 + st1
fadd st0,st1 ; so does this
fadd st1,st0 ; this sets st1 := st1 + st0
fadd to st1 ; so does this
Almost any floating-point instruction that references memory must use one of
the prefixes , or
to indicate what size of memory operand it
refers to.
Pseudo-instructions are things which, though not real x86 machine
instructions, are used in the instruction field anyway because that's the most
convenient place to put them. The current pseudo-instructions are
, ,
, and
, their uninitialised counterparts
, ,
, and
, the command,
the command, and the
prefix.
DB and friends:
Declaring Initialised Data, ,
, and
are used, much as in MASM, to declare initialised
data in the output file. They can be invoked in a wide range of ways:
db 0x55 ; just the byte 0x55
db 0x55,0x56,0x57 ; three bytes in succession
db 'a',0x55 ; character constants are OK
db 'hello',13,10,'$' ; so are string constants
dw 0x1234 ; 0x34 0x12
dw 'a' ; 0x41 0x00 (it's just a number)
dw 'ab' ; 0x41 0x42 (character constant)
dw 'abc' ; 0x41 0x42 0x43 0x00 (string)
dd 0x12345678 ; 0x78 0x56 0x34 0x12
dd 1.234567e20 ; floating-point constant
dq 1.234567e20 ; double-precision float
dt 1.234567e20 ; extended-precision float
and do not accept
numeric constants or string constants as operands.
RESB and friends:
Declaring Uninitialised Data, ,
, and
are designed to be used in the BSS section of a
module: they declare uninitialised storage space. Each takes a single
operand, which is the number of bytes, words, doublewords or whatever to
reserve. As stated in section
2.2.7, NASM does not support the MASM/TASM syntax of reserving uninitialised
space by writing or similar things: this is what
it does instead. The operand to a -type
pseudo-instruction is a critical expression: see section
3.7.
For example:
buffer: resb 64 ; reserve 64 bytes wordvar: resw 1 ; reserve a word realarray resq 10 ; array of ten reals
INCBIN : Including
External Binary Files is borrowed from the old Amiga assembler
DevPac: it includes a binary file verbatim into the output file. This can be
handy for (for example) including graphics and sound data directly into a game
executable file. It can be called in one of these three ways:
incbin "file.dat" ; include the whole file
incbin "file.dat",1024 ; skip the first 1024 bytes
incbin "file.dat",1024,512 ; skip the first 1024, and
; actually include at most 512
EQU : Defining
Constants defines a symbol to a given constant value:
when is used, the source line must contain a
label. The action of is to define the given label
name to the value of its (only) operand. This definition is absolute, and cannot
change later. So, for example,
message db 'hello, world' msglen equ $-message
defines to be the constant 12.
may not then be redefined later. This is not a
preprocessor definition either: the value of is
evaluated once, using the value of (see section
3.5 for an explanation of ) at the point of
definition, rather than being evaluated wherever it is referenced and using the
value of at the point of reference. Note that the
operand to an is also a critical expression (section
3.7).
TIMES : Repeating
Instructions or DataThe prefix causes the instruction to be
assembled multiple times. This is partly present as NASM's equivalent of the
syntax supported by MASM-compatible assemblers, in
that you can code
zerobuf: times 64 db 0
or similar things; but is more versatile than
that. The argument to is not just a numeric
constant, but a numeric expression, so you can do things like
buffer: db 'hello, world'
times 64-$+buffer db ' '
which will store exactly enough spaces to make the total length of
up to 64. Finally,
can be applied to ordinary instructions, so you
can code trivial unrolled loops in it:
times 100 movsb
Note that there is no effective difference between and , except that the latter
will be assembled about 100 times faster due to the internal structure of the
assembler.
The operand to , like that of
and those of and
friends, is a critical expression (section
3.7).
Note also that can't be applied to macros:
the reason for this is that is processed after
the macro phase, which allows the argument to to
contain expressions such as as above. To
repeat more than one line of code, or a complex macro, use the preprocessor
directive.
An effective address is any operand to an instruction which references memory. Effective addresses, in NASM, have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in square brackets. For example:
wordvar dw 123
mov ax,[wordvar]
mov ax,[wordvar+1]
mov ax,[es:wordvar+bx]
Anything not conforming to this simple system is not a valid memory reference
in NASM, for example .
More complicated effective addresses, such as those involving more than one register, work in exactly the same way:
mov eax,[ebx*2+ecx+offset]
mov ax,[bp+di+8]
NASM is capable of doing algebra on these effective addresses, so that things which don't necessarily look legal are perfectly all right:
mov eax,[ebx*5] ; assembles as [ebx*4+ebx]
mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]
Some forms of effective address have more than one assembled form; in most
such cases NASM will generate the smallest form it can. For example, there are
distinct assembled forms for the 32-bit effective addresses
and , and
NASM will generally generate the latter on the grounds that the former requires
four bytes to store a zero offset.
NASM has a hinting mechanism which will cause
and to
generate different opcodes; this is occasionally useful because
and have
different default segment registers.
However, you can force NASM to generate an effective address in a particular
form by the use of the keywords ,
, and
. If you need
to be assembled using a double-word offset field instead of the one byte NASM
will normally generate, you can code .
Similarly, you can force NASM to use a byte offset for a small value which it
hasn't seen on the first pass (see section
3.7 for an example of such a code fragment) by using . As special cases, will code with a byte
offset of zero, and will code it with a
double-word offset of zero. The normal form, ,
will be coded with no offset field.
Similarly, NASM will split into
because that allows the offset field to be
absent and space to be saved; in fact, it will also split
into
. You can combat this behaviour by the
use of the keyword: will force to be
generated literally.
NASM understands four different types of constant: numeric, character, string and floating-point.
A numeric constant is simply a number. NASM allows you to specify numbers in
a variety of number bases, in a variety of ways: you can suffix
, and
for hex, octal and binary, or you can prefix
for hex in the style of C, or you can prefix
for hex in the style of Borland Pascal. Note,
though, that the prefix does double duty as a prefix
on identifiers (see section
3.1), so a hex number prefixed with a sign must
have a digit after the rather than a letter.
Some examples:
mov ax,100 ; decimal
mov ax,0a2h ; hex
mov ax,$0a2 ; hex again: the 0 is required
mov ax,0xa2 ; hex yet again
mov ax,777q ; octal
mov ax,10010011b ; binary
A character constant consists of up to four characters enclosed in either single or double quotes. The type of quote makes no difference to NASM, except of course that surrounding the constant with single quotes allows double quotes to appear within it and vice versa.
A character constant with more than one character will be arranged with little-endian order in mind: if you code
mov eax,'abcd'
then the constant generated is not , but
, so that if you were then to store the
value into memory, it would read rather than
. This is also the sense of character constants
understood by the Pentium's instruction (see section
A.22).
String constants are only acceptable to some pseudo-instructions, namely the
family and .
A string constant looks like a character constant, only longer. It is treated as a concatenation of maximum-size character constants for the conditions. So the following are equivalent:
db 'hello' ; string constant
db 'h','e','l','l','o' ; equivalent character constants
And the following are also equivalent:
dd 'ninechars' ; doubleword string constant
dd 'nine','char','s' ; becomes three doublewords
db 'ninechars',0,0,0 ; and really looks like this
Note that when used as an operand to , a constant
like is treated as a string constant despite
being short enough to be a character constant, because otherwise would have the same effect as , which would be silly. Similarly, three-character or
four-character constants are treated as strings when they are operands to
.
Floating-point constants are acceptable only as arguments to
, and
. They are expressed in the traditional form:
digits, then a period, then optionally more digits, then optionally an
followed by an exponent. The period is mandatory, so
that NASM can distinguish between , which declares
an integer constant, and which declares a
floating-point constant.
Some examples:
dd 1.2 ; an easy one
dq 1.e10 ; 10,000,000,000
dq 1.e+10 ; synonymous with 1.e10
dq 1.e-10 ; 0.000 000 000 1
dt 3.141592653589793238462 ; pi
NASM cannot do compile-time arithmetic on floating-point constants. This is because NASM is designed to be portable - although it always generates code to run on x86 processors, the assembler itself can run on any system with an ANSI C compiler. Therefore, the assembler cannot guarantee the presence of a floating-point unit capable of handling the Intel number formats, and so for NASM to be able to do floating arithmetic it would have to include its own complete set of floating-point routines, which would significantly increase the size of the assembler for very little benefit.
Expressions in NASM are similar in syntax to those in C.
NASM does not guarantee the size of the integers used to evaluate expressions at compile time: since NASM can compile and run on 64-bit systems quite happily, don't assume that expressions are evaluated in 32-bit registers and so try to make deliberate use of integer overflow. It might not always work. The only thing NASM will guarantee is what's guaranteed by ANSI C: you always have at least 32 bits to work in.
NASM supports two special tokens in expressions, allowing calculations to
involve the current assembly position: the and
tokens. evaluates to
the assembly position at the beginning of the line containing the expression; so
you can code an infinite loop using .
evaluates to the beginning of the current section;
so you can tell how far into the section you are by using
.
The arithmetic operators provided by NASM are listed here, in increasing order of precedence.
| : Bitwise OR
OperatorThe operator gives a bitwise OR, exactly as
performed by the machine instruction. Bitwise OR is
the lowest-priority arithmetic operator supported by NASM.
^ : Bitwise XOR
Operator provides the bitwise XOR operation.
& : Bitwise AND
Operator provides the bitwise AND operation.
<< and
>> : Bit Shift Operators gives a bit-shift to the left, just as it
does in C. So evaluates to 5 times 8, or
40. gives a bit-shift to the right; in NASM,
such a shift is always unsigned, so that the bits shifted in from the
left-hand end are filled with zero rather than a sign-extension of the previous
highest bit.
+ and
- : Addition and Subtraction OperatorsThe and operators do
perfectly ordinary addition and subtraction.
* ,
/ , // ,
% and %% : Multiplication and
Division is the multiplication operator.
and are both division
operators: is unsigned division and
is signed division. Similarly,
and provide unsigned
and signed modulo operators respectively.
NASM, like ANSI C, provides no guarantees about the sensible operation of the signed modulo operator.
Since the character is used extensively by the
macro preprocessor, you should ensure that both the signed and unsigned modulo
operators are followed by white space wherever they appear.
+ ,
- , ~ and
SEG The highest-priority operators in NASM's expression grammar are those which
only apply to one argument. negates its operand,
does nothing (it's provided for symmetry with
), computes the one's
complement of its operand, and provides the
segment address of its operand (explained in more detail in section
3.6).
SEG and
WRT When writing large 16-bit programs, which must be split into multiple
segments, it is often necessary to be able to refer to the segment part of the
address of a symbol. NASM supports the operator to
perform this function.
The operator returns the preferred
segment base of a symbol, defined as the segment base relative to which the
offset of the symbol makes sense. So the code
mov ax,seg symbol
mov es,ax
mov bx,symbol
will load with a valid pointer to the symbol
.
Things can be more complex than this: since 16-bit segments and groups may
overlap, you might occasionally want to refer to some symbol using a different
segment base from the preferred one. NASM lets you do this, by the use of the
(With Reference To) keyword. So you can do things
like
mov ax,weird_seg ; weird_seg is a segment base
mov es,ax
mov bx,symbol wrt weird_seg
to load with a different, but functionally
equivalent, pointer to the symbol .
NASM supports far (inter-segment) calls and jumps by means of the syntax
, where
and both
represent immediate values. So to call a far procedure, you could code either of
call (seg procedure):procedure
call weird_seg:(procedure wrt weird_seg)
(The parentheses are included for clarity, to show the intended parsing of the above instructions. They are not necessary in practice.)
NASM supports the syntax as a
synonym for the first of the above usages. works
identically to in these examples.
To declare a far pointer to a data item in a data segment, you must code
dw symbol, seg symbol
NASM supports no convenient synonym for this, though you can always invent one using the macro processor.
A limitation of NASM is that it is a two-pass assembler; unlike TASM and others, it will always do exactly two assembly passes. Therefore it is unable to cope with source files that are complex enough to require three or more passes.
The first pass is used to determine the size of all the assembled code and data, so that the second pass, when generating all the code, knows all the symbol addresses the code refers to. So one thing NASM can't handle is code whose size depends on the value of a symbol declared after the code in question. For example,
times (label-$) db 0 label: db 'Where am I?'
The argument to in this case could equally
legally evaluate to anything at all; NASM will reject this example because it
cannot tell the size of the line when it first
sees it. It will just as firmly reject the slightly paradoxical code
times (label-$+1) db 0 label: db 'NOW where am I?'
in which any value for the argument
is by definition wrong!
NASM rejects these examples by means of a concept called a critical
expression, which is defined to be an expression whose value is required to
be computable in the first pass, and which must therefore depend only on symbols
defined before it. The argument to the prefix is
a critical expression; for the same reason, the arguments to the
family of pseudo-instructions are also critical
expressions.
Critical expressions can crop up in other contexts as well: consider the following code.
mov ax,symbol1 symbol1 equ symbol2 symbol2:
On the first pass, NASM cannot determine the value of
, because is
defined to be equal to which NASM hasn't seen
yet. On the second pass, therefore, when it encounters the line , it is unable to generate the code for it because it
still doesn't know the value of . On the next
line, it would see the again and be able to
determine the value of , but by then it would
be too late.
NASM avoids this problem by defining the right-hand side of an
statement to be a critical expression, so the
definition of would be rejected in the first
pass.
There is a related issue involving forward references: consider this code fragment.
mov eax,[ebx+offset] offset equ 10
NASM, on pass one, must calculate the size of the instruction without knowing the value of
. It has no way of knowing that
is small enough to fit into a one-byte offset
field and that it could therefore get away with generating a shorter form of the
effective-address encoding; for all it knows, in pass one,
could be a symbol in the code segment, and it
might need the full four-byte form. So it is forced to compute the size of the
instruction to accommodate a four-byte address part. In pass two, having made
this decision, it is now forced to honour it and keep the instruction large, so
the code generated in this case is not as small as it could have been. This
problem can be solved by defining before using
it, or by forcing byte size in the effective address by coding .
NASM gives special treatment to symbols beginning with a period. A label beginning with a single period is treated as a local label, which means that it is associated with the previous non-local label. So, for example:
label1 ; some code
.loop ; some more code
jne .loop
ret
label2 ; some code
.loop ; some more code
jne .loop
ret
In the above code fragment, each instruction
jumps to the line immediately before it, because the two definitions of
are kept separate by virtue of each being
associated with the previous non-local label.
This form of local label handling is borrowed from the old Amiga assembler
DevPac; however, NASM goes one step further, in allowing access to local labels
from other parts of the code. This is achieved by means of defining a
local label in terms of the previous non-local label: the first definition of
above is really defining a symbol called
, and the second defines a symbol called
. So, if you really needed to, you could
write
label3 ; some more code
; and some more
jmp label1.loop
Sometimes it is useful - in a macro, for instance - to be able to define a
label which can be referenced from anywhere but which doesn't interfere with the
normal local-label mechanism. Such a label can't be non-local because it would
interfere with subsequent definitions of, and references to, local labels; and
it can't be local because the macro that defined it wouldn't know the label's
full name. NASM therefore introduces a third type of label, which is probably
only useful in macro definitions: if a label begins with the special prefix
, then it does nothing to the local label
mechanism. So you could code
label1: ; a non-local label
.local: ; this is really label1.local
..@foo: ; this is a special symbol
label2: ; another non-local label
.local: ; this is really label2.local
jmp ..@foo ; this will jump three lines up
NASM has the capacity to define other special symbols beginning with a double
period: for example, is used to specify the
entry point in the output format (see section
6.2.6).