47342e8f08
(1) smooth language somewhat (2) M68K version (3) begin to sketch out scheme for dynamic config by arch[s] desired, via @c comments. (Non-68K stuff *NOT* deleted!)
3823 lines
148 KiB
Text
3823 lines
148 KiB
Text
\input texinfo
|
|
@c @tex
|
|
@c \special{twoside}
|
|
@c @end tex
|
|
@setfilename as
|
|
@synindex ky cp
|
|
@ifinfo
|
|
This file documents the GNU Assembler "as".
|
|
|
|
Copyright (C) 1991 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of
|
|
this manual provided the copyright notice and this permission notice
|
|
are preserved on all copies.
|
|
|
|
@ignore
|
|
Permission is granted to process this file through Tex and print the
|
|
results, provided the printed document carries copying permission
|
|
notice identical to this one except for the removal of this paragraph
|
|
(this paragraph not being relevant to the printed manual).
|
|
|
|
@end ignore
|
|
Permission is granted to copy and distribute modified versions of this
|
|
manual under the conditions for verbatim copying, provided also that the
|
|
section entitled ``GNU General Public License'' is included exactly as
|
|
in the original, and provided that the entire resulting derived work is
|
|
distributed under the terms of a permission notice identical to this
|
|
one.
|
|
|
|
Permission is granted to copy and distribute translations of this manual
|
|
into another language, under the above conditions for modified versions,
|
|
except that the section entitled ``GNU General Public License'' may be
|
|
included in a translation approved by the author instead of in the
|
|
original English.
|
|
@end ifinfo
|
|
|
|
@setchapternewpage odd
|
|
@settitle as (680x0)
|
|
@titlepage
|
|
@title{as}
|
|
@subtitle{The GNU Assembler}
|
|
@c if m680x0
|
|
@subtitle{(Motorola 680x0 version)}
|
|
@c fi m680x0
|
|
@sp 1
|
|
@subtitle January 1991
|
|
@sp 13
|
|
The Free Software Foundation Inc. thanks The Nice Computer
|
|
Company of Australia for loaning Dean Elsner to write the
|
|
first (Vax) version of @code{as} for Project GNU.
|
|
The proprietors, management and staff of TNCCA thank FSF for
|
|
distracting the boss while they got some work
|
|
done.
|
|
@sp 3
|
|
@author{Dean Elsner, Jay Fenlason & friends}
|
|
@author{revised by Roland Pesch for Cygnus Support}
|
|
@c pesch@cygnus.com
|
|
@page
|
|
@tex
|
|
\def\$#1${{#1}} % Kluge: collect RCS revision info without $...$
|
|
\xdef\manvers{\$Revision$} % For use in headers, footers too
|
|
{\parskip=0pt
|
|
\hfill Cygnus Support\par
|
|
\hfill \manvers\par
|
|
\hfill \TeX{}info \texinfoversion\par
|
|
}
|
|
@end tex
|
|
|
|
@vskip 0pt plus 1filll
|
|
Copyright @copyright{} 1991 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of
|
|
this manual provided the copyright notice and this permission notice
|
|
are preserved on all copies.
|
|
|
|
Permission is granted to copy and distribute modified versions of this
|
|
manual under the conditions for verbatim copying, provided also that the
|
|
section entitled ``GNU General Public License'' is included exactly as
|
|
in the original, and provided that the entire resulting derived work is
|
|
distributed under the terms of a permission notice identical to this
|
|
one.
|
|
|
|
Permission is granted to copy and distribute translations of this manual
|
|
into another language, under the above conditions for modified versions,
|
|
except that the section entitled ``GNU General Public License'' may be
|
|
included in a translation approved by the author instead of in the
|
|
original English.
|
|
@end titlepage
|
|
@page
|
|
|
|
@node top, Syntax, top, top
|
|
@chapter Overview
|
|
|
|
@menu
|
|
* Syntax:: The (machine independent) syntax that assembly language
|
|
files must follow. The machine dependent syntax
|
|
can be found in the machine dependent section of
|
|
the manual for the machine that you are using.
|
|
* Segments:: How to use segments and subsegments, and how the
|
|
assembler and linker will relocate things.
|
|
* Symbols:: How to set up and manipulate symbols.
|
|
* Expressions:: And how the assembler deals with them.
|
|
* Pseudo Ops:: The assorted machine directives that tell the
|
|
assembler exactly what to do with its input.
|
|
* Machine Dependent:: Information specific to each machine.
|
|
@ignore @c pesch@cygnus.com---see comments at nodes ignored
|
|
* Maintenance:: Keeping the assembler running.
|
|
* Retargeting:: Teaching the assembler about new machines.
|
|
@end ignore
|
|
* License:: The GNU General Public License gives you permission
|
|
to redistribute GNU "as" on certain terms; and also
|
|
explains that there is no warranty.
|
|
@end menu
|
|
|
|
This manual is a user guide to the GNU assembler @code{as}.
|
|
@c pesch@cygnus.com:
|
|
@c The following should be conditional on machine config
|
|
@c if 680x0
|
|
This version of the manual describes @code{as} configured to generate
|
|
code for Motorola 680x0 architectures.
|
|
@c fi 680x0
|
|
|
|
@section Command-line Synopsis
|
|
|
|
@example
|
|
as [ -f ] [ -k ] [ -L ] [ -o @var{objfile} ] [ -R ] [ -v ] [ -w ]
|
|
@c if 680x0
|
|
[ -l ] [ -mc68000 | -mc68010 | -mc68020 ]
|
|
@c fi 680x0
|
|
[ -- | @var{files} @dots{} ]
|
|
@end example
|
|
|
|
@table @code
|
|
@item -f
|
|
``fast''---skip preprocessing (assume source is compiler output)
|
|
|
|
@item -k
|
|
Issue warnings when difference tables altered for long displacements
|
|
|
|
@item -L
|
|
Keep (in symbol table) local symbols, starting with @samp{L}
|
|
|
|
@item -o @var{objfile}
|
|
Name the object-file output from @code{as}
|
|
|
|
@item -R
|
|
Fold data segment into text segment
|
|
|
|
@item -W
|
|
Supress warning messages
|
|
|
|
@c if 680x0
|
|
@item -l
|
|
Shorten references to undefined symbols, to one word instead of two
|
|
|
|
@item -mc68000 | -mc68010 | -mc68020
|
|
Specify what processor in the 68000 family is the target (default 68020)
|
|
@c fi 680x0
|
|
|
|
@item -- | @var{files} @dots{}
|
|
Source files to assemble, or standard input
|
|
@end table
|
|
|
|
@section Structure of this Manual
|
|
This document is intended to describe what you need to know to use GNU
|
|
@code{as}. We cover the syntax expected in source files, including
|
|
notation for symbols, constants, and expressions; the directives that
|
|
@code{as} understands; and of course how to invoke @code{as}.
|
|
|
|
@c if 680x0
|
|
We also cover special features in the 68000 configuration of @code{as},
|
|
including pseudo-operations.
|
|
@c fi 680x0
|
|
|
|
@ignore
|
|
This document also describes some of the
|
|
machine-dependent features of various flavors of the assembler.
|
|
This document also describes how the assembler works internally, and
|
|
provides some information that may be useful to people attempting to
|
|
port the assembler to another machine.
|
|
@end ignore
|
|
|
|
On the other hand, this manual is @emph{not} intended as an introduction
|
|
to assembly language programming---let alone programming in general! In
|
|
a similar vein, we make no attempt to introduce the machine
|
|
architecture; we do @emph{not} describe the instruction set, standard
|
|
mnemonics, registers or addressing modes that are standard to a
|
|
particular architecture. You may want to consult the manufacturer's
|
|
machine-architecture manual for this information.
|
|
|
|
@c I think this is premature---pesch@cygnus.com, 17jan1991
|
|
@ignore
|
|
Throughout this document, we assume that you are running @dfn{GNU},
|
|
the portable operating system from the @dfn{Free Software
|
|
Foundation, Inc.}. This restricts our attention to certain kinds of
|
|
computer (in particular, the kinds of computers that GNU can run on);
|
|
once this assumption is granted examples and definitions need less
|
|
qualification.
|
|
|
|
@code{as} is part of a team of programs that turn a high-level
|
|
human-readable series of instructions into a low-level
|
|
computer-readable series of instructions. Different versions of
|
|
@code{as} are used for different kinds of computer. In particular,
|
|
at the moment, @code{as} only works for the DEC Vax, the Motorola
|
|
680x0, the Intel 80386, the Sparc, and the National Semiconductor
|
|
32032/32532.
|
|
@end ignore
|
|
|
|
@section Terminology
|
|
@ignore
|
|
@c if all-architectures
|
|
GNU and @code{as} assume the computer that will run the programs it
|
|
assembles will obey these rules.
|
|
|
|
A (memory) @dfn{address} is 32 bits. The lowest address is zero.
|
|
@c fi all-architectures
|
|
@end ignore
|
|
|
|
Certain terms used in computing vary slightly in meaning according to
|
|
context. This is how we use some of them in this manual:
|
|
|
|
The @dfn{contents} of any memory address is one @dfn{byte} of
|
|
exactly 8 bits.
|
|
|
|
A @dfn{word} is 16 bits stored in two bytes of memory. The addresses
|
|
of these bytes differ by exactly 1.
|
|
@ignore
|
|
@c if all-architectures
|
|
Notice that the interpretation of
|
|
the bits in a word and of how to address a word depends on which
|
|
particular computer you are assembling for.
|
|
@c fi all-architectures
|
|
@end ignore
|
|
|
|
A @dfn{long word}, or @dfn{long}, is 32 bits stored in four contiguous
|
|
bytes of memory.
|
|
@ignore
|
|
@c if all-architectures
|
|
Again the interpretation and addressing of those bits is
|
|
machine dependent. For example, National Semiconductor 32x32 computers say
|
|
@emph{double word} where we say @emph{long}.
|
|
@c fi all-architectures
|
|
@end ignore
|
|
|
|
@ignore
|
|
@c if all-architectures
|
|
Numeric quantities are usually @emph{unsigned} or @emph{2's complement}.
|
|
@c fi all-architectures
|
|
@end ignore
|
|
Bytes, words and longs may store numbers. @code{as} manipulates
|
|
integer expressions as 32-bit numbers in 2's complement format.
|
|
When asked to store an integer in a byte or word, the lowest order
|
|
bits are stored.
|
|
@ignore
|
|
@c if all-architectures
|
|
The order of bytes in a word or long in memory is
|
|
determined by what kind of computer will run the assembled program.
|
|
We won't mention this important caveat again.
|
|
@c fi all-architectures
|
|
@end ignore
|
|
|
|
The meaning of these terms has changed over time. Although ``byte''
|
|
used to mean any length of contiguous bits, ``byte'' now pervasively
|
|
means exactly 8 contiguous bits. A ``word'' of 16 bits made sense
|
|
for 16-bit computers. Even on 32-bit computers, ``word'' still
|
|
means 16 bits---to machine language programmers. To many other
|
|
programmers ``word'' means 32 bits; if your habits differ from our
|
|
convention, you may need to pay special attention to this usage.
|
|
@ignore
|
|
@c if 32x32
|
|
Similarly ``long'' means 32 bits: from ``long word''. National
|
|
Semiconductor 32x32 machine language calls a 32-bit number a ``double
|
|
word''.
|
|
@c fi 32x32
|
|
@end ignore
|
|
|
|
The following table shows the terms used with GNU @code{as} for units of
|
|
memory, and contrasts them with normal usage in some other contexts.
|
|
|
|
@iftex
|
|
@sp 1
|
|
@end iftex
|
|
@center @emph{Names for integers of different sizes: some conventions}
|
|
@ifinfo
|
|
@example
|
|
|
|
|
|
length as GNU C 680x0 vax 32x32
|
|
(bits)
|
|
|
|
8 byte char byte byte byte
|
|
16 word short (int) word word word
|
|
32 long long (int) long(-word) long(-word) double-word
|
|
64 quad quad(-word)
|
|
128 octa octa-word
|
|
|
|
@end example
|
|
@end ifinfo
|
|
@tex
|
|
\halign{\tt\hfil #\quad&\rm #\hfil\quad&\rm #\hfil\quad&\rm
|
|
#\hfil\quad&\rm #\hfil\quad&\rm #\hfil\quad\cr
|
|
{\it length}\cr
|
|
{\it (bits)}&{\bf as}&{\bf GNU C}&{\bf 680x0}&{\bf vax}&{\bf 32x32}\cr
|
|
\noalign{\hrule}
|
|
8 &byte &char &byte &byte &byte \cr
|
|
16 &word &short (int)&word &word &word \cr
|
|
32 &long &long (int) &long(-word)&long(-word)&double-word\cr
|
|
64 &quad & & &quad(-word)\cr
|
|
128 &octa & & &octa-word\cr
|
|
}
|
|
@end tex
|
|
|
|
@section as, the GNU Assembler
|
|
@code{as} is primarily intended to assemble the output of the GNU C
|
|
compiler @code{gcc} for use by the linker @code{ld}. Nevertheless,
|
|
@code{as} tries to assemble correctly everything that the native
|
|
assembler would; any exceptions are documented explicitly
|
|
(@pxref{Machine Dependent}). This doesn't necessarily mean @code{as}
|
|
will use the same syntax as another assembler for the same architecture;
|
|
for example, we know of several incompatible versions of 680x0 assembly
|
|
language syntax.
|
|
|
|
GNU @code{as} is really a family of assemblers. If you use (or have
|
|
used) GNU @code{as} on another architecture, you should find a fairly
|
|
similar environment. Each version has much in common with the others,
|
|
including object file formats, most assembler directives (often called
|
|
@dfn{pseudo-ops)} and assembler syntax.
|
|
|
|
Unlike older assemblers, @code{as} tries to assemble a source program in
|
|
one pass of the source file. This has a subtle impact on the @kbd{.org}
|
|
directive (@pxref{Org}).
|
|
|
|
@section Command Line Options
|
|
@example
|
|
as [ options @dots{} ] [ file1 @dots{} ]
|
|
@end example
|
|
|
|
After the program name @code{as}, the command line may contain
|
|
options and file names. Options may be in any order, and may be
|
|
before, after, or between file names. The order of file names is
|
|
significant.
|
|
|
|
@subsection Options
|
|
|
|
@file{--} (two hyphens) by itself names the standard input file
|
|
explicitly, as one of the files for @code{as} tp assemble.
|
|
|
|
Except for @samp{--} any command line argument that begins with a
|
|
hyphen (@samp{-}) is an option. Each option changes the behavior of
|
|
@code{as}. No option changes the way another option works. An
|
|
option is a @samp{-} followed by one or more letters; the case of
|
|
the letter is important. No option (letter) should be used twice on
|
|
the same command line. (Nobody has decided what two copies of the
|
|
same option should mean.) All options are optional.
|
|
|
|
Some options expect exactly one file name to follow them. The file
|
|
name may either immediately follow the option's letter (compatible
|
|
with older assemblers) or it may be the next command argument (GNU
|
|
standard). These two command lines are equivalent:
|
|
|
|
@example
|
|
as -o my-object-file.o mumble
|
|
as -omy-object-file.o mumble
|
|
@end example
|
|
|
|
@section Input Files
|
|
|
|
We use the phrase @dfn{source program}, abbreviated @dfn{source}, to
|
|
describe the program input to one run of @code{as}. The program may
|
|
be in one or more files; how the source is partitioned into files
|
|
doesn't change the meaning of the source.
|
|
|
|
The source program is a catenation of the text in all the files, in the
|
|
order specified.
|
|
|
|
Each time you run @code{as} it assembles exactly one source
|
|
program. The source program is made up of one or more files.
|
|
(The standard input is also a file.)
|
|
|
|
You give @code{as} a command line that has zero or more input file
|
|
names. The input files are read (from left file name to right). A
|
|
command line argument (in any position) that has no special meaning
|
|
is taken to be an input file name.
|
|
|
|
If @code{as} is given no file names it attempts to read one input file
|
|
from @code{as}'s standard input, which is normally your terminal. You
|
|
may have to type @key{ctl-D} to tell @code{as} there is no more program
|
|
to assemble.
|
|
|
|
Use @samp{--} if you need to explicitly name the standard input file
|
|
in your command line.
|
|
|
|
If the source is empty, code{as} will produce a small, empty object
|
|
file.
|
|
|
|
@subsection Input Filenames and Line-numbers
|
|
There are two ways of locating a line in the input file (or files) and both
|
|
are used in reporting error messages. One way refers to a line
|
|
number in a physical file; the other refers to a line number in a
|
|
``logical'' file.
|
|
|
|
@dfn{Physical files} are those files named in the command line given
|
|
to @code{as}.
|
|
|
|
@dfn{Logical files} are simply names declared explicitly by assembler
|
|
directives; they bear no relation to physical files. Logical file names
|
|
help error messages reflect the original source file, when @code{as}
|
|
source is itself synthesized from other files. @xref{File}.
|
|
|
|
@section Output (Object) File
|
|
Every time you run @code{as} it produces an output file, which is
|
|
your assembly language program translated into numbers. This file
|
|
is the object file, named @code{a.out} unless you tell @code{as} to
|
|
give it another name by using the @code{-o} option. Conventionally,
|
|
object file names end with @file{.o}. The default name of
|
|
@file{a.out} is used for historical reasons: older assemblers were
|
|
capable of assembling self-contained programs directly into a
|
|
runnable program.
|
|
@c This may still work, but hasn't been tested.
|
|
|
|
The object file is meant for input to the linker @code{ld}. It contains
|
|
assembled program code, information to help @code{ld} to integrate
|
|
the assembled program into a runnable file and (optionally) symbolic
|
|
information for the debugger.
|
|
|
|
@comment link above to some info file(s) like the description of a.out.
|
|
@comment don't forget to describe GNU info as well as Unix lossage.
|
|
|
|
@section Error and Warning Messages
|
|
|
|
@code{as} may write warnings and error messages to the standard
|
|
error file (usually your terminal). This should not happen when
|
|
@code{as} is run automatically by a compiler. Error messages are
|
|
meant for those few people who still write in assembly language.
|
|
|
|
Warnings report an assumption made so that @code{as} could keep
|
|
assembling a flawed program.
|
|
|
|
Errors report a grave problem that stops the assembly.
|
|
|
|
Warning messages have the format
|
|
@example
|
|
file_name:line_number:Warning Message Text
|
|
@end example
|
|
If a logical file name has been given (@xref{File}.) it is used for
|
|
the filename, otherwise the name of the current input file is used.
|
|
If a logical line number was given (@xref{Line}.) then it is used to
|
|
calculate the number printed, otherwise the actual line in the
|
|
current source file is printed. The message text is intended to be
|
|
self explanatory (In the grand Unix tradition).
|
|
|
|
Error messages have the format
|
|
@example
|
|
file_name:line_number:FATAL:Error Message Text
|
|
@end example
|
|
The file name and line number are derived as for warning
|
|
messages. The actual message text may be rather less explanatory
|
|
because many of them aren't supposed to happen.
|
|
|
|
@section Options
|
|
@subsection Work Faster: -f
|
|
@samp{-f} should only be used when assembling programs written by a
|
|
(trusted) compiler. @samp{-f} stops the assembler from pre-processing
|
|
the input file(s) before assembling them. @emph{Warning:} if the files
|
|
actually need to be pre-processed (if the contain comments, for
|
|
example), @code{as} will not work correctly if @samp{-f} is used.
|
|
|
|
@subsection Warn if difference tables altered: -k
|
|
@code{as} sometimes alters the code emitted for directives of the form
|
|
@samp{.word @var{sym1}-@var{sym2}}; @pxref{Word}.
|
|
You can use the @samp{-k} option if you want a warning issued when this
|
|
is done.
|
|
|
|
@subsection Include Local Labels: -L
|
|
For historical reasons, labels beginning with @samp{L} (upper case only)
|
|
are called @dfn{local labels}. Normally you don't see such labels when
|
|
debugging, because they are intended for the use of programs (like
|
|
compilers) that compose assembler programs, not for your notice.
|
|
Normally both @code{as} and @code{ld} discard such labels, so you don't
|
|
normally debug with them.
|
|
|
|
This option tells @code{as} to retain those @samp{L@dots{}} symbols
|
|
in the object file. Usually if you do this you also tell the linker
|
|
@code{ld} to preserve symbols whose names begin with @samp{L}.
|
|
|
|
@subsection Name the Object File: -o
|
|
There is always one object file output when you run @code{as}. By
|
|
default it has the name @file{a.out}. You use this option (which
|
|
takes exactly one filename) to give the object file a different name.
|
|
|
|
Whatever the object file is called, @code{as} will overwrite any
|
|
existing file of the same name.
|
|
|
|
@subsection Fold Data Segment into Text Segment: -R
|
|
@code{-R} tells @code{as} to write the object file as if all
|
|
data-segment data lives in the text segment. This is only done at
|
|
the very last moment: your binary data are the same, but data
|
|
segment parts are relocated differently. The data segment part of
|
|
your object file is zero bytes long because all it bytes are
|
|
appended to the text segment. (@xref{Segments}.)
|
|
|
|
When you specify code{-R} it would be possible to generate shorter
|
|
address displacements (because we don't have to cross between text and
|
|
data segment). We don't do this simply for compatibility with older
|
|
versions of @code{as}. @code{-R} may work this way in future.
|
|
|
|
@subsection Supress Warnings: -W
|
|
@code{as} should never give a warning or error message when
|
|
assembling compiler output. But programs written by people often
|
|
cause @code{as} to give a warning that a particular assumption was
|
|
made. All such warnings are directed to the standard error file.
|
|
If you use this option, no warnings are issued. This option only
|
|
affects the warning messages: it does not change any particular of how
|
|
@code{as} assembles your file. Errors, which stop the assembly, are
|
|
still reported.
|
|
|
|
@node Syntax, Segments, top, top
|
|
@chapter Syntax
|
|
This chapter describes the machine-independent syntax allowed in a
|
|
source file. @code{as} syntax is similar to what many other assemblers
|
|
use; it is inspired in BSD 4.2 assembler, except that @code{as} does not
|
|
assemble Vax bit-fields.
|
|
|
|
@section The Pre-processor
|
|
The pre-processor adjusts and removes extra whitespace. It leaves
|
|
one space or tab before the keywords on a line, and turns any other
|
|
whitespace on the line into a single space.
|
|
|
|
The pre-processor removes all comments, replacing them with a single
|
|
space (for /* @dots{} */ comments), or an appropriate number of
|
|
newlines.
|
|
|
|
The pre-processor converts character constants into the appropriate
|
|
numeric values.
|
|
|
|
This means that excess whitespace, comments, and character constants
|
|
cannot be used in the portions of the input text that are not
|
|
pre-processed.
|
|
|
|
If the first line of an input file is @code{#NO_APP} or the
|
|
@samp{-f} option is given, the input file will not be
|
|
pre-processed. Within such an input file, parts of the file can be
|
|
pre-processed by putting a line that says @code{#APP} before the
|
|
text that should be pre-processed, and putting a line that says
|
|
@code{#NO_APP} after them. This feature is mainly intend to support
|
|
asm statements in compilers whose output normally does not need to
|
|
be pre-processed.
|
|
|
|
@section Whitespace
|
|
@dfn{Whitespace} is one or more blanks or tabs, in any order.
|
|
Whitespace is used to separate symbols, and to make programs neater
|
|
for people to read. Unless within character constants
|
|
(@xref{Characters}.), any whitespace means the same as exactly one
|
|
space.
|
|
|
|
@section Comments
|
|
There are two ways of rendering comments to @code{as}. In both
|
|
cases the comment is equivalent to one space.
|
|
|
|
Anything from @samp{/*} through the next @samp{*/} is a comment.
|
|
This means you may not nest these comments.
|
|
|
|
@example
|
|
/*
|
|
The only way to include a newline ('\n') in a comment
|
|
is to use this sort of comment.
|
|
*/
|
|
|
|
/* This sort of comment does not nest. */
|
|
@end example
|
|
|
|
Anything from the @dfn{line comment} character to the next newline
|
|
is considered a comment and is ignored. The line comment character is
|
|
@c if vax
|
|
@c @samp{#} on the Vax
|
|
@c @fi vax
|
|
@c if 680x0
|
|
@samp{|} on the 680x0. @xref{Machine Dependent}.
|
|
@c fi 680x0
|
|
@ignore
|
|
@if all-arch
|
|
On some machines there are two different
|
|
line comment characters. One will only begin a comment if it is the
|
|
first non-whitespace character on a line, while the other will
|
|
always begin a comment.
|
|
@fi all-arch
|
|
@end ignore
|
|
|
|
To be compatible with past assemblers a special interpretation is
|
|
given to lines that begin with @samp{#}. Following the @samp{#} an
|
|
absolute expression (@pxref{Expressions}) is expected: this will be
|
|
the logical line number of the @b{next} line. Then a string
|
|
(@xref{Strings}.) is allowed: if present it is a new logical file
|
|
name. The rest of the line, if any, should be whitespace.
|
|
|
|
If the first non-whitespace characters on the line are not numeric,
|
|
the line is ignored. (Just like a comment.)
|
|
@example
|
|
# This is an ordinary comment.
|
|
# 42-6 "new_file_name" # New logical file name
|
|
# This is logical line # 36.
|
|
@end example
|
|
This feature is deprecated, and may disappear from future versions
|
|
of @code{as}.
|
|
|
|
@section Symbols
|
|
A @dfn{symbol} is one or more characters chosen from the set of all
|
|
letters (both upper and lower case), digits and the three characters
|
|
@samp{_.$}. No symbol may begin with a digit. Case is
|
|
significant. There is no length limit: all characters are
|
|
significant. Symbols are delimited by characters not in that set,
|
|
or by begin/end-of-file. (@xref{Symbols}.)
|
|
|
|
@section Statements
|
|
A @dfn{statement} ends at a newline character (@samp{\n}) or at a
|
|
semicolon (@samp{;}). The newline or semicolon is considered part
|
|
of the preceding statement. Newlines and semicolons within
|
|
character constants are an exception: they don't end statements.
|
|
It is an error to end any statement with end-of-file: the last
|
|
character of any input file should be a newline.
|
|
|
|
You may write a statement on more than one line if you put a
|
|
backslash (@kbd{\}) immediately in front of any newlines within the
|
|
statement. When @code{as} reads a backslashed newline both
|
|
characters are ignored. You can even put backslashed newlines in
|
|
the middle of symbol names without changing the meaning of your
|
|
source program.
|
|
|
|
An empty statement is allowed, and may include whitespace. It is ignored.
|
|
|
|
A statement begins with zero or more labels, optionally followed by a
|
|
@dfn{key symbol} which determines what kind of statement it is. The key
|
|
symbol determines the syntax of the rest of the statement. If the
|
|
symbol begins with a dot (@t{.}) then the statement is an assembler
|
|
directive: typically valid for any computer. If the symbol begins with
|
|
a letter the statement is an assembly language @dfn{instruction}: it
|
|
will assemble into a machine language instruction. Different versions
|
|
of @code{as} for different computers will recognize different
|
|
instructions. In fact, the same symbol may represent a different
|
|
instruction in a different computer's assembly language.
|
|
|
|
A label is a symbol immediately followed by a colon (@code{:}).
|
|
Whitespace before a label or after a colon is permitted, but you may not
|
|
have whitespace between a label's symbol and its colon. @xref{Labels}.
|
|
|
|
@example
|
|
label: .directive followed by something
|
|
another$label: # This is an empty statement.
|
|
instruction operand_1, operand_2, @dots{}
|
|
@end example
|
|
|
|
@section Constants
|
|
A constant is a number, written so that its value is known by
|
|
inspection, without knowing any context. Like this:
|
|
@example
|
|
.byte 74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
|
|
.ascii "Ring the bell\7" # A string constant.
|
|
.octa 0x123456789abcdef0123456789ABCDEF0 # A bignum.
|
|
.float 0f-314159265358979323846264338327\
|
|
95028841971.693993751E-40 # - pi, a flonum.
|
|
@end example
|
|
|
|
@node Characters, Strings, , Syntax
|
|
@subsection Character Constants
|
|
There are two kinds of character constants. A @dfn{character} stands
|
|
for one character in one byte and its value may be used in
|
|
numeric expressions. String constants (properly called string
|
|
@emph{literals}) are potentially many bytes and their values may not be
|
|
used in arithmetic expressions.
|
|
|
|
@node Strings, , Characters, Syntax
|
|
@subsubsection Strings
|
|
A @dfn{string} is written between double-quotes. It may contain
|
|
double-quotes or null characters. The way to get special characters
|
|
into a string is to @dfn{escape} these characters: precede them with
|
|
a backslash (@code{\}) character. For example @samp{\\} represents
|
|
one backslash: the first @code{\} is an escape which tells
|
|
@code{as} to interpret the second character literally as a backslash
|
|
(which prevents @code{as} from recognizing the second @code{\} as an
|
|
escape character). The complete list of escapes follows.
|
|
|
|
@table @kbd
|
|
@item \EOF
|
|
A @kbd{\} followed by end-of-file: erroneous. It is treated just
|
|
like an end-of-file without a preceding backslash.
|
|
@c @item \a
|
|
@c Mnemonic for ACKnowledge; for ASCII this is octal code 007.
|
|
@item \b
|
|
Mnemonic for backspace; for ASCII this is octal code 010.
|
|
@c @item \e
|
|
@c Mnemonic for EOText; for ASCII this is octal code 004.
|
|
@item \f
|
|
Mnemonic for FormFeed; for ASCII this is octal code 014.
|
|
@item \n
|
|
Mnemonic for newline; for ASCII this is octal code 012.
|
|
@c @item \p
|
|
@c Mnemonic for prefix; for ASCII this is octal code 033, usually known as @code{escape}.
|
|
@item \r
|
|
Mnemonic for carriage-Return; for ASCII this is octal code 015.
|
|
@c @item \s
|
|
@c Mnemonic for space; for ASCII this is octal code 040. Included for compliance with
|
|
@c other assemblers.
|
|
@item \t
|
|
Mnemonic for horizontal Tab; for ASCII this is octal code 011.
|
|
@c @item \v
|
|
@c Mnemonic for Vertical tab; for ASCII this is octal code 013.
|
|
@c @item \x @var{digit} @var{digit} @var{digit}
|
|
@c A hexadecimal character code. The numeric code is 3 hexadecimal digits.
|
|
@item \ @var{digit} @var{digit} @var{digit}
|
|
An octal character code. The numeric code is 3 octal digits.
|
|
For compatibility with other Unix systems, 8 and 9 are accepted as digits:
|
|
for example, @code{\008} has the value 010, and @code{\009} the value 011.
|
|
@item \\
|
|
Represents one @samp{\} character.
|
|
@c @item \'
|
|
@c Represents one @samp{'} (accent acute) character.
|
|
@c This is needed in single character literals
|
|
@c (@xref{Characters}.) to represent
|
|
@c a @samp{'}.
|
|
@item \"
|
|
Represents one @samp{"} character. Needed in strings to represent
|
|
this character, because an unescaped @samp{"} would end the string.
|
|
@item \ @var{anything-else}
|
|
Any other character when escaped by @kbd{\} will give a warning, but
|
|
assemble as if the @samp{\} was not present. The idea is that if
|
|
you used an escape sequence you clearly didn't want the literal
|
|
interpretation of the following character. However @code{as} has no
|
|
other interpretation, so @code{as} knows it is giving you the wrong
|
|
code and warns you of the fact.
|
|
@end table
|
|
|
|
Which characters are escapable, and what those escapes represent,
|
|
varies widely among assemblers. The current set is what we think
|
|
BSD 4.2 @code{as} recognizes, and is a subset of what most C
|
|
compilers recognize. If you are in doubt, don't use an escape
|
|
sequence.
|
|
|
|
@subsubsection Characters
|
|
A single character may be written as a single quote immediately
|
|
followed by that character. The same escapes apply to characters as
|
|
to strings. So if you want to write the character backslash, you
|
|
must write @kbd{'\\} where the first @code{\} escapes the second
|
|
@code{\}. As you can see, the quote is an acute accent, not an
|
|
grave accent. A newline (or semicolon @samp{;}) immediately
|
|
following an accent acute is taken as a literal character and does
|
|
not count as the end of a statement. The value of a character
|
|
constant in a numeric expression is the machine's byte-wide code for
|
|
that character. @code{as} assumes your character code is ASCII: @kbd{'A}
|
|
means 65, @kbd{'B} means 66, and so on.
|
|
|
|
@subsection Number Constants
|
|
@code{as} distinguishes 3 flavors of numbers according to how they
|
|
are stored in the target machine. @emph{Integers} are numbers that
|
|
would fit into an @code{int} in the C language. @emph{Bignums} are
|
|
integers, but they are stored in a more than 32 bits. @emph{Flonums}
|
|
are floating point numbers, described below.
|
|
|
|
@subsubsection Integers
|
|
An octal integer is @samp{0} followed by zero or more of the octal
|
|
digits (@samp{01234567}).
|
|
|
|
A decimal integer starts with a non-zero digit followed by zero or
|
|
more digits (@samp{0123456789}).
|
|
|
|
A hexadecimal integer is @samp{0x} or @samp{0X} followed by one or
|
|
more hexadecimal digits chosen from @samp{0123456789abcdefABCDEF}.
|
|
|
|
Integers have the usual values. To denote a negative integer, use
|
|
the unary operator @samp{-} discussed under expressions
|
|
(@xref{Unops}.).
|
|
|
|
@subsubsection Bignums
|
|
A @dfn{bignum} has the same syntax and semantics as an integer
|
|
except that the number (or its negative) takes more than 32 bits to
|
|
represent in binary. The distinction is made because in some places
|
|
integers are permitted while bignums are not.
|
|
|
|
@subsubsection Flonums
|
|
A @dfn{flonum} represents a floating point number. The translation
|
|
is complex: a decimal floating point number from the text is
|
|
converted by @code{as} to a generic binary floating point number of
|
|
more than sufficient precision. This generic floating point number
|
|
is converted to the particular computer's floating point format(s)
|
|
by a portion of @code{as} specialized to that computer.
|
|
|
|
A flonum is written by writing (in order)
|
|
@itemize @bullet
|
|
@item
|
|
The digit @samp{0}.
|
|
@item
|
|
A letter, to tell @code{as} the rest of the number is a flonum.
|
|
@kbd{e}
|
|
is recommended. Case is not important.
|
|
(Any otherwise illegal letter will work here,
|
|
but that might be changed. Vax BSD 4.2 assembler
|
|
seems to allow any of @samp{defghDEFGH}.)
|
|
@item
|
|
An optional sign: either @samp{+} or @samp{-}.
|
|
@item
|
|
An optional @dfn{integer part}: zero or more decimal digits.
|
|
@item
|
|
An optional @dfn{fraction part}: @samp{.} followed by zero
|
|
or more decimal digits.
|
|
@item
|
|
An optional exponent, consisting of:
|
|
@itemize @bullet
|
|
@item
|
|
A letter; the exact significance varies according to
|
|
the computer that executes the program. @code{as}
|
|
accepts any letter for now. Case is not important.
|
|
@item
|
|
Optional sign: either @samp{+} or @samp{-}.
|
|
@item
|
|
One or more decimal digits.
|
|
@end itemize
|
|
@end itemize
|
|
|
|
At least one of @var{integer part} or @var{fraction part} must be
|
|
present. The floating point number has the usual base-10 value.
|
|
|
|
@code{as} does all processing using integers. Flonums are computed
|
|
independently of any floating point hardware in the computer running
|
|
@code{as}.
|
|
|
|
@node Segments, Symbols, Syntax, top
|
|
@chapter Segments and Relocation
|
|
Roughly, a segment is a range of addresses, with no gaps; all data
|
|
``in'' those addresses is treated the same for some particular purpose.
|
|
For example there may be a ``read only'' segment.
|
|
|
|
The linker @code{ld} reads many object files (partial programs) and
|
|
combines their contents to form a runnable program. When @code{as}
|
|
emits an object file, the partial program is assumed to start at address
|
|
0. @code{ld} will assign the final addresses the partial program
|
|
occupies, so that different partial programs don't overlap. This is
|
|
actually an over-simplification, but it will suffice to explain how
|
|
@code{as} uses segments.
|
|
|
|
@code{ld} moves blocks of bytes of your program to their run-time
|
|
addresses. These blocks slide to their run-time addresses as rigid
|
|
units; their length does not change and neither does the order of bytes
|
|
within them. Such a rigid unit is called a @emph{segment}. Assigning
|
|
run-time addresses to segments is called @dfn{relocation}. It includes
|
|
the task of adjusting mentions of object-file addresses so they refer to
|
|
the proper run-time addresses.
|
|
|
|
An object file written by @code{as} has three segments, any of which
|
|
may be empty. These are named @emph{text}, @emph{data} and @emph{bss}
|
|
segments. Within the object file, the text segment starts at
|
|
address 0, the data segment follows, and the bss segment follows the
|
|
data segment.
|
|
|
|
To let @code{ld} know which data will change when the segments are
|
|
relocated, and how to change that data, @code{as} also writes to the
|
|
object file details of the relocation needed. To perform relocation
|
|
@code{ld} must know, each time an address in the object
|
|
file is mentioned:
|
|
@itemize @bullet
|
|
@item
|
|
Where in the object file is the beginning of this reference to
|
|
an address?
|
|
@item
|
|
How long (in bytes) is this reference?
|
|
@item
|
|
Which segment does the address refer to?
|
|
What is the numeric value of (@var{address} @t{-}
|
|
@var{start-address of segment})?
|
|
@item
|
|
Is the reference to an address ``Program-counter relative''?
|
|
@end itemize
|
|
|
|
In fact, every address @code{as} ever uses is expressed as
|
|
(@var{segment} @t{+} @var{offset into segment}). Further, every
|
|
expression @code{as} computes is of this segmented nature.
|
|
@dfn{Absolute expression} means an expression with segment ``absolute''
|
|
(@pxref{ld Segments}). A @dfn{pass1 expression} means an expression with
|
|
segment ``pass1'' (@pxref{as Segments}). In this manual we use the
|
|
notation @{@var{segname} @var{N}@} to mean ``offset @var{N} into segment
|
|
@var{segname}''.
|
|
|
|
Apart from text, data and bss segments you need to know about the
|
|
@dfn{absolute} segment. When @code{ld} mixes partial programs,
|
|
addresses in the absolute segment remain unchanged. That is, address
|
|
@{absolute 0@} is ``relocated'' to run-time address 0 by @code{ld}.
|
|
Although two partial programs' data segments will not overlap addresses
|
|
after linking, @b{by definition} their absolute segments will overlap.
|
|
Address @{absolute 239@} in one partial program will always be the same
|
|
address when the program is running as address @{absolute 239@} in any
|
|
other partial program.
|
|
|
|
The idea of segments is extended to the @dfn{undefined} segment. Any
|
|
address whose segment is unknown at assembly time is by definition
|
|
rendered @{undefined @var{U}@}---where @var{U} will be filled in later.
|
|
Since numbers are always defined, the only way to generate an undefined
|
|
address is to mention an undefined symbol. A reference to a named
|
|
common block would be such a symbol: its value is unknown at assembly
|
|
time so it has segment @emph{undefined}.
|
|
|
|
By analogy the word @emph{segment} is to describe groups of segments in
|
|
the linked program. @code{ld} puts all partial programs' text
|
|
segments in contiguous addresses in the linked program. It is
|
|
customary to refer to the @emph{text segment} of a program, meaning all
|
|
the addresses of all partial program's text segments. Likewise for
|
|
data and bss segments.
|
|
|
|
@section Segments
|
|
Some segments are manipulated by @code{ld}; others are invented for
|
|
use of @code{as} and have no meaning except during assembly.
|
|
|
|
@node ld Segments, , ,
|
|
@subsection ld Segments
|
|
@code{ld} deals with just 5 kinds of segments, summarized below.
|
|
|
|
@table @b
|
|
|
|
@item text segment
|
|
@itemx data segment
|
|
These segments hold your program. @code{as} and @code{ld} treat them as
|
|
separate but equal segments. Anything you can say of one segment is
|
|
true of the other. When the program is running however it is customary
|
|
for the text segment to be unalterable, and often shared among
|
|
processes: it will contain instructions, constants and the like. The
|
|
data segment of a running program is usually alterable: for example, C
|
|
variables would be stored in the data segment.
|
|
|
|
@item bss segment
|
|
This segment contains zeroed bytes when your program begins running. It
|
|
is used to hold unitialized variables or common storage. The length of
|
|
each partial program's bss segment is important, but because it starts
|
|
out containing zeroed bytes there is no need to store explicit zero
|
|
bytes in the object file. The Bss segment was invented to eliminate
|
|
those explicit zeros from object files.
|
|
|
|
@item absolute segment
|
|
Address 0 of this segment is always ``relocated'' to runtime address 0.
|
|
This is useful if you want to refer to an address that @code{ld} must
|
|
not change when relocating. In this sense we speak of absolute
|
|
addresses being ``unrelocatable'': they don't change during relocation.
|
|
|
|
@item undefined segment
|
|
This ``segment'' is a catch-all for address references to objects not in
|
|
the preceding segments.
|
|
@c FIXME: ref to some other doc on obj-file formats could go here.
|
|
|
|
@end table
|
|
|
|
An idealized example of the 3 relocatable segments follows. Memory
|
|
addresses are on the horizontal axis.
|
|
|
|
@example
|
|
+-----+----+--+
|
|
partial program # 1: |ttttt|dddd|00|
|
|
+-----+----+--+
|
|
|
|
text data bss
|
|
seg. seg. seg.
|
|
|
|
+---+---+---+
|
|
partial program # 2: |TTT|DDD|000|
|
|
+---+---+---+
|
|
|
|
+--+---+-----+--+----+---+-----+~~
|
|
linked program: | |TTT|ttttt| |dddd|DDD|00000|
|
|
+--+---+-----+--+----+---+-----+~~
|
|
|
|
addresses: 0 @dots{}
|
|
@end example
|
|
|
|
@node as Segments, , ,
|
|
@subsection as Internal Segments
|
|
These segments are invented for the internal use of @code{as}. They
|
|
have no meaning at run-time. You don't need to know about these
|
|
segments except that they might be mentioned in @code{as}' warning
|
|
messages. These segments are invented to permit the value of every
|
|
expression in your assembly language program to be a segmented
|
|
address.
|
|
|
|
@table @b
|
|
@item absent segment
|
|
An expression was expected and none was
|
|
found.
|
|
|
|
@item goof segment
|
|
An internal assembler logic error has been
|
|
found. This means there is a bug in the assembler.
|
|
|
|
@item grand segment
|
|
A @dfn{grand number} is a bignum or a flonum, but not an integer. If a
|
|
number can't be written as a C @code{int} constant, it is a grand
|
|
number. @code{as} has to remember that a flonum or a bignum does not
|
|
fit into 32 bits, and cannot be an argument (@pxref{Argument}) in an
|
|
expression: this is done by making a flonum or bignum be in segment
|
|
``grand''. This is purely for internal @code{as} convenience; grand
|
|
segment behaves similarly to absolute segment.
|
|
|
|
@item pass1 segment
|
|
The expression was impossible to evaluate in the first pass. The
|
|
assembler will attempt a second pass (second reading of the source) to
|
|
evaluate the expression. Your expression mentioned an undefined symbol
|
|
in a way that defies the one-pass (segment + offset in segment) assembly
|
|
process. No compiler need emit such an expression.
|
|
|
|
The second pass is currently not implemented. @code{as} will abort with
|
|
an error message if one is required.
|
|
|
|
@item difference segment
|
|
As an assist to the C compiler, expressions of the forms
|
|
@example
|
|
@var{(undefined symbol)} - @var{(expression)}
|
|
@var{(something)} - @var{(undefined symbol)}
|
|
@var{(undefined symbol)} - @var{(undefined symbol)}
|
|
@end example
|
|
are permitted, and belong to the ``difference'' segment. @code{as}
|
|
re-evaluates such expressions after the source file has been read and
|
|
the symbol table built. If by that time there are no undefined symbols
|
|
in the expression then the expression assumes a new segment. The
|
|
intention is to permit statements like
|
|
@samp{.word label - base_of_table}
|
|
to be assembled in one pass where both @code{label} and
|
|
@code{base_of_table} are undefined. This is useful for compiling C and
|
|
Algol switch statements, Pascal case statements, FORTRAN computed goto
|
|
statements and the like.
|
|
@end table
|
|
|
|
@section Sub-Segments
|
|
Assembled bytes fall into two segments: text and data. Because you
|
|
may have groups of text or data that you want to end up near to each
|
|
other in the object file, @code{as}, allows you to use
|
|
@dfn{subsegments}. Within each segment, there can be numbered
|
|
subsegments with values from 0 to 8192. Objects assembled into the
|
|
same subsegment will be grouped with other objects in the same
|
|
subsegment when they are all put into the object file. For example,
|
|
a compiler might want to store constants in the text segment, but
|
|
might not want to have them interspersed with the program being
|
|
assembled. In this case, the compiler could issue a @code{text 0}
|
|
before each section of code being output, and a @code{text 1} before
|
|
each group of constants being output.
|
|
|
|
Subsegments are optional. If you don't used subsegments, everything
|
|
will be stored in subsegment number zero.
|
|
|
|
Each subsegment is zero-padded up to a multiple of four bytes.
|
|
(Subsegments may be padded a different amount on different flavors
|
|
of @code{as}.) Subsegments appear in your object file in numeric
|
|
order, lowest numbered to highest. (All this to be compatible with
|
|
other people's assemblers.) The object file, @code{ld} @emph{etc.}
|
|
have no concept of subsegments. They just see all your text
|
|
subsegments as a text segment, and all your data subsegments as a
|
|
data segment.
|
|
|
|
To specify which subsegment you want subsequent statements assembled
|
|
into, use a @samp{.text @var{expression}} or a @samp{.data
|
|
@var{expression}} statement. @var{Expression} should be an absolute
|
|
expression. (@xref{Expressions}.) If you just say @samp{.text}
|
|
then @samp{.text 0} is assumed. Likewise @samp{.data} means
|
|
@samp{.data 0}. Assembly begins in @code{text 0}.
|
|
For instance:
|
|
@example
|
|
.text 0 # The default subsegment is text 0 anyway.
|
|
.ascii "This lives in the first text subsegment. *"
|
|
.text 1
|
|
.ascii "But this lives in the second text subsegment."
|
|
.data 0
|
|
.ascii "This lives in the data segment,"
|
|
.ascii "in the first data subsegment."
|
|
.text 0
|
|
.ascii "This lives in the first text segment,"
|
|
.ascii "immediately following the asterisk (*)."
|
|
@end example
|
|
|
|
Each segment has a @dfn{location counter} incremented by one for
|
|
every byte assembled into that segment. Because subsegments are
|
|
merely a convenience restricted to @code{as} there is no concept of
|
|
a subsegment location counter. There is no way to directly
|
|
manipulate a location counter. The location counter of the segment
|
|
that statements are being assembled into is said to be the
|
|
@dfn{active} location counter.
|
|
|
|
@section Bss Segment
|
|
The @code{bss} segment is used for local common variable storage.
|
|
You may allocate address space in the @code{bss} segment, but you may
|
|
not dictate data to load into it before your program executes. When
|
|
your program starts running, all the contents of the @code{bss}
|
|
segment are zeroed bytes.
|
|
|
|
Addresses in the bss segment are allocated with special directives;
|
|
you may not assemble anything directly into the bss segment. Hence
|
|
there are no bss subsegments. @xref{Comm}; @pxref{Lcomm}.
|
|
|
|
@node Symbols, Expressions, Segments, top
|
|
@chapter Symbols
|
|
Symbols are a central concept: the programmer uses symbols to name
|
|
things, the linker uses symbols to link, and the debugger uses symbols
|
|
to debug.
|
|
|
|
@code{as} does not place symbols in the object file in the same order
|
|
they were declared. This may break some debuggers.
|
|
|
|
@node Labels, , , Symbols
|
|
@section Labels
|
|
A @dfn{label} is written as a symbol immediately followed by a colon
|
|
(@samp{:}). The symbol then represents the current value of the
|
|
active location counter, and is, for example, a suitable instruction
|
|
operand. You are warned if you use the same symbol to represent two
|
|
different locations: the first definition overrides any other
|
|
definitions.
|
|
|
|
@section Giving Symbols Other Values
|
|
A symbol can be given an arbitrary value by writing a symbol followed
|
|
by an equals sign (@samp{=}) followed by an expression
|
|
(@pxref{Expressions}). This is equivalent to using the @code{.set}
|
|
directive. (@xref{Set}.)
|
|
|
|
@section Symbol Names
|
|
Symbol names begin with a letter or with one of @samp{$._}. That
|
|
character may be followed by any string of digits, letters,
|
|
underscores and dollar signs. Case of letters is significant:
|
|
@code{foo} is a different symbol name than @code{Foo}.
|
|
|
|
Each symbol has exactly one name. Each name in an assembly language
|
|
program refers to exactly one symbol. You may use that symbol name any
|
|
number of times in a program.
|
|
|
|
@subsection Local Symbol Names
|
|
|
|
Local symbols help compilers and programmers use names temporarily.
|
|
There are ten @dfn{local} symbol names, which are re-used throughout
|
|
the program. Their names are @samp{0} @samp{1} @dots{} @samp{9}.
|
|
To define a local symbol, write a label of the form
|
|
@var{digit}@t{:}. To refer to the most recent previous definition
|
|
of that symbol write @var{digit}@t{b}, using the same digit as when
|
|
you defined the label. To refer to the next definition of a local
|
|
label, write @var{digit}@t{f} where @var{digit} gives you a choice
|
|
of 10 forward references. The @samp{b} stands for ``backwards'' and
|
|
the @samp{f} stands for ``forwards''.
|
|
|
|
Local symbols are not used by the current GNU C compiler.
|
|
|
|
There is no restriction on how you can use these labels, but
|
|
remember that at any point in the assembly you can refer to at most
|
|
10 prior local labels and to at most 10 forward local labels.
|
|
|
|
Local symbol names are only a notation device. They are immediately
|
|
transformed into more conventional symbol names before the assembler
|
|
uses them. The symbol names stored in the symbol table, appearing in
|
|
error messages and optionally emitted to the object file have these
|
|
parts:
|
|
|
|
@table @code
|
|
@item L
|
|
All local labels begin with @samp{L}. Normally both @code{as} and
|
|
@code{ld} forget symbols that start with @samp{L}. These labels are
|
|
used for symbols you are never intended to see. If you give the
|
|
@samp{-L} option then @code{as} will retain these symbols in the
|
|
object file. By instructing @code{ld} to also retain these symbols,
|
|
you may use them in debugging.
|
|
|
|
@item @var{digit}
|
|
If the label is written @samp{0:} then the digit is @samp{0}.
|
|
If the label is written @samp{1:} then the digit is @samp{1}.
|
|
And so on up through @samp{9:}.
|
|
|
|
@item @ctrl{A}
|
|
This unusual character is included so you don't accidentally invent
|
|
a symbol of the same name. The character has ASCII value
|
|
@samp{\001}.
|
|
|
|
@item @emph{ordinal number}
|
|
This is a serial number to keep the labels distinct. The first
|
|
@samp{0:} gets the number @samp{1}; The 15th @samp{0:} gets the
|
|
number @samp{15}; @emph{etc.}. Likewise for the other labels @samp{1:}
|
|
through @samp{9:}.
|
|
@end table
|
|
|
|
For instance, the first @code{1:} is named @code{L1@ctrl{A}1}, the 44th
|
|
@code{3:} is named @code{L3@ctrl{A}44}.
|
|
|
|
@section The Special Dot Symbol
|
|
|
|
The special symbol @code{.} refers to the current address that
|
|
@code{as} is assembling into. Thus, the expression @samp{melvin:
|
|
.long .} will cause @var{melvin} to contain its own address.
|
|
Assigning a value to @code{.} is treated the same as a @code{.org}
|
|
directive. Thus, the expression @samp{.=.+4} is the same as saying
|
|
@samp{.space 4}.
|
|
|
|
@section Symbol Attributes
|
|
Every symbol has these attributes: Value, Type, Descriptor, and ``Other''.
|
|
@c if internals
|
|
@c The detailed definitions are in <a.out.h>.
|
|
@c fi internals
|
|
|
|
If you use a symbol without defining it, @code{as} assumes zero for
|
|
all these attributes, and probably won't warn you. This makes the
|
|
symbol an externally defined symbol, which is generally what you
|
|
would want.
|
|
|
|
@subsection Value
|
|
The value of a symbol is (usually) 32 bits, the size of one GNU C
|
|
@code{int}. For a symbol which labels a location in the
|
|
@code{text}, @code{data}, @code{bss} or @code{Absolute} segments the
|
|
value is the number of addresses from the start of that segment to
|
|
the label. Naturally for @code{text} @code{data} and @code{bss}
|
|
segments the value of a symbol changes as @code{ld} changes segment
|
|
base addresses during linking. @code{absolute} symbols' values do
|
|
not change during linking: that is why they are called absolute.
|
|
|
|
The value of an undefined symbol is treated in a special way. If it
|
|
is 0 then the symbol is not defined in this assembler source
|
|
program, and @code{ld} will try to determine its value from other
|
|
programs it is linked with. You make this kind of symbol simply by
|
|
mentioning a symbol name without defining it. A non-zero value
|
|
represents a @code{.comm} common declaration. The value is how much
|
|
common storage to reserve, in bytes (@emph{i.e.} addresses). The
|
|
symbol refers to the first address of the allocated storage.
|
|
|
|
@subsection Type
|
|
The type attribute of a symbol is 8 bits encoded in a devious way.
|
|
We kept this coding standard for compatibility with older operating
|
|
systems.
|
|
|
|
@example
|
|
|
|
7 6 5 4 3 2 1 0 bit numbers
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|
| | | |
|
|
| N_STAB bits | N_TYPE bits |N_EXT|
|
|
| | | bit |
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|
|
|
n_type byte
|
|
@end example
|
|
|
|
@subsubsection N_EXT bit
|
|
This bit is set if @code{ld} might need to use the symbol's type bits
|
|
and value. If this bit is off, then @code{ld} can ignore the
|
|
symbol while linking. It is set in two cases. If the symbol is
|
|
undefined, then @code{ld} is expected to find the symbol's value
|
|
elsewhere in another program module. Otherwise the symbol has the
|
|
value given, but this symbol name and value are revealed to any other
|
|
programs linked in the same executable program. This second use of
|
|
the @code{N_EXT} bit is most often done by a @code{.globl} statement.
|
|
|
|
@subsubsection N_TYPE bits
|
|
These establish the symbol's ``type'', which is mainly a relocation
|
|
concept. Common values are detailed in the manual describing the
|
|
executable file format.
|
|
|
|
@subsubsection N_STAB bits
|
|
Common values for these bits are described in the manual on the
|
|
executable file format.
|
|
|
|
@subsection Descriptor
|
|
This is an arbitrary 16-bit value. You may establish a symbol's
|
|
descriptor value by using a @code{.desc} statement (@pxref{Desc}).
|
|
A descriptor value means nothing to @code{as}.
|
|
|
|
@subsection Other
|
|
This is an arbitrary 8-bit value. It means nothing to @code{as}.
|
|
|
|
@node Expressions, Pseudo Ops, Symbols, top
|
|
@chapter Expressions
|
|
An @dfn{expression} specifies an address or numeric value.
|
|
Whitespace may precede and/or follow an expression.
|
|
|
|
@section Empty Expressions
|
|
An empty expression has no value: it is just whitespace or null.
|
|
Wherever an absolute expression is required, you may omit the
|
|
expression and @code{as} will assume a value of (absolute) 0. This
|
|
is compatible with other assemblers.
|
|
|
|
@section Integer Expressions
|
|
An @dfn{integer expression} is one or more @emph{arguments} delimited
|
|
by @emph{operators}.
|
|
|
|
@node Argument, Unops, , Expressions
|
|
@subsection Arguments
|
|
|
|
@dfn{Arguments} are symbols, numbers or subexpressions. In other
|
|
contexts arguments are sometimes called ``arithmetic operands''. In
|
|
this manual, to avoid confusing them with the ``instruction operands'' of
|
|
the machine language, we use the term ``argument'' to refer to parts of
|
|
expressions only, and the word ``operand'' to refer only to machine
|
|
instruction operands.
|
|
|
|
Symbols are evaluated to yield @{@var{segment} @var{value}@} where
|
|
@var{segment} is one of @b{text}, @b{data}, @b{bss}, @b{absolute},
|
|
or @b{undefined}. @var{value} is a signed, 2's complement 32 bit
|
|
integer.
|
|
|
|
Numbers are usually integers.
|
|
|
|
A number can be a flonum or bignum. In this case, you are warned
|
|
that only the low order 32 bits are used, and @code{as} pretends
|
|
these 32 bits are an integer. You may write integer-manipulating
|
|
instructions that act on exotic constants, compatible with other
|
|
assemblers.
|
|
|
|
Subexpressions are a left parenthesis (@t{(}) followed by an integer
|
|
expression followed by a right parenthesis (@t{)}), or a unary
|
|
operator followed by an argument.
|
|
|
|
@subsection Operators
|
|
@dfn{Operators} are arithmetic functions, like @t{+} or @t{%}. Unary
|
|
operators are followed by an argument. Binary operators appear
|
|
between their arguments. Operators may be preceded and/or followed by
|
|
whitespace.
|
|
|
|
@subsection Unary Operators
|
|
@node Unops, , Argument, Expressions
|
|
@code{as} has the following @dfn{unary operators}. They each take
|
|
one argument, which must be absolute.
|
|
@table @t
|
|
@item -
|
|
Hyphen. @dfn{Negation}. Two's complement negation.
|
|
@item ~
|
|
Tilde. @dfn{Complementation}. Bitwise not.
|
|
@end table
|
|
|
|
@subsection Binary Operators
|
|
|
|
@dfn{Binary operators} are infix. Operators have precedence, but
|
|
operators with equal precedence are performed left to right.
|
|
Apart from @code{+} or @code{-}, both arguments must be absolute, and
|
|
the result is absolute.
|
|
|
|
@enumerate
|
|
|
|
@item
|
|
Highest Precedence
|
|
@table @code
|
|
@item *
|
|
@dfn{Multiplication}.
|
|
@item /
|
|
@dfn{Division}. Truncation is the same as the C operator @samp{/}
|
|
@item %
|
|
@dfn{Remainder}.
|
|
@item <
|
|
@itemx <<
|
|
@dfn{Shift Left}. Same as the C operator @samp{<<}
|
|
@item >
|
|
@itemx >>
|
|
@dfn{Shift Right}. Same as the C operator @samp{>>}
|
|
@end table
|
|
|
|
@item
|
|
Intermediate precedence
|
|
@table @code
|
|
@item |
|
|
@dfn{Bitwise Inclusive Or}.
|
|
@item &
|
|
@dfn{Bitwise And}.
|
|
@item ^
|
|
@dfn{Bitwise Exclusive Or}.
|
|
@item !
|
|
@dfn{Bitwise Or Not}.
|
|
@end table
|
|
|
|
@item
|
|
Lowest Precedence
|
|
@table @code
|
|
@item +
|
|
@dfn{Addition}. If either argument is absolute, the result
|
|
has the segment of the other argument.
|
|
If either argument is pass1 or undefined, the result is pass1.
|
|
Otherwise @code{+} is illegal.
|
|
@item -
|
|
@dfn{Subtraction}. If the right argument is absolute, the
|
|
result has the segment of the left argument.
|
|
If either argument is pass1 the result is pass1.
|
|
If either argument is undefined the result is difference segment.
|
|
If both arguments are in the same segment, the result is absolute---provided
|
|
that segment is one of @b{text}, @b{data} or @b{bss}.
|
|
Otherwise @code{-} is illegal.
|
|
@end table
|
|
@end enumerate
|
|
|
|
The sense of the rule for @code{+} is that it's only meaningful to add
|
|
the @emph{offsets} in an address; you can only have a defined segment in
|
|
one of the two arguments.
|
|
|
|
Similarly, you can't subtract quantities from two different segments.
|
|
|
|
@node Pseudo Ops, Machine Dependent, Expressions, top
|
|
@chapter Assembler Directives
|
|
@menu
|
|
* Abort:: The Abort directive causes as to abort
|
|
* Align:: Pad the location counter to a power of 2
|
|
* Ascii:: Fill memory with bytes of ASCII characters
|
|
* Asciz:: Fill memory with bytes of ASCII characters followed
|
|
by a null.
|
|
* Byte:: Fill memory with 8-bit integers
|
|
* Comm:: Reserve public space in the BSS segment
|
|
* Data:: Change to the data segment
|
|
* Desc:: Set the n_desc of a symbol
|
|
* Double:: Fill memory with double-precision floating-point numbers
|
|
* File:: Set the logical file name
|
|
* Fill:: Fill memory with repeated values
|
|
* Float:: Fill memory with single-precision floating-point numbers
|
|
* Global:: Make a symbol visible to the linker
|
|
* Int:: Fill memory with 32-bit integers
|
|
* Lcomm:: Reserve private space in the BSS segment
|
|
* Line:: Set the logical line number
|
|
* Long:: Fill memory with 32-bit integers
|
|
* Lsym:: Create a local symbol
|
|
* Octa:: Fill memory with 128-bit integers
|
|
* Org:: Change the location counter
|
|
* Quad:: Fill memory with 64-bit integers
|
|
* Set:: Set the value of a symbol
|
|
* Short:: Fill memory with 16-bit integers
|
|
* Space:: Fill memory with a repeated value
|
|
* Stab:: Store debugging information
|
|
* Text:: Change to the text segment
|
|
* Word:: Fill memory with 16-bit integers
|
|
@end menu
|
|
|
|
All assembler directives have names that begin with a period (@samp{.}).
|
|
The rest of the name is letters: their case does not matter.
|
|
|
|
@node Abort, Align, Pseudo Ops, Pseudo Ops
|
|
@section .abort
|
|
This directive stops the assembly immediately. It is for
|
|
compatibility with other assemblers. The original idea was that the
|
|
assembler program would be piped into the assembler. If the sender
|
|
of a program quit, it could use this directive tells @code{as} to
|
|
quit also. One day @code{.abort} will not be supported.
|
|
|
|
@node Align, Ascii, Abort, Pseudo Ops
|
|
@section .align @var{absolute-expression} , @var{absolute-expression}
|
|
Pad the location counter (in the current subsegment) to a word,
|
|
longword or whatever boundary. The first expression is the number
|
|
of low-order zero bits the location counter will have after
|
|
advancement. For example @samp{.align 3} will advance the location
|
|
counter until it a multiple of 8. If the location counter is
|
|
already a multiple of 8, no change is needed.
|
|
|
|
The second expression gives the value to be stored in the padding
|
|
bytes. It (and the comma) may be omitted. If it is omitted, the
|
|
padding bytes are zero.
|
|
|
|
@node Ascii, Asciz, Align, Pseudo Ops
|
|
@section .ascii @var{strings}
|
|
@code{.ascii} expects zero or more string literals (@pxref{Strings})
|
|
separated by commas. It assembles each string (with no automatic
|
|
trailing zero byte) into consecutive addresses.
|
|
|
|
@node Asciz, Byte, Ascii, Pseudo Ops
|
|
@section .asciz @var{strings}
|
|
@code{.asciz} is just like @code{.ascii}, but each string is followed by a zero byte.
|
|
The ``z'' in @samp{.asciz} stands for ``zero''.
|
|
|
|
@node Byte, Comm, Asciz, Pseudo Ops
|
|
@section .byte @var{expressions}
|
|
|
|
@code{.byte} expects zero or more expressions, separated by commas.
|
|
Each expression is assembled into the next byte.
|
|
|
|
@node Comm, Data, Byte, Pseudo Ops
|
|
@section .comm @var{symbol} , @var{length}
|
|
@code{.comm} declares a named common area in the bss segment. Normally
|
|
@code{ld} reserves memory addresses for it during linking, so no partial
|
|
program defines the location of the symbol. Use @code{.comm} to tell
|
|
@code{ld} that it must be at least @var{length} bytes long. @code{ld}
|
|
will allocate space for each @code{.comm} symbol that is at least as
|
|
long as the longest @code{.comm} request in any of the partial programs
|
|
linked. @var{length} is an absolute expression.
|
|
|
|
@node Data, Desc, Comm, Pseudo Ops
|
|
@section .data @var{subsegment}
|
|
@code{.data} tells @code{as} to assemble the following statements onto the
|
|
end of the data subsegment numbered @var{subsegment} (which is an
|
|
absolute expression). If @var{subsegment} is omitted, it defaults
|
|
to zero.
|
|
|
|
@node Desc, Double, Data, Pseudo Ops
|
|
@section .desc @var{symbol}, @var{absolute-expression}
|
|
This directive sets @code{n_desc} of the symbol to the low 16 bits of
|
|
@var{absolute-expression}.
|
|
|
|
@node Double, File, Desc, Pseudo Ops
|
|
@section .double @var{flonums}
|
|
@code{.double} expects zero or more flonums, separated by commas. It assembles
|
|
floating point numbers. The exact kind of floating point numbers
|
|
emitted depends on how @code{as} is configured. @xref{Machine Dependent}.
|
|
|
|
@node File, Fill, Double, Pseudo Ops
|
|
@section .file @var{string}
|
|
@code{.file} tells @code{as} that we are about to start a new logical
|
|
file. @var{String} is the new file name. An empty file name
|
|
is permitted, but you must still give the quotes: @code{""}. This
|
|
statement may go away in future: it is only recognized to
|
|
be compatible with old @code{as} programs.
|
|
|
|
@node Fill, Float, File, Pseudo Ops
|
|
@section .fill @var{repeat} , @var{size} , @var{value}
|
|
@var{result}, @var{size} and @var{value} are absolute expressions.
|
|
This emits @var{repeat} copies of @var{size} bytes. @var{Repeat}
|
|
may be zero or more. @var{Size} may be zero or more, but if it is
|
|
more than 8, then it is deemed to have the value 8, compatible with
|
|
other people's assemblers. The contents of each @var{repeat} bytes
|
|
is taken from an 8-byte number. The highest order 4 bytes are
|
|
zero. The lowest order 4 bytes are @var{value} rendered in the
|
|
byte-order of an integer on the computer @code{as} is assembling for.
|
|
Each @var{size} bytes in a repetition is taken from the lowest order
|
|
@var{size} bytes of this number. Again, this bizarre behavior is
|
|
compatible with other people's assemblers.
|
|
|
|
@var{Size} and @var{value} are optional.
|
|
If the second comma and @var{value} are absent, @var{value} is
|
|
assumed zero. If the first comma and following tokens are absent,
|
|
@var{size} is assumed to be 1.
|
|
|
|
@node Float, Global, Fill, Pseudo Ops
|
|
@section .float @var{flonums}
|
|
This directive assembles zero or more flonums, separated by commas.
|
|
The exact kind of floating point numbers emitted depends on how
|
|
@code{as} is configured. @xref{Machine Dependent}.
|
|
|
|
@node Global, Int, Float, Pseudo Ops
|
|
@section .global @var{symbol}
|
|
@code{.global} makes the symbol visible to @code{ld}. If you define
|
|
@var{symbol} in your partial program, its value is made available to
|
|
other partial programs that are linked with it. Otherwise,
|
|
@var{symbol} will take its attributes from a symbol of the same name
|
|
from another partial program it is linked with.
|
|
|
|
This is done by setting the @code{N_EXT} bit
|
|
of that symbol's @code{n_type} to 1.
|
|
|
|
@node Int, Lcomm, Global, Pseudo Ops
|
|
@section .int @var{expressions}
|
|
Expect zero or more @var{expressions}, of any segment, separated by
|
|
commas. For each expression, emit a 32-bit number that will, at run
|
|
time, be the value of that expression. The byte order of the
|
|
expression depends on what kind of computer will run the program.
|
|
|
|
@node Lcomm, Line, Int, Pseudo Ops
|
|
@section .lcomm @var{symbol} , @var{length}
|
|
Reserve @var{length} (an absolute expression) bytes for a local
|
|
common denoted by @var{symbol}. The segment and value of @var{symbol} are
|
|
those of the new local common. The addresses are allocated in the
|
|
@code{bss} segment, so at run-time the bytes will start off zeroed.
|
|
@var{Symbol} is not declared global (@pxref{Global}), so is normally
|
|
not visible to @code{ld}.
|
|
|
|
@node Line, Long, Lcomm, Pseudo Ops
|
|
@section .line @var{logical line number}
|
|
@code{.line} tells @code{as} to change the logical line number.
|
|
@var{logical line number} is an absolute expression. The next line
|
|
will have that logical line number. So any other statements on the
|
|
current line (after a @code{;}) will be reported as on logical line
|
|
number @var{logical line number} - 1. One day this directive will
|
|
be unsupported: it is used only for compatibility with existing
|
|
assembler programs.
|
|
|
|
@node Long, Lsym, Line, Pseudo Ops
|
|
@section .long @var{expressions}
|
|
@code{.long} is the same as @samp{.int}, @pxref{Int}.
|
|
|
|
@node Lsym, Octa, Long, Pseudo Ops
|
|
@section .lsym @var{symbol}, @var{expression}
|
|
@code{.lsym} creates a new symbol named @var{symbol}, but does not put it in
|
|
the hash table, ensuring it cannot be referenced by name during the
|
|
rest of the assembly. This sets the attributes of the symbol to be
|
|
the same as the expression value:
|
|
@table @code
|
|
@item n_other = n_desc = 0
|
|
@itemx n_type = @r{(segment of @var{expression})}
|
|
@itemx N_EXT = 0
|
|
@itemx n_value = @var{expression}
|
|
@end table
|
|
|
|
@node Octa, Org, Lsym, Pseudo Ops
|
|
@section .octa @var{bignums}
|
|
This directive expects zero or more bignums, separated by commas. For each
|
|
bignum, it emits an 16-byte (@b{octa}-word) integer.
|
|
|
|
@node Org, Quad, Octa, Pseudo Ops
|
|
@section .org @var{new-lc} , @var{fill}
|
|
|
|
@code{.org} will advance the location counter of the current segment to
|
|
@var{new-lc}. @var{new-lc} is either an absolute expression or an
|
|
expression with the same segment as the current subsegment. That is,
|
|
you can't use @code{.org} to cross segments: if @var{new-lc} has the
|
|
wrong segment, the @code{.org} directive is ignored. To be compatible
|
|
with former assemblers, if the segment of @var{new-lc} is absolute,
|
|
@code{as} will issue a warning, then pretend the segment of @var{new-lc}
|
|
is the same as the current subsegment.
|
|
|
|
@code{.org} may only increase the location counter, or leave it
|
|
unchanged; you cannot use @code{.org} to move the location counter
|
|
backwards.
|
|
|
|
Because @code{as} tries to assemble programs in one pass @var{new-lc}
|
|
must be defined. If you really detest this restriction we eagerly await
|
|
a chance to share your improved assembler.
|
|
|
|
Beware that the origin is relative to the start of the segment, not
|
|
to the start of the subsegment. This is compatible with other
|
|
people's assemblers.
|
|
|
|
When the location counter (of the current subsegment) is advanced, the
|
|
intervening bytes are filled with @var{fill} which should be an
|
|
absolute expression. If the comma and @var{fill} are omitted,
|
|
@var{fill} defaults to zero.
|
|
|
|
@node Quad, Set, Org, Pseudo Ops
|
|
@section .quad @var{bignums}
|
|
@code{.quad} expects zero or more bignums, separated by commas. For each
|
|
bignum, it emits an 8-byte (@b{quad}-word) integer. If the bignum
|
|
won't fit in a quad-word, it prints a warning message; and just
|
|
takes the lowest order 8 bytes of the bignum.
|
|
|
|
@node Set, Short, Quad, Pseudo Ops
|
|
@section .set @var{symbol}, @var{expression}
|
|
|
|
This directive sets the value of @var{symbol} to @var{expression}. This
|
|
will change @code{n_value} and @code{n_type} to conform to
|
|
@var{expression}. If @code{n_ext} is set, it remains set.
|
|
|
|
You may @code{.set} a symbol many times in the same assembly.
|
|
If the expression's segment is unknowable during pass 1, a second
|
|
pass over the source program will be forced. The second pass is
|
|
currently not implemented. @code{as} will abort with an error
|
|
message if one is required.
|
|
|
|
If you @code{.set} a global symbol, the value stored in the object
|
|
file is the last value stored into it.
|
|
|
|
@node Short, Space, Set, Pseudo Ops
|
|
@section .short @var{expressions}
|
|
@c if not sparc
|
|
@code{.short} is the same as @samp{.word}. @xref{Word}.
|
|
@c fi not sparc
|
|
@c if sparc
|
|
@c On the sparc, this expects zero or more @var{expressions}, and emits
|
|
@c a 16 bit number for each.
|
|
@c fi sparc
|
|
|
|
@node Space, Stab, Short, Pseudo Ops
|
|
@section .space @var{size} , @var{fill}
|
|
This directive emits @var{size} bytes, each of value @var{fill}. Both
|
|
@var{size} and @var{fill} are absolute expressions. If the comma
|
|
and @var{fill} are omitted, @var{fill} is assumed to be zero.
|
|
|
|
@node Stab, Text, Space, Pseudo Ops
|
|
@section .stabd, .stabn, .stabs
|
|
There are three directives that begin @samp{.stab}.
|
|
All emit symbols, for use by symbolic debuggers.
|
|
The symbols are not entered in @code{as}' hash table: they
|
|
cannot be referenced elsewhere in the source file.
|
|
Up to five fields are required:
|
|
@table @var
|
|
@item string
|
|
This is the symbol's name. It may contain any character except @samp{\000},
|
|
so is more general than ordinary symbol names. Some debuggers used to
|
|
code arbitrarily complex structures into symbol names using this field.
|
|
@item type
|
|
An absolute expression. The symbol's @code{n_type} is set to the low 8
|
|
bits of this expression.
|
|
Any bit pattern is permitted, but @code{ld} and debuggers will choke on
|
|
silly bit patterns.
|
|
@item other
|
|
An absolute expression.
|
|
The symbol's @code{n_other} is set to the low 8 bits of this expression.
|
|
@item desc
|
|
An absolute expression.
|
|
The symbol's @code{n_desc} is set to the low 16 bits of this expression.
|
|
@item value
|
|
An absolute expression which becomes the symbol's @code{n_value}.
|
|
@end table
|
|
|
|
If a warning is detected while reading a @code{.stab@var{X}}
|
|
statement, the symbol has probably already been created and you will
|
|
get a half-formed symbol in your object file. This is compatible
|
|
with earlier assemblers!
|
|
|
|
@table @code
|
|
@item .stabd @var{type} , @var{other} , @var{desc}
|
|
|
|
The ``name'' of the symbol generated is not even an empty string.
|
|
It is a null pointer, for compatibility. Older assemblers used a
|
|
null pointer so they didn't waste space in object files with empty
|
|
strings.
|
|
|
|
The symbol's @code{n_value} is set to the location counter,
|
|
relocatably. When your program is linked, the value of this symbol
|
|
will be where the location counter was when the @code{.stabd} was
|
|
assembled.
|
|
|
|
@item .stabn @var{type} , @var{other} , @var{desc} , @var{value}
|
|
|
|
The name of the symbol is set to the empty string @code{""}.
|
|
|
|
@item .stabs @var{string} , @var{type} , @var{other} , @var{desc} , @var{value}
|
|
|
|
All five fields are specified.
|
|
@end table
|
|
|
|
@node Text, Word, Stab, Pseudo Ops
|
|
@section .text @var{subsegment}
|
|
Tells @code{as} to assemble the following statements onto the end of
|
|
the text subsegment numbered @var{subsegment}, which is an absolute
|
|
expression. If @var{subsegment} is omitted, subsegment number zero
|
|
is used.
|
|
|
|
@node Word, , Text, Pseudo Ops
|
|
@section .word @var{expressions}
|
|
@c if sparc
|
|
@c On the Sparc, this produces 32-bit numbers instead of 16-bit ones.
|
|
@c fi sparc
|
|
This directive expects zero or more @var{expressions}, of any segment,
|
|
separated by commas. For each expression, @code{as} emits a 16-bit number.
|
|
@ignore
|
|
@c if all-arch
|
|
The byte order
|
|
of the expression depends on what kind of computer will run the
|
|
program.
|
|
@c fi all-arch
|
|
@end ignore
|
|
|
|
@subsection Special Treatment to support Compilers
|
|
|
|
In order to assemble compiler output into something that will work,
|
|
@code{as} will occasionlly do strange things to @samp{.word} directives.
|
|
Directives of the form @samp{.word sym1-sym2} are often emitted by
|
|
compilers as part of jump tables. Therefore, when @code{as} assembles a
|
|
directive of the form @samp{.word sym1-sym2}, and the difference between
|
|
@code{sym1} and @code{sym2} does not fit in 16 bits, @code{as} will
|
|
create a @dfn{secondary jump table}, immediately before the next label.
|
|
This @var{secondary jump table} will be preceded by a short-jump to the
|
|
first byte after the secondary table. This short-jump prevents the flow
|
|
of control from accidentally falling into the new table. Inside the
|
|
table will be a long-jump to @code{sym2}. The original @samp{.word}
|
|
will contain @code{sym1} minus the address of the long-jump to
|
|
@code{sym2}.
|
|
|
|
If there were several occurrences of @samp{.word sym1-sym2} before the
|
|
secondary jump table, all of them will be adjusted. If there was a
|
|
@samp{.word sym3-sym4}, that also did not fit in sixteen bits, a
|
|
long-jump to @code{sym4} will be included in the secondary jump table,
|
|
and the @code{.word} directives will be adjusted to contain @code{sym3}
|
|
minus the address of the long-jump to @code{sym4}; and so on, for as many
|
|
entries in the original jump table as necessary.
|
|
|
|
@ignore
|
|
@c if internals
|
|
@emph{This feature may be disabled by compiling @code{as} with the
|
|
@samp{-DWORKING_DOT_WORD} option.} This feature is likely to confuse
|
|
assembly language programmers.
|
|
@c fi internals
|
|
@end ignore
|
|
|
|
|
|
@section Deprecated Directives
|
|
One day these directives won't work.
|
|
They are included for compatibility with older assemblers.
|
|
@table @t
|
|
@item .abort
|
|
@item .file
|
|
@item .line
|
|
@end table
|
|
|
|
@node Machine Dependent, License, Pseudo Ops, top
|
|
@chapter Machine Dependent Features:
|
|
@c if 680x0
|
|
Motorola 680x0 @refill
|
|
@c fi 680x0
|
|
@c pesch@cygnus.com: This version of the manual is specifically hacked
|
|
@c for 68K gas. We should have a config method of
|
|
@c automating this; in the meantime, use ignore
|
|
@c for the other architectures (or for their stubs)
|
|
@ignore
|
|
@section Vax
|
|
@subsection Options
|
|
|
|
The Vax version of @code{as} accepts any of the following options,
|
|
gives a warning message that the option was ignored and proceeds.
|
|
These options are for compatibility with scripts designed for other
|
|
people's assemblers.
|
|
|
|
@table @asis
|
|
@item @kbd{-D} (Debug)
|
|
@itemx @kbd{-S} (Symbol Table)
|
|
@itemx @kbd{-T} (Token Trace)
|
|
These are obsolete options used to debug old assemblers.
|
|
|
|
@item @kbd{-d} (Displacement size for JUMPs)
|
|
This option expects a number following the @kbd{-d}. Like options
|
|
that expect filenames, the number may immediately follow the
|
|
@kbd{-d} (old standard) or constitute the whole of the command line
|
|
argument that follows @kbd{-d} (GNU standard).
|
|
|
|
@item @kbd{-V} (Virtualize Interpass Temporary File)
|
|
Some other assemblers use a temporary file. This option
|
|
commanded them to keep the information in active memory rather
|
|
than in a disk file. @code{as} always does this, so this
|
|
option is redundant.
|
|
|
|
@item @kbd{-J} (JUMPify Longer Branches)
|
|
Many 32-bit computers permit a variety of branch instructions
|
|
to do the same job. Some of these instructions are short (and
|
|
fast) but have a limited range; others are long (and slow) but
|
|
can branch anywhere in virtual memory. Often there are 3
|
|
flavors of branch: short, medium and long. Some other
|
|
assemblers would emit short and medium branches, unless told by
|
|
this option to emit short and long branches.
|
|
|
|
@item @kbd{-t} (Temporary File Directory)
|
|
Some other assemblers may use a temporary file, and this option
|
|
takes a filename being the directory to site the temporary
|
|
file. @code{as} does not use a temporary disk file, so this
|
|
option makes no difference. @kbd{-t} needs exactly one
|
|
filename.
|
|
@end table
|
|
|
|
The Vax version of the assembler accepts two options when
|
|
compiled for VMS. They are @kbd{-h}, and @kbd{-+}. The
|
|
@kbd{-h} option prevents @code{as} from modifying the
|
|
symbol-table entries for symbols that contain lowercase
|
|
characters (I think). The @kbd{-+} option causes @code{as} to
|
|
print warning messages if the FILENAME part of the object file,
|
|
or any symbol name is larger than 31 characters. The @kbd{-+}
|
|
option also insertes some code following the @samp{_main}
|
|
symbol so that the object file will be compatible with Vax-11
|
|
"C".
|
|
|
|
@subsection Floating Point
|
|
Conversion of flonums to floating point is correct, and
|
|
compatible with previous assemblers. Rounding is
|
|
towards zero if the remainder is exactly half the least significant bit.
|
|
|
|
@code{D}, @code{F}, @code{G} and @code{H} floating point formats
|
|
are understood.
|
|
|
|
Immediate floating literals (@emph{e.g.} @samp{S`$6.9})
|
|
are rendered correctly. Again, rounding is towards zero in the
|
|
boundary case.
|
|
|
|
The @code{.float} directive produces @code{f} format numbers.
|
|
The @code{.double} directive produces @code{d} format numbers.
|
|
|
|
@subsection Machine Directives
|
|
The Vax version of the assembler supports four directives for
|
|
generating Vax floating point constants. They are described in the
|
|
table below.
|
|
|
|
@table @code
|
|
@item .dfloat
|
|
This expects zero or more flonums, separated by commas, and
|
|
assembles Vax @code{d} format 64-bit floating point constants.
|
|
|
|
@item .ffloat
|
|
This expects zero or more flonums, separated by commas, and
|
|
assembles Vax @code{f} format 32-bit floating point constants.
|
|
|
|
@item .gfloat
|
|
This expects zero or more flonums, separated by commas, and
|
|
assembles Vax @code{g} format 64-bit floating point constants.
|
|
|
|
@item .hfloat
|
|
This expects zero or more flonums, separated by commas, and
|
|
assembles Vax @code{h} format 128-bit floating point constants.
|
|
|
|
@end table
|
|
|
|
@subsection Opcodes
|
|
All DEC mnemonics are supported. Beware that @code{case@dots{}}
|
|
instructions have exactly 3 operands. The dispatch table that
|
|
follows the @code{case@dots{}} instruction should be made with
|
|
@code{.word} statements. This is compatible with all unix
|
|
assemblers we know of.
|
|
|
|
@subsection Branch Improvement
|
|
Certain pseudo opcodes are permitted. They are for branch
|
|
instructions. They expand to the shortest branch instruction that
|
|
will reach the target. Generally these mnemonics are made by
|
|
substituting @samp{j} for @samp{b} at the start of a DEC mnemonic.
|
|
This feature is included both for compatibility and to help
|
|
compilers. If you don't need this feature, don't use these
|
|
opcodes. Here are the mnemonics, and the code they can expand into.
|
|
|
|
@table @code
|
|
@item jbsb
|
|
@samp{Jsb} is already an instruction mnemonic, so we chose @samp{jbsb}.
|
|
@table @asis
|
|
@item (byte displacement)
|
|
@kbd{bsbb @dots{}}
|
|
@item (word displacement)
|
|
@kbd{bsbw @dots{}}
|
|
@item (long displacement)
|
|
@kbd{jsb @dots{}}
|
|
@end table
|
|
@item jbr
|
|
@itemx jr
|
|
Unconditional branch.
|
|
@table @asis
|
|
@item (byte displacement)
|
|
@kbd{brb @dots{}}
|
|
@item (word displacement)
|
|
@kbd{brw @dots{}}
|
|
@item (long displacement)
|
|
@kbd{jmp @dots{}}
|
|
@end table
|
|
@item j@var{COND}
|
|
@var{COND} may be any one of the conditional branches
|
|
@code{neq nequ eql eqlu gtr geq lss gtru lequ vc vs gequ cc lssu cs}.
|
|
@var{COND} may also be one of the bit tests
|
|
@code{bs bc bss bcs bsc bcc bssi bcci lbs lbc}.
|
|
@var{NOTCOND} is the opposite condition to @var{COND}.
|
|
@table @asis
|
|
@item (byte displacement)
|
|
@kbd{b@var{COND} @dots{}}
|
|
@item (word displacement)
|
|
@kbd{b@var{UNCOND} foo ; brw @dots{} ; foo:}
|
|
@item (long displacement)
|
|
@kbd{b@var{UNCOND} foo ; jmp @dots{} ; foo:}
|
|
@end table
|
|
@item jacb@var{X}
|
|
@var{X} may be one of @code{b d f g h l w}.
|
|
@table @asis
|
|
@item (word displacement)
|
|
@kbd{@var{OPCODE} @dots{}}
|
|
@item (long displacement)
|
|
@kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @dots{} ; bar:}
|
|
@end table
|
|
@item jaob@var{YYY}
|
|
@var{YYY} may be one of @code{lss leq}.
|
|
@item jsob@var{ZZZ}
|
|
@var{ZZZ} may be one of @code{geq gtr}.
|
|
@table @asis
|
|
@item (byte displacement)
|
|
@kbd{@var{OPCODE} @dots{}}
|
|
@item (word displacement)
|
|
@kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: brw @var{destination} ; bar:}
|
|
@item (long displacement)
|
|
@kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @var{destination} ; bar: }
|
|
@end table
|
|
@item aobleq
|
|
@itemx aoblss
|
|
@itemx sobgeq
|
|
@itemx sobgtr
|
|
@table @asis
|
|
@item (byte displacement)
|
|
@kbd{@var{OPCODE} @dots{}}
|
|
@item (word displacement)
|
|
@kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: brw @var{destination} ; bar:}
|
|
@item (long displacement)
|
|
@kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @var{destination} ; bar:}
|
|
@end table
|
|
@end table
|
|
|
|
@subsection operands
|
|
The immediate character is @samp{$} for Unix compatibility, not
|
|
@samp{#} as DEC writes it.
|
|
|
|
The indirect character is @samp{*} for Unix compatibility, not
|
|
@samp{@@} as DEC writes it.
|
|
|
|
The displacement sizing character is @samp{`} (an accent grave) for
|
|
Unix compatibility, not @samp{^} as DEC writes it. The letter
|
|
preceding @samp{`} may have either case. @samp{G} is not
|
|
understood, but all other letters (@code{b i l s w}) are understood.
|
|
|
|
Register names understood are @code{r0 r1 r2 @dots{} r15 ap fp sp
|
|
pc}. Any case of letters will do.
|
|
|
|
For instance
|
|
@example
|
|
tstb *w`$4(r5)
|
|
@end example
|
|
|
|
Any expression is permitted in an operand. Operands are comma
|
|
separated.
|
|
|
|
@c There is some bug to do with recognizing expressions
|
|
@c in operands, but I forget what it is. It is
|
|
@c a syntax clash because () is used as an address mode
|
|
@c and to encapsulate sub-expressions.
|
|
@subsection Not Supported
|
|
Vax bit fields can not be assembled with @code{as}. Someone
|
|
can add the required code if they really need it.
|
|
@end ignore
|
|
|
|
@c if 680x0
|
|
@section Options
|
|
The 680x0 version of @code{as} has two machine dependent options.
|
|
One shortens undefined references from 32 to 16 bits, while the
|
|
other is used to tell @code{as} what kind of machine it is
|
|
assembling for.
|
|
|
|
You can use the @kbd{-l} option to shorten the size of references to
|
|
undefined symbols. If the @kbd{-l} option is not given, references to
|
|
undefined symbols will be a full long (32 bits) wide. (Since @code{as}
|
|
cannot know where these symbols will end up, @code{as} can only allocate
|
|
space for the linker to fill in later. Since @code{as} doesn't know how
|
|
far away these symbols will be, it allocates as much space as it can.)
|
|
If this option is given, the references will only be one word wide (16
|
|
bits). This may be useful if you want the object file to be as small as
|
|
possible, and you know that the relevant symbols will be less than 17
|
|
bits away.
|
|
|
|
The 680x0 version of @code{as} is most frequently used to assemble
|
|
programs for the Motorola MC68020 microprocessor. Occasionally it is
|
|
used to assemble programs for the mostly similar, but slightly different
|
|
MC68000 or MC68010 microprocessors. You can give @code{as} the options
|
|
@samp{-m68000}, @samp{-mc68000}, @samp{-m68010}, @samp{-mc68010},
|
|
@samp{-m68020}, and @samp{-mc68020} to tell it what processor is the
|
|
target.
|
|
|
|
@section Syntax
|
|
|
|
The 680x0 version of @code{as} uses syntax similar to the Sun assembler.
|
|
Size modifiers are appended directly to the end of the opcode without an
|
|
intervening period. For example, write @samp{movl} rather than
|
|
@samp{move.l}.
|
|
|
|
@c pesch@cygnus.com: Vintage Release c1.37 isn't compiled with
|
|
@c SUN_ASM_SYNTAX.
|
|
@ignore
|
|
If @code{as} is compiled with SUN_ASM_SYNTAX defined, it will also allow
|
|
Sun-style local labels of the form @samp{1$} through @samp{$9}.
|
|
@end ignore
|
|
|
|
In the following table @dfn{apc} stands for any of the address
|
|
registers (@samp{a0} through @samp{a7}), nothing, (@samp{}), the
|
|
Program Counter (@samp{pc}), or the zero-address relative to the
|
|
program counter (@samp{zpc}).
|
|
|
|
The following addressing modes are understood:
|
|
@table @dfn
|
|
@item Immediate
|
|
@samp{#@var{digits}}
|
|
|
|
@item Data Register
|
|
@samp{d0} through @samp{d7}
|
|
|
|
@item Address Register
|
|
@samp{a0} through @samp{a7}
|
|
|
|
@item Address Register Indirect
|
|
@samp{a0@@} through @samp{a7@@}
|
|
|
|
@item Address Register Postincrement
|
|
@samp{a0@@+} through @samp{a7@@+}
|
|
|
|
@item Address Register Predecrement
|
|
@samp{a0@@-} through @samp{a7@@-}
|
|
|
|
@item Indirect Plus Offset
|
|
@samp{@var{apc}@@(@var{digits})}
|
|
|
|
@item Index
|
|
@samp{@var{apc}@@(@var{digits},@var{register}:@var{size}:@var{scale})}
|
|
or @samp{@var{apc}@@(@var{register}:@var{size}:@var{scale})}
|
|
|
|
@item Postindex
|
|
@samp{@var{apc}@@(@var{digits})@@(@var{digits},@var{register}:@var{size}:@var{scale})}
|
|
or @samp{@var{apc}@@(@var{digits})@@(@var{register}:@var{size}:@var{scale})}
|
|
|
|
@item Preindex
|
|
@samp{@var{apc}@@(@var{digits},@var{register}:@var{size}:@var{scale})@@(@var{digits})}
|
|
or @samp{@var{apc}@@(@var{register}:@var{size}:@var{scale})@@(@var{digits})}
|
|
|
|
@item Memory Indirect
|
|
@samp{@var{apc}@@(@var{digits})@@(@var{digits})}
|
|
|
|
@item Absolute
|
|
@samp{@var{symbol}}, or @samp{@var{digits}}
|
|
@ignore
|
|
@c pesch@cygnus.com: gnu, rich concur the following needs careful
|
|
@c research before documenting.
|
|
, or either of the above followed
|
|
by @samp{:b}, @samp{:w}, or @samp{:l}.
|
|
@end ignore
|
|
@end table
|
|
|
|
@section Floating Point
|
|
The floating point code is not too well tested, and may have
|
|
subtle bugs in it.
|
|
|
|
Packed decimal (P) format floating literals are not supported.
|
|
Feel free to add the code!
|
|
|
|
The floating point formats generated by directives are these.
|
|
@table @code
|
|
@item .float
|
|
@code{Single} precision floating point constants.
|
|
@item .double
|
|
@code{Double} precision floating point constants.
|
|
@end table
|
|
|
|
There is no directive to produce regions of memory holding
|
|
extended precision numbers, however they can be used as
|
|
immediate operands to floating-point instructions. Adding a
|
|
directive to create extended precision numbers would not be
|
|
hard, but it has not yet seemed necessary.
|
|
|
|
@section Machine Directives
|
|
In order to be compatible with the Sun assembler the 680x0 assembler
|
|
understands the following directives.
|
|
@table @code
|
|
@item .data1
|
|
This directive is identical to a @code{.data 1} directive.
|
|
@item .data2
|
|
This directive is identical to a @code{.data 2} directive.
|
|
@item .even
|
|
This directive is identical to a @code{.align 1} directive.
|
|
@c Is this true? does it work???
|
|
@item .skip
|
|
This directive is identical to a @code{.space} directive.
|
|
@end table
|
|
|
|
@section Opcodes
|
|
@c pesch@cygnus.com: I don't see any point in the following
|
|
@c paragraph. Bugs are bugs; how does saying this
|
|
@c help anyone?
|
|
@ignore
|
|
Danger: Several bugs have been found in the opcode table (and
|
|
fixed). More bugs may exist. Be careful when using obscure
|
|
instructions.
|
|
@end ignore
|
|
|
|
@subsection Branch Improvement
|
|
|
|
Certain pseudo opcodes are permitted for branch instructions.
|
|
They expand to the shortest branch instruction that will reach the
|
|
target. Generally these mnemonics are made by substituting @samp{j} for
|
|
@samp{b} at the start of a Motorola mnemonic.
|
|
|
|
The following table summarizes the pseudo-operations. A @code{*} flags
|
|
cases that are more fully described after the table:
|
|
|
|
@example
|
|
Displacement
|
|
+---------------------------------------------------------
|
|
| 68020 68000/10
|
|
Pseudo-Op |BYTE WORD LONG LONG non-PC relative
|
|
+---------------------------------------------------------
|
|
jbsr |bsrs bsr bsrl jsr jsr
|
|
jra |bras bra bral jmp jmp
|
|
* jXX |bXXs bXX bXXl bNXs;jmpl bNXs;jmp
|
|
* dbXX |dbXX dbXX dbXX; bra; jmpl
|
|
* fjXX |fbXXw fbXXw fbXXl fbNXw;jmp
|
|
|
|
XX: condition
|
|
NX: negative of condition XX
|
|
|
|
@end example
|
|
@center{@code{*}---see full description below}
|
|
|
|
@table @code
|
|
@item jbsr
|
|
@itemx jra
|
|
These are the simplest jump pseudo-operations; they always map to one
|
|
particular machine instruction, depending on the displacement to the
|
|
branch target.
|
|
|
|
@item j@var{XX}
|
|
Here, @samp{j@var{XX}} stands for an entire family of pseudo-operations,
|
|
where @var{XX} is a conditional branch or condition-code test. The full
|
|
list of pseudo-ops in this family is:
|
|
@example
|
|
jhi jls jcc jcs jne jeq jvc
|
|
jvs jpl jmi jge jlt jgt jle
|
|
@end example
|
|
|
|
For the cases of non-PC relative displacements and long displacements on
|
|
the 68000 or 68010, @code{as} will issue a longer code fragment in terms of
|
|
@var{NX}, the opposite condition to @var{XX}:
|
|
@example
|
|
j@var{XX} foo
|
|
@end example
|
|
gives
|
|
@example
|
|
b@var{NX}s oof
|
|
jmp foo
|
|
oof:
|
|
@end example
|
|
|
|
@item db@var{XX}
|
|
The full family of pseudo-operations covered here is
|
|
@example
|
|
dbhi dbls dbcc dbcs dbne dbeq dbvc
|
|
dbvs dbpl dbmi dbge dblt dbgt dble
|
|
dbf dbra dbt
|
|
@end example
|
|
|
|
Other than for word and byte displacements, when the source reads
|
|
@samp{db@var{XX} foo}, @code{as} will emit
|
|
@example
|
|
db@var{XX} oo1
|
|
bra oo2
|
|
oo1:jmpl foo
|
|
oo2:
|
|
@end example
|
|
|
|
@item fj@var{XX}
|
|
This family includes
|
|
@example
|
|
fjne fjeq fjge fjlt fjgt fjle fjf
|
|
fjt fjgl fjgle fjnge fjngl fjngle fjngt
|
|
fjnle fjnlt fjoge fjogl fjogt fjole fjolt
|
|
fjor fjseq fjsf fjsne fjst fjueq fjuge
|
|
fjugt fjule fjult fjun
|
|
@end example
|
|
|
|
For branch targets that are not PC relative, @code{as} emits
|
|
@example
|
|
fb@var{NX} oof
|
|
jmp foo
|
|
oof:
|
|
@end example
|
|
when it encounters @samp{fj@var{XX} foo}.
|
|
|
|
@end table
|
|
|
|
@subsection Special Characters
|
|
The immediate character is @samp{#} for Sun compatibility. The
|
|
line-comment character is @samp{|}. If a @samp{#} appears at the
|
|
beginning of a line, it is treated as a comment unless it looks like
|
|
@samp{# line file}, in which case it is treated normally.
|
|
@c fi 680x0
|
|
|
|
@c pesch@cygnus.com: see remarks at ignore for vax.
|
|
@ignore
|
|
@section 32x32
|
|
@section Options
|
|
The 32x32 version of @code{as} accepts a @kbd{-m32032} option to
|
|
specify thiat it is compiling for a 32032 processor, or a
|
|
@kbd{-m32532} to specify that it is compiling for a 32532 option.
|
|
The default (if neither is specified) is chosen when the assembler
|
|
is compiled.
|
|
|
|
@subsection Syntax
|
|
I don't know anything about the 32x32 syntax assembled by
|
|
@code{as}. Someone who undersands the processor (I've never seen
|
|
one) and the possible syntaxes should write this section.
|
|
|
|
@subsection Floating Point
|
|
The 32x32 uses IEEE floating point numbers, but @code{as} will only
|
|
create single or double precision values. I don't know if the 32x32
|
|
understands extended precision numbers.
|
|
|
|
@subsection Machine Directives
|
|
The 32x32 has no machine dependent directives.
|
|
|
|
@section Sparc
|
|
@subsection Options
|
|
The sparc has no machine dependent options.
|
|
|
|
@subsection syntax
|
|
I don't know anything about Sparc syntax. Someone who does
|
|
will have to write this section.
|
|
|
|
@subsection Floating Point
|
|
The Sparc uses ieee floating-point numbers.
|
|
|
|
@subsection Machine Directives
|
|
The Sparc version of @code{as} supports the following additional
|
|
machine directives:
|
|
|
|
@table @code
|
|
@item .common
|
|
This must be followed by a symbol name, a positive number, and
|
|
@code{"bss"}. This behaves somewhat like @code{.comm}, but the
|
|
syntax is different.
|
|
|
|
@item .global
|
|
This is functionally identical to @code{.globl}.
|
|
|
|
@item .half
|
|
This is functionally identical to @code{.short}.
|
|
|
|
@item .proc
|
|
This directive is ignored. Any text following it on the same
|
|
line is also ignored.
|
|
|
|
@item .reserve
|
|
This must be followed by a symbol name, a positive number, and
|
|
@code{"bss"}. This behaves somewhat like @code{.lcomm}, but the
|
|
syntax is different.
|
|
|
|
@item .seg
|
|
This must be followed by @code{"text"}, @code{"data"}, or
|
|
@code{"data1"}. It behaves like @code{.text}, @code{.data}, or
|
|
@code{.data 1}.
|
|
|
|
@item .skip
|
|
This is functionally identical to the .space directive.
|
|
|
|
@item .word
|
|
On the Sparc, the .word directive produces 32 bit values,
|
|
instead of the 16 bit values it produces on every other machine.
|
|
|
|
@end table
|
|
|
|
@section Intel 80386
|
|
@subsection Options
|
|
The 80386 has no machine dependent options.
|
|
|
|
@subsection AT&T Syntax versus Intel Syntax
|
|
In order to maintain compatibility with the output of @code{GCC},
|
|
@code{as} supports AT&T System V/386 assembler syntax. This is quite
|
|
different from Intel syntax. We mention these differences because
|
|
almost all 80386 documents used only Intel syntax. Notable differences
|
|
between the two syntaxes are:
|
|
@itemize @bullet
|
|
@item
|
|
AT&T immediate operands are preceded by @samp{$}; Intel immediate
|
|
operands are undelimited (Intel @samp{push 4} is AT&T @samp{pushl $4}).
|
|
AT&T register operands are preceded by @samp{%}; Intel register operands
|
|
are undelimited. AT&T absolute (as opposed to PC relative) jump/call
|
|
operands are prefixed by @samp{*}; they are undelimited in Intel syntax.
|
|
|
|
@item
|
|
AT&T and Intel syntax use the opposite order for source and destination
|
|
operands. Intel @samp{add eax, 4} is @samp{addl $4, %eax}. The
|
|
@samp{source, dest} convention is maintained for compatibility with
|
|
previous Unix assemblers.
|
|
|
|
@item
|
|
In AT&T syntax the size of memory operands is determined from the last
|
|
character of the opcode name. Opcode suffixes of @samp{b}, @samp{w},
|
|
and @samp{l} specify byte (8-bit), word (16-bit), and long (32-bit)
|
|
memory references. Intel syntax accomplishes this by prefixes memory
|
|
operands (@emph{not} the opcodes themselves) with @samp{byte ptr},
|
|
@samp{word ptr}, and @samp{dword ptr}. Thus, Intel @samp{mov al, byte
|
|
ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T syntax.
|
|
|
|
@item
|
|
Immediate form long jumps and calls are
|
|
@samp{lcall/ljmp $@var{segment}, $@var{offset}} in AT&T syntax; the
|
|
Intel syntax is
|
|
@samp{call/jmp far @var{segment}:@var{offset}}. Also, the far return
|
|
instruction
|
|
is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is
|
|
@samp{ret far @var{stack-adjust}}.
|
|
|
|
@item
|
|
The AT&T assembler does not provide support for multiple segment
|
|
programs. Unix style systems expect all programs to be single segments.
|
|
@end itemize
|
|
|
|
@subsection Opcode Naming
|
|
Opcode names are suffixed with one character modifiers which specify the
|
|
size of operands. The letters @samp{b}, @samp{w}, and @samp{l} specify
|
|
byte, word, and long operands. If no suffix is specified by an
|
|
instruction and it contains no memory operands then @code{as} tries to
|
|
fill in the missing suffix based on the destination register operand
|
|
(the last one by convention). Thus, @samp{mov %ax, %bx} is equivalent
|
|
to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to
|
|
@samp{movw $1, %bx}. Note that this is incompatible with the AT&T Unix
|
|
assembler which assumes that a missing opcode suffix implies long
|
|
operand size. (This incompatibility does not affect compiler output
|
|
since compilers always explicitly specify the opcode suffix.)
|
|
|
|
Almost all opcodes have the same names in AT&T and Intel format. There
|
|
are a few exceptions. The sign extend and zero extend instructions need
|
|
two sizes to specify them. They need a size to sign/zero extend
|
|
@emph{from} and a size to zero extend @emph{to}. This is accomplished
|
|
by using two opcode suffixes in AT&T syntax. Base names for sign extend
|
|
and zero extend are @samp{movs@dots{}} and @samp{movz@dots{}} in AT&T
|
|
syntax (@samp{movsx} and @samp{movzx} in Intel syntax). The opcode
|
|
suffixes are tacked on to this base name, the @emph{from} suffix before
|
|
the @emph{to} suffix. Thus, @samp{movsbl %al, %edx} is AT&T syntax for
|
|
``move sign extend @emph{from} %al @emph{to} %edx.'' Possible suffixes,
|
|
thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word),
|
|
and @samp{wl} (from word to long).
|
|
|
|
The Intel syntax conversion instructions
|
|
@itemize @bullet
|
|
@item
|
|
@samp{cbw} --- sign-extend byte in @samp{%al} to word in @samp{%ax},
|
|
@item
|
|
@samp{cwde} --- sign-extend word in @samp{%ax} to long in @samp{%eax},
|
|
@item
|
|
@samp{cwd} --- sign-extend word in @samp{%ax} to long in @samp{%dx:%ax},
|
|
@item
|
|
@samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax},
|
|
@end itemize
|
|
are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, and @samp{cltd} in
|
|
AT&T naming. @code{as} accepts either naming for these instructions.
|
|
|
|
Far call/jump instructions are @samp{lcall} and @samp{ljmp} in
|
|
AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel
|
|
convention.
|
|
|
|
@subsection Register Naming
|
|
Register operands are always prefixes with @samp{%}. The 80386 registers
|
|
consist of
|
|
@itemize @bullet
|
|
@item
|
|
the 8 32-bit registers @samp{%eax} (the accumulator), @samp{%ebx},
|
|
@samp{%ecx}, @samp{%edx}, @samp{%edi}, @samp{%esi}, @samp{%ebp} (the
|
|
frame pointer), and @samp{%esp} (the stack pointer).
|
|
|
|
@item
|
|
the 8 16-bit low-ends of these: @samp{%ax}, @samp{%bx}, @samp{%cx},
|
|
@samp{%dx}, @samp{%di}, @samp{%si}, @samp{%bp}, and @samp{%sp}.
|
|
|
|
@item
|
|
the 8 8-bit registers: @samp{%ah}, @samp{%al}, @samp{%bh},
|
|
@samp{%bl}, @samp{%ch}, @samp{%cl}, @samp{%dh}, and @samp{%dl} (These
|
|
are the high-bytes and low-bytes of @samp{%ax}, @samp{%bx},
|
|
@samp{%cx}, and @samp{%dx})
|
|
|
|
@item
|
|
the 6 segment registers @samp{%cs} (code segment), @samp{%ds}
|
|
(data segment), @samp{%ss} (stack segment), @samp{%es}, @samp{%fs},
|
|
and @samp{%gs}.
|
|
|
|
@item
|
|
the 3 processor control registers @samp{%cr0}, @samp{%cr2}, and
|
|
@samp{%cr3}.
|
|
|
|
@item
|
|
the 6 debug registers @samp{%db0}, @samp{%db1}, @samp{%db2},
|
|
@samp{%db3}, @samp{%db6}, and @samp{%db7}.
|
|
|
|
@item
|
|
the 2 test registers @samp{%tr6} and @samp{%tr7}.
|
|
|
|
@item
|
|
the 8 floating point register stack @samp{%st} or equivalently
|
|
@samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)},
|
|
@samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}.
|
|
@end itemize
|
|
|
|
@subsection Opcode Prefixes
|
|
Opcode prefixes are used to modify the following opcode. They are used
|
|
to repeat string instructions, to provide segment overrides, to perform
|
|
bus lock operations, and to give operand and address size (16-bit
|
|
operands are specified in an instruction by prefixing what would
|
|
normally be 32-bit operands with a ``operand size'' opcode prefix).
|
|
Opcode prefixes are usually given as single-line instructions with no
|
|
operands, and must directly precede the instruction they act upon. For
|
|
example, the @samp{scas} (scan string) instruction is repeated with:
|
|
@example
|
|
repne
|
|
scas
|
|
@end example
|
|
|
|
Here is a list of opcode prefixes:
|
|
@itemize @bullet
|
|
@item
|
|
Segment override prefixes @samp{cs}, @samp{ds}, @samp{ss}, @samp{es},
|
|
@samp{fs}, @samp{gs}. These are automatically added by specifying
|
|
using the @var{segment}:@var{memory-operand} form for memory references.
|
|
|
|
@item
|
|
Operand/Address size prefixes @samp{data16} and @samp{addr16}
|
|
change 32-bit operands/addresses into 16-bit operands/addresses. Note
|
|
that 16-bit addressing modes (i.e. 8086 and 80286 addressing modes)
|
|
are not supported (yet).
|
|
|
|
@item
|
|
The bus lock prefix @samp{lock} inhibits interrupts during
|
|
execution of the instruction it precedes. (This is only valid with
|
|
certain instructions; see a 80386 manual for details).
|
|
|
|
@item
|
|
The wait for coprocessor prefix @samp{wait} waits for the
|
|
coprocessor to complete the current instruction. This should never be
|
|
needed for the 80386/80387 combination.
|
|
|
|
@item
|
|
The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added
|
|
to string instructions to make them repeat @samp{%ecx} times.
|
|
@end itemize
|
|
|
|
@subsection Memory References
|
|
An Intel syntax indirect memory reference of the form
|
|
@example
|
|
@var{segment}:[@var{base} + @var{index}*@var{scale} + @var{disp}]
|
|
@end example
|
|
is translated into the AT&T syntax
|
|
@example
|
|
@var{segment}:@var{disp}(@var{base}, @var{index}, @var{scale})
|
|
@end example
|
|
where @var{base} and @var{index} are the optional 32-bit base and
|
|
index registers, @var{disp} is the optional displacement, and
|
|
@var{scale}, taking the values 1, 2, 4, and 8, multiplies @var{index}
|
|
to calculate the address of the operand. If no @var{scale} is
|
|
specified, @var{scale} is taken to be 1. @var{segment} specifies the
|
|
optional segment register for the memory operand, and may override the
|
|
default segment register (see a 80386 manual for segment register
|
|
defaults). Note that segment overrides in AT&T syntax @emph{must} have
|
|
be preceded by a @samp{%}. If you specify a segment override which
|
|
coincides with the default segment register, @code{as} will @emph{not}
|
|
output any segment register override prefixes to assemble the given
|
|
instruction. Thus, segment overrides can be specified to emphasize which
|
|
segment register is used for a given memory operand.
|
|
|
|
Here are some examples of Intel and AT&T style memory references:
|
|
@table @asis
|
|
|
|
@item AT&T: @samp{-4(%ebp)}, Intel: @samp{[ebp - 4]}
|
|
@var{base} is @samp{%ebp}; @var{disp} is @samp{-4}. @var{segment} is
|
|
missing, and the default segment is used (@samp{%ss} for addressing with
|
|
@samp{%ebp} as the base register). @var{index}, @var{scale} are both missing.
|
|
|
|
@item AT&T: @samp{foo(,%eax,4)}, Intel: @samp{[foo + eax*4]}
|
|
@var{index} is @samp{%eax} (scaled by a @var{scale} 4); @var{disp} is
|
|
@samp{foo}. All other fields are missing. The segment register here
|
|
defaults to @samp{%ds}.
|
|
|
|
@item AT&T: @samp{foo(,1)}; Intel @samp{[foo]}
|
|
This uses the value pointed to by @samp{foo} as a memory operand.
|
|
Note that @var{base} and @var{index} are both missing, but there is only
|
|
@emph{one} @samp{,}. This is a syntactic exception.
|
|
|
|
@item AT&T: @samp{%gs:foo}; Intel @samp{gs:foo}
|
|
This selects the contents of the variable @samp{foo} with segment
|
|
register @var{segment} being @samp{%gs}.
|
|
|
|
@end table
|
|
|
|
Absolute (as opposed to PC relative) call and jump operands must be
|
|
prefixed with @samp{*}. If no @samp{*} is specified, @code{as} will
|
|
always choose PC relative addressing for jump/call labels.
|
|
|
|
Any instruction that has a memory operand @emph{must} specify its size (byte,
|
|
word, or long) with an opcode suffix (@samp{b}, @samp{w}, or @samp{l},
|
|
respectively).
|
|
|
|
@subsection Handling of Jump Instructions
|
|
Jump instructions are always optimized to use the smallest possible
|
|
displacements. This is accomplished by using byte (8-bit) displacement
|
|
jumps whenever the target is sufficiently close. If a byte displacement
|
|
is insufficient a long (32-bit) displacement is used. We do not support
|
|
word (16-bit) displacement jumps (i.e. prefixing the jump instruction
|
|
with the @samp{addr16} opcode prefix), since the 80386 insists upon masking
|
|
@samp{%eip} to 16 bits after the word displacement is added.
|
|
|
|
Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz},
|
|
@samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in
|
|
byte displacements, so that it is possible that use of these
|
|
instructions (@code{GCC} does not use them) will cause the assembler to
|
|
print an error message (and generate incorrect code). The AT&T 80386
|
|
assembler tries to get around this problem by expanding @samp{jcxz foo} to
|
|
@example
|
|
jcxz cx_zero
|
|
jmp cx_nonzero
|
|
cx_zero: jmp foo
|
|
cx_nonzero:
|
|
@end example
|
|
|
|
@subsection Floating Point
|
|
All 80387 floating point types except packed BCD are supported.
|
|
(BCD support may be added without much difficulty). These data
|
|
types are 16-, 32-, and 64- bit integers, and single (32-bit),
|
|
double (64-bit), and extended (80-bit) precision floating point.
|
|
Each supported type has an opcode suffix and a constructor
|
|
associated with it. Opcode suffixes specify operand's data
|
|
types. Constructors build these data types into memory.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Floating point constructors are @samp{.float} or @samp{.single},
|
|
@samp{.double}, and @samp{.tfloat} for 32-, 64-, and 80-bit formats.
|
|
These correspond to opcode suffixes @samp{s}, @samp{l}, and @samp{t}.
|
|
@samp{t} stands for temporary real, and that the 80387 only supports
|
|
this format via the @samp{fldt} (load temporary real to stack top) and
|
|
@samp{fstpt} (store temporary real and pop stack) instructions.
|
|
|
|
@item
|
|
Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and
|
|
@samp{.quad} for the 16-, 32-, and 64-bit integer formats. The corresponding
|
|
opcode suffixes are @samp{s} (single), @samp{l} (long), and @samp{q}
|
|
(quad). As with the temporary real format the 64-bit @samp{q} format is
|
|
only present in the @samp{fildq} (load quad integer to stack top) and
|
|
@samp{fistpq} (store quad integer and pop stack) instructions.
|
|
@end itemize
|
|
|
|
Register to register operations do not require opcode suffixes,
|
|
so that @samp{fst %st, %st(1)} is equivalent to @samp{fstl %st, %st(1)}.
|
|
|
|
Since the 80387 automatically synchronizes with the 80386 @samp{fwait}
|
|
instructions are almost never needed (this is not the case for the
|
|
80286/80287 and 8086/8087 combinations). Therefore, @code{as} supresses
|
|
the @samp{fwait} instruction whenever it is implicitly selected by one
|
|
of the @samp{fn@dots{}} instructions. For example, @samp{fsave} and
|
|
@samp{fnsave} are treated identically. In general, all the @samp{fn@dots{}}
|
|
instructions are made equivalent to @samp{f@dots{}} instructions. If
|
|
@samp{fwait} is desired it must be explicitly coded.
|
|
|
|
@subsection Notes
|
|
There is some trickery concerning the @samp{mul} and @samp{imul}
|
|
instructions that deserves mention. The 16-, 32-, and 64-bit expanding
|
|
multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5
|
|
for @samp{imul}) can be output only in the one operand form. Thus,
|
|
@samp{imul %ebx, %eax} does @emph{not} select the expanding multiply;
|
|
the expanding multiply would clobber the @samp{%edx} register, and this
|
|
would confuse @code{GCC} output. Use @samp{imul %ebx} to get the
|
|
64-bit product in @samp{%edx:%eax}.
|
|
|
|
We have added a two operand form of @samp{imul} when the first operand
|
|
is an immediate mode expression and the second operand is a register.
|
|
This is just a shorthand, so that, multiplying @samp{%eax} by 69, for
|
|
example, can be done with @samp{imul $69, %eax} rather than @samp{imul
|
|
$69, %eax, %eax}.
|
|
@end ignore
|
|
@c pesch@cygnus.com: we also ignore the following chapters, but for
|
|
@c a different reason---internals are changing
|
|
@c rapidly. These may need to be moved to another
|
|
@c book anyhow, if we adopt the model of user/modifier
|
|
@c books.
|
|
@ignore
|
|
@node Maintenance, Retargeting, Machine Dependent, top
|
|
@chapter Maintaining the Assembler
|
|
[[this chapter is still being built]]
|
|
|
|
@section Design
|
|
We had these goals, in descending priority:
|
|
@table @b
|
|
@item Accuracy.
|
|
For every program composed by a compiler, @code{as} should emit
|
|
``correct'' code. This leaves some latitude in choosing addressing
|
|
modes, order of @code{relocation_info} structures in the object
|
|
file, @emph{etc}.
|
|
|
|
@item Speed, for usual case.
|
|
By far the most common use of @code{as} will be assembling compiler
|
|
emissions.
|
|
|
|
@item Upward compatibility for existing assembler code.
|
|
Well @dots{} we don't support Vax bit fields but everything else
|
|
seems to be upward compatible.
|
|
|
|
@item Readability.
|
|
The code should be maintainable with few surprises. (JF: ha!)
|
|
|
|
@end table
|
|
|
|
We assumed that disk I/O was slow and expensive while memory was
|
|
fast and access to memory was cheap. We expect the in-memory data
|
|
structures to be less than 10 times the size of the emitted object
|
|
file. (Contrast this with the C compiler where in-memory structures
|
|
might be 100 times object file size!)
|
|
This suggests:
|
|
@itemize @bullet
|
|
@item
|
|
Try to read the source file from disk only one time. For other
|
|
reasons, we keep large chunks of the source file in memory during
|
|
assembly so this is not a problem. Also the assembly algorithm
|
|
should only scan the source text once if the compiler composed the
|
|
text according to a few simple rules.
|
|
@item
|
|
Emit the object code bytes only once. Don't store values and then
|
|
backpatch later.
|
|
@item
|
|
Build the object file in memory and do direct writes to disk of
|
|
large buffers.
|
|
@end itemize
|
|
|
|
RMS suggested a one-pass algorithm which seems to work well. By not
|
|
parsing text during a second pass considerable time is saved on
|
|
large programs (@emph{e.g.} the sort of C program @code{yacc} would
|
|
emit).
|
|
|
|
It happened that the data structures needed to emit relocation
|
|
information to the object file were neatly subsumed into the data
|
|
structures that do backpatching of addresses after pass 1.
|
|
|
|
Many of the functions began life as re-usable modules, loosely
|
|
connected. RMS changed this to gain speed. For example, input
|
|
parsing routines which used to work on pre-sanitized strings now
|
|
must parse raw data. Hence they have to import knowledge of the
|
|
assemblers' comment conventions @emph{etc}.
|
|
|
|
@section Deprecated Feature(?)s
|
|
We have stopped supporting some features:
|
|
@itemize @bullet
|
|
@item
|
|
@code{.org} statements must have @b{defined} expressions.
|
|
@item
|
|
Vax Bit fields (@kbd{:} operator) are entirely unsupported.
|
|
@end itemize
|
|
|
|
It might be a good idea to not support these features in a future release:
|
|
@itemize @bullet
|
|
@item
|
|
@kbd{#} should begin a comment, even in column 1.
|
|
@item
|
|
Why support the logical line & file concept any more?
|
|
@item
|
|
Subsegments are a good candidate for flushing.
|
|
Depends on which compilers need them I guess.
|
|
@end itemize
|
|
|
|
@section Bugs, Ideas, Further Work
|
|
Clearly the major improvement is DON'T USE A TEXT-READING
|
|
ASSEMBLER for the back end of a compiler. It is much faster to
|
|
interpret binary gobbledygook from a compiler's tables than to
|
|
ask the compiler to write out human-readable code just so the
|
|
assembler can parse it back to binary.
|
|
|
|
Assuming you use @code{as} for human written programs: here are
|
|
some ideas:
|
|
@itemize @bullet
|
|
@item
|
|
Document (here) @code{APP}.
|
|
@item
|
|
Take advantage of knowing no spaces except after opcode
|
|
to speed up @code{as}. (Modify @code{app.c} to flush useless spaces:
|
|
only keep space/tabs at begin of line or between 2
|
|
symbols.)
|
|
@item
|
|
Put pointers in this documentation to @file{a.out} documentation.
|
|
@item
|
|
Split the assembler into parts so it can gobble direct binary
|
|
from @emph{e.g.} @code{cc}. It is silly for@code{cc} to compose text
|
|
just so @code{as} can parse it back to binary.
|
|
@item
|
|
Rewrite hash functions: I want a more modular, faster library.
|
|
@item
|
|
Clean up LOTS of code.
|
|
@item
|
|
Include all the non-@file{.c} files in the maintenance chapter.
|
|
@item
|
|
Document flonums.
|
|
@item
|
|
Implement flonum short literals.
|
|
@item
|
|
Change all talk of expression operands to expression quantities,
|
|
or perhaps to expression arguments.
|
|
@item
|
|
Implement pass 2.
|
|
@item
|
|
Whenever a @code{.text} or @code{.data} statement is seen, we close
|
|
of the current frag with an imaginary @code{.fill 0}. This is
|
|
because we only have one obstack for frags, and we can't grow new
|
|
frags for a new subsegment, then go back to the old subsegment and
|
|
append bytes to the old frag. All this nonsense goes away if we
|
|
give each subsegment its own obstack. It makes code simpler in
|
|
about 10 places, but nobody has bothered to do it because C compiler
|
|
output rarely changes subsegments (compared to ending frags with
|
|
relaxable addresses, which is common).
|
|
@end itemize
|
|
|
|
@section Sources
|
|
@c The following files in the @file{as} directory
|
|
@c are symbolic links to other files, of
|
|
@c the same name, in a different directory.
|
|
@c @itemize @bullet
|
|
@c @item
|
|
@c @file{atof_generic.c}
|
|
@c @item
|
|
@c @file{atof_vax.c}
|
|
@c @item
|
|
@c @file{flonum_const.c}
|
|
@c @item
|
|
@c @file{flonum_copy.c}
|
|
@c @item
|
|
@c @file{flonum_get.c}
|
|
@c @item
|
|
@c @file{flonum_multip.c}
|
|
@c @item
|
|
@c @file{flonum_normal.c}
|
|
@c @item
|
|
@c @file{flonum_print.c}
|
|
@c @end itemize
|
|
|
|
Here is a list of the source files in the @file{as} directory.
|
|
|
|
@table @file
|
|
@item app.c
|
|
This contains the pre-processing phase, which deletes comments,
|
|
handles whitespace, etc. This was recently re-written, since app
|
|
used to be a separate program, but RMS wanted it to be inline.
|
|
|
|
@item append.c
|
|
This is a subroutine to append a string to another string returning a
|
|
pointer just after the last @code{char} appended. (JF: All these
|
|
little routines should probably all be put in one file.)
|
|
|
|
@item as.c
|
|
Here you will find the main program of the assembler @code{as}.
|
|
|
|
@item expr.c
|
|
This is a branch office of @file{read.c}. This understands
|
|
expressions, arguments. Inside @code{as}, arguments are called
|
|
(expression) @emph{operands}. This is confusing, because we also talk
|
|
(elsewhere) about instruction @emph{operands}. Also, expression
|
|
operands are called @emph{quantities} explicitly to avoid confusion
|
|
with instruction operands. What a mess.
|
|
|
|
@item frags.c
|
|
This implements the @b{frag} concept. Without frags, finding the
|
|
right size for branch instructions would be a lot harder.
|
|
|
|
@item hash.c
|
|
This contains the symbol table, opcode table @emph{etc.} hashing
|
|
functions.
|
|
|
|
@item hex_value.c
|
|
This is a table of values of digits, for use in atoi() type
|
|
functions. Could probably be flushed by using calls to strtol(), or
|
|
something similar.
|
|
|
|
@item input-file.c
|
|
This contains Operating system dependent source file reading
|
|
routines. Since error messages often say where we are in reading
|
|
the source file, they live here too. Since @code{as} is intended to
|
|
run under GNU and Unix only, this might be worth flushing. Anyway,
|
|
almost all C compilers support stdio.
|
|
|
|
@item input-scrub.c
|
|
This deals with calling the pre-processor (if needed) and feeding the
|
|
chunks back to the rest of the assembler the right way.
|
|
|
|
@item messages.c
|
|
This contains operating system independent parts of fatal and
|
|
warning message reporting. See @file{append.c} above.
|
|
|
|
@item output-file.c
|
|
This contains operating system dependent functions that write an
|
|
object file for @code{as}. See @file{input-file.c} above.
|
|
|
|
@item read.c
|
|
This implements all the directives of @code{as}. This also deals
|
|
with passing input lines to the machine dependent part of the
|
|
assembler.
|
|
|
|
@item strstr.c
|
|
This is a C library function that isn't in most C libraries yet.
|
|
See @file{append.c} above.
|
|
|
|
@item subsegs.c
|
|
This implements subsegments.
|
|
|
|
@item symbols.c
|
|
This implements symbols.
|
|
|
|
@item write.c
|
|
This contains the code to perform relaxation, and to write out
|
|
the object file. It is mostly operating system independent, but
|
|
different OSes have different object file formats in any case.
|
|
|
|
@item xmalloc.c
|
|
This implements @code{malloc()} or bust. See @file{append.c} above.
|
|
|
|
@item xrealloc.c
|
|
This implements @code{realloc()} or bust. See @file{append.c} above.
|
|
|
|
@item atof-generic.c
|
|
The following files were taken from a machine-independent subroutine
|
|
library for manipulating floating point numbers and very large
|
|
integers.
|
|
|
|
@file{atof-generic.c} turns a string into a flonum internal format
|
|
floating-point number.
|
|
|
|
@item flonum-const.c
|
|
This contains some potentially useful floating point numbers in
|
|
flonum format.
|
|
|
|
@item flonum-copy.c
|
|
This copies a flonum.
|
|
|
|
@item flonum-multip.c
|
|
This multiplies two flonums together.
|
|
|
|
@item bignum-copy.c
|
|
This copies a bignum.
|
|
|
|
@end table
|
|
|
|
Here is a table of all the machine-specific files (this includes
|
|
both source and header files). Typically, there is a
|
|
@var{machine}.c file, a @var{machine}-opcode.h file, and an
|
|
atof-@var{machine}.c file. The @var{machine}-opcode.h file should
|
|
be identical to the one used by GDB (which uses it for disassembly.)
|
|
|
|
@table @file
|
|
|
|
@item atof-ieee.c
|
|
This contains code to turn a flonum into a ieee literal constant.
|
|
This is used by tye 680x0, 32x32, sparc, and i386 versions of @code{as}.
|
|
|
|
@item i386-opcode.h
|
|
This is the opcode-table for the i386 version of the assembler.
|
|
|
|
@item i386.c
|
|
This contains all the code for the i386 version of the assembler.
|
|
|
|
@item i386.h
|
|
This defines constants and macros used by the i386 version of the assembler.
|
|
|
|
@item m-generic.h
|
|
generic 68020 header file. To be linked to m68k.h on a
|
|
non-sun3, non-hpux system.
|
|
|
|
@item m-sun2.h
|
|
68010 header file for Sun2 workstations. Not well tested. To be linked
|
|
to m68k.h on a sun2. (See also @samp{-DSUN_ASM_SYNTAX} in the
|
|
@file{Makefile}.)
|
|
|
|
@item m-sun3.h
|
|
68020 header file for Sun3 workstations. To be linked to m68k.h before
|
|
compiling on a Sun3 system. (See also @samp{-DSUN_ASM_SYNTAX} in the
|
|
@file{Makefile}.)
|
|
|
|
@item m-hpux.h
|
|
68020 header file for a HPUX (system 5?) box. Which box, which
|
|
version of HPUX, etc? I don't know.
|
|
|
|
@item m68k.h
|
|
A hard- or symbolic- link to one of @file{m-generic.h},
|
|
@file{m-hpux.h} or @file{m-sun3.h} depending on which kind of
|
|
680x0 you are assembling for. (See also @samp{-DSUN_ASM_SYNTAX} in the
|
|
@file{Makefile}.)
|
|
|
|
@item m68k-opcode.h
|
|
Opcode table for 68020. This is now a link to the opcode table
|
|
in the @code{GDB} source directory.
|
|
|
|
@item m68k.c
|
|
All the mc680x0 code, in one huge, slow-to-compile file.
|
|
|
|
@item ns32k.c
|
|
This contains the code for the ns32032/ns32532 version of the
|
|
assembler.
|
|
|
|
@item ns32k-opcode.h
|
|
This contains the opcode table for the ns32032/ns32532 version
|
|
of the assembler.
|
|
|
|
@item vax-inst.h
|
|
Vax specific file for describing Vax operands and other Vax-ish things.
|
|
|
|
@item vax-opcode.h
|
|
Vax opcode table.
|
|
|
|
@item vax.c
|
|
Vax specific parts of @code{as}. Also includes the former files
|
|
@file{vax-ins-parse.c}, @file{vax-reg-parse.c} and @file{vip-op.c}.
|
|
|
|
@item atof-vax.c
|
|
Turns a flonum into a Vax constant.
|
|
|
|
@item vms.c
|
|
This file contains the special code needed to put out a VMS
|
|
style object file for the Vax.
|
|
|
|
@end table
|
|
|
|
Here is a list of the header files in the source directory.
|
|
(Warning: This section may not be very accurate. I didn't
|
|
write the header files; I just report them.) Also note that I
|
|
think many of these header files could be cleaned up or
|
|
eliminated.
|
|
|
|
@table @file
|
|
|
|
@item a.out.h
|
|
This describes the structures used to create the binary header data
|
|
inside the object file. Perhaps we should use the one in
|
|
@file{/usr/include}?
|
|
|
|
@item as.h
|
|
This defines all the globally useful things, and pulls in <stdio.h>
|
|
and <assert.h>.
|
|
|
|
@item bignum.h
|
|
This defines macros useful for dealing with bignums.
|
|
|
|
@item expr.h
|
|
Structure and macros for dealing with expression()
|
|
|
|
@item flonum.h
|
|
This defines the structure for dealing with floating point
|
|
numbers. It #includes @file{bignum.h}.
|
|
|
|
@item frags.h
|
|
This contains macro for appending a byte to the current frag.
|
|
|
|
@item hash.h
|
|
Structures and function definitions for the hashing functions.
|
|
|
|
@item input-file.h
|
|
Function headers for the input-file.c functions.
|
|
|
|
@item md.h
|
|
structures and function headers for things defined in the
|
|
machine dependent part of the assembler.
|
|
|
|
@item obstack.h
|
|
This is the GNU systemwide include file for manipulating obstacks.
|
|
Since nobody is running under real GNU yet, we include this file.
|
|
|
|
@item read.h
|
|
Macros and function headers for reading in source files.
|
|
|
|
@item struct-symbol.h
|
|
Structure definition and macros for dealing with the gas
|
|
internal form of a symbol.
|
|
|
|
@item subsegs.h
|
|
structure definition for dealing with the numbered subsegments
|
|
of the text and data segments.
|
|
|
|
@item symbols.h
|
|
Macros and function headers for dealing with symbols.
|
|
|
|
@item write.h
|
|
Structure for doing segment fixups.
|
|
@end table
|
|
|
|
@comment ~subsection Test Directory
|
|
@comment (Note: The test directory seems to have disappeared somewhere
|
|
@comment along the line. If you want it, you'll probably have to find a
|
|
@comment REALLY OLD dump tape~dots{})
|
|
@comment
|
|
@comment The ~file{test/} directory is used for regression testing.
|
|
@comment After you modify ~code{as}, you can get a quick go/nogo
|
|
@comment confidence test by running the new ~code{as} over the source
|
|
@comment files in this directory. You use a shell script ~file{test/do}.
|
|
@comment
|
|
@comment The tests in this suite are evolving. They are not comprehensive.
|
|
@comment They have, however, caught hundreds of bugs early in the debugging
|
|
@comment cycle of ~code{as}. Most test statements in this suite were naturally
|
|
@comment selected: they were used to demonstrate actual ~code{as} bugs rather
|
|
@comment than being written ~i{a prioi}.
|
|
@comment
|
|
@comment Another testing suggestion: over 30 bugs have been found simply by
|
|
@comment running examples from this manual through ~code{as}.
|
|
@comment Some examples in this manual are selected
|
|
@comment to distinguish boundary conditions; they are good for testing ~code{as}.
|
|
@comment
|
|
@comment ~subsubsection Regression Testing
|
|
@comment Each regression test involves assembling a file and comparing the
|
|
@comment actual output of ~code{as} to ``known good'' output files. Both
|
|
@comment the object file and the error/warning message file (stderr) are
|
|
@comment inspected. Optionally ~code{as}' exit status may be checked.
|
|
@comment Discrepencies are reported. Each discrepency means either that
|
|
@comment you broke some part of ~code{as} or that the ``known good'' files
|
|
@comment are now out of date and should be changed to reflect the new
|
|
@comment definition of ``good''.
|
|
@comment
|
|
@comment Each regression test lives in its own directory, in a tree
|
|
@comment rooted in the directory ~file{test/}. Each such directory
|
|
@comment has a name ending in ~file{.ret}, where `ret' stands for
|
|
@comment REgression Test. The ~file{.ret} ending allows ~code{find
|
|
@comment (1)} to find all regression tests in the tree, without
|
|
@comment needing to list them explicitly.
|
|
@comment
|
|
@comment Any ~file{.ret} directory must contain a file called
|
|
@comment ~file{input} which is the source file to assemble. During
|
|
@comment testing an object file ~file{output} is created, as well as
|
|
@comment a file ~file{stdouterr} which contains the output to both
|
|
@comment stderr and stderr. If there is a file ~file{output.good} in
|
|
@comment the directory, and if ~file{output} contains exactly the
|
|
@comment same data as ~file{output.good}, the file ~file{output} is
|
|
@comment deleted. Likewise ~file{stdouterr} is removed if it exactly
|
|
@comment matches a file ~file{stdouterr.good}. If file
|
|
@comment ~file{status.good} is present, containing a decimal number
|
|
@comment before a newline, the exit status of ~code{as} is compared
|
|
@comment to this number. If the status numbers are not equal, a file
|
|
@comment ~file{status} is written to the directory, containing the
|
|
@comment actual status as a decimal number followed by newline.
|
|
@comment
|
|
@comment Should any of the ~file{*.good} files fail to match their corresponding
|
|
@comment actual files, this is noted by a 1-line message on the screen during
|
|
@comment the regression test, and you can use ~code{find (1)} to find any
|
|
@comment files named ~file{status}, ~file {output} or ~file{stdouterr}.
|
|
@comment
|
|
@node Retargeting, License, Maintenance, top
|
|
@chapter Teaching the Assembler about a New Machine
|
|
|
|
This chapter describes the steps required in order to make the
|
|
assembler work with another machine's assembly language. This
|
|
chapter is not complete, and only describes the steps in the
|
|
broadest terms. You should look at the source for the
|
|
currently supported machine in order to discover some of the
|
|
details that aren't mentioned here.
|
|
|
|
You should create a new file called @file{@var{machine}.c}, and
|
|
add the appropriate lines to the file @file{Makefile} so that
|
|
you can compile your new version of the assembler. This should
|
|
be straighforward; simply add lines similar to the ones there
|
|
for the four current versions of the assembler.
|
|
|
|
If you want to be compatible with GDB, (and the current
|
|
machine-dependent versions of the assembler), you should create
|
|
a file called @file{@var{machine}-opcode.h} which should
|
|
contain all the information about the names of the machine
|
|
instructions, their opcodes, and what addressing modes they
|
|
support. If you do this right, the assembler and GDB can share
|
|
this file, and you'll only have to write it once. Note that
|
|
while you're writing @code{as}, you may want to use an
|
|
independent program (if you have access to one), to make sure
|
|
that @code{as} is emitting the correct bytes. Since @code{as}
|
|
and @code{GDB} share the opcode table, an incorrect opcode
|
|
table entry may make invalid bytes look OK when you disassemble
|
|
them with @code{GDB}.
|
|
|
|
@section Functions You will Have to Write
|
|
|
|
Your file @file{@var{machine}.c} should contain definitions for
|
|
the following functions and variables. It will need to include
|
|
some header files in order to use some of the structures
|
|
defined in the machine-independent part of the assembler. The
|
|
needed header files are mentioned in the descriptions of the
|
|
functions that will need them.
|
|
|
|
@table @code
|
|
|
|
@item long omagic;
|
|
This long integer holds the value to place at the beginning of
|
|
the @file{a.out} file. It is usually @samp{OMAGIC}, except on
|
|
machines that store additional information in the magic-number.
|
|
|
|
@item char comment_chars[];
|
|
This character array holds the values of the characters that
|
|
start a comment anywhere in a line. Comments are stripped off
|
|
automatically by the machine independent part of the
|
|
assembler. Note that the @samp{/*} will always start a
|
|
comment, and that only @samp{*/} will end a comment started by
|
|
@samp{*/}.
|
|
|
|
@item char line_comment_chars[];
|
|
This character array holds the values of the chars that start a
|
|
comment only if they are the first (non-whitespace) character
|
|
on a line. If the character @samp{#} does not appear in this
|
|
list, you may get unexpected results. (Various
|
|
machine-independent parts of the assembler treat the comments
|
|
@samp{#APP} and @samp{#NO_APP} specially, and assume that lines
|
|
that start with @samp{#} are comments.)
|
|
|
|
@item char EXP_CHARS[];
|
|
This character array holds the letters that can separate the
|
|
mantissa and the exponent of a floating point number. Typical
|
|
values are @samp{e} and @samp{E}.
|
|
|
|
@item char FLT_CHARS[];
|
|
This character array holds the letters that--when they appear
|
|
immediately after a leading zero--indicate that a number is a
|
|
floating-point number. (Sort of how 0x indicates that a
|
|
hexadecimal number follows.)
|
|
|
|
@item pseudo_typeS md_pseudo_table[];
|
|
(@var{pseudo_typeS} is defined in @file{md.h})
|
|
This array contains a list of the machine_dependent directives
|
|
the assembler must support. It contains the name of each
|
|
pseudo op (Without the leading @samp{.}), a pointer to a
|
|
function to be called when that directive is encountered, and
|
|
an integer argument to be passed to that function.
|
|
|
|
@item void md_begin(void)
|
|
This function is called as part of the assembler's
|
|
initialization. It should do any initialization required by
|
|
any of your other routines.
|
|
|
|
@item int md_parse_option(char **optionPTR, int *argcPTR, char ***argvPTR)
|
|
This routine is called once for each option on the command line
|
|
that the machine-independent part of @code{as} does not
|
|
understand. This function should return non-zero if the option
|
|
pointed to by @var{optionPTR} is a valid option. If it is not
|
|
a valid option, this routine should return zero. The variables
|
|
@var{argcPTR} and @var{argvPTR} are provided in case the option
|
|
requires a filename or something similar as an argument. If
|
|
the option is multi-character, @var{optionPTR} should be
|
|
advanced past the end of the option, otherwise every letter in
|
|
the option will be treated as a separate single-character
|
|
option.
|
|
|
|
@item void md_assemble(char *string)
|
|
This routine is called for every machine-dependent
|
|
non-directive line in the source file. It does all the real
|
|
work involved in reading the opcode, parsing the operands,
|
|
etc. @var{string} is a pointer to a null-terminated string,
|
|
that comprises the input line, with all excess whitespace and
|
|
comments removed.
|
|
|
|
@item void md_number_to_chars(char *outputPTR,long value,int nbytes)
|
|
This routine is called to turn a C long int, short int, or char
|
|
into the series of bytes that represents that number on the
|
|
target machine. @var{outputPTR} points to an array where the
|
|
result should be stored; @var{value} is the value to store; and
|
|
@var{nbytes} is the number of bytes in 'value' that should be
|
|
stored.
|
|
|
|
@item void md_number_to_imm(char *outputPTR,long value,int nbytes)
|
|
This routine is called to turn a C long int, short int, or char
|
|
into the series of bytes that represent an immediate value on
|
|
the target machine. It is identical to the function @code{md_number_to_chars},
|
|
except on NS32K machines.@refill
|
|
|
|
@item void md_number_to_disp(char *outputPTR,long value,int nbytes)
|
|
This routine is called to turn a C long int, short int, or char
|
|
into the series of bytes that represent an displacement value on
|
|
the target machine. It is identical to the function @code{md_number_to_chars},
|
|
except on NS32K machines.@refill
|
|
|
|
@item void md_number_to_field(char *outputPTR,long value,int nbytes)
|
|
This routine is identical to @code{md_number_to_chars},
|
|
except on NS32K machines.
|
|
|
|
@item void md_ri_to_chars(struct relocation_info *riPTR,ri)
|
|
(@code{struct relocation_info} is defined in @file{a.out.h})
|
|
This routine emits the relocation info in @var{ri}
|
|
in the appropriate bit-pattern for the target machine.
|
|
The result should be stored in the location pointed
|
|
to by @var{riPTR}. This routine may be a no-op unless you are
|
|
attempting to do cross-assembly.
|
|
|
|
@item char *md_atof(char type,char *outputPTR,int *sizePTR)
|
|
This routine turns a series of digits into the appropriate
|
|
internal representation for a floating-point number.
|
|
@var{type} is a character from @var{FLT_CHARS[]} that describes
|
|
what kind of floating point number is wanted; @var{outputPTR}
|
|
is a pointer to an array that the result should be stored in;
|
|
and @var{sizePTR} is a pointer to an integer where the size (in
|
|
bytes) of the result should be stored. This routine should
|
|
return an error message, or an empty string (not (char *)0) for
|
|
success.
|
|
|
|
@item int md_short_jump_size;
|
|
This variable holds the (maximum) size in bytes of a short (16
|
|
bit or so) jump created by @code{md_create_short_jump()}. This
|
|
variable is used as part of the broken-word feature, and isn't
|
|
needed if the assembler is compiled with
|
|
@samp{-DWORKING_DOT_WORD}.
|
|
|
|
@item int md_long_jump_size;
|
|
This variable holds the (maximum) size in bytes of a long (32
|
|
bit or so) jump created by @code{md_create_long_jump()}. This
|
|
variable is used as part of the broken-word feature, and isn't
|
|
needed if the assembler is compiled with
|
|
@samp{-DWORKING_DOT_WORD}.
|
|
|
|
@item void md_create_short_jump(char *resultPTR,long from_addr,
|
|
@code{long to_addr,fragS *frag,symbolS *to_symbol)}
|
|
This function emits a jump from @var{from_addr} to @var{to_addr} in
|
|
the array of bytes pointed to by @var{resultPTR}. If this creates a
|
|
type of jump that must be relocated, this function should call
|
|
@code{fix_new()} with @var{frag} and @var{to_symbol}. The jump
|
|
emitted by this function may be smaller than @var{md_short_jump_size},
|
|
but it must never create a larger one.
|
|
(If it creates a smaller jump, the extra bytes of memory will not be
|
|
used.) This function is used as part of the broken-word feature,
|
|
and isn't needed if the assembler is compiled with
|
|
@samp{-DWORKING_DOT_WORD}.@refill
|
|
|
|
@item void md_create_long_jump(char *ptr,long from_addr,
|
|
@code{long to_addr,fragS *frag,symbolS *to_symbol)}
|
|
This function is similar to the previous function,
|
|
@code{md_create_short_jump()}, except that it creates a long
|
|
jump instead of a short one. This function is used as part of
|
|
the broken-word feature, and isn't needed if the assembler is
|
|
compiled with @samp{-DWORKING_DOT_WORD}.
|
|
|
|
@item int md_estimate_size_before_relax(fragS *fragPTR,int segment_type)
|
|
This function does the initial setting up for relaxation. This
|
|
includes forcing references to still-undefined symbols to the
|
|
appropriate addressing modes.
|
|
|
|
@item relax_typeS md_relax_table[];
|
|
(relax_typeS is defined in md.h)
|
|
This array describes the various machine dependent states a
|
|
frag may be in before relaxation. You will need one group of
|
|
entries for each type of addressing mode you intend to relax.
|
|
|
|
@item void md_convert_frag(fragS *fragPTR)
|
|
(@var{fragS} is defined in @file{as.h})
|
|
This routine does the required cleanup after relaxation.
|
|
Relaxation has changed the type of the frag to a type that can
|
|
reach its destination. This function should adjust the opcode
|
|
of the frag to use the appropriate addressing mode.
|
|
@var{fragPTR} points to the frag to clean up.
|
|
|
|
@item void md_end(void)
|
|
This function is called just before the assembler exits. It
|
|
need not free up memory unless the operating system doesn't do
|
|
it automatically on exit. (In which case you'll also have to
|
|
track down all the other places where the assembler allocates
|
|
space but never frees it.)
|
|
|
|
@end table
|
|
|
|
@section External Variables You will Need to Use
|
|
|
|
You will need to refer to or change the following external variables
|
|
from within the machine-dependent part of the assembler.
|
|
|
|
@table @code
|
|
@item extern char flagseen[];
|
|
This array holds non-zero values in locations corresponding to
|
|
the options that were on the command line. Thus, if the
|
|
assembler was called with @samp{-W}, @var{flagseen['W']} would
|
|
be non-zero.
|
|
|
|
@item extern fragS *frag_now;
|
|
This pointer points to the current frag--the frag that bytes
|
|
are currently being added to. If nothing else, you will need
|
|
to pass it as an argument to various machine-independent
|
|
functions. It is maintained automatically by the
|
|
frag-manipulating functions; you should never have to change it
|
|
yourself.
|
|
|
|
@item extern LITTLENUM_TYPE generic_bignum[];
|
|
(@var{LITTLENUM_TYPE} is defined in @file{bignum.h}.
|
|
This is where @dfn{bignums}--numbers larger than 32 bits--are
|
|
returned when they are encountered in an expression. You will
|
|
need to use this if you need to implement directives (or
|
|
anything else) that must deal with these large numbers.
|
|
@code{Bignums} are of @code{segT} @code{SEG_BIG} (defined in
|
|
@file{as.h}, and have a positive @code{X_add_number}. The
|
|
@code{X_add_number} of a @code{bignum} is the number of
|
|
@code{LITTLENUMS} in @var{generic_bignum} that the number takes
|
|
up.
|
|
|
|
@item extern FLONUM_TYPE generic_floating_point_number;
|
|
(@var{FLONUM_TYPE} is defined in @file{flonum.h}.
|
|
The is where @dfn{flonums}--floating-point numbers within
|
|
expressions--are returned. @code{Flonums} are of @code{segT}
|
|
@code{SEG_BIG}, and have a negative @code{X_add_number}.
|
|
@code{Flonums} are returned in a generic format. You will have
|
|
to write a routine to turn this generic format into the
|
|
appropriate floating-point format for your machine.
|
|
|
|
@item extern int need_pass_2;
|
|
If this variable is non-zero, the assembler has encountered an
|
|
expression that cannot be assembled in a single pass. Since
|
|
the second pass isn't implemented, this flag means that the
|
|
assembler is punting, and is only looking for additional syntax
|
|
errors. (Or something like that.)
|
|
|
|
@item extern segT now_seg;
|
|
This variable holds the value of the segment the assembler is
|
|
currently assembling into.
|
|
|
|
@end table
|
|
|
|
@section External functions will you need
|
|
|
|
You will find the following external functions useful (or
|
|
indispensable) when you're writing the machine-dependent part
|
|
of the assembler.
|
|
|
|
@table @code
|
|
|
|
@item char *frag_more(int bytes)
|
|
This function allocates @var{bytes} more bytes in the current
|
|
frag (or starts a new frag, if it can't expand the current frag
|
|
any more.) for you to store some object-file bytes in. It
|
|
returns a pointer to the bytes, ready for you to store data in.
|
|
|
|
@item void fix_new(fragS *frag, int where, short size, symbolS *add_symbol, symbolS *sub_symbol, long offset, int pcrel)
|
|
This function stores a relocation fixup to be acted on later.
|
|
@var{frag} points to the frag the relocation belongs in;
|
|
@var{where} is the location within the frag where the relocation begins;
|
|
@var{size} is the size of the relocation, and is usually 1 (a single byte),
|
|
2 (sixteen bits), or 4 (a longword).
|
|
The value @var{add_symbol} @minus{} @var{sub_symbol} + @var{offset}, is added to the byte(s)
|
|
at @var{frag->literal[where]}. If @var{pcrel} is non-zero, the address of the
|
|
location is subtracted from the result. A relocation entry is also added
|
|
to the @file{a.out} file. @var{add_symbol}, @var{sub_symbol}, and/or
|
|
@var{offset} may be NULL.@refill
|
|
|
|
@item char *frag_var(relax_stateT type, int max_chars, int var,
|
|
@code{relax_substateT subtype, symbolS *symbol, char *opcode)}
|
|
This function creates a machine-dependent frag of type @var{type}
|
|
(usually @code{rs_machine_dependent}).
|
|
@var{max_chars} is the maximum size in bytes that the frag may grow by;
|
|
@var{var} is the current size of the variable end of the frag;
|
|
@var{subtype} is the sub-type of the frag. The sub-type is used to index into
|
|
@var{md_relax_table[]} during @code{relaxation}.
|
|
@var{symbol} is the symbol whose value should be used to when relax-ing this frag.
|
|
@var{opcode} points into a byte whose value may have to be modified if the
|
|
addressing mode used by this frag changes. It typically points into the
|
|
@var{fr_literal[]} of the previous frag, and is used to point to a location
|
|
that @code{md_convert_frag()}, may have to change.@refill
|
|
|
|
@item void frag_wane(fragS *fragPTR)
|
|
This function is useful from within @code{md_convert_frag}. It
|
|
changes a frag to type rs_fill, and sets the variable-sized
|
|
piece of the frag to zero. The frag will never change in size
|
|
again.
|
|
|
|
@item segT expression(expressionS *retval)
|
|
(@var{segT} is defined in @file{as.h}; @var{expressionS} is defined in @file{expr.h})
|
|
This function parses the string pointed to by the external char
|
|
pointer @var{input_line_pointer}, and returns the segment-type
|
|
of the expression. It also stores the results in the
|
|
@var{expressionS} pointed to by @var{retval}.
|
|
@var{input_line_pointer} is advanced to point past the end of
|
|
the expression. (@var{input_line_pointer} is used by other
|
|
parts of the assembler. If you modify it, be sure to restore
|
|
it to its original value.)
|
|
|
|
@item as_warn(char *message,@dots{})
|
|
If warning messages are disabled, this function does nothing.
|
|
Otherwise, it prints out the current file name, and the current
|
|
line number, then uses @code{fprintf} to print the
|
|
@var{message} and any arguments it was passed.
|
|
|
|
@item as_bad(char *message,@dots{})
|
|
This function should be called when @code{as} encounters
|
|
conditions that are bad enough that @code{as} should not
|
|
produce an object file, but should continue reading input and
|
|
printing warning and bad error messages.
|
|
|
|
@item as_fatal(char *message,@dots{})
|
|
This function prints out the current file name and line number,
|
|
prints the word @samp{FATAL:}, then uses @code{fprintf} to
|
|
print the @var{message} and any arguments it was passed. Then
|
|
the assembler exits. This function should only be used for
|
|
serious, unrecoverable errors.
|
|
|
|
@item void float_const(int float_type)
|
|
This function reads floating-point constants from the current
|
|
input line, and calls @code{md_atof} to assemble them. It is
|
|
useful as the function to call for the directives
|
|
@samp{.single}, @samp{.double}, @samp{.float}, etc.
|
|
@var{float_type} must be a character from @var{FLT_CHARS}.
|
|
|
|
@item void demand_empty_rest_of_line(void);
|
|
This function can be used by machine-dependent directives to
|
|
make sure the rest of the input line is empty. It prints a
|
|
warning message if there are additional characters on the line.
|
|
|
|
@item long int get_absolute_expression(void)
|
|
This function can be used by machine-dependent directives to
|
|
read an absolute number from the current input line. It
|
|
returns the result. If it isn't given an absolute expression,
|
|
it prints a warning message and returns zero.
|
|
|
|
@end table
|
|
|
|
|
|
@section The concept of Frags
|
|
|
|
This assembler works to optimize the size of certain addressing
|
|
modes. (e.g. branch instructions) This means the size of many
|
|
pieces of object code cannot be determined until after assembly
|
|
is finished. (This means that the addresses of symbols cannot be
|
|
determined until assembly is finished.) In order to do this,
|
|
@code{as} stores the output bytes as @dfn{frags}.
|
|
|
|
Here is the definition of a frag (from @file{as.h})
|
|
@example
|
|
struct frag
|
|
@{
|
|
long int fr_fix;
|
|
long int fr_var;
|
|
relax_stateT fr_type;
|
|
relax_substateT fr_substate;
|
|
unsigned long fr_address;
|
|
long int fr_offset;
|
|
struct symbol *fr_symbol;
|
|
char *fr_opcode;
|
|
struct frag *fr_next;
|
|
char fr_literal[];
|
|
@}
|
|
@end example
|
|
|
|
@table @var
|
|
@item fr_fix
|
|
is the size of the fixed-size piece of the frag.
|
|
|
|
@item fr_var
|
|
is the maximum (?) size of the variable-sized piece of the frag.
|
|
|
|
@item fr_type
|
|
is the type of the frag.
|
|
Current types are:
|
|
rs_fill
|
|
rs_align
|
|
rs_org
|
|
rs_machine_dependent
|
|
|
|
@item fr_substate
|
|
This stores the type of machine-dependent frag this is. (what
|
|
kind of addressing mode is being used, and what size is being
|
|
tried/will fit/etc.
|
|
|
|
@item fr_address
|
|
@var{fr_address} is only valid after relaxation is finished.
|
|
Before relaxation, the only way to store an address is (pointer
|
|
to frag containing the address) plus (offset into the frag).
|
|
|
|
@item fr_offset
|
|
This contains a number, whose meaning depends on the type of
|
|
the frag.
|
|
for machine_dependent frags, this contains the offset from
|
|
fr_symbol that the frag wants to go to. Thus, for branch
|
|
instructions it is usually zero. (unless the instruction was
|
|
@samp{jba foo+12} or something like that.)
|
|
|
|
@item fr_symbol
|
|
for machine_dependent frags, this points to the symbol the frag
|
|
needs to reach.
|
|
|
|
@item fr_opcode
|
|
This points to the location in the frag (or in a previous frag)
|
|
of the opcode for the instruction that caused this to be a frag.
|
|
@var{fr_opcode} is needed if the actual opcode must be changed
|
|
in order to use a different form of the addressing mode.
|
|
(For example, if a conditional branch only comes in size tiny,
|
|
a large-size branch could be implemented by reversing the sense
|
|
of the test, and turning it into a tiny branch over a large jump.
|
|
This would require changing the opcode.)
|
|
|
|
@var{fr_literal} is a variable-size array that contains the
|
|
actual object bytes. A frag consists of a fixed size piece of
|
|
object data, (which may be zero bytes long), followed by a
|
|
piece of object data whose size may not have been determined
|
|
yet. Other information includes the type of the frag (which
|
|
controls how it is relaxed),
|
|
|
|
@item fr_next
|
|
This is the next frag in the singly-linked list. This is
|
|
usually only needed by the machine-independent part of
|
|
@code{as}.
|
|
|
|
@end table
|
|
@end ignore
|
|
|
|
@node License, , Machine Dependent, Top
|
|
@unnumbered GNU GENERAL PUBLIC LICENSE
|
|
@center Version 1, February 1989
|
|
|
|
@display
|
|
Copyright @copyright{} 1989 Free Software Foundation, Inc.
|
|
675 Mass Ave, Cambridge, MA 02139, USA
|
|
|
|
Everyone is permitted to copy and distribute verbatim copies
|
|
of this license document, but changing it is not allowed.
|
|
@end display
|
|
|
|
@unnumberedsec Preamble
|
|
|
|
The license agreements of most software companies try to keep users
|
|
at the mercy of those companies. By contrast, our General Public
|
|
License is intended to guarantee your freedom to share and change free
|
|
software---to make sure the software is free for all its users. The
|
|
General Public License applies to the Free Software Foundation's
|
|
software and to any other program whose authors commit to using it.
|
|
You can use it for your programs, too.
|
|
|
|
When we speak of free software, we are referring to freedom, not
|
|
price. Specifically, the General Public License is designed to make
|
|
sure that you have the freedom to give away or sell copies of free
|
|
software, that you receive source code or can get it if you want it,
|
|
that you can change the software or use pieces of it in new free
|
|
programs; and that you know you can do these things.
|
|
|
|
To protect your rights, we need to make restrictions that forbid
|
|
anyone to deny you these rights or to ask you to surrender the rights.
|
|
These restrictions translate to certain responsibilities for you if you
|
|
distribute copies of the software, or if you modify it.
|
|
|
|
For example, if you distribute copies of a such a program, whether
|
|
gratis or for a fee, you must give the recipients all the rights that
|
|
you have. You must make sure that they, too, receive or can get the
|
|
source code. And you must tell them their rights.
|
|
|
|
We protect your rights with two steps: (1) copyright the software, and
|
|
(2) offer you this license which gives you legal permission to copy,
|
|
distribute and/or modify the software.
|
|
|
|
Also, for each author's protection and ours, we want to make certain
|
|
that everyone understands that there is no warranty for this free
|
|
software. If the software is modified by someone else and passed on, we
|
|
want its recipients to know that what they have is not the original, so
|
|
that any problems introduced by others will not reflect on the original
|
|
authors' reputations.
|
|
|
|
The precise terms and conditions for copying, distribution and
|
|
modification follow.
|
|
|
|
@iftex
|
|
@unnumberedsec TERMS AND CONDITIONS
|
|
@end iftex
|
|
@ifinfo
|
|
@center TERMS AND CONDITIONS
|
|
@end ifinfo
|
|
|
|
@enumerate
|
|
@item
|
|
This License Agreement applies to any program or other work which
|
|
contains a notice placed by the copyright holder saying it may be
|
|
distributed under the terms of this General Public License. The
|
|
``Program'', below, refers to any such program or work, and a ``work based
|
|
on the Program'' means either the Program or any work containing the
|
|
Program or a portion of it, either verbatim or with modifications. Each
|
|
licensee is addressed as ``you''.
|
|
|
|
@item
|
|
You may copy and distribute verbatim copies of the Program's source
|
|
code as you receive it, in any medium, provided that you conspicuously and
|
|
appropriately publish on each copy an appropriate copyright notice and
|
|
disclaimer of warranty; keep intact all the notices that refer to this
|
|
General Public License and to the absence of any warranty; and give any
|
|
other recipients of the Program a copy of this General Public License
|
|
along with the Program. You may charge a fee for the physical act of
|
|
transferring a copy.
|
|
|
|
@item
|
|
You may modify your copy or copies of the Program or any portion of
|
|
it, and copy and distribute such modifications under the terms of Paragraph
|
|
1 above, provided that you also do the following:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
cause the modified files to carry prominent notices stating that
|
|
you changed the files and the date of any change; and
|
|
|
|
@item
|
|
cause the whole of any work that you distribute or publish, that
|
|
in whole or in part contains the Program or any part thereof, either
|
|
with or without modifications, to be licensed at no charge to all
|
|
third parties under the terms of this General Public License (except
|
|
that you may choose to grant warranty protection to some or all
|
|
third parties, at your option).
|
|
|
|
@item
|
|
If the modified program normally reads commands interactively when
|
|
run, you must cause it, when started running for such interactive use
|
|
in the simplest and most usual way, to print or display an
|
|
announcement including an appropriate copyright notice and a notice
|
|
that there is no warranty (or else, saying that you provide a
|
|
warranty) and that users may redistribute the program under these
|
|
conditions, and telling the user how to view a copy of this General
|
|
Public License.
|
|
|
|
@item
|
|
You may charge a fee for the physical act of transferring a
|
|
copy, and you may at your option offer warranty protection in
|
|
exchange for a fee.
|
|
@end itemize
|
|
|
|
Mere aggregation of another independent work with the Program (or its
|
|
derivative) on a volume of a storage or distribution medium does not bring
|
|
the other work under the scope of these terms.
|
|
|
|
@item
|
|
You may copy and distribute the Program (or a portion or derivative of
|
|
it, under Paragraph 2) in object code or executable form under the terms of
|
|
Paragraphs 1 and 2 above provided that you also do one of the following:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
accompany it with the complete corresponding machine-readable
|
|
source code, which must be distributed under the terms of
|
|
Paragraphs 1 and 2 above; or,
|
|
|
|
@item
|
|
accompany it with a written offer, valid for at least three
|
|
years, to give any third party free (except for a nominal charge
|
|
for the cost of distribution) a complete machine-readable copy of the
|
|
corresponding source code, to be distributed under the terms of
|
|
Paragraphs 1 and 2 above; or,
|
|
|
|
@item
|
|
accompany it with the information you received as to where the
|
|
corresponding source code may be obtained. (This alternative is
|
|
allowed only for noncommercial distribution and only if you
|
|
received the program in object code or executable form alone.)
|
|
@end itemize
|
|
|
|
Source code for a work means the preferred form of the work for making
|
|
modifications to it. For an executable file, complete source code means
|
|
all the source code for all modules it contains; but, as a special
|
|
exception, it need not include source code for modules which are standard
|
|
libraries that accompany the operating system on which the executable
|
|
file runs, or for standard header files or definitions files that
|
|
accompany that operating system.
|
|
|
|
@item
|
|
You may not copy, modify, sublicense, distribute or transfer the
|
|
Program except as expressly provided under this General Public License.
|
|
Any attempt otherwise to copy, modify, sublicense, distribute or transfer
|
|
the Program is void, and will automatically terminate your rights to use
|
|
the Program under this License. However, parties who have received
|
|
copies, or rights to use copies, from you under this General Public
|
|
License will not have their licenses terminated so long as such parties
|
|
remain in full compliance.
|
|
|
|
@item
|
|
By copying, distributing or modifying the Program (or any work based
|
|
on the Program) you indicate your acceptance of this license to do so,
|
|
and all its terms and conditions.
|
|
|
|
@item
|
|
Each time you redistribute the Program (or any work based on the
|
|
Program), the recipient automatically receives a license from the original
|
|
licensor to copy, distribute or modify the Program subject to these
|
|
terms and conditions. You may not impose any further restrictions on the
|
|
recipients' exercise of the rights granted herein.
|
|
|
|
@item
|
|
The Free Software Foundation may publish revised and/or new versions
|
|
of the General Public License from time to time. Such new versions will
|
|
be similar in spirit to the present version, but may differ in detail to
|
|
address new problems or concerns.
|
|
|
|
Each version is given a distinguishing version number. If the Program
|
|
specifies a version number of the license which applies to it and ``any
|
|
later version'', you have the option of following the terms and conditions
|
|
either of that version or of any later version published by the Free
|
|
Software Foundation. If the Program does not specify a version number of
|
|
the license, you may choose any version ever published by the Free Software
|
|
Foundation.
|
|
|
|
@item
|
|
If you wish to incorporate parts of the Program into other free
|
|
programs whose distribution conditions are different, write to the author
|
|
to ask for permission. For software which is copyrighted by the Free
|
|
Software Foundation, write to the Free Software Foundation; we sometimes
|
|
make exceptions for this. Our decision will be guided by the two goals
|
|
of preserving the free status of all derivatives of our free software and
|
|
of promoting the sharing and reuse of software generally.
|
|
|
|
@iftex
|
|
@heading NO WARRANTY
|
|
@end iftex
|
|
@ifinfo
|
|
@center NO WARRANTY
|
|
@end ifinfo
|
|
|
|
@item
|
|
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
|
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
|
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
|
PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
|
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
|
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
|
REPAIR OR CORRECTION.
|
|
|
|
@item
|
|
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL
|
|
ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
|
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
|
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES
|
|
ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT
|
|
LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES
|
|
SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
|
|
WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
|
|
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
|
@end enumerate
|
|
|
|
@iftex
|
|
@heading END OF TERMS AND CONDITIONS
|
|
@end iftex
|
|
@ifinfo
|
|
@center END OF TERMS AND CONDITIONS
|
|
@end ifinfo
|
|
|
|
@page
|
|
@unnumberedsec Appendix: How to Apply These Terms to Your New Programs
|
|
|
|
If you develop a new program, and you want it to be of the greatest
|
|
possible use to humanity, the best way to achieve this is to make it
|
|
free software which everyone can redistribute and change under these
|
|
terms.
|
|
|
|
To do so, attach the following notices to the program. It is safest to
|
|
attach them to the start of each source file to most effectively convey
|
|
the exclusion of warranty; and each file should have at least the
|
|
``copyright'' line and a pointer to where the full notice is found.
|
|
|
|
@smallexample
|
|
@var{one line to give the program's name and a brief idea of what it does.}
|
|
Copyright (C) 19@var{yy} @var{name of author}
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; either version 1, or (at your option)
|
|
any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program; if not, write to the Free Software
|
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
|
@end smallexample
|
|
|
|
Also add information on how to contact you by electronic and paper mail.
|
|
|
|
If the program is interactive, make it output a short notice like this
|
|
when it starts in an interactive mode:
|
|
|
|
@smallexample
|
|
Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
|
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
|
This is free software, and you are welcome to redistribute it
|
|
under certain conditions; type `show c' for details.
|
|
@end smallexample
|
|
|
|
The hypothetical commands `show w' and `show c' should show the
|
|
appropriate parts of the General Public License. Of course, the
|
|
commands you use may be called something other than `show w' and `show
|
|
c'; they could even be mouse-clicks or menu items---whatever suits your
|
|
program.
|
|
|
|
You should also get your employer (if you work as a programmer) or your
|
|
school, if any, to sign a ``copyright disclaimer'' for the program, if
|
|
necessary. Here a sample; alter the names:
|
|
|
|
@example
|
|
Yoyodyne, Inc., hereby disclaims all copyright interest in the
|
|
program `Gnomovision' (a program to direct compilers to make passes
|
|
at assemblers) written by James Hacker.
|
|
|
|
@var{signature of Ty Coon}, 1 April 1989
|
|
Ty Coon, President of Vice
|
|
@end example
|
|
|
|
That's all there is to it!
|
|
|
|
|
|
@summarycontents
|
|
@contents
|
|
@bye
|