old-cross-binutils/bfd/doc/bfdint.texi

\input texinfo
@setfilename bfdint.info

@settitle BFD Internals
@iftex
@title{BFD Internals}
@author{Ian Lance Taylor}
@author{Cygnus Solutions}
@end iftex

@node Top
@top BFD Internals
@raisesections
@cindex bfd internals

This document describes some BFD internal information which may be
helpful when working on BFD.  It is very incomplete.

This document is not updated regularly, and may be out of date.  It was
last modified on $Date$.

The initial version of this document was written by Ian Lance Taylor
@email{ian@@cygnus.com}.

@menu
* BFD glossary::		BFD glossary
* BFD guidelines::		BFD programming guidelines
* BFD generated files::		BFD generated files
* BFD multiple compilations::	Files compiled multiple times in BFD
* BFD relocation handling::	BFD relocation handling
* Index::			Index
@end menu

@node BFD glossary
@section BFD glossary
@cindex glossary for bfd
@cindex bfd glossary

This is a short glossary of some BFD terms.

@table @asis
@item a.out
The a.out object file format.  The original Unix object file format.
Still used on SunOS, though not Solaris.  Supports only three sections.

@item archive
A collection of object files produced and manipulated by the @samp{ar}
program.

@item BFD
The BFD library itself.  Also, each object file, archive, or exectable
opened by the BFD library has the type @samp{bfd *}, and is sometimes
referred to as a bfd.

@item COFF
The Common Object File Format.  Used on Unix SVR3.  Used by some
embedded targets, although ELF is normally better.

@item DLL
A shared library on Windows.

@item dynamic linker
When a program linked against a shared library is run, the dynamic
linker will locate the appropriate shared library and arrange to somehow
include it in the running image.

@item dynamic object
Another name for an ELF shared library.

@item ECOFF
The Extended Common Object File Format.  Used on Alpha Digital Unix
(formerly OSF/1), as well as Ultrix and Irix 4.  A variant of COFF.

@item ELF
The Executable and Linking Format.  The object file format used on most
modern Unix systems, including GNU/Linux, Solaris, Irix, and SVR4.  Also
used on many embedded systems.

@item executable
A program, with instructions and symbols, and perhaps dynamic linking
information.  Normally produced by a linker.

@item NLM
NetWare Loadable Module.  Used to describe the format of an object which
be loaded into NetWare, which is some kind of PC based network server
program.

@item object file
A binary file including machine instructions, symbols, and relocation
information.  Normally produced by an assembler.

@item object file format
The format of an object file.  Typically object files and executables
for a particular system are in the same format, although executables
will not contain any relocation information.

@item PE
The Portable Executable format.  This is the object file format used for
Windows (specifically, Win32) object files.  It is based closely on
COFF, but has a few significant differences.

@item PEI
The Portable Executable Image format.  This is the object file format
used for Windows (specifically, Win32) executables.  It is very similar
to PE, but includes some additional header information.

@item relocations
Information used by the linker to adjust section contents.  Also called
relocs.

@item section
Object files and executable are composed of sections.  Sections have
optional data and optional relocation information.

@item shared library
A library of functions which may be used by many executables without
actually being linked into each executable.  There are several different
implementations of shared libraries, each having slightly different
features.

@item symbol
Each object file and executable may have a list of symbols, often
referred to as the symbol table.  A symbol is basically a name and an
address.  There may also be some additional information like the type of
symbol, although the type of a symbol is normally something simple like
function or object, and should be confused with the more complex C
notion of type.  Typically every global function and variable in a C
program will have an associated symbol.

@item Win32
The current Windows API, implemented by Windows 95 and later and Windows
NT 3.51 and later, but not by Windows 3.1.

@item XCOFF
The eXtended Common Object File Format.  Used on AIX.  A variant of
COFF, with a completely different symbol table implementation.
@end table

@node BFD guidelines
@section BFD programming guidelines
@cindex bfd programming guidelines
@cindex programming guidelines for bfd
@cindex guidelines, bfd programming

There is a lot of poorly written and confusing code in BFD.  New BFD
code should be written to a higher standard.  Merely because some BFD
code is written in a particular manner does not mean that you should
emulate it.

Here are some general BFD programming guidelines:

@itemize @bullet
@item
Follow the GNU coding standards.

@item
Avoid global variables.  We ideally want BFD to be fully reentrant, so
that it can be used in multiple threads.  All uses of global or static
variables interfere with that.  Initialized constant variables are OK,
and they should be explicitly marked with const.  Instead of global
variables, use data attached to a BFD or to a linker hash table.

@item
All externally visible functions should have names which start with
@samp{bfd_}.  All such functions should be declared in some header file,
typically @file{bfd.h}.  See, for example, the various declarations near
the end of @file{bfd-in.h}, which mostly declare functions required by
specific linker emulations.

@item
All functions which need to be visible from one file to another within
BFD, but should not be visible outside of BFD, should start with
@samp{_bfd_}.  Although external names beginning with @samp{_} are
prohibited by the ANSI standard, in practice this usage will always
work, and it is required by the GNU coding standards.

@item
Always remember that people can compile using --enable-targets to build
several, or all, targets at once.  It must be possible to link together
the files for all targets.

@item
BFD code should compile with few or no warnings using @samp{gcc -Wall}.
Some warnings are OK, like the absence of certain function declarations
which may or may not be declared in system header files.  Warnings about
ambiguous expressions and the like should always be fixed.
@end itemize

@node BFD generated files
@section BFD generated files
@cindex generated files in bfd
@cindex bfd generated files

BFD contains several automatically generated files.  This section
describes them.  Some files are created at configure time, when you
configure BFD.  Some files are created at make time, when you build
time.  Some files are automatically rebuilt at make time, but only if
you configure with the @samp{--enable-maintainer-mode} option.  Some
files live in the object directory---the directory from which you run
configure---and some live in the source directory.  All files that live
in the source directory are checked into the CVS repository.

@table @file
@item bfd.h
@cindex @file{bfd.h}
@cindex @file{bfd-in3.h}
Lives in the object directory.  Created at make time from
@file{bfd-in2.h} via @file{bfd-in3.h}.  @file{bfd-in3.h} is created at
configure time from @file{bfd-in2.h}.  There are automatic dependencies
to rebuild @file{bfd-in3.h} and hence @file{bfd.h} if @file{bfd-in2.h}
changes, so you can normally ignore @file{bfd-in3.h}, and just think
about @file{bfd-in2.h} and @file{bfd.h}.

@file{bfd.h} is built by replacing a few strings in @file{bfd-in2.h}.
To see them, search for @samp{@@} in @file{bfd-in2.h}.  They mainly
control whether BFD is built for a 32 bit target or a 64 bit target.

@item bfd-in2.h
@cindex @file{bfd-in2.h}
Lives in the source directory.  Created from @file{bfd-in.h} and several
other BFD source files.  If you configure with the
@samp{--enable-maintainer-mode} option, @file{bfd-in2.h} is rebuilt
automatically when a source file changes.

@item elf32-target.h
@itemx elf64-target.h
@cindex @file{elf32-target.h}
@cindex @file{elf64-target.h}
Live in the object directory.  Created from @file{elfxx-target.h}.
These files are versions of @file{elfxx-target.h} customized for either
a 32 bit ELF target or a 64 bit ELF target.

@item libbfd.h
@cindex @file{libbfd.h}
Lives in the source directory.  Created from @file{libbfd-in.h} and
several other BFD source files.  If you configure with the
@samp{--enable-maintainer-mode} option, @file{libbfd.h} is rebuilt
automatically when a source file changes.

@item libcoff.h
@cindex @file{libcoff.h}
Lives in the source directory.  Created from @file{libcoff-in.h} and
@file{coffcode.h}.  If you configure with the
@samp{--enable-maintainer-mode} option, @file{libcoff.h} is rebuilt
automatically when a source file changes.

@item targmatch.h
@cindex @file{targmatch.h}
Lives in the object directory.  Created at make time from
@file{config.bfd}.  This file is used to map configuration triplets into
BFD target vector variable names at run time.
@end table

@node BFD multiple compilations
@section Files compiled multiple times in BFD
Several files in BFD are compiled multiple times.  By this I mean that
there are header files which contain function definitions.  These header
filesare included by other files, and thus the functions are compiled
once per file which includes them.

Preprocessor macros are used to control the compilation, so that each
time the files are compiled the resulting functions are slightly
different.  Naturally, if they weren't different, there would be no
reason to compile them multiple times.

This is a not a particularly good programming technique, and future BFD
work should avoid it.

@itemize @bullet
@item
Since this technique is rarely used, even experienced C programmers find
it confusing.

@item
It is difficult to debug programs which use BFD, since there is no way
to describe which version of a particular function you are looking at.

@item
Programs which use BFD wind up incorporating two or more slightly
different versions of the same function, which wastes space in the
executable.

@item
This technique is never required nor is it especially efficient.  It is
always possible to use statically initialized structures holding
function pointers and magic constants instead.
@end itemize

The following is a list of the files which are compiled multiple times.

@table @file
@item aout-target.h
@cindex @file{aout-target.h}
Describes a few functions and the target vector for a.out targets.  This
is used by individual a.out targets with different definitions of
@samp{N_TXTADDR} and similar a.out macros.

@item aoutf1.h
@cindex @file{aoutf1.h}
Implements standard SunOS a.out files.  In principle it supports 64 bit
a.out targets based on the preprocessor macro @samp{ARCH_SIZE}, but
since all known a.out targets are 32 bits, this code may or may not
work.  This file is only included by a few other files, and it is
difficult to justify its existence.

@item aoutx.h
@cindex @file{aoutx.h}
Implements basic a.out support routines.  This file can be compiled for
either 32 or 64 bit support.  Since all known a.out targets are 32 bits,
the 64 bit support may or may not work.  I believe the original
intention was that this file would only be included by @samp{aout32.c}
and @samp{aout64.c}, and that other a.out targets would simply refer to
the functions it defined.  Unfortunately, some other a.out targets
started including it directly, leading to a somewhat confused state of
affairs.

@item coffcode.h
@cindex @file{coffcode.h}
Implements basic COFF support routines.  This file is included by every
COFF target.  It implements code which handles COFF magic numbers as
well as various hook functions called by the generic COFF functions in
@file{coffgen.c}.  This file is controlled by a number of different
macros, and more are added regularly.

@item coffswap.h
@cindex @file{coffswap.h}
Implements COFF swapping routines.  This file is included by
@file{coffcode.h}, and thus by every COFF target.  It implements the
routines which swap COFF structures between internal and external
format.  The main control for this file is the external structure
definitions in the files in the @file{include/coff} directory.  A COFF
target file will include one of those files before including
@file{coffcode.h} and thus @file{coffswap.h}.  There are a few other
macros which affect @file{coffswap.h} as well, mostly describing whether
certain fields are present in the external structures.

@item ecoffswap.h
@cindex @file{ecoffswap.h}
Implements ECOFF swapping routines.  This is like @file{coffswap.h}, but
for ECOFF.  It is included by the ECOFF target files (of which there are
only two).  The control is the preprocessor macro @samp{ECOFF_32} or
@samp{ECOFF_64}.

@item elfcode.h
@cindex @file{elfcode.h}
Implements ELF functions that use external structure definitions.  This
file is included by two other files: @file{elf32.c} and @file{elf64.c}.
It is controlled by the @samp{ARCH_SIZE} macro which is defined to be
@samp{32} or @samp{64} before including it.  The @samp{NAME} macro is
used internally to give the functions different names for the two target
sizes.

@item elfcore.h
@cindex @file{elfcore.h}
Like @file{elfcode.h}, but for functions that are specific to ELF core
files.  This is included only by @file{elfcode.h}.

@item elflink.h
@cindex @file{elflink.h}
Like @file{elfcode.h}, but for functions used by the ELF linker.  This
is included only by @file{elfcode.h}.

@item elfxx-target.h
@cindex @file{elfxx-target.h}
This file is the source for the generated files @file{elf32-target.h}
and @file{elf64-target.h}, one of which is included by every ELF target.
It defines the ELF target vector.

@item freebsd.h
@cindex @file{freebsd.h}
Presumably intended to be included by all FreeBSD targets, but in fact
there is only one such target, @samp{i386-freebsd}.  This defines a
function used to set the right magic number for FreeBSD, as well as
various macros, and includes @file{aout-target.h}.

@item netbsd.h
@cindex @file{netbsd.h}
Like @file{freebsd.h}, except that there are several files which include
it.

@item nlm-target.h
@cindex @file{nlm-target.h}
Defines the target vector for a standard NLM target.

@item nlmcode.h
@cindex @file{nlmcode.h}
Like @file{elfcode.h}, but for NLM targets.  This is only included by
@file{nlm32.c} and @file{nlm64.c}, both of which define the macro
@samp{ARCH_SIZE} to an appropriate value.  There are no 64 bit NLM
targets anyhow, so this is sort of useless.

@item nlmswap.h
@cindex @file{nlmswap.h}
Like @file{coffswap.h}, but for NLM targets.  This is included by each
NLM target, but I think it winds up compiling to the exact same code for
every target, and as such is fairly useless.

@item peicode.h
@cindex @file{peicode.h}
Provides swapping routines and other hooks for PE targets.
@file{coffcode.h} will include this rather than @file{coffswap.h} for a
PE target.  This defines PE specific versions of the COFF swapping
routines, and also defines some macros which control @file{coffcode.h}
itself.
@end table

@node BFD relocation handling
@section BFD relocation handling
@cindex bfd relocation handling
@cindex relocations in bfd

The handling of relocations is one of the more confusing aspects of BFD.
Relocation handling has been implemented in various different ways, all
somewhat incompatible, none perfect.

@menu
* BFD relocation concepts::	BFD relocation concepts
* BFD relocation functions::	BFD relocation functions
* BFD relocation future::	BFD relocation future
@end menu

@node BFD relocation concepts
@subsection BFD relocation concepts

A relocation is an action which the linker must take when linking.  It
describes a change to the contents of a section.  The change is normally
based on the final value of one or more symbols.  Relocations are
created by the assembler when it creates an object file.

Most relocations are simple.  A typical simple relocation is to set 32
bits at a given offset in a section to the value of a symbol.  This type
of relocation would be generated for code like @code{int *p = &i;} where
@samp{p} and @samp{i} are global variables.  A relocation for the symbol
@samp{i} would be generated such that the linker would initialize the
area of memory which holds the value of @samp{p} to the value of the
symbol @samp{i}.

Slightly more complex relocations may include an addend, which is a
constant to add to the symbol value before using it.  In some cases a
relocation will require adding the symbol value to the existing contents
of the section in the object file.  In others the relocation will simply
replace the contents of the section with the symbol value.  Some
relocations are PC relative, so that the value to be stored in the
section is the difference between the value of a symbol and the final
address of the section contents.

In general, relocations can be arbitrarily complex.  For
example,relocations used in dynamic linking systems often require the
linker to allocate space in a different section and use the offset
within that section as the value to store.  In the IEEE object file
format, relocations may involve arbitrary expressions.

When doing a relocateable link, the linker may or may not have to do
anything with a relocation, depending upon the definition of the
relocation.  Simple relocations generally do not require any special
action.

@node BFD relocation functions
@subsection BFD relocation functions

In BFD, each section has an array of @samp{arelent} structures.  Each
structure has a pointer to a symbol, an address within the section, an
addend, and a pointer to a @samp{reloc_howto_struct} structure.  The
howto structure has a bunch of fields describing the reloc, including a
type field.  The type field is specific to the object file format
backend; none of the generic code in BFD examines it.

Originally, the function @samp{bfd_perform_relocation} was supposed to
handle all relocations.  In theory, many relocations would be simple
enough to be described by the fields in the howto structure.  For those
that weren't, the howto structure included a @samp{special_function}
field to use as an escape.

While this seems plausible, a look at @samp{bfd_perform_relocation}
shows that it failed.  The function has odd special cases.  Some of the
fields in the howto structure, such as @samp{pcrel_offset}, were not
adequately documented.

The linker uses @samp{bfd_perform_relocation} to do all relocations when
the input and output file have different formats (e.g., when generating
S-records).  The generic linker code, which is used by all targets which
do not define their own special purpose linker, uses
@samp{bfd_get_relocated_section_contents}, which for most targets turns
into a call to @samp{bfd_generic_get_relocated_section_contents}, which
calls @samp{bfd_perform_relocation}.  So @samp{bfd_perform_relocation}
is still widely used, which makes it difficult to change, since it is
difficult to test all possible cases.

The assembler used @samp{bfd_perform_relocation} for a while.  This
turned out to be the wrong thing to do, since
@samp{bfd_perform_relocation} was written to handle relocations on an
existing object file, while the assembler needed to create relocations
in a new object file.  The assembler was changed to use the new function
@samp{bfd_install_relocation} instead, and @samp{bfd_install_relocation}
was created as a copy of @samp{bfd_perform_relocation}.

Unfortunately, the work did not progress any farther, so
@samp{bfd_install_relocation} remains a simple copy of
@samp{bfd_perform_relocation}, with all the odd special cases and
confusing code.  This again is difficult to change, because again any
change can affect any assembler target, and so is difficult to test.

The new linker, when using the same object file format for all input
files and the output file, does not convert relocations into
@samp{arelent} structures, so it can not use
@samp{bfd_perform_relocation} at all.  Instead, users of the new linker
are expected to write a @samp{relocate_section} function which will
handle relocations in a target specific fashion.

There are two helper functions for target specific relocation:
@samp{_bfd_final_link_relocate} and @samp{_bfd_relocate_contents}.
These functions use a howto structure, but they @emph{do not} use the
@samp{special_function} field.  Since the functions are normally called
from target specific code, the @samp{special_function} field adds
little; any relocations which require special handling can be handled
without calling those functions.

So, if you want to add a new target, or add a new relocation to an
existing target, you need to do the following:
@itemize @bullet
@item
Make sure you clearly understand what the contents of the section should
look like after assembly, after a relocateable link, and after a final
link.  Make sure you clearly understand the operations the linker must
perform during a relocateable link and during a final link.

@item
Write a howto structure for the relocation.  The howto structure is
flexible enough to represent any relocation which should be handled by
setting a contiguous bitfield in the destination to the value of a
symbol, possibly with an addend, possibly adding the symbol value to the
value already present in the destination.

@item
Change the assembler to generate your relocation.  The assembler will
call @samp{bfd_install_relocation}, so your howto structure has to be
able to handle that.  You may need to set the @samp{special_function}
field to handle assembly correctly.  Be careful to ensure that any code
you write to handle the assembler will also work correctly when doing a
relocateable link.  For example, see @samp{bfd_elf_generic_reloc}.

@item
Test the assembler.  Consider the cases of relocation against an
undefined symbol, a common symbol, a symbol defined in the object file
in the same section, and a symbol defined in the object file in a
different section.  These cases may not all be applicable for your
reloc.

@item
If your target uses the new linker, which is recommended, add any
required handling to the target specific relocation function.  In simple
cases this will just involve a call to @samp{_bfd_final_link_relocate}
or @samp{_bfd_relocate_contents}, depending upon the definition of the
relocation and whether the link is relocateable or not.

@item
Test the linker.  Test the case of a final link.  If the relocation can
overflow, use a linker script to force an overflow and make sure the
error is reported correctly.  Test a relocateable link, whether the
symbol is defined or undefined in the relocateable output.  For both the
final and relocateable link, test the case when the symbol is a common
symbol, when the symbol looked like a common symbol but became a defined
symbol, when the symbol is defined in a different object file, and when
the symbol is defined in the same object file.

@item
In order for linking to another object file format, such as S-records,
to work correctly, @samp{bfd_perform_relocation} has to do the right
thing for the relocation.  You may need to set the
@samp{special_function} field to handle this correctly.  Test this by
doing a link in which the output object file format is S-records.

@item
Using the linker to generate relocateable output in a different object
file format is impossible in the general case, so you generally don't
have to worry about that.  Linking input files of different object file
formats together is quite unusual, but if you're really dedicated you
may want to consider testing this case, both when the output object file
format is the same as your format, and when it is different.
@end itemize

@node BFD relocation future
@subsection BFD relocation future

Clearly the current BFD relocation support is in bad shape.  A
wholescale rewrite would be very difficult, because it would require
thorough testing of every BFD target.  So some sort of incremental
change is required.

My vague thoughts on this would involve defining a new, clearly defined,
howto structure.  Some mechanism would be used to determine which type
of howto structure was being used by a particular format.

The new howto structure would clearly define the relocation behaviour in
the case of an assembly, a relocateable link, and a final link.  At
least one special function would be defined as an escape, and it might
make sense to define more.

One or more generic functions similar to @samp{bfd_perform_relocation}
would be written to handle the new howto structure.

This should make it possible to write a generic version of the relocate
section functions used by the new linker.  The target specific code
would provide some mechanism (a function pointer or an initial
conversion) to convert target specific relocations into howto
structures.

Ideally it would be possible to use this generic relocate section
function for the generic linker as well.  That is, it would replace the
@samp{bfd_generic_get_relocated_section_contents} function which is
currently normally used.

For the special case of ELF dynamic linking, more consideration needs to
be given to writing ELF specific but ELF target generic code to handle
special relocation types such as GOT and PLT.

@node Index
@unnumberedsec Index
@printindex cp

@contents
@bye