933786524e
* README.txt: New. * config.h (CYCLE_ACCURATE, CYCLE_STATS): New. * configure.in (--enable-cycle-accurate, --enable-cycle-stats): New. Default to enabled. * configure: Regenerate. * cpu.h (regs_type): Add cycle tracking info. (reset_pipeline_stats): Declare. (halt_pipeline_stats): Declare. (pipeline_stats): Declare. * main.c (done): Call pipeline_stats(). * mem.h (rx_mem_ptr): Moved to here ... * mem.c (mem_ptr): ... from here. Rename throughout. (mem_put_byte): Move LEDs to Port A. Add Port B to control cycle statistics. Move UART to SCI4. (mem_put_hi): Add TPU 1-2. TPU 1 and 2 count CPU cycles. * reg.c (init_regs): Set Rt reg to -1 (no reg). * rx.c: Add cycle counting and statistics throughout. (rx_get_byte): Optimize for speed. (decode_opcode): Likewise. (reset_pipeline_stats): New. (halt_pipeline_stats): New. (pipeline_stats): New. * trace.c (sim_disasm_one): Print cycle count. [include/opcode] * rx.h (RX_Opcode_ID): Add nop2 and nop3 for statistics.
121 lines
4.6 KiB
Text
121 lines
4.6 KiB
Text
The RX simulator offers two rx-specific configure options:
|
|
|
|
--enable-cycle-accurate (default)
|
|
--disable-cycle-accurate
|
|
|
|
If enabled, the simulator will keep track of how many cycles each
|
|
instruction takes. While not 100% accurate, it is very close,
|
|
including modelling fetch stalls and register latency.
|
|
|
|
--enable-cycle-stats (default)
|
|
--disable-cycle-stats
|
|
|
|
If enabled, specifying "-v" twice on the simulator command line causes
|
|
the simulator to print statistics on how much time was used by each
|
|
type of opcode, and what pairs of opcodes tend to happen most
|
|
frequently, as well as how many times various pipeline stalls
|
|
happened.
|
|
|
|
|
|
|
|
The RX simulator offers many command line options:
|
|
|
|
-v - verbose output. This prints some information about where the
|
|
program is being loaded and its starting address, as well as
|
|
information about how much memory was used and how many instructions
|
|
were executed during the run. If specified twice, pipeline and cycle
|
|
information are added to the report.
|
|
|
|
-d - disassemble output. Each instruction executed is printed.
|
|
|
|
-t - trace output. Causes a *lot* of printed information about what
|
|
every instruction is doing, from math results down to register
|
|
changes.
|
|
|
|
--ignore-*
|
|
--warn-*
|
|
--error-*
|
|
|
|
The RX simulator can detect certain types of memory corruption, and
|
|
either ignore them, warn the user about them, or error and exit.
|
|
Note that valid GCC code may trigger some of these, for example,
|
|
writing a bitfield involves reading the existing value, which may
|
|
not have been set yet. The options for * are:
|
|
|
|
null-deref - memory access to address zero. You must modify your
|
|
linker script to avoid putting anything at location zero, of
|
|
course.
|
|
|
|
unwritten-pages - attempts to read a page of memory (see below)
|
|
before it is written. This is much faster than the next option.
|
|
|
|
unwritten-bytes - attempts to read individual bytes before they're
|
|
written.
|
|
|
|
corrupt-stack - On return from a subroutine, the memory location
|
|
where $pc was stored is checked to see if anything other than
|
|
$pc had been written to it most recently.
|
|
|
|
-i -w -e - these three options change the settings for all of the
|
|
above. For example, "-i" tells the simulator to ignore all memory
|
|
corruption.
|
|
|
|
-E - end of options. Any remaining options (after the program name)
|
|
are considered to be options for the simulated program, although
|
|
such functionality is not supported.
|
|
|
|
|
|
|
|
The RX simulator simulates a small number of peripherals, mostly in
|
|
order to provide I/O capabilities for testing and such. The supported
|
|
peripherals, and their limitations, are documented here.
|
|
|
|
Memory
|
|
|
|
Memory for the simulator is stored in a hierarchical tree, much like
|
|
the i386's page directory and page tables. The simulator can allocate
|
|
memory to individual pages as needed, allowing the simulated program
|
|
to act as if it had a full 4 Gb of RAM at its disposal, without
|
|
actually allocating more memory from the host operating system than
|
|
the simulated program actually uses. Note that for each page of
|
|
memory, there's a corresponding page of memory *types* (for tracking
|
|
memory corruption). Memory is initially filled with all zeros.
|
|
|
|
GPIO Port A
|
|
|
|
PA.DR is configured as an output-only port (regardless of PA.DDR).
|
|
When written to, a row of colored @ and * symbols are printed,
|
|
reflecting a row of eight LEDs being either on or off.
|
|
|
|
GPIO Port B
|
|
|
|
PB.DR controls the pipeline statistics. Writing a 0 to PB.DR disables
|
|
statistics gathering. Writing a non-0 to PB.DR resets all counters
|
|
and enables (even if already enabled) statistics gathering. The
|
|
simulator starts with statistics enabled, so writing to PB.DR is not
|
|
needed if you want statistics on the entire program's run.
|
|
|
|
SCI4
|
|
|
|
SCI4.TDR is connected to the simulator's stdout. Any byte written to
|
|
SCI4.TDR is written to stdout. If the simulated program writes the
|
|
bytes 3, 3, and N in sequence, the simulator exits with an exit value
|
|
of N.
|
|
|
|
SCI4.SSR always returns "transmitter empty".
|
|
|
|
|
|
TPU1.TCNT
|
|
TPU2.TCNT
|
|
|
|
TPU1 and TPU2 are configured as a chained 32-bit counter which counts
|
|
machine cycles. It always runs at "ICLK speed", regardless of the
|
|
clock control settings. Writing to either of these 16-bit registers
|
|
zeros the counter, regardless of the value written. Reading from
|
|
these registers returns the elapsed cycle count, with TPU1 holding the
|
|
most significant word and TPU2 holding the least significant word.
|
|
|
|
Note that, much like the hardware, these values may (TPU2.CNT *will*)
|
|
change between reads, so you must read TPU1.CNT, then TPU2.CNT, and
|
|
then TPU1.CNT again, and only trust the values if both reads of
|
|
TPU1.CNT were the same.
|