Semantic Verification
=====================

The purpose of these tools is to verify to the extent possible that
the instruction semantics layer in ROSE accurately reflects how the
instructions execute on real hardware.

The basic idea is that we run ROSE in parallel with the executable
we're analyzing. The executable is run under a very simple
network-based debugger which is driven by ROSE. This allows us to run
the executable on an architecture different than ROSE.

The Debugger
============

The DEBUGGER is responsible for launching the program being analyzed,
controlling its execution, and querying the values of registers and
memory.  ROSE and the debugger communicate via TCP sockets.  The
underlying system used by the debugger could vary:

   * A true, full-fledged debugger such as GDB
   * A simple ptrace() mechanism
   * A modified virtual machine (such as Bochs)
   * A Pin-instrumented program

(The DebuggerPtrace.c file contains the ptrace debugger.)

Regardless of the underlying method, the debugger interface will be
consistent and simple.  The client (usually ROSE) will send a command
to the debugger which will return a status and optional data.
Commands are either ASCII for human readability and ease of debugging,
or binary for efficiency.

ASCII Commands
--------------

Each command is a single letter followed by one or two space separated
numbers on a single line. The numbers can be decimal, octal (leading
zero), or hexadecimal (leading "0x").  A human can interactively
control the debugger by starting the debugger and then connecting to
it with "telnet" or "nc". For example:

   machineA$ DebuggerPtrace --port=32002 /bin/ls
   machineB$ nc machineA 32002

The commands:

    s
	Single step. Execute one instruction.

    r
	Show values of all known non-floating-point registers.

    q
	Quit, killing the subordinate program.

    b ADDR [SIZE]
	Set a break point at all addresses between ADDR (inclusive)
        and ADDR+SIZE (exclusive). A SIZE of zero (or not specified)
        is a special case, treated as if the SIZE were one byte
        instead.

    c [ADDR]
        Continue execution until a break point is reached.  If ADDR is
        present and non-zero then execution continues until ADDR is
        reached and all other break points are temporarily ignored.

    m ADDR [SIZE]
	Returns SIZE bytes (defaults to 1) of memory values beginning
        at the specified address.

The debugger will print a status message after each command is
executed. For single-step and continue commands the status is printed
when the subordinate program reaches the next stopping point.  The
status is the word "OK" or "FAIL" optionally followed by additional
text on the same line.  Any command that produces other results will
output the results before the OK/FAIL status is reported.

Binary Commands
---------------

The debugger also understands binary commands, which produce binary
output and status reports.  Binary and ASCII commands can be mixed
during a debugging session as long as there are no intervening bytes
between the line-feed of an ASCII command and the first byte of the
following binary command.

A binary command is a one-byte command followed by one or two 8-byte
arguments each in little-endian order.  The number of arguments
depends on the command--the optional arguments shown above for the
ASCII commands are required arguments in binary commands.

For each command, the server will return an eight-byte, little-endian
value. If a result is more than eight bytes then the first eight bytes
will indicate the length of the result, which immediately follows. The
decision whether the first eight bytes is the return value or length
of following data is based on the command that produced the
output. (In other words, see the code ;-)

Note: The debuggers (e.g., DebuggerPtrace.c) are very simple C files
that need no special support. You should be able to compile the
debuggers with "gcc DebuggerPtrace.c".  The verifySemantics.C defines a
class for communicating with a debugger and will probably be moved
to its own header eventually.

Verification Algorithm
======================

A copy of the executable must be present on the machine where ROSE is
running and the machine where the debugger is running.  Start the
executable under the debugger:

     $ gcc -O3 -o debugger DebuggerPtrace.c
     $ ./debugger --port=32002 a.out

In another window (on possibly a different machine), run the verifier:

     $ verifySemantics --debugger=localhost:32002 a.out

ROSE parses the executable and obtains addresses of all executable
segments. It then sets breakpoints for all addresses in those
segments and asks the debugger to continue execution.

When a break point is reached ROSE looks up the SgAsmInstruction at
that address. ROSE queries the debugger to compare the instruction,
and submits the instruction to the semantic analyzer.

The Policy for the semantic analyzer will query the debugger for every
register and memory read operation.  Register and memory write
operations will be remembered for later.

ROSE will ask the debugger to single step and then compare writes
(from previous step) with values queried from the debugger.

ROSE and the debugger then advance to the next breakpoint and repeat
the process.

Shortcomings
============

* The verifier does not handle interrupts, so instructions like "INT"
  will be executed in the natively-running version (under the
  debugger) but not in the instruction semantics layer (the
  verifier). Therefore, one should expect the state after instructions
  like these to be different.

* The RDTSC (read timestamp counter) will return a different value
  natively than in the verifier.
