x86sim is a simulator for 32-bit Linux programs, and runs on either
32- or 64-bit linux.

== Definitions ==

* The "specimen" is the 32-bit Linux program being executed inside the
  simulator.

* The "host OS" is the system for which ROSE was compiled and on which
  the x86sim is running.

* The "guest OS" is the Linux environment (system calls, etc) provided
  by x86sim to the specimen.

== Usage ==

Usage: x86sim [SWITCHES] SPECIMEN [SPECIMEN_ARGS...]

Switches are:

* "--debug=WORDS" where WORDS is a comma-separated list of one or more
  of the following:

** "insn" shows disassembled instructions as they are executed.  If
   stderr is a terminal then ANSI escape sequences are used to prevent
   other debugging output from immediately scrolling off the screen.
   If you need to see every individual instruction then redirect
   stderr to a non-tty.

** "state" shows register contents after each instruction is executed.

** "mem" shows memory read and write operations.

** "mmap" shows the specimen memory map (similar to /proc/self/maps)
   every time the map changes.

** "syscall" shows system calls, arguments, and return values (similar
   to the strace utility).

** "loader" shows a variety of loading, linking, mapping, and
   relocation operations.

** "progress" emits a progress report every 10 seconds.

** "signal" shows information about signal handling.

** "all" turns on all possible debugging switches.

* "--debug" by itself is the same as "--debug=syscall,insn"

* "--log=FILE" sends diagnostics to the specified file rather than
  standard error.  May include a printf-like integer format which
  will be replaced by the PID, which is useful when the specimen
  forks.

* "--core=STYLE" where STYLE is a comma-separated list of core dump
  types. Currently supported are "elf", which produces a standard ELF
  Core Dump, and "rose" which produces a multi-file memory dump.  The
  default is "elf".  If multiple styles are specified then multiple
  core dumps are created.

* "--dump=ADDRESS[,NAME]" produces a core dump the first time x86sim
  executes an instruction at the specified address.  The core file
  name is optional, but it's often useful to avoid a subsequent dump
  from overwriting this one.  Once the core dump(s) is produced x86sim
  will continue to execute the specimen.

* "--interp=NAME" overrides the interpreter name from the PT_INTERP
  section (normally named ".interp"). The interpreter is usually
  the linux dynamic linker, ld-linux.so.

* "--vdso=PATHS" overrides the compiled in names for finding the
  virtual dynamic shared object normally provided by the kernel. The
  PATHS are a colon-separate list of directories and/or files. When
  directories are specified the simulator looks for files called
  "x86vdso" in those directories.

* "--showauxv" causes the simulator to print its own ELF auxiliary
  vector.  (The auxiliary vector for the specimen is printed by
  "--debug=loader".)

* "--trace=FILENAME" generates a binary trace that can be read
  with the traceAnalysis project.  That project reads a trace file
  and produces ELF and PE binary executables.

Positional arguments:

* "SPECIMEN" is the name of the program to simulate.  x86sim will
  locate this program using the same rules as the shell: using the
  name directly if it contains a slash, or consulting the PATH
  environment variable otherwise.  NOTE: For temporary backward
  compatibility, x86sim also searches the CWD, but does so last (do
  not depend on this feature).

* "SPECIMEN_ARGS" are all the arguments for the specimen.

== Design ==

x86sim uses ROSE's binary support to simulate a process:

* The memory address space of the specimen is represented by an
  instance of ROSE's MemoryMap class.  Keeping the specimen's address
  space separated from x86sim's address space allows the specimen to
  use its entire address space however it chooses, and it prevents the
  specimen from being able to access x86sim's address space.

  The main drawback of this design is that every specimen memory
  access must go through the MemoryMap, and x86sim must marshal data
  between the specimen's and x86sim's address spaces when performing a
  system call on behalf of the specimen.  This has an impact on
  performance.

* The specimen's instructions are simulated rather than executed
  directly by the CPU.  The simulation is performed by the
  X86InstructionSemantics class defined in ROSE and an EmulationPolicy
  class defined in x86sim.  The X86InstructionSemantics class defines
  what basic operations must be performed by each instruction, while
  the EmulationPolicy class defines the operations themselves.

  Simulating each instruction has a number of advantages:

     * the specimen can be for a different architecture than that on
       which x86sim is running.

     * x86sim handles priviledged instructions the same way as all
       other instructions.

     * it's easy to modify x86sim to do something special for
       certain instructions.

     * it allows x86sim to keep the specimen in a separate address
       space that doesn't overlap with x86sim's address space.

     * it's a good way to gain confidence that ROSE's instruction
       semantics are working properly.  A bug in the implementation
       would likely cause the specimen to fail.

  The main disadvantage of simulating each instruction is that it
  is much slower than executing instructions directly.

* The ROSE Disassembler is used to build an intermediate
  representation of each instruction before the instruction can be
  simulated.  Disassembly at some level is necessary in order to
  simulate instructions, and the level of disassembly that x86sim uses
  is the same as that which ROSE uses for all other binary
  analyses. Besides being necessary for simulation, this also gives us
  confidence that the disassembler is functioning properly.

* Dynamic linking. x86sim loads the specimen's main executable into an
  address space (MemoryMap) like the OS would do, and then begins
  simulation.  If the main executable has an interpreter, then the
  interpreter is also loaded and simulation starts in the
  interpreter. In this way, x86sim is able to simulate dynamically
  linked executables -- it simulates the linker itself (i.e., it
  simulates the ld-linux.so interpreter).

  ROSE also has it's own dynamic linking capability which is under
  development. The simulator could use this capability instead, which
  would give the user more control over the linking process.

* System calls. x86sim supports either the "INT 0x80" interface or
  they "SYSENTER" interface.  In either case, the instruction is
  intercepted and x86sim processes the system call. It does one of two
  things: (1) x86sim may invoke the system call on behalf of the
  specimen, or (2) x86sim may update the simulated state of the
  specimen directly.

  Handling of system calls actually makes up the bulk of the
  EmulationPolicy class in x86sim.

== System calls ==

If you see a message like "syscall_131(....) is not implemented yet"
then you may implement it.  Look up the call number in
<asm/unistd_32.h> (not <asm/unistd_64.h> since 32- and 64-bit syscall
numbers differ).

Add a new case to the big switch statement in
EmulationPolicy::emulate_syscall().  System call arguments can be
obtained with arg(N) methods and are always register values.  If an
argument is a pointer to a string then the string can be obtained with
code like "read_string(arg(N))".  Other data structures passed into a
system call must first be copied from specimen memory to x86sim
memory, converted to a 64-bit layout if the host OS is 64-bit, and
then passed to the real system call.

The real system call can be invoked with syscall(), but a hard-coded
system call number must be supplied since the __SYS_* constants are
for the host OS, which might be either 32- or 64-bit.  Note that
direct system calls don't always follow the same semantics nor use the
same data structures as the corresponding C library call.  For
example,  the brk() library function returns a different value than
the SYS_brk system call.  Likewise, the "stat" family of functions
have a different data structure in the C library than they do in the
kernel.  But in most cases it's probably safe to call the C library
function rather than the direct system call.

Some system calls (like the "stat" family of calls) return a different
data structure than what is eventually returned by the C library.  In
these cases it's best if x86sim uses syscall() since this prevents the
C library from converting the kernel data structure to the C library
data structure, an action that must be undone in x86sim so that the
specimen's emulated C library can re-do it.

System calls return an exit status in the EAX register, with certain
negative values indicating an error.  Sometimes the C library will
make additional checks and return a failure even if the underlying
system call was successful. It's best to invoke the system call with
syscall() in these cases and then let the specimen's emulated C
library detect the error itself.

The syscall() function returns -1 on error with the actual error code
in 'errno'.  Thus, the typical way to end system call emulation in the
x86sim big switch statement is like:

    int retval = syscall(....);
    if (-1==retval) retval = -errno;
    ...
    writeGPR(x86_gpr_ax, retval);

== Purely Abstract Interpretation ==

Instructions are simulated by processing them through the instruction
semantics layer, X86InstructionSemantics, using the semantic policy
EmulationPolicy. The EmulationPolicy class is derived from
VirtualMachineSemantics::Policy and related classes. The
VirtualMachineSemantics tracks registers and memory, the values of
which can be:
  * A known integer
  * An unknown, named value.
  * An unknown, named value with a signed offset.
  * The negation of an unknown, named value with a signed offset.

A concrete interpretation uses only values that are known
integers. The VirtualMachineSemantics policy was chosen as the base
class because of its ability to represent known integers and its
ability to perform constant folding for operations on those integers.

The EmulationPolicy modifies and extends the VirtualMachineSemantics
policy in three main ways:

  1. Memory access is simplified. The VirtualMachineSemantics accounts
     for memory I/O operations where the address is not known. The
     EmulationPolicy does not need this feature, and therefore
     overrides the memory I/O methods to make them faster.

  2. The EmulationPolicy extends the VirtualMachineSemantics policy by
     adding methods for loading the executable and its optional
     interpreter (for dynamically linked executables) into a simulated
     address space.

  3. The EmulationPolicy adds an emulate_syscall() method. This is
     actually the real meat of the class in terms of number of lines
     of code.

Since EmulationPolicy is derived from VirtualMachineSemantics, it
should be possible, in theory, to perform abstract analysis with
unknown values.  VirtualMachineSemantics was designed to be a fast
approximation of fully abstract symbolic semantics, and one could
replace VirtualMachineSemantics with SymbolicSemantics. One should
note that SymbolicSemantics doesn't do any kind of constant folding at
this time, so the symbolic expressions could become very large.

However, there's one important aspect of EmulationPolicy that makes
abstract interpretation problematic: the system call emulation.
Syscall emulation depends on having known, concrete values to pass to
the underlying system call.

== Disassembly Issues ==

Currently, x86sim disassembles machine instructions only when that
memory is about to be executed.  The disassembly is cached and if that
address is executed again, x86sim will use the cached instruction if
the memory contains the same bytes as when the instruction was
originally disassembled.

It is also possible to disassemble the entire executable at once
simply by choosing an appropriate disassembler (see
Disassembler::lookup()), and then calling the disassembler, giving it
the MemoryMap for the specimen. Alternatively, one can use the
Partitioner::partition() method, passing a disassembler and memory
map.

== Concolic Analysis ==

Certain combinations of abstract and concrete analysis are
possible. For instance, it's possible to obtain, through static
analysis and disassembly, an AST containing functions of basic blocks
for all the executable memory.  This AST contains a control flow graph
and a function call graph that could be compared with the concrete
interpretation and augmented by the concrete interpretation.

For statically linked executables, one could also obtain an AST for
the ELF container. But since shared objects for dynamically linked
executables are resolved by simulating the dynamic loader, ROSE does
not have any direct knowledge of the shared objects.  It would be
possible to obtain the list of shared object files by observing the
system calls related to file I/O since the simulated loader must open,
and map the shared objects.

== Debugging Tricks and Tips ==

It's often difficult to determine why a specimen runs differently
under x86sim than natively. It could be due to differences in the
environment provided by x86sim (like the fact that x86sim runs the
specimen with slightly different AUXV (one of the pieces of
information that gets pushed onto the specimen's initial stack).  You
can see values in x86sim with "--debug=loader" and natively by setting
the LD_SHOW_AUXV environment variable.

It could be due to differences in the way memory is layed out.  A
simple way to test this is to somehow get the specimen to dump its
memory map and compare that to the map printed with
"--debug=mmap".  If the specimen is the standard "cat" command for
instance, you could just run "cat /proc/self/maps".

Another way to look at memory maps is by looking at the ELF Segment
Table in a core dump, either a core dump produced by x86sim or by
running the command natively.  For native core dumps you might need to
first execute "ulimit -Sc unlimited".  See the "--dump" switch for
x86sim.  An easy way to get a core dump at an arbitrary process for a
natively run specimen is to fire it up under GDB, set a break point at
the address of interest (e.g., "break *0x08040444"), run to the break
point, and then send a core producing signal to the specimen. You can
see the ELF Segment Table with a variety of tools: readelf, objdump,
or from ROSE itself using tests/roseTests/binaryTests/execFormatsTest:

    $ execFormatsTests -rose:read_executable_file_format_only core
    $ less core.dump

The GDB "set disassembly-flavor intel" will cause GDB to use the Intel
format which is used by most tools (other than GCC) including ROSE
itself.  Same can be done with objdump's "-Mintel" switch.

An easy way to view memory contents (other than GDB) is to run x86sim
with "--core=rose --dump=0x12345678,mem" and then look at the
resulting mem.index file. Find the address of interest in the index,
then look in the appropriate mem-*.dat file. (0x12345678 is the
address where the dump is taken.)

Here's another way (besides GDB) to get an assembly listing of memory
contents based on what's been executed. Run with "--core=rose
--dump=0x12345678,mem --debug=insn" and capture stderr to find all the
executed "call" instructions that are followed by a constant target
address, sort the call targets addresses into a unique list, separate
the targets from one another with only a comma, and store the result
in a file named "targets".  Then run the "disassemble" tool found in
tests/roseTests/binaryTests:

    $ disassemble -rose:partitioner_search -leftovers \
                  --raw=$(cat targets) mem.index

See also "disassemble --help" because it takes a lot of different,
useful switches and improvements continue to be added (it's the
test bed for the latest binary capabilities).

The behavior of the dynamic linker (ld.so) in the specimen can be
controlled by setting environment variables documented in its
man page. However, to prevent those variables from affecting the
loading and linking of ROSE itself, prefix the variable's name
with "X86SIM_". For example, to adjust the library load path
do this:
    
    $ X86SIM_LD_LIBRARY_PATH=somedir ./x86sim a.out


To get a specimen core dump upon return from any signal handler, use
the "--dump=0xdeceaced" switch (as in a deceaced (sic) signal handler).


== CROSS PLATFORM ==

Here's how to run a 32-bit specimen on a 64-bit machine that doesn't
have 32-bit compatibility. The goal is to get the specimen to follow
the same code path in the 64-bit x86sim as it would when run natively
on the 32-bit platform.

(1) On the 32-bit machine, run "ldd $SPECIMEN" to get a list of
    libraries, including the dynamic linker itself. Copy all of these
    files to a directory, $LIBDIR, on the 64-bit machine.  The library
    with no file name (if any, often called "linux-gate") is a virtual
    dynamic shared object provided by the kernel--don't worry about
    copying it.  The "ld-linux" shared object is the dynamic linker
    itself, which we'll refer to as $LDLINUX.

(2) Copy the virtual dynamic shared object file to the test
    directory.  Name of file is $VDSO.

      $ VDSO_BASE=0x$(setarch i386 -LRB3 dd if=/proc/self/maps |\
	            grep vdso |\
                    cut -d- -f1)
 
    Run a couple of times to make sure the $VDSO_BASE is consistent.

      $ setarch i386 -LRB3 dd bs=1024 skip=$[VDSO_BASE/1024] count=4 \
          if=/proc/self/mem of=$VDSO

(3) Copy the specimen to the 64-bit machine. This is $SPECIMEN.

(4) Run the simulator as:

      setarch i386 -LRB3 \
          env \
	      X86SIM_LD_PRELOAD= \
              X86SIM_LD_LIBRARY_PATH=$LIBDIR \
          x86sim \
              --interp=$LDLINUX --vdso=$VDSO \
              $SPECIMEN

== SIGNAL HANDLING ==

Signals delivered to x86sim will be treated as if they were intended
for the specimen (x86sim never gets a SIGSEGV anyway ;-).  After the
OS kernel delivers the signal to x86sim, x86sim will deliver it to the
specimen. This final delivery always happens between instructions.

We don't yet handle signal stacks, although adding one wouldn't be
that hard.

Pressing Control-C won't do what usually happens--it won't immediately
kill x86sim.  Instead, the SIGINT is handled just like the other
signals in that x86sim will deliver it to the specimen.  Usually the
specimen will not have a handler registered for SIGINT and x86sim will
cause the specimen to exit, thus exiting x86sim as well.  If you're
unable to kill x86sim with a Control-C, try Control-\ or, from another
terminal, run "kill -SIGKILL PID" where PID is the process ID of
x86sim (and the specimen).

If x86sim fails an assertion (assert() or ROSE_ASSERT()) it will look
like the specimen failed the assertion since x86sim traps the signal
and delivers it to the specimen. This is true of all signals produced
by x86sim for x86sim.
