2.2
Recording Traces
Traces are recorded by manipulating the threaded code branch tar-
get stored in the intermediate representation of each instruction as it
is executed by JamVM’s direct threaded interpreter [4]. The entire
interpreter of JamVM is located in a single C method. Each Java
Virtual Machine Language (JVML) instruction is implemented as
a code block that is preceded by a label. The address of this la-
bel is stored in the intermediate representation, and JamVM uses
GCC’s [12] computed goto statement to jump to the corresponding
label of each instruction as it is executed.
To record a trace, the monitor manipulates the threaded code
branch target in the instruction immediately following the first in-
struction of the trace to redirect execution to a recording code
block. This special code block can be understood as a meta instruc-
tion that is (virtually) inserted after each executed instruction. The
recording block records the current value of the program counter,
the opcode of the executed instruction, and the value on top of the
stack in a trace structure. Every time the recording block is invoked
it reverses the change to the previously executed instruction that
caused the recording block to be invoked and patches the next in-
struction label instead. Using this technique the recording block
essentially chases the execution, always making sure that it is one
step ahead of the program counter.
While our approach triples the number of hard to predict (and
thus costly) indirect dispatches during trace recording, the overhead
is essentially zero when the trace recorder is disconnected and is
not recording traces. The only noteworthy overhead is incurred
for JVML branch instructions, because we have to increment the
invocation frequency counter and compare it to a threshold value.
To eliminate this overhead we redirect branch instructions to code
blocks that do no longer perform profiling once a second stop-loss
threshold is exceeded. In this case we assume that no suitable traces
could be extracted from the bytecode in question.
While recording traces, the direction (taken or not taken) of
every conditional branch is recorded. These decision points in
the original program become guard instructions in the recorded
trace. At runtime, these guard instructions ensure that the control
flow still follows the previously encountered path. Depending on
whether the original branch was taken or not, a guard instruction
is emitted that either checks for the branch condition itself or its
complement.
A taken branch instruction that branches if the value is equal
zero (jeq), for example, becomes a guard if not equal zero in-
struction (gne), because the compiled trace has to be aborted if the
original assumption (value is equal zero) does not hold at the time
the trace is executed. The same branch instruction (jeq) would be
converted into a guard if equal zero instruction (geq) if it was not
taken in the initial code, because now the compiled trace has to be
guarded against the opposite condition.
Lookup table dispatches (JVML instructions lookupswitch
and tableswitch) are used in Java to efficiently implement the
switch
/case construct of the Java high-level language. When
encountering a lookup table dispatch instruction, the virtual ma-
chine determines the bytecode address where to continue execution
based on a table of address and value pairs. We treat lookup table
dispatches as if they were simple conditional branches that com-
pare a value with a constant. When the trace is executed, we expect
the same value to be encountered by the guard instruction, or the
trace is terminated through a side exit.
1
Optimizing virtual method invocations is a non-trivial task for
traditional compilers. Since our compiler always follows the ac-
tually executed instruction stream, we only have to insert a guard
1
We will discuss in the next section how to merge several traces so we can
compile lookup table dispatches that have several hot paths.
instruction that checks that the predicted target method is still valid.
For this, the guard instruction compares the actual type of the object
the virtual method is invoked on with the type that was encountered
when the trace was recorded. If the two types still match, execution
of the trace continues with the already recorded and compiled in-
struction stream.
If any guard instruction fails at runtime, control is transfered
back to the VM to resume execution at the bytecode level. For this,
the mapping between machine code registers and Java stack and
local variables has to be preserved. Our intermediate representation
embeds this information in every guard instruction and the code
generator creates side exit stubs that write back the values into the
appropriate local variable locations.
Our initial implementation maintained mapping information for
both, the Java operand stack as well as Java local variables. Pro-
filing of our first prototype showed that the code generator never
actually had to generate write-back code for stack locations. While
in principle legal in JVML, the Java compiler does not generate ba-
sic blocks that have more than one predecessor and a non-empty
stack at the entry point. Instead, the stack is always empty after
branch instructions, except for some special constructs such as the
“? :” operator. However, the latter can never be a loop header and
thus will never trigger the recording of a trace and is thus irrelevant
in this context. Based on this observation we removed support for
traces that start with a non-empty stack or contain side-exits with a
non-empty stack.
The last map is recorded at the very end of the trace where
execution returns to the loop header (trace entry point). This map is
special as it indicates which Java local variables are altered during
each loop iteration. To ensure proper semantics, all values altered
during a loop iteration have to be written back at all side exists,
because the value could still be pending from a previous loop
iteration (Figure 2).
2.3
Stop Conditions
Recording ends in two cases—either we decide to abort the record-
ing due to certain circumstances, or we successfully finish the
recording.
As we are only interested in the loop body itself, we define
a number of abort conditions which terminate the recording of a
trace. If an exception is thrown, for example, we immediately ter-
minate recording, because exceptions are by definition rare events
and by its very nature JIT compilation only focuses on frequently
executed code areas. Similarly, we also abort tracing when we en-
counter a native method invocation. The rationale of this design
decision is that native method invocations are already fairly expen-
sive, no matter if compiled or interpreted, and we expect them to
happen infrequently. Also, by not having to generate code for native
code invocation, our backend compiler is simplified significantly.
When execution returns to the original loop header where a trace
started, the recording of the trace ends and the newly recorded trace
is sent to the JIT compiler for translation to optimized machine
code.
3.
Compiling Traces
Once a trace has been recorded, our trace compiler translates it to
directly executable machine code. As mentioned previously, our
compiler is specialized in compiling traces only and expects its
input to be free of branch instructions except for a single, final,
unconditional branch at the end of the trace, which jumps back
to the loop header. The code generated for the trace is optimized
with the assumption that the trace will be executed repeatedly
(loop), until it is left through a side exit (guard instructions). In
our system, traces do not have a generic exit point, as they always