Academic
Publications
The Structure and Performance of Efficient Interpreters

The Structure and Performance of Efficient Interpreters,Journal of Instruction-level Parallelism,M. Anton Ertl,David Gregg

The Structure and Performance of Efficient Interpreters   (Citations: 19)
BibTex | RIS | RefWorks Download
Interpreters designed for high general-purpose performance typically perform a large number of indirect branches (3.2%{13% of all executed instructions in our benchmarks). These branches consume more than half of the run-time in a number of congurations we simulated. We evaluate how accurate various existing and proposed branch prediction schemes are on a number of interpreters, how the mispredictions aect the performance of the interpreters and how two dieren t interpreter implementation techniques perform with various branch predictors. We also suggest various ways in which hardware designers, C compiler writers, and interpreter writers can improve the performance of interpreters.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...In 2001, Ertl and Gregg [8] found that there are certain optimization techniques for interpreters, e.g., threaded code 1 , [1,7] that cause them to perform...
    • ...While interpreters using the threaded code optimization, such as the OCaml and Forth interpreters performed within a slowdown factor of up to 10 when compared with an optimizing native code compiler, other interpreters, such as the Perl and Xlisp interpreters, which were not using similar optimization techniques performed considerably worse: a slowdown of an additional factor of 100 when compared with efficient interpreters [8]...
    • ...In low abstraction-level virtual machines the overhead in instruction dispatch is big, therefore using threaded code is particularly effective, resulting in reported speedups of up to a factor of 2.02 [8]...
    • ...In the spectralnorm benchmark the application of both techniques results in a speedup of up to a factor of 1.92—only slightly lower than the maximum reported speedup of up to a factor of 2.02 achieved by efficient interpreters using threaded code alone [8]...

    Stefan Brunthaler. Inline Caching Meets Quickening

    • ...The optimization of interpreters is interesting because often simple techniques have a huge impact—for example changing the instruction dispatch from the common switch-based dispatch technique to the more advanced threaded-code 1 [Bel73] dispatch technique results in re- ported speedups of up to 2.02 [EG03b]...
    • ...In 2003, Ertl and Gregg identified a set of optimization techniques that achieve significant speedup for virtual machines [EG03b]...

    Stefan Brunthaler. Efficient interpretation using quickening

    • ...The virtual machine is implemented in C++, has a standard Cheney garbage collector [17, page 117] and uses a direct threaded byte code interpreter [4, 12] as its main execution loop...

    Anders Bach Nielsenet al. Virtual class support at the virtual machine level

    • ...Mispredicted branches are expensive on modern processors (e.g., they cost about 10 cycles on the Pentium III and Athlon and 20–30 cycles on the Pentium 4). As a result, interpreters can spend more than half of their execution time recovering from indirect branch mispredictions [Ertl and Gregg 2003b]...
    • ...BTBs mispredict 50%–63% of the executed indirect branches in threaded-code interpreters and 81%–98% in switch-based interpreters [Ertl and Gregg 2003b]...
    • ...In previous work on a RISC architecture we have measured indirect branches accounting for up to 13% of executed instructions for the Gforth interpreter and 11% for the Ocaml interpreter [Ertl and Gregg 2003b] As we show in Section 7.2.2, on the CISC Intel x86 architecture where more work can be accomplished with fewer real machine instructions, the average for Gforth can be as high as 16.5% of executed instructions...
    • ...Switch dispatch can be implemented in ANSI C (see Figure 1), but is not very efficient [Ertl and Gregg 2003b] (see also Section 3)...
    • ...Better indirect branch predictors have been proposed [Driesen and H¨olzle 1998; 1999; Kalamatianos and Kaeli 1999], and they would improve the prediction accuracy in interpreters substantially [Ertl and Gregg 2003b]...
    • ...Ertl and Gregg [2003b] investigated the performance of several virtual machine interpreters on several branch predictors and found that BTBs mispredict 81%–98% of the indirect branches in switch-dispatch interpreters, and 57%–63% of the indirect branches in threaded-code interpreters (a variation, the so-called BTB with two-bit counters, produces slightly better results for threaded code: 50%–61% mispredictions)...
    • ...In contrast, Ertl and Gregg [2003b] investigated the performance efficient interpreters, and found that their running time was often dominated by indirect branch mispredictions...
    • ...BTBs, Ertl and Gregg also showed that a processor with a two-level indirect branch predictor can correctly predict most indirect branches in VM interpreters [Ertl and Gregg 2003b]...

    Kevin Caseyet al. Optimizing indirect branch prediction accuracy in virtual machine inte...

    • ...However, switch dispatch is considerably slower than the start-of-the-art, direct-threaded dispatch [7]...

    Marc Berndlet al. Context Threading: A Flexible and Efficient Dispatch Technique for Vir...

Sort by: