10 |
|
|
11 |
<!-- |
<!-- |
12 |
|
|
13 |
$Id: intro.html,v 1.107 2007/03/08 19:04:09 debug Exp $ |
$Id: intro.html,v 1.108 2007/04/12 16:57:22 debug Exp $ |
14 |
|
|
15 |
Copyright (C) 2003-2007 Anders Gavare. All rights reserved. |
Copyright (C) 2003-2007 Anders Gavare. All rights reserved. |
16 |
|
|
53 |
<li><a href="#run">How to run the emulator</a> |
<li><a href="#run">How to run the emulator</a> |
54 |
<li><a href="#cpus">Which processor architectures does GXemul emulate?</a> |
<li><a href="#cpus">Which processor architectures does GXemul emulate?</a> |
55 |
<li><a href="#hosts">Which host architectures are supported?</a> |
<li><a href="#hosts">Which host architectures are supported?</a> |
|
<li><a href="#translation">What kind of translation does GXemul use?</a> |
|
56 |
<li><a href="#accuracy">Emulation accuracy</a> |
<li><a href="#accuracy">Emulation accuracy</a> |
57 |
<li><a href="#emulmodes">Which machines does GXemul emulate?</a> |
<li><a href="#emulmodes">Which machines does GXemul emulate?</a> |
58 |
</ul> |
</ul> |
235 |
GXemul should compile and run on any modern host architecture (64-bit or |
GXemul should compile and run on any modern host architecture (64-bit or |
236 |
32-bit word-length). |
32-bit word-length). |
237 |
|
|
238 |
<p>Note: The dynamic translation engine does <i>not</i> require backends |
<p>Note: The <a href="translation.html">dynamic translation</a> engine |
239 |
for native code generation to be written for each individual host |
does <i>not</i> require backends for native code generation to be written |
240 |
architecture; the "intermediate representation" that the dyntrans system |
for each individual host architecture; the intermediate representation |
241 |
uses can be executed on any host architecture. |
that the dyntrans system uses can be executed on any host architecture. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<p><br> |
|
|
<a name="translation"></a> |
|
|
<h3>What kind of translation does GXemul use?</h3> |
|
|
|
|
|
<b>Static vs. dynamic:</b> |
|
|
|
|
|
<p>In order to support guest operating systems, which can overwrite old |
|
|
code pages in memory with new code, it is necessary to translate code |
|
|
dynamically. It is not possible to do a "one-pass" (static) translation. |
|
|
Self-modifying code and Just-in-Time compilers running inside |
|
|
the emulator are other things that would not work with a static |
|
|
translator. GXemul is a dynamic translator. However, it does not |
|
|
necessarily translate into native code, like many other emulators. |
|
|
|
|
|
<p><b>"Runnable" Intermediate Representation:</b> |
|
|
|
|
|
<p>Dynamic translators usually translate from the emulated architecture |
|
|
(e.g. MIPS) into a kind of <i>intermediate representation</i> (IR), and then |
|
|
to native code (e.g. AMD64 or x86 code). Since one of my main goals for |
|
|
GXemul is to keep everything as portable as possible, I have tried to make |
|
|
sure that the IR is something which can be executed regardless of whether |
|
|
the final step (translation from IR to native code) has been implemented |
|
|
or not. |
|
|
|
|
|
<p>The IR in GXemul consists of arrays of pointers to functions, and a few |
|
|
arguments which are passed along to those functions. The functions are |
|
|
implemented in either manually hand-coded C, or automatically generated C. |
|
|
In any case, this is all statically linked into the GXemul binary at link |
|
|
time. |
|
|
|
|
|
<p>Here is a simplified diagram of how these arrays work. |
|
|
|
|
|
<p><center><img src="simplified_dyntrans.png"></center> |
|
|
|
|
|
<p>There is one instruction call slot for every possible program counter |
|
|
location. In the MIPS case, instruction words are 32 bits in length, |
|
|
and pages are (usually) 4 KB large, resulting in 1024 instruction call |
|
|
slots. After the last of these instruction calls, there is an additional |
|
|
call to a special "end of page" function (which doesn't count as an executed |
|
|
instruction). This function switches to the first instruction |
|
|
on the next virtual page (which might cause exceptions, etc). |
|
|
|
|
|
<p>The complexity of individual instructions vary. A simple example of |
|
|
what an instruction can look like is the MIPS <tt>addiu</tt> instruction: |
|
|
<pre> |
|
|
X(addiu) |
|
|
{ |
|
|
reg(ic->arg[1]) = (int32_t) |
|
|
((int32_t)reg(ic->arg[0]) + (int32_t)ic->arg[2]); |
|
|
} |
|
|
</pre> |
|
|
|
|
|
<p>It stores the result of a 32-bit addition of the register at arg[0] |
|
|
with the immediate value arg[2] (treating both as signed 32-bit |
|
|
integers) into register arg[1]. If the emulated CPU is a 64-bit CPU, |
|
|
then this will store a correctly sign-extended value into arg[1]. |
|
|
If it is a 32-bit CPU, then only the lowest 32 bits will be stored, |
|
|
and the high part ignored. <tt>X(addiu)</tt> is expanded to |
|
|
<tt>mips_instr_addiu</tt> in the 64-bit case, and <tt>mips32_instr_addiu</tt> |
|
|
in the 32-bit case. Both are compiled into the GXemul executable; no code |
|
|
is created during run-time. |
|
|
|
|
|
<p>Here are examples of what the <tt>addiu</tt> instruction actually |
|
|
looks like when it is compiled, on various host architectures: |
|
|
|
|
|
<p><center><table border="0"> |
|
|
<tr><td><b>GCC 4.0.1 on Alpha:</b></td> |
|
|
<td width="35"></td><td></td> |
|
|
<tr> |
|
|
<td valign="top"> |
|
|
<pre>mips_instr_addiu: |
|
|
ldq t1,8(a1) |
|
|
ldq t2,24(a1) |
|
|
ldq t3,16(a1) |
|
|
ldq t0,0(t1) |
|
|
addl t0,t2,t0 |
|
|
stq t0,0(t3) |
|
|
ret</pre> |
|
|
</td> |
|
|
<td></td> |
|
|
<td valign="top"> |
|
|
<pre>mips32_instr_addiu: |
|
|
ldq t2,8(a1) |
|
|
ldq t0,24(a1) |
|
|
ldq t3,16(a1) |
|
|
ldl t1,0(t2) |
|
|
addq t0,t1,t0 |
|
|
stl t0,0(t3) |
|
|
ret</pre> |
|
|
</td> |
|
|
</tr> |
|
|
|
|
|
<tr><td><b><br>GCC 3.4.4 on AMD64:</b></td> |
|
|
<tr> |
|
|
<td valign="top"> |
|
|
<pre>mips_instr_addiu: |
|
|
mov 0x8(%rsi),%rdx |
|
|
mov 0x18(%rsi),%rax |
|
|
mov 0x10(%rsi),%rcx |
|
|
add (%rdx),%eax |
|
|
cltq |
|
|
mov %rax,(%rcx) |
|
|
retq</pre> |
|
|
</td> |
|
|
<td></td> |
|
|
<td valign="top"> |
|
|
<pre>mips32_instr_addiu: |
|
|
mov 0x8(%rsi),%rcx |
|
|
mov 0x10(%rsi),%rdx |
|
|
mov (%rcx),%eax |
|
|
add 0x18(%rsi),%eax |
|
|
mov %eax,(%rdx) |
|
|
retq</pre> |
|
|
</td> |
|
|
</tr> |
|
|
|
|
|
<tr><td><b><br>GCC 4.0.1 on i386:</b></td> |
|
|
<tr> |
|
|
<td valign="top"> |
|
|
<pre>mips_instr_addiu: |
|
|
mov 0x8(%esp),%eax |
|
|
mov 0x8(%eax),%ecx |
|
|
mov 0x4(%eax),%edx |
|
|
mov 0xc(%eax),%eax |
|
|
add (%edx),%eax |
|
|
mov %eax,(%ecx) |
|
|
cltd |
|
|
mov %edx,0x4(%ecx) |
|
|
ret</pre> |
|
|
</td> |
|
|
<td></td> |
|
|
<td valign="top"> |
|
|
<pre>mips32_instr_addiu: |
|
|
mov 0x8(%esp),%eax |
|
|
mov 0x8(%eax),%ecx |
|
|
mov 0x4(%eax),%edx |
|
|
mov 0xc(%eax),%eax |
|
|
add (%edx),%eax |
|
|
mov %eax,(%ecx) |
|
|
ret</pre> |
|
|
</td> |
|
|
</tr> |
|
|
</table></center> |
|
|
|
|
|
<p>On 64-bit hosts, there is not much difference, but on 32-bit hosts (and |
|
|
to some extent on AMD64), the difference is enough to make it worthwhile. |
|
|
|
|
|
|
|
|
<p><b>Performance:</b> |
|
|
|
|
|
<p>The performance of using this kind of runnable IR is obviously lower |
|
|
than what can be achieved by emulators using native code generation, but |
|
|
can be significantly higher than using a naive fetch-decode-execute |
|
|
interpretation loop. In my opinion, using a runnable IR is an interesting |
|
|
compromise. |
|
|
|
|
|
<p>The overhead per emulated instruction is usually around or below |
|
|
approximately 10 host instructions. This is very much dependent on your |
|
|
host architecture and what compiler and compiler switches you are using. |
|
|
Added to this instruction count is (of course) also the C code used to |
|
|
implement each specific instruction. |
|
|
|
|
|
<p><b>Instruction Combinations:</b> |
|
|
|
|
|
<p>Short, common instruction sequences can sometimes be replaced by a |
|
|
"compound" instruction. An example could be a compare instruction followed |
|
|
by a conditional branch instruction. The advantages of instruction |
|
|
combinations are that |
|
|
<ul> |
|
|
<li>the amortized overhead per instruction is slightly reduced, and |
|
|
<p> |
|
|
<li>the host's compiler can make a good job at optimizing the common |
|
|
instruction sequence. |
|
|
</ul> |
|
|
|
|
|
<p>The special cases where instruction combinations give the most gain |
|
|
are in the cores of string/memory manipulation functions such as |
|
|
<tt>memset()</tt> or <tt>strlen()</tt>. The core loop can then (at least |
|
|
to some extent) be replaced by a native call to the equivalent function. |
|
|
|
|
|
<p>The implementations of compound instructions still keep track of the |
|
|
number of executed instructions, etc. When single-stepping, these |
|
|
translations are invalidated, and replaced by normal instruction calls |
|
|
(one per emulated instruction). |
|
|
|
|
|
<p><b>Native Code Back-ends:</b> |
|
|
|
|
|
<p>In theory, it will be possible to implement native code generation, |
|
|
similar to what is used in high-performance emulators such as QEMU, |
|
|
as long as that generated code abides to the C ABI on the host. |
|
|
|
|
|
<p>However, since I wanted to make sure that GXemul works without such |
|
|
native code back-ends, there are no implemented backends in this release. |
|
|
|
|
|
<p>(There is a place-holder in the source code for native code generation, |
|
|
which can be used for experiments, but it does not contain any working |
|
|
code at the moment.) |
|
|
|
|
242 |
|
|
243 |
|
|
244 |
|
|