/[gxemul]/trunk/doc/intro.html
This is repository of my old source code which isn't updated any more. Go to git.rot13.org for current projects!
ViewVC logotype

Diff of /trunk/doc/intro.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 37 by dpavlin, Mon Oct 8 16:21:34 2007 UTC revision 38 by dpavlin, Mon Oct 8 16:21:53 2007 UTC
# Line 10  Line 10 
10    
11  <!--  <!--
12    
13  $Id: intro.html,v 1.107 2007/03/08 19:04:09 debug Exp $  $Id: intro.html,v 1.108 2007/04/12 16:57:22 debug Exp $
14    
15  Copyright (C) 2003-2007  Anders Gavare.  All rights reserved.  Copyright (C) 2003-2007  Anders Gavare.  All rights reserved.
16    
# Line 53  SUCH DAMAGE. Line 53  SUCH DAMAGE.
53    <li><a href="#run">How to run the emulator</a>    <li><a href="#run">How to run the emulator</a>
54    <li><a href="#cpus">Which processor architectures does GXemul emulate?</a>    <li><a href="#cpus">Which processor architectures does GXemul emulate?</a>
55    <li><a href="#hosts">Which host architectures are supported?</a>    <li><a href="#hosts">Which host architectures are supported?</a>
   <li><a href="#translation">What kind of translation does GXemul use?</a>  
56    <li><a href="#accuracy">Emulation accuracy</a>    <li><a href="#accuracy">Emulation accuracy</a>
57    <li><a href="#emulmodes">Which machines does GXemul emulate?</a>    <li><a href="#emulmodes">Which machines does GXemul emulate?</a>
58  </ul>  </ul>
# Line 236  that can be considered "working" in the Line 235  that can be considered "working" in the
235  GXemul should compile and run on any modern host architecture (64-bit or  GXemul should compile and run on any modern host architecture (64-bit or
236  32-bit word-length).  32-bit word-length).
237    
238  <p>Note: The dynamic translation engine does <i>not</i> require backends  <p>Note: The <a href="translation.html">dynamic translation</a> engine
239  for native code generation to be written for each individual host  does <i>not</i> require backends for native code generation to be written
240  architecture; the "intermediate representation" that the dyntrans system  for each individual host architecture; the intermediate representation
241  uses can be executed on any host architecture.  that the dyntrans system uses can be executed on any host architecture.
   
   
   
   
   
 <p><br>  
 <a name="translation"></a>  
 <h3>What kind of translation does GXemul use?</h3>  
   
 <b>Static vs. dynamic:</b>  
   
 <p>In order to support guest operating systems, which can overwrite old  
 code pages in memory with new code, it is necessary to translate code  
 dynamically. It is not possible to do a "one-pass" (static) translation.  
 Self-modifying code and Just-in-Time compilers running inside  
 the emulator are other things that would not work with a static  
 translator. GXemul is a dynamic translator. However, it does not  
 necessarily translate into native code, like many other emulators.  
   
 <p><b>"Runnable" Intermediate Representation:</b>  
   
 <p>Dynamic translators usually translate from the emulated architecture  
 (e.g. MIPS) into a kind of <i>intermediate representation</i> (IR), and then  
 to native code (e.g. AMD64 or x86 code). Since one of my main goals for  
 GXemul is to keep everything as portable as possible, I have tried to make  
 sure that the IR is something which can be executed regardless of whether  
 the final step (translation from IR to native code) has been implemented  
 or not.  
   
 <p>The IR in GXemul consists of arrays of pointers to functions, and a few  
 arguments which are passed along to those functions. The functions are  
 implemented in either manually hand-coded C, or automatically generated C.  
 In any case, this is all statically linked into the GXemul binary at link  
 time.  
   
 <p>Here is a simplified diagram of how these arrays work.  
   
 <p><center><img src="simplified_dyntrans.png"></center>  
   
 <p>There is one instruction call slot for every possible program counter  
 location. In the MIPS case, instruction words are 32 bits in length,  
 and pages are (usually) 4 KB large, resulting in 1024 instruction call  
 slots. After the last of these instruction calls, there is an additional  
 call to a special "end of page" function (which doesn't count as an executed  
 instruction). This function switches to the first instruction  
 on the next virtual page (which might cause exceptions, etc).  
   
 <p>The complexity of individual instructions vary. A simple example of  
 what an instruction can look like is the MIPS <tt>addiu</tt> instruction:  
 <pre>  
         X(addiu)  
         {  
                 reg(ic->arg[1]) = (int32_t)  
                     ((int32_t)reg(ic->arg[0]) + (int32_t)ic->arg[2]);  
         }  
 </pre>  
   
 <p>It stores the result of a 32-bit addition of the register at arg[0]  
 with the immediate value arg[2] (treating both as signed 32-bit  
 integers) into register arg[1]. If the emulated CPU is a 64-bit CPU,  
 then this will store a correctly sign-extended value into arg[1].  
 If it is a 32-bit CPU, then only the lowest 32 bits will be stored,  
 and the high part ignored. <tt>X(addiu)</tt> is expanded to  
 <tt>mips_instr_addiu</tt> in the 64-bit case, and <tt>mips32_instr_addiu</tt>  
 in the 32-bit case. Both are compiled into the GXemul executable; no code  
 is created during run-time.  
   
 <p>Here are examples of what the <tt>addiu</tt> instruction actually  
 looks like when it is compiled, on various host architectures:  
   
 <p><center><table border="0">  
     <tr><td><b>GCC 4.0.1 on Alpha:</b></td>  
         <td width="35"></td><td></td>  
     <tr>  
         <td valign="top">  
 <pre>mips_instr_addiu:  
      ldq     t1,8(a1)  
      ldq     t2,24(a1)  
      ldq     t3,16(a1)  
      ldq     t0,0(t1)  
      addl    t0,t2,t0  
      stq     t0,0(t3)  
      ret</pre>  
         </td>  
         <td></td>  
         <td valign="top">  
 <pre>mips32_instr_addiu:  
      ldq     t2,8(a1)  
      ldq     t0,24(a1)  
      ldq     t3,16(a1)  
      ldl     t1,0(t2)  
      addq    t0,t1,t0  
      stl     t0,0(t3)  
      ret</pre>  
         </td>  
     </tr>  
   
     <tr><td><b><br>GCC 3.4.4 on AMD64:</b></td>  
     <tr>  
         <td valign="top">  
 <pre>mips_instr_addiu:  
      mov    0x8(%rsi),%rdx  
      mov    0x18(%rsi),%rax  
      mov    0x10(%rsi),%rcx  
      add    (%rdx),%eax  
      cltq  
      mov    %rax,(%rcx)  
      retq</pre>  
         </td>  
         <td></td>  
         <td valign="top">  
 <pre>mips32_instr_addiu:  
      mov    0x8(%rsi),%rcx  
      mov    0x10(%rsi),%rdx  
      mov    (%rcx),%eax  
      add    0x18(%rsi),%eax  
      mov    %eax,(%rdx)  
      retq</pre>  
         </td>  
     </tr>  
   
     <tr><td><b><br>GCC 4.0.1 on i386:</b></td>  
     <tr>  
         <td valign="top">  
 <pre>mips_instr_addiu:  
      mov    0x8(%esp),%eax  
      mov    0x8(%eax),%ecx  
      mov    0x4(%eax),%edx  
      mov    0xc(%eax),%eax  
      add    (%edx),%eax  
      mov    %eax,(%ecx)  
      cltd  
      mov    %edx,0x4(%ecx)  
      ret</pre>  
         </td>  
         <td></td>  
         <td valign="top">  
 <pre>mips32_instr_addiu:  
      mov    0x8(%esp),%eax  
      mov    0x8(%eax),%ecx  
      mov    0x4(%eax),%edx  
      mov    0xc(%eax),%eax  
      add    (%edx),%eax  
      mov    %eax,(%ecx)  
      ret</pre>  
         </td>  
     </tr>  
 </table></center>  
   
 <p>On 64-bit hosts, there is not much difference, but on 32-bit hosts (and  
 to some extent on AMD64), the difference is enough to make it worthwhile.  
   
   
 <p><b>Performance:</b>  
   
 <p>The performance of using this kind of runnable IR is obviously lower  
 than what can be achieved by emulators using native code generation, but  
 can be significantly higher than using a naive fetch-decode-execute  
 interpretation loop. In my opinion, using a runnable IR is an interesting  
 compromise.  
   
 <p>The overhead per emulated instruction is usually around or below  
 approximately 10 host instructions. This is very much dependent on your  
 host architecture and what compiler and compiler switches you are using.  
 Added to this instruction count is (of course) also the C code used to  
 implement each specific instruction.  
   
 <p><b>Instruction Combinations:</b>  
   
 <p>Short, common instruction sequences can sometimes be replaced by a  
 "compound" instruction. An example could be a compare instruction followed  
 by a conditional branch instruction. The advantages of instruction  
 combinations are that  
 <ul>  
   <li>the amortized overhead per instruction is slightly reduced, and  
   <p>  
   <li>the host's compiler can make a good job at optimizing the common  
         instruction sequence.  
 </ul>  
   
 <p>The special cases where instruction combinations give the most gain  
 are in the cores of string/memory manipulation functions such as  
 <tt>memset()</tt> or <tt>strlen()</tt>. The core loop can then (at least  
 to some extent) be replaced by a native call to the equivalent function.  
   
 <p>The implementations of compound instructions still keep track of the  
 number of executed instructions, etc. When single-stepping, these  
 translations are invalidated, and replaced by normal instruction calls  
 (one per emulated instruction).  
   
 <p><b>Native Code Back-ends:</b>  
   
 <p>In theory, it will be possible to implement native code generation,  
 similar to what is used in high-performance emulators such as QEMU,  
 as long as that generated code abides to the C ABI on the host.  
   
 <p>However, since I wanted to make sure that GXemul works without such  
 native code back-ends, there are no implemented backends in this release.  
   
 <p>(There is a place-holder in the source code for native code generation,  
 which can be used for experiments, but it does not contain any working  
 code at the moment.)  
   
242    
243    
244    

Legend:
Removed from v.37  
changed lines
  Added in v.38

  ViewVC Help
Powered by ViewVC 1.1.26