10 |
|
|
11 |
<!-- |
<!-- |
12 |
|
|
13 |
$Id: intro.html,v 1.73 2006/02/18 14:02:19 debug Exp $ |
$Id: intro.html,v 1.87 2006/06/23 10:00:41 debug Exp $ |
14 |
|
|
15 |
Copyright (C) 2003-2006 Anders Gavare. All rights reserved. |
Copyright (C) 2003-2006 Anders Gavare. All rights reserved. |
16 |
|
|
52 |
<li><a href="#build">How to compile/build the emulator</a> |
<li><a href="#build">How to compile/build the emulator</a> |
53 |
<li><a href="#run">How to run the emulator</a> |
<li><a href="#run">How to run the emulator</a> |
54 |
<li><a href="#cpus">Which processor architectures does GXemul emulate?</a> |
<li><a href="#cpus">Which processor architectures does GXemul emulate?</a> |
55 |
|
<li><a href="#hosts">Which host architectures are supported?</a> |
56 |
|
<li><a href="#translation">What kind of translation does GXemul use?</a> |
57 |
<li><a href="#accuracy">Emulation accuracy</a> |
<li><a href="#accuracy">Emulation accuracy</a> |
58 |
<li><a href="#emulmodes">Which machines does GXemul emulate?</a> |
<li><a href="#emulmodes">Which machines does GXemul emulate?</a> |
59 |
</ul> |
</ul> |
73 |
hardware components are emulated well enough to let unmodified operating |
hardware components are emulated well enough to let unmodified operating |
74 |
systems (e.g. NetBSD) run as if they were running on a real machine. |
systems (e.g. NetBSD) run as if they were running on a real machine. |
75 |
|
|
76 |
<p>The processor architecture best emulated by GXemul is MIPS, but other |
<p>Devices and processors (ARM, MIPS, PowerPC) are not simulated with 100% |
77 |
architectures such as ARM and PowerPC are also partially emulated. |
accuracy. They are only ``faked'' well enough to allow guest operating |
78 |
|
systems run without complaining too much. Still, the emulator could be of |
79 |
<p>Devices and CPUs are not simulated with 100% accuracy. They are only |
interest for academic research and experiments, such as when learning how |
80 |
``faked'' well enough to allow guest operating systems run without |
to write operating system code. |
|
complaining too much. Still, the emulator could be of interest for |
|
|
academic research and experiments, such as when learning how to write |
|
|
operating system code. |
|
81 |
|
|
82 |
<p>The emulator is written in C, does not depend on third-party libraries, |
<p>The emulator is written in C, does not depend on third-party libraries, |
83 |
and should compile and run on most 64-bit and 32-bit Unix-like systems. |
and should compile and run on most 64-bit and 32-bit Unix-like systems. |
158 |
<p>The emulator's performance is highly dependent on both runtime settings |
<p>The emulator's performance is highly dependent on both runtime settings |
159 |
and on compiler settings, so you might want to experiment with different |
and on compiler settings, so you might want to experiment with different |
160 |
CC and CFLAGS environment variable values. For example, on an AMD Athlon |
CC and CFLAGS environment variable values. For example, on an AMD Athlon |
161 |
host, you might want to try setting <tt>CFLAGS</tt> to <tt>-march=athlon |
host, you might want to try setting <tt>CFLAGS</tt> to <tt>-march=athlon</tt> |
162 |
-O3</tt> before running <tt>configure</tt>. |
before running <tt>configure</tt>. |
163 |
|
|
164 |
|
|
165 |
|
|
212 |
<a name="cpus"></a> |
<a name="cpus"></a> |
213 |
<h3>Which processor architectures does GXemul emulate?</h3> |
<h3>Which processor architectures does GXemul emulate?</h3> |
214 |
|
|
215 |
<h4>MIPS:</h4> |
The architectures that are emulated well enough to let at least one |
216 |
|
guest operating system run (per architecture) are ARM, MIPS, and |
217 |
|
PowerPC. |
218 |
|
|
219 |
|
|
220 |
|
|
221 |
|
|
222 |
|
|
223 |
|
<p><br> |
224 |
|
<a name="hosts"></a> |
225 |
|
<h3>Which host architectures are supported?</h3> |
226 |
|
|
227 |
|
As of release 0.4.0 of GXemul, the old binary translation subsystem, which |
228 |
|
was used for emulation of MIPS processors on Alpha and i386 hosts, has |
229 |
|
been removed. The current dynamic translation subsystem should work on any |
230 |
|
host. |
231 |
|
|
232 |
|
|
233 |
|
|
234 |
|
|
|
Emulation of R4000, which is a 64-bit CPU, was my initial goal. |
|
|
R2000/R3000-like CPUs (32-bit), R1x000, and generic MIPS32/MIPS64-style |
|
|
CPUs are also emulated, and are hopefully almost as stable as the R4000 |
|
|
emulation. Several guest operating systems for MIPS can run inside |
|
|
the emulator. |
|
235 |
|
|
236 |
<p>(For MIPS emulation, I have written an experimental dynamic binary |
<p><br> |
237 |
translation subsystem, for Alpha and i386 hosts. This gives higher total |
<a name="translation"></a> |
238 |
performance than interpreting one instruction at a time and executing it. |
<h3>What kind of translation does GXemul use?</h3> |
|
If you wish to disable bintrans, add <b>-B</b> to the command line.) |
|
239 |
|
|
240 |
<h4>ARM:</h4> |
<b>Static vs. dynamic:</b> |
241 |
|
|
242 |
ARM emulation is good enough to run NetBSD/cats, OpenBSD/cats, and |
<p>In order to support guest operating systems, which can overwrite old |
243 |
NetBSD/evbarm, but it is not as tested or fine-tuned as the MIPS emulation |
code pages in memory with new code, it is necessary to translate code |
244 |
mode. |
dynamically. It is not possible to do a "one-pass" (static) translation. |
245 |
|
Self-modifying code and Just-in-Time compilers running inside |
246 |
|
the emulator are other things that would not work with a static |
247 |
|
translator. GXemul is a dynamic translator. However, it does not |
248 |
|
necessarily translate into native code, like many other emulators. |
249 |
|
|
250 |
|
<p><b>"Runnable" Intermediate Representation:</b> |
251 |
|
|
252 |
|
<p>Dynamic translators usually translate from the emulated architecture |
253 |
|
(e.g. MIPS) into a kind of <i>intermediate representation</i> (IR), and then |
254 |
|
to native code (e.g. AMD64 or x86 code). Since one of my main goals for |
255 |
|
GXemul is to keep everything as portable as possible, I have tried to make |
256 |
|
sure that the IR is something which can be executed regardless of whether |
257 |
|
the final step (translation from IR to native code) has been implemented |
258 |
|
or not. |
259 |
|
|
260 |
|
<p>The IR in GXemul consists of arrays of pointers to functions, and a few |
261 |
|
arguments which are passed along to those functions. The functions are |
262 |
|
implemented in either manually hand-coded C, or automatically generated C. |
263 |
|
In any case, this is all statically linked into the GXemul binary at link |
264 |
|
time. |
265 |
|
|
266 |
|
<p>Here is a simplified diagram of how these arrays work. |
267 |
|
|
268 |
|
<p><center><img src="simplified_dyntrans.png"></center> |
269 |
|
|
270 |
|
<p>There is one instruction call slot for every possible program counter |
271 |
|
location. In the MIPS case, instruction words are 32 bits in length, |
272 |
|
and pages are (usually) 4 KB large, resulting in 1024 instruction call |
273 |
|
slots. After the last of these instruction calls, there is an additional |
274 |
|
call to a special "end of page" function (which doesn't count as an executed |
275 |
|
instruction). This function switches to the first instruction |
276 |
|
on the next virtual page (which might cause exceptions, etc). |
277 |
|
|
278 |
<h4>PowerPC:</h4> |
<p>The complexity of individual instructions vary. A simple example of |
279 |
|
what an instruction can look like is the MIPS <tt>addiu</tt> instruction: |
280 |
|
<pre> |
281 |
|
X(addiu) |
282 |
|
{ |
283 |
|
reg(ic->arg[1]) = (int32_t) |
284 |
|
((int32_t)reg(ic->arg[0]) + (int32_t)ic->arg[2]); |
285 |
|
} |
286 |
|
</pre> |
287 |
|
|
288 |
PowerPC emulation is still in its beginning stages, but good enough |
<p>It stores the result of a 32-bit addition of the register at arg[0] |
289 |
to run NetBSD/prep 2.1. |
with the immediate value arg[2] (treating both as signed 32-bit |
290 |
|
integers) into register arg[1]. If the emulated CPU is a 64-bit CPU, |
291 |
|
then this will store a correctly sign-extended value into arg[1]. |
292 |
|
If it is a 32-bit CPU, then only the lowest 32 bits will be stored, |
293 |
|
and the high part ignored. <tt>X(addiu)</tt> is expanded to |
294 |
|
<tt>mips_instr_addiu</tt> in the 64-bit case, and <tt>mips32_instr_addiu</tt> |
295 |
|
in the 32-bit case. Both are compiled into the GXemul executable; no code |
296 |
|
is created during run-time. |
297 |
|
|
298 |
|
<p>Here are examples of what the <tt>addiu</tt> instruction actually |
299 |
|
looks like when it is compiled, on various host architectures: |
300 |
|
|
301 |
|
<p><center><table border="0"> |
302 |
|
<tr><td><b>GCC 4.0.1 on Alpha:</b></td> |
303 |
|
<td width="35"></td><td></td> |
304 |
|
<tr> |
305 |
|
<td valign="top"> |
306 |
|
<pre>mips_instr_addiu: |
307 |
|
ldq t1,8(a1) |
308 |
|
ldq t2,24(a1) |
309 |
|
ldq t3,16(a1) |
310 |
|
ldq t0,0(t1) |
311 |
|
addl t0,t2,t0 |
312 |
|
stq t0,0(t3) |
313 |
|
ret</pre> |
314 |
|
</td> |
315 |
|
<td></td> |
316 |
|
<td valign="top"> |
317 |
|
<pre>mips32_instr_addiu: |
318 |
|
ldq t2,8(a1) |
319 |
|
ldq t0,24(a1) |
320 |
|
ldq t3,16(a1) |
321 |
|
ldl t1,0(t2) |
322 |
|
addq t0,t1,t0 |
323 |
|
stl t0,0(t3) |
324 |
|
ret</pre> |
325 |
|
</td> |
326 |
|
</tr> |
327 |
|
|
328 |
|
<tr><td><b><br>GCC 3.4.4 on AMD64:</b></td> |
329 |
|
<tr> |
330 |
|
<td valign="top"> |
331 |
|
<pre>mips_instr_addiu: |
332 |
|
mov 0x8(%rsi),%rdx |
333 |
|
mov 0x18(%rsi),%rax |
334 |
|
mov 0x10(%rsi),%rcx |
335 |
|
add (%rdx),%eax |
336 |
|
cltq |
337 |
|
mov %rax,(%rcx) |
338 |
|
retq</pre> |
339 |
|
</td> |
340 |
|
<td></td> |
341 |
|
<td valign="top"> |
342 |
|
<pre>mips32_instr_addiu: |
343 |
|
mov 0x8(%rsi),%rcx |
344 |
|
mov 0x10(%rsi),%rdx |
345 |
|
mov (%rcx),%eax |
346 |
|
add 0x18(%rsi),%eax |
347 |
|
mov %eax,(%rdx) |
348 |
|
retq</pre> |
349 |
|
</td> |
350 |
|
</tr> |
351 |
|
|
352 |
|
<tr><td><b><br>GCC 4.0.1 on i386:</b></td> |
353 |
|
<tr> |
354 |
|
<td valign="top"> |
355 |
|
<pre>mips_instr_addiu: |
356 |
|
mov 0x8(%esp),%eax |
357 |
|
mov 0x8(%eax),%ecx |
358 |
|
mov 0x4(%eax),%edx |
359 |
|
mov 0xc(%eax),%eax |
360 |
|
add (%edx),%eax |
361 |
|
mov %eax,(%ecx) |
362 |
|
cltd |
363 |
|
mov %edx,0x4(%ecx) |
364 |
|
ret</pre> |
365 |
|
</td> |
366 |
|
<td></td> |
367 |
|
<td valign="top"> |
368 |
|
<pre>mips32_instr_addiu: |
369 |
|
mov 0x8(%esp),%eax |
370 |
|
mov 0x8(%eax),%ecx |
371 |
|
mov 0x4(%eax),%edx |
372 |
|
mov 0xc(%eax),%eax |
373 |
|
add (%edx),%eax |
374 |
|
mov %eax,(%ecx) |
375 |
|
ret</pre> |
376 |
|
</td> |
377 |
|
</tr> |
378 |
|
</table></center> |
379 |
|
|
380 |
|
<p>On 64-bit hosts, there is not much difference, but on 32-bit hosts (and |
381 |
|
to some extent on AMD64), the difference is enough to make it worthwhile. |
382 |
|
|
383 |
|
|
384 |
|
<p><b>Performance:</b> |
385 |
|
|
386 |
|
<p>The performance of using this kind of runnable IR is obviously lower |
387 |
|
than what can be achieved by emulators using native code generation, but |
388 |
|
can be significantly higher than using a naive fetch-decode-execute |
389 |
|
interpretation loop. In my opinion, using a runnable IR is an interesting |
390 |
|
compromise. |
391 |
|
|
392 |
|
<p>The overhead per emulated instruction is usually around or below |
393 |
|
approximately 10 host instructions. This is very much dependent on your |
394 |
|
host architecture and what compiler and compiler switches you are using. |
395 |
|
Added to this instruction count is (of course) also the C code used to |
396 |
|
implement each specific instruction. |
397 |
|
|
398 |
|
<p><b>Instruction Combinations:</b> |
399 |
|
|
400 |
|
<p>Short, common instruction sequences can sometimes be replaced by a |
401 |
|
"compound" instruction. An example could be a compare instruction followed |
402 |
|
by a conditional branch instruction. The advantages of instruction |
403 |
|
combinations are that |
404 |
|
<ul> |
405 |
|
<li>the amortized overhead per instruction is slightly reduced, and |
406 |
|
<p> |
407 |
|
<li>the host's compiler can make a good job at optimizing the common |
408 |
|
instruction sequence. |
409 |
|
</ul> |
410 |
|
|
411 |
|
<p>The special cases where instruction combinations give the most gain |
412 |
|
are in the cores of string/memory manipulation functions such as |
413 |
|
<tt>memset()</tt> or <tt>strlen()</tt>. The core loop can then (at least |
414 |
|
to some extent) be replaced by a native call to the equivalent function. |
415 |
|
|
416 |
|
<p>The implementations of compound instructions still keep track of the |
417 |
|
number of executed instructions, etc. When single-stepping, these |
418 |
|
translations are invalidated, and replaced by normal instruction calls |
419 |
|
(one per emulated instruction). |
420 |
|
|
421 |
|
<p><b>Native Code Back-ends: (not in this release)</b> |
422 |
|
|
423 |
|
<p>In theory, it will be possible to implement native code generation |
424 |
|
(similar to what is used in high-performance emulators such as QEMU), |
425 |
|
as long as that generated code abides to the C ABI on the host, but |
426 |
|
for now I wanted to make sure that GXemul works without such native |
427 |
|
code back-ends. For this reason, as of release 0.4.0, GXemul is |
428 |
|
completely free of native code back-ends. |
429 |
|
|
|
<p>Non-MIPS emulation modes use dynamic translation, but not recompilation |
|
|
into native code. This makes it possible to run on any host platform. |
|
430 |
|
|
431 |
|
|
432 |
|
|
437 |
<h3>Emulation accuracy:</h3> |
<h3>Emulation accuracy:</h3> |
438 |
|
|
439 |
GXemul is an instruction-level emulator; things that would happen in |
GXemul is an instruction-level emulator; things that would happen in |
440 |
several steps within a real CPU are not taken into account (eg. pipe-line |
several steps within a real CPU are not taken into account (e.g. pipe-line |
441 |
stalls or out-of-order execution). Still, instruction-level accuracy seems |
stalls or out-of-order execution). Still, instruction-level accuracy seems |
442 |
to be enough to be able to run complete guest operating systems inside the |
to be enough to be able to run complete guest operating systems inside the |
443 |
emulator. |
emulator. |
444 |
|
|
445 |
<p>Caches are by default not emulated. In some cases, the existance of |
<p>The existance of instruction and data caches is "faked" to let |
446 |
caches is "faked" to let operating systems think that they are there. |
operating systems think that they are there, but for all practical |
447 |
(There is some old code for R2000/R3000 caches, but it has probably |
purposes, these caches are non-working. |
|
suffered from bitrot by now.) |
|
448 |
|
|
449 |
<p>The emulator is <i>not</i> timing-accurate. It can be run in a |
<p>The emulator is <i>not</i> timing-accurate. It can be run in a |
450 |
"deterministic" mode, <tt><b>-D</b></tt>. The meaning of deterministic is |
"deterministic" mode, <tt><b>-D</b></tt>. The meaning of deterministic is |
454 |
option. (Deterministic in this case does <i>not</i> mean that the |
option. (Deterministic in this case does <i>not</i> mean that the |
455 |
emulation will be identical to some actual real-world machine.) |
emulation will be identical to some actual real-world machine.) |
456 |
|
|
457 |
<p><font color="#ff0000">(Oops/TODO: User interaction means <i>both</i> |
<p>(Note that user interaction means <i>both</i> input to the emulated |
458 |
input to the emulated program/OS, and interacting with the emulator |
program/OS, and interaction with the emulator's debugger. Breaking into the |
459 |
itself. Breaking into the debugger and then continuing execution may |
debugger and then continuing execution may affect when/how interrupts |
460 |
affect when/how interrupts occur.)</font> |
occur.) |
461 |
|
|
462 |
|
|
463 |
|
|
473 |
|
|
474 |
<p> |
<p> |
475 |
<ul> |
<ul> |
476 |
<li><b><u>MIPS</u></b> |
<li><b><u>ARM</u></b> |
477 |
<ul> |
<ul> |
478 |
<li><b>DECstation 5000/200</b> ("3max") |
<li><b>CATS</b> (NetBSD/cats, OpenBSD/cats) |
479 |
<li><b>Acer Pica-61</b> (an ARC machine) |
<li><b>IQ80321</b> (NetBSD/evbarm) |
|
<li><b>NEC MobilePro 770, 780, 800, and 880</b> (HPCmips machines) |
|
|
<li><b>Cobalt</b> |
|
|
<li><b>Malta</b> (evbmips) |
|
|
<li><b>SGI O2 ("IP32")</b> <font color="#0000e0">(<super>*</super>)</font> |
|
480 |
</ul> |
</ul> |
481 |
<p> |
<p> |
482 |
<li><b><u>ARM</u></b> |
<li><b><u>MIPS</u></b> |
483 |
<ul> |
<ul> |
484 |
<li><b>CATS</b> |
<li><b>DECstation 5000/200</b> (NetBSD/pmax, OpenBSD/pmax, Ultrix, |
485 |
<li><b>IQ80321</b> (evbarm) |
Linux/DECstation, Sprite) |
486 |
|
<li><b>Acer Pica-61</b> (NetBSD/arc) |
487 |
|
<li><b>NEC MobilePro 770, 780, 800, and 880</b> (NetBSD/hpcmips) |
488 |
|
<li><b>Cobalt</b> (NetBSD/cobalt) |
489 |
|
<li><b>Malta</b> (NetBSD/evbmips) |
490 |
|
<li><b>SGI O2 (aka IP32)</b> <font color="#0000e0">(<super>*</super>)</font> |
491 |
|
(NetBSD/sgi) |
492 |
</ul> |
</ul> |
493 |
<p> |
<p> |
494 |
<li><b><u>PowerPC</u></b> |
<li><b><u>PowerPC</u></b> |
495 |
<ul> |
<ul> |
496 |
<li><b>PReP (PowerPC Reference Platform)</b> |
<li><b>IBM 6050/6070 (PReP, PowerPC Reference Platform)</b> (NetBSD/prep) |
497 |
</ul> |
</ul> |
498 |
</ul> |
</ul> |
499 |
|
|