1 |
<html><head><title>GXemul documentation: Technical details</title> |
<html><head><title>Gavare's eXperimental Emulator: Technical details</title> |
2 |
<meta name="robots" content="noarchive,nofollow,noindex"></head> |
<meta name="robots" content="noarchive,nofollow,noindex"></head> |
3 |
<body bgcolor="#f8f8f8" text="#000000" link="#4040f0" vlink="#404040" alink="#ff0000"> |
<body bgcolor="#f8f8f8" text="#000000" link="#4040f0" vlink="#404040" alink="#ff0000"> |
4 |
<table border=0 width=100% bgcolor="#d0d0d0"><tr> |
<table border=0 width=100% bgcolor="#d0d0d0"><tr> |
5 |
<td width=100% align=center valign=center><table border=0 width=100%><tr> |
<td width=100% align=center valign=center><table border=0 width=100%><tr> |
6 |
<td align="left" valign=center bgcolor="#d0efff"><font color="#6060e0" size="6"> |
<td align="left" valign=center bgcolor="#d0efff"><font color="#6060e0" size="6"> |
7 |
<b>GXemul documentation:</b></font> |
<b>Gavare's eXperimental Emulator: </b></font> |
8 |
<font color="#000000" size="6"><b>Technical details</b> |
<font color="#000000" size="6"><b>Technical details</b> |
9 |
</font></td></tr></table></td></tr></table><p> |
</font></td></tr></table></td></tr></table><p> |
10 |
|
|
11 |
<!-- |
<!-- |
12 |
|
|
13 |
$Id: technical.html,v 1.51 2005/06/04 22:47:49 debug Exp $ |
$Id: technical.html,v 1.63 2005/10/07 15:10:00 debug Exp $ |
14 |
|
|
15 |
Copyright (C) 2004-2005 Anders Gavare. All rights reserved. |
Copyright (C) 2004-2005 Anders Gavare. All rights reserved. |
16 |
|
|
40 |
--> |
--> |
41 |
|
|
42 |
|
|
43 |
|
|
44 |
<a href="./">Back to the index</a> |
<a href="./">Back to the index</a> |
45 |
|
|
46 |
<p><br> |
<p><br> |
47 |
<h2>Technical details</h2> |
<h2>Technical details</h2> |
48 |
|
|
49 |
<p> |
<p>This page describes some of the internals of GXemul. |
|
This page describes some of the internals of GXemul. |
|
|
|
|
|
<p> |
|
|
<font color="#e00000"><b>NOTE: This page is probably not |
|
|
very up-to-date by now.</b></font> |
|
50 |
|
|
51 |
<p> |
<p> |
52 |
<ul> |
<ul> |
53 |
<li><a href="#overview">Overview</a> |
<li><a href="#speed">Speed and emulation modes</a> |
|
<li><a href="#speed">Speed</a> |
|
54 |
<li><a href="#net">Networking</a> |
<li><a href="#net">Networking</a> |
55 |
<li><a href="#devices">Emulation of hardware devices</a> |
<li><a href="#devices">Emulation of hardware devices</a> |
|
<li><a href="#regtest">Regression tests</a> |
|
56 |
</ul> |
</ul> |
57 |
|
|
58 |
|
|
59 |
|
|
60 |
|
|
|
<p><br> |
|
|
<a name="overview"></a> |
|
|
<h3>Overview</h3> |
|
|
|
|
|
In simple terms, GXemul is just a simple fetch-and-execute |
|
|
loop; an instruction is fetched from memory, and executed. |
|
|
|
|
|
<p> |
|
|
In reality, a lot of things need to be handled. Before each instruction is |
|
|
executed, the emulator checks to see if any interrupts are asserted which |
|
|
are not masked away. If so, then an INT exception is generated. Exceptions |
|
|
cause the program counter to be set to a specific value, and some of the |
|
|
system coprocessor's registers to be set to values signifying what kind of |
|
|
exception it was (an interrupt exception in this case). |
|
|
|
|
|
<p> |
|
|
Reading instructions from memory is done through a TLB, a translation |
|
|
lookaside buffer. The TLB on MIPS is software controlled, which means that |
|
|
the program running inside the emulator (for example an operating system |
|
|
kernel) has to take care of manually updating the TLB. Some memory |
|
|
addresses are translated into physical addresses directly, some are |
|
|
translated into valid physical addresses via the TLB, and some memory |
|
|
references are not valid. Invalid memory references cause exceptions. |
|
|
|
|
|
<p> |
|
|
After an instruction has been read from memory, the emulator checks which |
|
|
opcode it contains and executes the instruction. Executing an instruction |
|
|
usually involves reading some register and writing some register, or perhaps a |
|
|
load from memory (or a store to memory). The program counter is increased |
|
|
for every instruction. |
|
|
|
|
|
<p> |
|
|
Some memory references point to physical addresses which are not in the |
|
|
normal RAM address space. They may point to hardware devices. If that is |
|
|
the case, then loads and stores are converted into calls to a device |
|
|
access function. The device access function is then responsible for |
|
|
handling these reads and writes. For example, a graphical framebuffer |
|
|
device may put a pixel on the screen when a value is written to it, or a |
|
|
serial controller device may output a character to stdout when written to. |
|
|
|
|
|
|
|
61 |
|
|
62 |
|
|
63 |
<p><br> |
<p><br> |
64 |
<a name="speed"></a> |
<a name="speed"></a> |
65 |
<h3>Speed</h3> |
<h3>Speed and emulation modes</h3> |
|
|
|
|
There are two modes in which the emulator can run, <b>a</b>) a straight forward |
|
|
loop which fetches one instruction from emulated RAM and executes it |
|
|
(described in the previous section), and <b>b</b>) |
|
|
using dynamic binary translation. |
|
|
|
|
|
<p> |
|
|
Mode <b>a</b> is very slow. On a 2.8 GHz Intel Xeon host the resulting |
|
|
emulated machine is rougly equal to a 7 MHz R3000 (or a 3.5 MHz R4000). |
|
|
The actual performance varies a lot, maybe between 5 and 10 million |
|
|
instructions per second, depending on workload. |
|
66 |
|
|
67 |
<p> |
So, how fast is GXemul? There is no short answer to this. There is |
68 |
Mode <b>b</b> ("bintrans") is still to be considered experimental, but |
especially no answer to the question <b>What is the slowdown factor?</b>, |
69 |
gives higher performance than mode <b>a</b>. It translates MIPS machine |
because the host architecture and emulated architecture can usually not be |
70 |
code into machine code that can be executed on the host machine |
compared just like that. |
71 |
on-the-fly. The translation itself obviously takes some time, but this is |
|
72 |
usually made up for by the fact that the translated code chunks are |
<p>Performance depends on several factors, including (but not limited to) |
73 |
executed multiple times. |
host architecture, host clock speed, which compiler and compiler flags |
74 |
To run the emulator with binary translation enabled, just add |
were used to build the emulator, what the workload is, and so on. For |
75 |
<tt><b>-b</b></tt> to the command line. |
example, if an emulated operating system tries to read a block from disk, |
76 |
|
from its point of view the read was instantaneous (no waiting). So 1 MIPS |
77 |
<p> |
in an emulated OS might have taken more than one million instructions on a |
78 |
Only small pieces of MIPS machine code are translated, usually the size of |
real machine. |
79 |
a function, or less. There is no "intermediate representation" code, so |
|
80 |
all translations are done directly from MIPS to host machine code. |
<p>Also, if the emulator says it has executed 1 million instructions, and |
81 |
|
the CPU family in question was capable of scalar execution (i.e. one cycle |
82 |
<p> |
per instruction), it might still have taken more than 1 million cycles on |
83 |
The default bintrans cache size is 16 MB, but you can change this by adding |
a real machine because of cache misses and similar micro-architectural |
84 |
<tt>-DDEFAULT_BINTRANS_SIZE_IN_MB=<i>xx</i></tt> to your CFLAGS environment |
penalties that are not simulated by GXemul. |
85 |
variable before running the configure script, or by using the |
|
86 |
<tt>bintrans_size()</tt> configuration file option when running the emulator. |
<p>Because of these issues, it is in my opinion best to measure |
87 |
|
performance as the actual (real-world) time it takes to perform a task |
88 |
<p> |
with the emulator. Typical examples would be "How long does it take to |
89 |
By default, an emulated OS running under DECstation emulation which listens to |
install NetBSD?", or "How long does it take to compile XYZ inside NetBSD |
90 |
interrupts from the mc146818 clock will get interrupts that are close to the |
in the emulator?". |
91 |
host's clock. That is, if the emulated OS says it wants 100 interrupts per |
|
92 |
second, it will get approximately 100 interrupts per real second. |
<p>So, how fast is it? :-) Answer: it varies. |
93 |
|
|
94 |
|
<p>The emulation technique used varies depending on which processor type |
95 |
|
is being emulated. (One of my main goals with GXemul is to experiment with |
96 |
|
different kinds of emulation, so these might change in the future.) |
97 |
|
|
98 |
<p> |
<ul> |
99 |
There is however a <tt><b>-I</b></tt> option, which sets the number of |
<li><b>MIPS:</b><br> |
100 |
emulated cycles per seconds to a fixed value. Let's say you wish to make the |
There are two emulation modes. The most important one is an |
101 |
emulated OS think it is running on a 40 MHz DECstation, and not a 7 MHz one, |
implementation of a <i>dynamic binary translator</i>. |
102 |
then you can add <tt><b>-I 40000000</b></tt> to the command line. This will not |
(Compared to real binary translators, though, GXemul's bintrans |
103 |
make the emulation faster, of course. It might even make it seem slower; for |
subsystem is very simple and does not perform very well.) |
104 |
example, if NetBSD/pmax waits 2 seconds for SCSI devices to settle during |
This mode can be used on Alpha and i386 host. The other emulation |
105 |
bootup, those 2 seconds will take 2*40000000 cycles (which will take more |
mode is simple interpretation, where an instruction is read from |
106 |
time than 2*7000000). |
emulated memory, and interpreted one-at-a-time. (Slow, but it |
107 |
|
works. It can be forcefully used by using the <tt>-B</tt> command |
108 |
|
line option.) |
109 |
|
<p> |
110 |
|
<li><b>All other modes:</b><br> |
111 |
|
These use a kind of dynamic translation system. (This system does |
112 |
|
not use host-specific backends, so it is not "recompilation" or |
113 |
|
anything like that.) Speed is slower than real binary translation, |
114 |
|
but faster than traditional interpretation, and with some tricks |
115 |
|
it will hopefully still give reasonable speed. ARM emulation uses |
116 |
|
this kind of translation, for example. |
117 |
|
</ul> |
118 |
|
|
|
<p> |
|
|
The <b><tt>-I</tt></b> option is also necessary if you want to run |
|
|
deterministic experiments, if a mc146818 (or similar) device is present. |
|
119 |
|
|
|
<p> |
|
|
Some emulators make claims such as "x times slowdown," but in the case of |
|
|
GXemul, the host is often not a MIPS-based machine, and hence comparing |
|
|
one MIPS instruction to a host instruction doesn't work. Performance depends on |
|
|
a lot of factors, including (but not limited to) host architecture, host speed, |
|
|
which compiler and compiler flags were used to build GXemul, what the |
|
|
workload is, and so on. For example, if an emulated operating system tries |
|
|
to read a block from disk, from its point of view the read was instantaneous |
|
|
(no waiting). So 1 MIPS in an emulated OS might have taken more than one |
|
|
million instructions on a real machine. Because of this, imho it is best |
|
|
to measure performance as the actual (real-world) time it takes to perform |
|
|
a task with the emulator. |
|
120 |
|
|
121 |
|
|
122 |
|
|
125 |
<a name="net"></a> |
<a name="net"></a> |
126 |
<h3>Networking</h3> |
<h3>Networking</h3> |
127 |
|
|
128 |
Running an entire operating system under emulation is very interesting in |
<font color="#ff0000">NOTE/TODO: This section is very old and a bit |
129 |
itself, but for several reasons, running a modern OS without access to |
out of date.</font> |
130 |
TCP/IP networking is a bit akward. Hence, I feel the need to implement TCP/IP |
|
131 |
(networking) support in the emulator. |
<p>Running an entire operating system under emulation is very interesting |
132 |
|
in itself, but for several reasons, running a modern OS without access to |
133 |
|
TCP/IP networking is a bit akward. Hence, I feel the need to implement |
134 |
|
TCP/IP (networking) support in the emulator. |
135 |
|
|
136 |
<p> |
<p> |
137 |
As far as I have understood it, there seems to be two different ways to go: |
As far as I have understood it, there seems to be two different ways to go: |
318 |
files in both directions, but then you should be aware of the |
files in both directions, but then you should be aware of the |
319 |
fragmentation issue mentioned above. |
fragmentation issue mentioned above. |
320 |
|
|
321 |
|
<p>TODO: Write a section on how to connect multiple emulator instances. |
322 |
|
(Using the <tt>local_port</tt> and <tt>add_remote</tt> configuration file |
323 |
|
commands.) |
324 |
|
|
325 |
|
|
326 |
|
|
327 |
|
|
328 |
|
|
331 |
<a name="devices"></a> |
<a name="devices"></a> |
332 |
<h3>Emulation of hardware devices</h3> |
<h3>Emulation of hardware devices</h3> |
333 |
|
|
334 |
Each file in the device/ directory is responsible for one hardware device. |
Each file in the <tt>src/device/</tt> directory is responsible for one |
335 |
These are used from src/machine.c, when initializing which hardware a |
hardware device. These are used from <tt>src/machine.c</tt>, when |
336 |
particular machine model will be using, or when adding devices to a |
initializing which hardware a particular machine model will be using, or |
337 |
machine using the <b>device()</b> command in configuration files. |
when adding devices to a machine using the <tt>device()</tt> command in |
338 |
|
configuration files. |
339 |
|
|
340 |
<p> |
<p><font color="#ff0000">NOTE: The device registry subsystem is currently |
341 |
<font color="#ff0000">NOTE: 2005-02-26: I'm currently rewriting the |
in a state of flux, as it is being redesigned.</font> |
|
device registry subsystem.</font> |
|
342 |
|
|
343 |
<p> |
<p>(I'll be using the name "<tt>foo</tt>" as the name of the device in all |
344 |
(I'll be using the name 'foo' as the name of the device in all these |
these examples. This is pseudo code, it might need some modification to |
|
examples. This is pseudo code, it might need some modification to |
|
345 |
actually compile and run.) |
actually compile and run.) |
346 |
|
|
347 |
<p> |
<p>Each device should have the following: |
|
Each device should have the following: |
|
348 |
|
|
349 |
<p> |
<p> |
350 |
<ul> |
<ul> |
351 |
<li>A devinit function in dev_foo.c. It would typically look |
<li>A <tt>devinit</tt> function in <tt>src/devices/dev_foo.c</tt>. It |
352 |
something like this: |
would typically look something like this: |
353 |
<pre> |
<pre> |
354 |
/* |
/* |
355 |
* devinit_foo(): |
* devinit_foo(): |
386 |
} |
} |
387 |
</pre><br> |
</pre><br> |
388 |
|
|
389 |
<li>At the top of dev_foo.c, the foo_data struct should be defined. |
<li>At the top of <tt>dev_foo.c</tt>, the <tt>foo_data</tt> struct |
390 |
|
should be defined. |
391 |
<pre> |
<pre> |
392 |
struct foo_data { |
struct foo_data { |
393 |
int irq_nr; |
int irq_nr; |
395 |
} |
} |
396 |
</pre><br> |
</pre><br> |
397 |
|
|
398 |
<li>If foo has a tick function (that is, something that needs to be |
<li>If <tt>foo</tt> has a tick function (that is, something that needs to be |
399 |
run at regular intervals) then FOO_TICKSHIFT and a tick function |
run at regular intervals) then <tt>FOO_TICKSHIFT</tt> and a tick |
400 |
need to be defined as well: |
function need to be defined as well: |
401 |
<pre> |
<pre> |
402 |
#define FOO_TICKSHIFT 10 |
#define FOO_TICKSHIFT 10 |
403 |
|
|
462 |
|
|
463 |
|
|
464 |
|
|
|
<p><br> |
|
|
<a name="regtest"></a> |
|
|
<h3>Regression tests</h3> |
|
|
|
|
|
In order to make sure that the emulator actually works like it is supposed |
|
|
to, it must be tested. For this purpose, there is a simple regression |
|
|
testing framework in the <tt>tests/</tt> directory. |
|
|
|
|
|
<p> |
|
|
<i>NOTE: The regression testing framework is basically just a skeleton so far. |
|
|
Regression tests are very good to have. However, the fact that complete |
|
|
operating systems can run in the emulator indicate that the emulation is |
|
|
probably not too incorrect. This makes it less of a priority to write |
|
|
regression tests.</i> |
|
|
|
|
|
<p> |
|
|
To run all the regression tests, type <tt>make regtest</tt>. Each assembly |
|
|
language file matching the pattern <tt>test_*.S</tt> will be compiled and |
|
|
linked into a 64-bit MIPS ELF (using a gcc cross compiler), and run in the |
|
|
emulator. If everything goes well, you should see something like this: |
|
|
|
|
|
<pre> |
|
|
$ make regtest |
|
|
cd tests; make run_tests; cd .. |
|
|
gcc33 -Wall -fomit-frame-pointer -fmove-all-movables -fpeephole -O2 |
|
|
-mcpu=ev5 -I/usr/X11R6/include -lm -L/usr/X11R6/lib -lX11 do_tests.c |
|
|
-o do_tests |
|
|
do_tests.c: In function `main': |
|
|
do_tests.c:173: warning: unused variable `s' |
|
|
/var/tmp//ccFOupvD.o: In function `do_tests': |
|
|
/var/tmp//ccFOupvD.o(.text+0x3a8): warning: tmpnam() possibly used |
|
|
unsafely; consider using mkstemp() |
|
|
mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns -mips64 |
|
|
-mabi=64 test_common.c -c -o test_common.o |
|
|
./do_tests "mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns |
|
|
-mips64 -mabi=64" "mips64-unknown-elf-as -mabi=64 -mips64" |
|
|
"mips64-unknown-elf-ld -Ttext 0xa800000000030000 -e main |
|
|
--oformat=elf64-bigmips" "../gxemul" |
|
|
|
|
|
Starting tests: |
|
|
test_addu.S (-a) |
|
|
test_addu.S (-a -b) |
|
|
test_clo_clz.S (-a) |
|
|
test_clo_clz.S (-a -b) |
|
|
.. |
|
|
test_unaligned.S (-a) |
|
|
test_unaligned.S (-a -b) |
|
|
|
|
|
Done. (12 tests done) |
|
|
PASS: 12 |
|
|
FAIL: 0 |
|
|
|
|
|
---------------- |
|
|
|
|
|
All tests OK |
|
|
|
|
|
---------------- |
|
|
</pre> |
|
|
|
|
|
<p> |
|
|
Each test writes output to stdout, and there is a <tt>test_*.good</tt> for |
|
|
each <tt>.S</tt> file which contains the wanted output. If the actual |
|
|
output matches the <tt>.good</tt> file, then the test passes, otherwise it |
|
|
fails. |
|
|
|
|
|
<p> |
|
|
Read <tt>tests/README</tt> for more information. |
|
|
|
|
|
|
|
|
|
|
465 |
|
|
466 |
</body> |
</body> |
467 |
</html> |
</html> |