10 |
|
|
11 |
<!-- |
<!-- |
12 |
|
|
13 |
$Id: technical.html,v 1.51 2005/06/04 22:47:49 debug Exp $ |
$Id: technical.html,v 1.53 2005/06/27 17:31:50 debug Exp $ |
14 |
|
|
15 |
Copyright (C) 2004-2005 Anders Gavare. All rights reserved. |
Copyright (C) 2004-2005 Anders Gavare. All rights reserved. |
16 |
|
|
45 |
<p><br> |
<p><br> |
46 |
<h2>Technical details</h2> |
<h2>Technical details</h2> |
47 |
|
|
48 |
<p> |
<p>This page describes some of the internals of GXemul. |
|
This page describes some of the internals of GXemul. |
|
|
|
|
|
<p> |
|
|
<font color="#e00000"><b>NOTE: This page is probably not |
|
|
very up-to-date by now.</b></font> |
|
49 |
|
|
50 |
<p> |
<p> |
51 |
<ul> |
<ul> |
52 |
<li><a href="#overview">Overview</a> |
<li><a href="#speed">Speed and emulation modes</a> |
|
<li><a href="#speed">Speed</a> |
|
53 |
<li><a href="#net">Networking</a> |
<li><a href="#net">Networking</a> |
54 |
<li><a href="#devices">Emulation of hardware devices</a> |
<li><a href="#devices">Emulation of hardware devices</a> |
55 |
<li><a href="#regtest">Regression tests</a> |
<li><a href="#regtest">Regression tests</a> |
58 |
|
|
59 |
|
|
60 |
|
|
|
<p><br> |
|
|
<a name="overview"></a> |
|
|
<h3>Overview</h3> |
|
|
|
|
|
In simple terms, GXemul is just a simple fetch-and-execute |
|
|
loop; an instruction is fetched from memory, and executed. |
|
|
|
|
|
<p> |
|
|
In reality, a lot of things need to be handled. Before each instruction is |
|
|
executed, the emulator checks to see if any interrupts are asserted which |
|
|
are not masked away. If so, then an INT exception is generated. Exceptions |
|
|
cause the program counter to be set to a specific value, and some of the |
|
|
system coprocessor's registers to be set to values signifying what kind of |
|
|
exception it was (an interrupt exception in this case). |
|
|
|
|
|
<p> |
|
|
Reading instructions from memory is done through a TLB, a translation |
|
|
lookaside buffer. The TLB on MIPS is software controlled, which means that |
|
|
the program running inside the emulator (for example an operating system |
|
|
kernel) has to take care of manually updating the TLB. Some memory |
|
|
addresses are translated into physical addresses directly, some are |
|
|
translated into valid physical addresses via the TLB, and some memory |
|
|
references are not valid. Invalid memory references cause exceptions. |
|
|
|
|
|
<p> |
|
|
After an instruction has been read from memory, the emulator checks which |
|
|
opcode it contains and executes the instruction. Executing an instruction |
|
|
usually involves reading some register and writing some register, or perhaps a |
|
|
load from memory (or a store to memory). The program counter is increased |
|
|
for every instruction. |
|
|
|
|
|
<p> |
|
|
Some memory references point to physical addresses which are not in the |
|
|
normal RAM address space. They may point to hardware devices. If that is |
|
|
the case, then loads and stores are converted into calls to a device |
|
|
access function. The device access function is then responsible for |
|
|
handling these reads and writes. For example, a graphical framebuffer |
|
|
device may put a pixel on the screen when a value is written to it, or a |
|
|
serial controller device may output a character to stdout when written to. |
|
|
|
|
|
|
|
61 |
|
|
62 |
|
|
63 |
<p><br> |
<p><br> |
64 |
<a name="speed"></a> |
<a name="speed"></a> |
65 |
<h3>Speed</h3> |
<h3>Speed and emulation modes</h3> |
66 |
|
|
67 |
There are two modes in which the emulator can run, <b>a</b>) a straight forward |
So, how fast is GXemul? There is no good answer to this. There is |
68 |
loop which fetches one instruction from emulated RAM and executes it |
especially no answer to the question <b>What is the slowdown factor?</b>, |
69 |
(described in the previous section), and <b>b</b>) |
because the host architecture and emulated architecture can usually not be |
70 |
using dynamic binary translation. |
compared just like that. |
71 |
|
|
72 |
<p> |
<p>Performance depends on several factors, including (but not limited to) |
73 |
Mode <b>a</b> is very slow. On a 2.8 GHz Intel Xeon host the resulting |
host architecture, host clock speed, which compiler and compiler flags |
74 |
emulated machine is rougly equal to a 7 MHz R3000 (or a 3.5 MHz R4000). |
were used to build the emulator, what the workload is, and so on. For |
75 |
The actual performance varies a lot, maybe between 5 and 10 million |
example, if an emulated operating system tries to read a block from disk, |
76 |
instructions per second, depending on workload. |
from its point of view the read was instantaneous (no waiting). So 1 MIPS |
77 |
|
in an emulated OS might have taken more than one million instructions on a |
78 |
<p> |
real machine. |
79 |
Mode <b>b</b> ("bintrans") is still to be considered experimental, but |
|
80 |
gives higher performance than mode <b>a</b>. It translates MIPS machine |
<p>Also, if the emulator says it has executed 1 million instructions, and |
81 |
code into machine code that can be executed on the host machine |
the CPU family in question was capable of scalar execution (i.e. one cycle |
82 |
on-the-fly. The translation itself obviously takes some time, but this is |
per instruction), it might still have taken more than 1 million cycles on |
83 |
usually made up for by the fact that the translated code chunks are |
a real machine because of cache misses and similar micro-architectural |
84 |
executed multiple times. |
penalties that are not simulated by GXemul. |
85 |
To run the emulator with binary translation enabled, just add |
|
86 |
<tt><b>-b</b></tt> to the command line. |
<p>Because of these issues, it is in my opinion best to measure |
87 |
|
performance as the actual (real-world) time it takes to perform a task |
88 |
<p> |
with the emulator. Typical examples would be "How long does it take to |
89 |
Only small pieces of MIPS machine code are translated, usually the size of |
install NetBSD?", or "How long does it take to compile XYZ inside NetBSD |
90 |
a function, or less. There is no "intermediate representation" code, so |
in the emulator?". |
91 |
all translations are done directly from MIPS to host machine code. |
|
92 |
|
<p>The emulation technique used varies depending on which processor type |
93 |
<p> |
is being emulated. (One of my main goals with GXemul is to experiment with |
94 |
The default bintrans cache size is 16 MB, but you can change this by adding |
different kinds of emulation, so these might change in the future.) |
|
<tt>-DDEFAULT_BINTRANS_SIZE_IN_MB=<i>xx</i></tt> to your CFLAGS environment |
|
|
variable before running the configure script, or by using the |
|
|
<tt>bintrans_size()</tt> configuration file option when running the emulator. |
|
95 |
|
|
96 |
<p> |
<ul> |
97 |
By default, an emulated OS running under DECstation emulation which listens to |
<li><b>MIPS</b><br> |
98 |
interrupts from the mc146818 clock will get interrupts that are close to the |
There are two emulation modes. The most important one is an |
99 |
host's clock. That is, if the emulated OS says it wants 100 interrupts per |
implementation of a <i>dynamic binary translator</i>. |
100 |
second, it will get approximately 100 interrupts per real second. |
(Compared to real binary translators, though, GXemul's bintrans |
101 |
|
subsystem is very simple and does not perform very well.) |
102 |
|
This mode can be used on Alpha and i386 host. The other emulation |
103 |
|
mode is simple interpretation, where an instruction is read from |
104 |
|
emulated memory, and interpreted one-at-a-time. (Slow, but it |
105 |
|
works. It can be forcefully used by using the <tt>-B</tt> command |
106 |
|
line option.) |
107 |
|
<p> |
108 |
|
<li><b>ARM</b><br> |
109 |
|
This mode does not really work yet, but will use |
110 |
|
dynamic translation, but not binary translation. Stay tuned. :-) |
111 |
|
<p> |
112 |
|
<li><b>URISC</b><br> |
113 |
|
Simple interpretation, one instruction at a time. There is probably |
114 |
|
no other way to emulate URISC, because it relies too heavily |
115 |
|
on self-modifying code. |
116 |
|
<p> |
117 |
|
<li><b>POWER/PowerPC</b><br> |
118 |
|
This emulation mode is very much unfinished, but still enabled by |
119 |
|
default. So far it uses plain interpretation, where an instruction |
120 |
|
is read from emulated memory, and interpreted one at a time. |
121 |
|
Slow. Not very interesting. |
122 |
|
<p> |
123 |
|
<li><b>x86</b><br> |
124 |
|
Although too unstable and non-working to be enabled by default, |
125 |
|
there is some code for emulating x86 machines. It simply reads |
126 |
|
one instruction at a time from emulated memory, and executes it. |
127 |
|
This is as slow as it gets. Not very interesting. |
128 |
|
</ul> |
129 |
|
|
|
<p> |
|
|
There is however a <tt><b>-I</b></tt> option, which sets the number of |
|
|
emulated cycles per seconds to a fixed value. Let's say you wish to make the |
|
|
emulated OS think it is running on a 40 MHz DECstation, and not a 7 MHz one, |
|
|
then you can add <tt><b>-I 40000000</b></tt> to the command line. This will not |
|
|
make the emulation faster, of course. It might even make it seem slower; for |
|
|
example, if NetBSD/pmax waits 2 seconds for SCSI devices to settle during |
|
|
bootup, those 2 seconds will take 2*40000000 cycles (which will take more |
|
|
time than 2*7000000). |
|
130 |
|
|
|
<p> |
|
|
The <b><tt>-I</tt></b> option is also necessary if you want to run |
|
|
deterministic experiments, if a mc146818 (or similar) device is present. |
|
131 |
|
|
|
<p> |
|
|
Some emulators make claims such as "x times slowdown," but in the case of |
|
|
GXemul, the host is often not a MIPS-based machine, and hence comparing |
|
|
one MIPS instruction to a host instruction doesn't work. Performance depends on |
|
|
a lot of factors, including (but not limited to) host architecture, host speed, |
|
|
which compiler and compiler flags were used to build GXemul, what the |
|
|
workload is, and so on. For example, if an emulated operating system tries |
|
|
to read a block from disk, from its point of view the read was instantaneous |
|
|
(no waiting). So 1 MIPS in an emulated OS might have taken more than one |
|
|
million instructions on a real machine. Because of this, imho it is best |
|
|
to measure performance as the actual (real-world) time it takes to perform |
|
|
a task with the emulator. |
|
132 |
|
|
133 |
|
|
134 |
|
|
137 |
<a name="net"></a> |
<a name="net"></a> |
138 |
<h3>Networking</h3> |
<h3>Networking</h3> |
139 |
|
|
140 |
Running an entire operating system under emulation is very interesting in |
<font color="#ff0000">NOTE/TODO: This section is very old and a bit |
141 |
itself, but for several reasons, running a modern OS without access to |
out of date.</font> |
142 |
TCP/IP networking is a bit akward. Hence, I feel the need to implement TCP/IP |
|
143 |
(networking) support in the emulator. |
<p>Running an entire operating system under emulation is very interesting |
144 |
|
in itself, but for several reasons, running a modern OS without access to |
145 |
|
TCP/IP networking is a bit akward. Hence, I feel the need to implement |
146 |
|
TCP/IP (networking) support in the emulator. |
147 |
|
|
148 |
<p> |
<p> |
149 |
As far as I have understood it, there seems to be two different ways to go: |
As far as I have understood it, there seems to be two different ways to go: |
330 |
files in both directions, but then you should be aware of the |
files in both directions, but then you should be aware of the |
331 |
fragmentation issue mentioned above. |
fragmentation issue mentioned above. |
332 |
|
|
333 |
|
<p>TODO: Write a section on how to connect multiple emulator instances. |
334 |
|
(Using the <tt>local_port</tt> and <tt>add_remote</tt> configuration file |
335 |
|
commands.) |
336 |
|
|
337 |
|
|
338 |
|
|
339 |
|
|
340 |
|
|
343 |
<a name="devices"></a> |
<a name="devices"></a> |
344 |
<h3>Emulation of hardware devices</h3> |
<h3>Emulation of hardware devices</h3> |
345 |
|
|
346 |
Each file in the device/ directory is responsible for one hardware device. |
Each file in the <tt>device/</tt> directory is responsible for one |
347 |
These are used from src/machine.c, when initializing which hardware a |
hardware device. These are used from <tt>src/machine.c</tt>, when |
348 |
particular machine model will be using, or when adding devices to a |
initializing which hardware a particular machine model will be using, or |
349 |
machine using the <b>device()</b> command in configuration files. |
when adding devices to a machine using the <tt>device()</tt> command in |
350 |
|
configuration files. |
351 |
|
|
352 |
<p> |
<p><font color="#ff0000">NOTE: The device registry subsystem is currently |
353 |
<font color="#ff0000">NOTE: 2005-02-26: I'm currently rewriting the |
in a state of flux, as it is being redesigned.</font> |
|
device registry subsystem.</font> |
|
354 |
|
|
355 |
<p> |
<p>(I'll be using the name "<tt>foo</tt>" as the name of the device in all |
356 |
(I'll be using the name 'foo' as the name of the device in all these |
these examples. This is pseudo code, it might need some modification to |
|
examples. This is pseudo code, it might need some modification to |
|
357 |
actually compile and run.) |
actually compile and run.) |
358 |
|
|
359 |
<p> |
<p>Each device should have the following: |
|
Each device should have the following: |
|
360 |
|
|
361 |
<p> |
<p> |
362 |
<ul> |
<ul> |
363 |
<li>A devinit function in dev_foo.c. It would typically look |
<li>A <tt>devinit</tt> function in <tt>src/devices/dev_foo.c</tt>. It |
364 |
something like this: |
would typically look something like this: |
365 |
<pre> |
<pre> |
366 |
/* |
/* |
367 |
* devinit_foo(): |
* devinit_foo(): |
398 |
} |
} |
399 |
</pre><br> |
</pre><br> |
400 |
|
|
401 |
<li>At the top of dev_foo.c, the foo_data struct should be defined. |
<li>At the top of <tt>dev_foo.c</tt>, the <tt>foo_data</tt> struct |
402 |
|
should be defined. |
403 |
<pre> |
<pre> |
404 |
struct foo_data { |
struct foo_data { |
405 |
int irq_nr; |
int irq_nr; |
407 |
} |
} |
408 |
</pre><br> |
</pre><br> |
409 |
|
|
410 |
<li>If foo has a tick function (that is, something that needs to be |
<li>If <tt>foo</tt> has a tick function (that is, something that needs to be |
411 |
run at regular intervals) then FOO_TICKSHIFT and a tick function |
run at regular intervals) then <tt>FOO_TICKSHIFT</tt> and a tick |
412 |
need to be defined as well: |
function need to be defined as well: |
413 |
<pre> |
<pre> |
414 |
#define FOO_TICKSHIFT 10 |
#define FOO_TICKSHIFT 10 |
415 |
|
|