1 |
<html><head><title>GXemul documentation: Technical details</title> |
<html><head><title>Gavare's eXperimental Emulator: Technical details</title> |
2 |
<meta name="robots" content="noarchive,nofollow,noindex"></head> |
<meta name="robots" content="noarchive,nofollow,noindex"></head> |
3 |
<body bgcolor="#f8f8f8" text="#000000" link="#4040f0" vlink="#404040" alink="#ff0000"> |
<body bgcolor="#f8f8f8" text="#000000" link="#4040f0" vlink="#404040" alink="#ff0000"> |
4 |
<table border=0 width=100% bgcolor="#d0d0d0"><tr> |
<table border=0 width=100% bgcolor="#d0d0d0"><tr> |
5 |
<td width=100% align=center valign=center><table border=0 width=100%><tr> |
<td width=100% align=center valign=center><table border=0 width=100%><tr> |
6 |
<td align="left" valign=center bgcolor="#d0efff"><font color="#6060e0" size="6"> |
<td align="left" valign=center bgcolor="#d0efff"><font color="#6060e0" size="6"> |
7 |
<b>GXemul documentation:</b></font> |
<b>Gavare's eXperimental Emulator:</b></font><br> |
8 |
<font color="#000000" size="6"><b>Technical details</b> |
<font color="#000000" size="6"><b>Technical details</b> |
9 |
</font></td></tr></table></td></tr></table><p> |
</font></td></tr></table></td></tr></table><p> |
10 |
|
|
11 |
<!-- |
<!-- |
12 |
|
|
13 |
$Id: technical.html,v 1.51 2005/06/04 22:47:49 debug Exp $ |
$Id: technical.html,v 1.76 2007/06/15 18:07:08 debug Exp $ |
14 |
|
|
15 |
Copyright (C) 2004-2005 Anders Gavare. All rights reserved. |
Copyright (C) 2004-2007 Anders Gavare. All rights reserved. |
16 |
|
|
17 |
Redistribution and use in source and binary forms, with or without |
Redistribution and use in source and binary forms, with or without |
18 |
modification, are permitted provided that the following conditions are met: |
modification, are permitted provided that the following conditions are met: |
40 |
--> |
--> |
41 |
|
|
42 |
|
|
43 |
|
|
44 |
<a href="./">Back to the index</a> |
<a href="./">Back to the index</a> |
45 |
|
|
46 |
<p><br> |
<p><br> |
47 |
<h2>Technical details</h2> |
<h2>Technical details</h2> |
48 |
|
|
49 |
<p> |
<p>This page describes some of the internals of GXemul. |
|
This page describes some of the internals of GXemul. |
|
|
|
|
|
<p> |
|
|
<font color="#e00000"><b>NOTE: This page is probably not |
|
|
very up-to-date by now.</b></font> |
|
50 |
|
|
51 |
<p> |
<p> |
52 |
<ul> |
<ul> |
53 |
<li><a href="#overview">Overview</a> |
<li><a href="#speed">Speed and emulation modes</a> |
|
<li><a href="#speed">Speed</a> |
|
54 |
<li><a href="#net">Networking</a> |
<li><a href="#net">Networking</a> |
55 |
<li><a href="#devices">Emulation of hardware devices</a> |
<li><a href="#devices">Emulation of hardware devices</a> |
|
<li><a href="#regtest">Regression tests</a> |
|
56 |
</ul> |
</ul> |
57 |
|
|
58 |
|
|
59 |
|
|
60 |
|
|
|
<p><br> |
|
|
<a name="overview"></a> |
|
|
<h3>Overview</h3> |
|
|
|
|
|
In simple terms, GXemul is just a simple fetch-and-execute |
|
|
loop; an instruction is fetched from memory, and executed. |
|
|
|
|
|
<p> |
|
|
In reality, a lot of things need to be handled. Before each instruction is |
|
|
executed, the emulator checks to see if any interrupts are asserted which |
|
|
are not masked away. If so, then an INT exception is generated. Exceptions |
|
|
cause the program counter to be set to a specific value, and some of the |
|
|
system coprocessor's registers to be set to values signifying what kind of |
|
|
exception it was (an interrupt exception in this case). |
|
|
|
|
|
<p> |
|
|
Reading instructions from memory is done through a TLB, a translation |
|
|
lookaside buffer. The TLB on MIPS is software controlled, which means that |
|
|
the program running inside the emulator (for example an operating system |
|
|
kernel) has to take care of manually updating the TLB. Some memory |
|
|
addresses are translated into physical addresses directly, some are |
|
|
translated into valid physical addresses via the TLB, and some memory |
|
|
references are not valid. Invalid memory references cause exceptions. |
|
|
|
|
|
<p> |
|
|
After an instruction has been read from memory, the emulator checks which |
|
|
opcode it contains and executes the instruction. Executing an instruction |
|
|
usually involves reading some register and writing some register, or perhaps a |
|
|
load from memory (or a store to memory). The program counter is increased |
|
|
for every instruction. |
|
|
|
|
|
<p> |
|
|
Some memory references point to physical addresses which are not in the |
|
|
normal RAM address space. They may point to hardware devices. If that is |
|
|
the case, then loads and stores are converted into calls to a device |
|
|
access function. The device access function is then responsible for |
|
|
handling these reads and writes. For example, a graphical framebuffer |
|
|
device may put a pixel on the screen when a value is written to it, or a |
|
|
serial controller device may output a character to stdout when written to. |
|
|
|
|
|
|
|
61 |
|
|
62 |
|
|
63 |
<p><br> |
<p><br> |
64 |
<a name="speed"></a> |
<a name="speed"></a> |
65 |
<h3>Speed</h3> |
<h3>Speed and emulation modes</h3> |
|
|
|
|
There are two modes in which the emulator can run, <b>a</b>) a straight forward |
|
|
loop which fetches one instruction from emulated RAM and executes it |
|
|
(described in the previous section), and <b>b</b>) |
|
|
using dynamic binary translation. |
|
|
|
|
|
<p> |
|
|
Mode <b>a</b> is very slow. On a 2.8 GHz Intel Xeon host the resulting |
|
|
emulated machine is rougly equal to a 7 MHz R3000 (or a 3.5 MHz R4000). |
|
|
The actual performance varies a lot, maybe between 5 and 10 million |
|
|
instructions per second, depending on workload. |
|
66 |
|
|
67 |
<p> |
So, how fast is GXemul? There is no short answer to this. There is |
68 |
Mode <b>b</b> ("bintrans") is still to be considered experimental, but |
especially no answer to the question <b>What is the slowdown factor?</b>, |
69 |
gives higher performance than mode <b>a</b>. It translates MIPS machine |
because the host architecture and emulated architecture can usually not be |
70 |
code into machine code that can be executed on the host machine |
compared just like that. |
71 |
on-the-fly. The translation itself obviously takes some time, but this is |
|
72 |
usually made up for by the fact that the translated code chunks are |
<p>Performance depends on several factors, including (but not limited to) |
73 |
executed multiple times. |
host architecture, target architecture, host clock speed, which compiler |
74 |
To run the emulator with binary translation enabled, just add |
and compiler flags were used to build the emulator, what the workload is, |
75 |
<tt><b>-b</b></tt> to the command line. |
what additional runtime flags are given to the emulator, and so on. |
76 |
|
|
77 |
<p> |
<p>Devices are generally not timing-accurate: for example, if an emulated |
78 |
Only small pieces of MIPS machine code are translated, usually the size of |
operating system tries to read a block from disk, from its point of view |
79 |
a function, or less. There is no "intermediate representation" code, so |
the read was instantaneous (no waiting). So 1 MIPS in an emulated OS might |
80 |
all translations are done directly from MIPS to host machine code. |
have taken more than one million instructions on a real machine. |
81 |
|
|
82 |
|
<p>Also, if the emulator says it has executed 1 million instructions, and |
83 |
|
the CPU family in question was capable of scalar execution (i.e. one cycle |
84 |
|
per instruction), it might still have taken more than 1 million cycles on |
85 |
|
a real machine because of cache misses and similar micro-architectural |
86 |
|
penalties that are not simulated by GXemul. |
87 |
|
|
88 |
|
<p>Because of these issues, it is in my opinion best to measure |
89 |
|
performance as the actual (real-world) time it takes to perform a task |
90 |
|
with the emulator, e.g.: |
91 |
|
|
92 |
<p> |
<ul> |
93 |
The default bintrans cache size is 16 MB, but you can change this by adding |
<li>"How long does it take to install NetBSD onto a disk image?" |
94 |
<tt>-DDEFAULT_BINTRANS_SIZE_IN_MB=<i>xx</i></tt> to your CFLAGS environment |
<li>"How long does it take to compile XYZ inside NetBSD |
95 |
variable before running the configure script, or by using the |
in the emulator?". |
96 |
<tt>bintrans_size()</tt> configuration file option when running the emulator. |
</ul> |
97 |
|
|
98 |
<p> |
<p>So, how fast is it? :-) Answer: it varies. |
|
By default, an emulated OS running under DECstation emulation which listens to |
|
|
interrupts from the mc146818 clock will get interrupts that are close to the |
|
|
host's clock. That is, if the emulated OS says it wants 100 interrupts per |
|
|
second, it will get approximately 100 interrupts per real second. |
|
99 |
|
|
|
<p> |
|
|
There is however a <tt><b>-I</b></tt> option, which sets the number of |
|
|
emulated cycles per seconds to a fixed value. Let's say you wish to make the |
|
|
emulated OS think it is running on a 40 MHz DECstation, and not a 7 MHz one, |
|
|
then you can add <tt><b>-I 40000000</b></tt> to the command line. This will not |
|
|
make the emulation faster, of course. It might even make it seem slower; for |
|
|
example, if NetBSD/pmax waits 2 seconds for SCSI devices to settle during |
|
|
bootup, those 2 seconds will take 2*40000000 cycles (which will take more |
|
|
time than 2*7000000). |
|
100 |
|
|
|
<p> |
|
|
The <b><tt>-I</tt></b> option is also necessary if you want to run |
|
|
deterministic experiments, if a mc146818 (or similar) device is present. |
|
101 |
|
|
|
<p> |
|
|
Some emulators make claims such as "x times slowdown," but in the case of |
|
|
GXemul, the host is often not a MIPS-based machine, and hence comparing |
|
|
one MIPS instruction to a host instruction doesn't work. Performance depends on |
|
|
a lot of factors, including (but not limited to) host architecture, host speed, |
|
|
which compiler and compiler flags were used to build GXemul, what the |
|
|
workload is, and so on. For example, if an emulated operating system tries |
|
|
to read a block from disk, from its point of view the read was instantaneous |
|
|
(no waiting). So 1 MIPS in an emulated OS might have taken more than one |
|
|
million instructions on a real machine. Because of this, imho it is best |
|
|
to measure performance as the actual (real-world) time it takes to perform |
|
|
a task with the emulator. |
|
102 |
|
|
103 |
|
|
104 |
|
|
107 |
<a name="net"></a> |
<a name="net"></a> |
108 |
<h3>Networking</h3> |
<h3>Networking</h3> |
109 |
|
|
110 |
Running an entire operating system under emulation is very interesting in |
<font color="#ff0000">NOTE/TODO: This section is very old.</font> |
111 |
itself, but for several reasons, running a modern OS without access to |
|
112 |
TCP/IP networking is a bit akward. Hence, I feel the need to implement TCP/IP |
<p>Running an entire operating system under emulation is very interesting |
113 |
(networking) support in the emulator. |
in itself, but for several reasons, running a modern OS without access to |
114 |
|
TCP/IP networking is a bit akward. Hence, I feel the need to implement |
115 |
|
TCP/IP (networking) support in the emulator. |
116 |
|
|
117 |
<p> |
<p> |
118 |
As far as I have understood it, there seems to be two different ways to go: |
As far as I have understood it, there seems to be two different ways to go: |
303 |
|
|
304 |
|
|
305 |
|
|
306 |
|
|
307 |
|
|
308 |
<p><br> |
<p><br> |
309 |
<a name="devices"></a> |
<a name="devices"></a> |
310 |
<h3>Emulation of hardware devices</h3> |
<h3>Emulation of hardware devices</h3> |
311 |
|
|
312 |
Each file in the device/ directory is responsible for one hardware device. |
Each file called <tt>dev_*.c</tt> in the |
313 |
These are used from src/machine.c, when initializing which hardware a |
<a href="../src/devices/"><tt>src/devices/</tt></a> directory is |
314 |
particular machine model will be using, or when adding devices to a |
responsible for one hardware device. These are used from |
315 |
machine using the <b>device()</b> command in configuration files. |
<a href="../src/machines/"><tt>src/machines</tt></a><tt>/machine_*.c</tt>, |
316 |
|
when initializing which hardware a particular machine model will be using, |
317 |
<p> |
or when adding devices to a machine using the <tt>device()</tt> command in |
318 |
<font color="#ff0000">NOTE: 2005-02-26: I'm currently rewriting the |
<a href="configfiles.html">configuration files</a>. |
|
device registry subsystem.</font> |
|
319 |
|
|
320 |
<p> |
<p>(I'll be using the name "<tt>foo</tt>" as the name of the device in all |
321 |
(I'll be using the name 'foo' as the name of the device in all these |
these examples. This is pseudo code, it might need some modification to |
|
examples. This is pseudo code, it might need some modification to |
|
322 |
actually compile and run.) |
actually compile and run.) |
323 |
|
|
324 |
<p> |
<p>Each device should have the following: |
|
Each device should have the following: |
|
325 |
|
|
326 |
<p> |
<p> |
327 |
<ul> |
<ul> |
328 |
<li>A devinit function in dev_foo.c. It would typically look |
<li>A <tt>devinit</tt> function in <tt>src/devices/dev_foo.c</tt>. It |
329 |
something like this: |
would typically look something like this: |
330 |
<pre> |
<pre> |
331 |
/* |
DEVINIT(foo) |
|
* devinit_foo(): |
|
|
*/ |
|
|
int devinit_foo(struct devinit *devinit) |
|
332 |
{ |
{ |
333 |
struct foo_data *d = malloc(sizeof(struct foo_data)); |
struct foo_data *d; |
334 |
|
|
335 |
if (d == NULL) { |
CHECK_ALLOCATION(d = malloc(sizeof(struct foo_data))); |
336 |
fprintf(stderr, "out of memory\n"); |
memset(d, 0, sizeof(struct foo_data)); |
|
exit(1); |
|
|
} |
|
|
memset(d, 0, sizeof(struct foon_data)); |
|
337 |
|
|
338 |
/* |
/* |
339 |
* Set up stuff here, for example fill d with useful |
* Set up stuff here, for example fill d with useful |
340 |
* data. devinit contains settings like address, irq_nr, |
* data. devinit contains settings like address, irq path, |
341 |
* and other things. |
* and other things. |
342 |
* |
* |
343 |
* ... |
* ... |
344 |
*/ |
*/ |
345 |
|
|
346 |
|
INTERRUPT_CONNECT(devinit->interrupt_path, d->irq); |
347 |
|
|
348 |
memory_device_register(devinit->machine->memory, devinit->name, |
memory_device_register(devinit->machine->memory, devinit->name, |
349 |
devinit->addr, DEV_FOO_LENGTH, |
devinit->addr, DEV_FOO_LENGTH, |
350 |
dev_foo_access, (void *)d, MEM_DEFAULT, NULL); |
dev_foo_access, (void *)d, DM_DEFAULT, NULL); |
351 |
|
|
352 |
/* This should only be here if the device |
/* This should only be here if the device |
353 |
has a tick function: */ |
has a tick function: */ |
359 |
} |
} |
360 |
</pre><br> |
</pre><br> |
361 |
|
|
362 |
<li>At the top of dev_foo.c, the foo_data struct should be defined. |
<p><tt>DEVINIT(foo)</tt> is defined as <tt>int devinit_foo(struct devinit *devinit)</tt>, |
363 |
|
and the <tt>devinit</tt> argument contains everything that the device driver's |
364 |
|
initialization function needs. |
365 |
|
|
366 |
|
<p> |
367 |
|
<li>At the top of <tt>dev_foo.c</tt>, the <tt>foo_data</tt> struct |
368 |
|
should be defined. |
369 |
<pre> |
<pre> |
370 |
struct foo_data { |
struct foo_data { |
371 |
int irq_nr; |
struct interrupt irq; |
372 |
/* ... */ |
/* ... */ |
373 |
} |
} |
374 |
</pre><br> |
</pre><br> |
375 |
|
(There is an exception to this rule; some legacy code and other |
376 |
<li>If foo has a tick function (that is, something that needs to be |
ugly hacks have their device structs defined in |
377 |
run at regular intervals) then FOO_TICKSHIFT and a tick function |
<tt>src/include/devices.h</tt> instead of <tt>dev_foo.c</tt>. |
378 |
need to be defined as well: |
New code should not add stuff to <tt>devices.h</tt>.) |
379 |
|
<p> |
380 |
|
<li>If <tt>foo</tt> has a tick function (that is, something that needs to be |
381 |
|
run at regular intervals) then <tt>FOO_TICKSHIFT</tt> and a tick |
382 |
|
function need to be defined as well: |
383 |
<pre> |
<pre> |
384 |
#define FOO_TICKSHIFT 10 |
#define FOO_TICKSHIFT 14 |
385 |
|
|
386 |
void dev_foo_tick(struct cpu *cpu, void *extra) |
DEVICE_TICK(foo) |
387 |
{ |
{ |
388 |
struct foo_data *d = (struct foo_data *) extra; |
struct foo_data *d = extra; |
389 |
|
|
390 |
if (.....) |
if (.....) |
391 |
cpu_interrupt(cpu, d->irq_nr); |
INTERRUPT_ASSERT(d->irq); |
392 |
else |
else |
393 |
cpu_interrupt_ack(cpu, d->irq_nr); |
INTERRUPT_DEASSERT(d->irq); |
394 |
} |
} |
395 |
</pre><br> |
</pre><br> |
396 |
|
|
397 |
|
<li>Does this device belong to a standard bus? |
398 |
|
<ul> |
399 |
|
<li>If this device should be detectable as a PCI device, then |
400 |
|
glue code should be added to |
401 |
|
<tt>src/devices/bus_pci.c</tt>. |
402 |
|
<li>If this is a legacy ISA device which should be usable by |
403 |
|
any machine which has an ISA bus, then the device should |
404 |
|
be added to <tt>src/devices/bus_isa.c</tt>. |
405 |
|
</ul> |
406 |
|
<p> |
407 |
<li>And last but not least, the device should have an access function. |
<li>And last but not least, the device should have an access function. |
408 |
The access function is called whenever there is a load or store |
The access function is called whenever there is a load or store |
409 |
to an address which is in the device' memory mapped region. |
to an address which is in the device' memory mapped region. To |
410 |
<pre> |
simplify things a little, a macro <tt>DEVICE_ACCESS(x)</tt> |
411 |
int dev_foo_access(struct cpu *cpu, struct memory *mem, |
is expanded into<pre> |
412 |
|
int dev_x_access(struct cpu *cpu, struct memory *mem, |
413 |
uint64_t relative_addr, unsigned char *data, size_t len, |
uint64_t relative_addr, unsigned char *data, size_t len, |
414 |
int writeflag, void *extra) |
int writeflag, void *extra) |
415 |
|
</pre> The access function can look like this: |
416 |
|
<pre> |
417 |
|
DEVICE_ACCESS(foo) |
418 |
{ |
{ |
419 |
struct foo_data *d = extra; |
struct foo_data *d = extra; |
420 |
uint64_t idata = 0, odata = 0; |
uint64_t idata = 0, odata = 0; |
421 |
|
|
422 |
idata = memory_readmax64(cpu, data, len); |
if (writeflag == MEM_WRITE) |
423 |
|
idata = memory_readmax64(cpu, data, len); |
424 |
|
|
425 |
switch (relative_addr) { |
switch (relative_addr) { |
426 |
/* .... */ |
|
427 |
|
/* Handle accesses to individual addresses within |
428 |
|
the device here. */ |
429 |
|
|
430 |
|
/* ... */ |
431 |
|
|
432 |
} |
} |
433 |
|
|
434 |
if (writeflag == MEM_READ) |
if (writeflag == MEM_READ) |
465 |
|
|
466 |
|
|
467 |
|
|
|
<p><br> |
|
|
<a name="regtest"></a> |
|
|
<h3>Regression tests</h3> |
|
|
|
|
|
In order to make sure that the emulator actually works like it is supposed |
|
|
to, it must be tested. For this purpose, there is a simple regression |
|
|
testing framework in the <tt>tests/</tt> directory. |
|
|
|
|
|
<p> |
|
|
<i>NOTE: The regression testing framework is basically just a skeleton so far. |
|
|
Regression tests are very good to have. However, the fact that complete |
|
|
operating systems can run in the emulator indicate that the emulation is |
|
|
probably not too incorrect. This makes it less of a priority to write |
|
|
regression tests.</i> |
|
|
|
|
|
<p> |
|
|
To run all the regression tests, type <tt>make regtest</tt>. Each assembly |
|
|
language file matching the pattern <tt>test_*.S</tt> will be compiled and |
|
|
linked into a 64-bit MIPS ELF (using a gcc cross compiler), and run in the |
|
|
emulator. If everything goes well, you should see something like this: |
|
|
|
|
|
<pre> |
|
|
$ make regtest |
|
|
cd tests; make run_tests; cd .. |
|
|
gcc33 -Wall -fomit-frame-pointer -fmove-all-movables -fpeephole -O2 |
|
|
-mcpu=ev5 -I/usr/X11R6/include -lm -L/usr/X11R6/lib -lX11 do_tests.c |
|
|
-o do_tests |
|
|
do_tests.c: In function `main': |
|
|
do_tests.c:173: warning: unused variable `s' |
|
|
/var/tmp//ccFOupvD.o: In function `do_tests': |
|
|
/var/tmp//ccFOupvD.o(.text+0x3a8): warning: tmpnam() possibly used |
|
|
unsafely; consider using mkstemp() |
|
|
mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns -mips64 |
|
|
-mabi=64 test_common.c -c -o test_common.o |
|
|
./do_tests "mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns |
|
|
-mips64 -mabi=64" "mips64-unknown-elf-as -mabi=64 -mips64" |
|
|
"mips64-unknown-elf-ld -Ttext 0xa800000000030000 -e main |
|
|
--oformat=elf64-bigmips" "../gxemul" |
|
|
|
|
|
Starting tests: |
|
|
test_addu.S (-a) |
|
|
test_addu.S (-a -b) |
|
|
test_clo_clz.S (-a) |
|
|
test_clo_clz.S (-a -b) |
|
|
.. |
|
|
test_unaligned.S (-a) |
|
|
test_unaligned.S (-a -b) |
|
|
|
|
|
Done. (12 tests done) |
|
|
PASS: 12 |
|
|
FAIL: 0 |
|
|
|
|
|
---------------- |
|
|
|
|
|
All tests OK |
|
|
|
|
|
---------------- |
|
|
</pre> |
|
|
|
|
|
<p> |
|
|
Each test writes output to stdout, and there is a <tt>test_*.good</tt> for |
|
|
each <tt>.S</tt> file which contains the wanted output. If the actual |
|
|
output matches the <tt>.good</tt> file, then the test passes, otherwise it |
|
|
fails. |
|
|
|
|
|
<p> |
|
|
Read <tt>tests/README</tt> for more information. |
|
|
|
|
|
|
|
|
|
|
468 |
|
|
469 |
</body> |
</body> |
470 |
</html> |
</html> |