--- trunk/doc/technical.html 2007/10/08 16:18:19 8 +++ trunk/doc/technical.html 2007/10/08 16:19:23 20 @@ -1,16 +1,16 @@ -
|
-This page describes some of the internals of GXemul. - -
-NOTE: This page is probably not -very up-to-date by now. +
This page describes some of the internals of GXemul.
-In reality, a lot of things need to be handled. Before each instruction is -executed, the emulator checks to see if any interrupts are asserted which -are not masked away. If so, then an INT exception is generated. Exceptions -cause the program counter to be set to a specific value, and some of the -system coprocessor's registers to be set to values signifying what kind of -exception it was (an interrupt exception in this case). - -
-Reading instructions from memory is done through a TLB, a translation -lookaside buffer. The TLB on MIPS is software controlled, which means that -the program running inside the emulator (for example an operating system -kernel) has to take care of manually updating the TLB. Some memory -addresses are translated into physical addresses directly, some are -translated into valid physical addresses via the TLB, and some memory -references are not valid. Invalid memory references cause exceptions. - -
-After an instruction has been read from memory, the emulator checks which -opcode it contains and executes the instruction. Executing an instruction -usually involves reading some register and writing some register, or perhaps a -load from memory (or a store to memory). The program counter is increased -for every instruction. - -
-Some memory references point to physical addresses which are not in the -normal RAM address space. They may point to hardware devices. If that is -the case, then loads and stores are converted into calls to a device -access function. The device access function is then responsible for -handling these reads and writes. For example, a graphical framebuffer -device may put a pixel on the screen when a value is written to it, or a -serial controller device may output a character to stdout when written to. - -
-Mode a is very slow. On a 2.8 GHz Intel Xeon host the resulting -emulated machine is rougly equal to a 7 MHz R3000 (or a 3.5 MHz R4000). -The actual performance varies a lot, maybe between 5 and 10 million -instructions per second, depending on workload. +So, how fast is GXemul? There is no short answer to this. There is +especially no answer to the question What is the slowdown factor?, +because the host architecture and emulated architecture can usually not be +compared just like that. + +
Performance depends on several factors, including (but not limited to) +host architecture, host clock speed, which compiler and compiler flags +were used to build the emulator, what the workload is, and so on. For +example, if an emulated operating system tries to read a block from disk, +from its point of view the read was instantaneous (no waiting). So 1 MIPS +in an emulated OS might have taken more than one million instructions on a +real machine. + +
Also, if the emulator says it has executed 1 million instructions, and +the CPU family in question was capable of scalar execution (i.e. one cycle +per instruction), it might still have taken more than 1 million cycles on +a real machine because of cache misses and similar micro-architectural +penalties that are not simulated by GXemul. + +
Because of these issues, it is in my opinion best to measure +performance as the actual (real-world) time it takes to perform a task +with the emulator. Typical examples would be "How long does it take to +install NetBSD?", or "How long does it take to compile XYZ inside NetBSD +in the emulator?". + +
So, how fast is it? :-) Answer: it varies. + +
The emulation technique used varies depending on which processor type +is being emulated. (One of my main goals with GXemul is to experiment with +different kinds of emulation, so these might change in the future.) -
-Mode b ("bintrans") is still to be considered experimental, but -gives higher performance than mode a. It translates MIPS machine -code into machine code that can be executed on the host machine -on-the-fly. The translation itself obviously takes some time, but this is -usually made up for by the fact that the translated code chunks are -executed multiple times. -To run the emulator with binary translation enabled, just add --b to the command line. - -
-Only small pieces of MIPS machine code are translated, usually the size of -a function, or less. There is no "intermediate representation" code, so -all translations are done directly from MIPS to host machine code. - -
-The default bintrans cache size is 16 MB, but you can change this by adding --DDEFAULT_BINTRANS_SIZE_IN_MB=xx to your CFLAGS environment -variable before running the configure script, or by using the -bintrans_size() configuration file option when running the emulator. - -
-By default, an emulated OS running under DECstation emulation which listens to -interrupts from the mc146818 clock will get interrupts that are close to the -host's clock. That is, if the emulated OS says it wants 100 interrupts per -second, it will get approximately 100 interrupts per real second. - -
-There is however a -I option, which sets the number of -emulated cycles per seconds to a fixed value. Let's say you wish to make the -emulated OS think it is running on a 40 MHz DECstation, and not a 7 MHz one, -then you can add -I 40000000 to the command line. This will not -make the emulation faster, of course. It might even make it seem slower; for -example, if NetBSD/pmax waits 2 seconds for SCSI devices to settle during -bootup, those 2 seconds will take 2*40000000 cycles (which will take more -time than 2*7000000). +
+
-The -I option is also necessary if you want to run -deterministic experiments, if a mc146818 (or similar) device is present. -
-Some emulators make claims such as "x times slowdown," but in the case of -GXemul, the host is often not a MIPS-based machine, and hence comparing -one MIPS instruction to a host instruction doesn't work. Performance depends on -a lot of factors, including (but not limited to) host architecture, host speed, -which compiler and compiler flags were used to build GXemul, what the -workload is, and so on. For example, if an emulated operating system tries -to read a block from disk, from its point of view the read was instantaneous -(no waiting). So 1 MIPS in an emulated OS might have taken more than one -million instructions on a real machine. Because of this, imho it is best -to measure performance as the actual (real-world) time it takes to perform -a task with the emulator. @@ -183,10 +125,13 @@
Running an entire operating system under emulation is very interesting +in itself, but for several reasons, running a modern OS without access to +TCP/IP networking is a bit akward. Hence, I feel the need to implement +TCP/IP (networking) support in the emulator.
As far as I have understood it, there seems to be two different ways to go: @@ -377,31 +322,28 @@ + +
-NOTE: 2005-02-26: I'm currently rewriting the -device registry subsystem. - -
-(I'll be using the name 'foo' as the name of the device in all these -examples. This is pseudo code, it might need some modification to +
(I'll be using the name "foo" as the name of the device in all +these examples. This is pseudo code, it might need some modification to actually compile and run.) -
-Each device should have the following: +
Each device should have the following:
/* * devinit_foo(): @@ -426,7 +368,7 @@ memory_device_register(devinit->machine->memory, devinit->name, devinit->addr, DEV_FOO_LENGTH, - dev_foo_access, (void *)d, MEM_DEFAULT, NULL); + dev_foo_access, (void *)d, DM_DEFAULT, NULL); /* This should only be here if the device has a tick function: */ @@ -438,19 +380,25 @@ }
struct foo_data { int irq_nr; /* ... */ }
+
- #define FOO_TICKSHIFT 10 + #define FOO_TICKSHIFT 14 void dev_foo_tick(struct cpu *cpu, void *extra) { @@ -463,6 +411,16 @@ }
-NOTE: The regression testing framework is basically just a skeleton so far. -Regression tests are very good to have. However, the fact that complete -operating systems can run in the emulator indicate that the emulation is -probably not too incorrect. This makes it less of a priority to write -regression tests. - -
-To run all the regression tests, type make regtest. Each assembly -language file matching the pattern test_*.S will be compiled and -linked into a 64-bit MIPS ELF (using a gcc cross compiler), and run in the -emulator. If everything goes well, you should see something like this: - -
- $ make regtest - cd tests; make run_tests; cd .. - gcc33 -Wall -fomit-frame-pointer -fmove-all-movables -fpeephole -O2 - -mcpu=ev5 -I/usr/X11R6/include -lm -L/usr/X11R6/lib -lX11 do_tests.c - -o do_tests - do_tests.c: In function `main': - do_tests.c:173: warning: unused variable `s' - /var/tmp//ccFOupvD.o: In function `do_tests': - /var/tmp//ccFOupvD.o(.text+0x3a8): warning: tmpnam() possibly used - unsafely; consider using mkstemp() - mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns -mips64 - -mabi=64 test_common.c -c -o test_common.o - ./do_tests "mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns - -mips64 -mabi=64" "mips64-unknown-elf-as -mabi=64 -mips64" - "mips64-unknown-elf-ld -Ttext 0xa800000000030000 -e main - --oformat=elf64-bigmips" "../gxemul" - - Starting tests: - test_addu.S (-a) - test_addu.S (-a -b) - test_clo_clz.S (-a) - test_clo_clz.S (-a -b) - .. - test_unaligned.S (-a) - test_unaligned.S (-a -b) - - Done. (12 tests done) - PASS: 12 - FAIL: 0 - - ---------------- - - All tests OK - - ---------------- -- -
-Each test writes output to stdout, and there is a test_*.good for -each .S file which contains the wanted output. If the actual -output matches the .good file, then the test passes, otherwise it -fails. - -
-Read tests/README for more information. - - -