--- trunk/doc/technical.html 2007/10/08 16:18:19 8 +++ trunk/doc/technical.html 2007/10/08 16:22:32 42 @@ -1,18 +1,18 @@ -
|
-This page describes some of the internals of GXemul. - -
-NOTE: This page is probably not -very up-to-date by now. +
This page describes some of the internals of GXemul.
-In reality, a lot of things need to be handled. Before each instruction is -executed, the emulator checks to see if any interrupts are asserted which -are not masked away. If so, then an INT exception is generated. Exceptions -cause the program counter to be set to a specific value, and some of the -system coprocessor's registers to be set to values signifying what kind of -exception it was (an interrupt exception in this case). - -
-Reading instructions from memory is done through a TLB, a translation -lookaside buffer. The TLB on MIPS is software controlled, which means that -the program running inside the emulator (for example an operating system -kernel) has to take care of manually updating the TLB. Some memory -addresses are translated into physical addresses directly, some are -translated into valid physical addresses via the TLB, and some memory -references are not valid. Invalid memory references cause exceptions. - -
-After an instruction has been read from memory, the emulator checks which -opcode it contains and executes the instruction. Executing an instruction -usually involves reading some register and writing some register, or perhaps a -load from memory (or a store to memory). The program counter is increased -for every instruction. - -
-Some memory references point to physical addresses which are not in the -normal RAM address space. They may point to hardware devices. If that is -the case, then loads and stores are converted into calls to a device -access function. The device access function is then responsible for -handling these reads and writes. For example, a graphical framebuffer -device may put a pixel on the screen when a value is written to it, or a -serial controller device may output a character to stdout when written to. - -
-Mode a is very slow. On a 2.8 GHz Intel Xeon host the resulting -emulated machine is rougly equal to a 7 MHz R3000 (or a 3.5 MHz R4000). -The actual performance varies a lot, maybe between 5 and 10 million -instructions per second, depending on workload. +
-Mode b ("bintrans") is still to be considered experimental, but -gives higher performance than mode a. It translates MIPS machine -code into machine code that can be executed on the host machine -on-the-fly. The translation itself obviously takes some time, but this is -usually made up for by the fact that the translated code chunks are -executed multiple times. -To run the emulator with binary translation enabled, just add --b to the command line. - -
-Only small pieces of MIPS machine code are translated, usually the size of -a function, or less. There is no "intermediate representation" code, so -all translations are done directly from MIPS to host machine code. +So, how fast is GXemul? There is no short answer to this. There is +especially no answer to the question What is the slowdown factor?, +because the host architecture and emulated architecture can usually not be +compared just like that. + +
Performance depends on several factors, including (but not limited to) +host architecture, target architecture, host clock speed, which compiler +and compiler flags were used to build the emulator, what the workload is, +what additional runtime flags are given to the emulator, and so on. + +
Devices are generally not timing-accurate: for example, if an emulated +operating system tries to read a block from disk, from its point of view +the read was instantaneous (no waiting). So 1 MIPS in an emulated OS might +have taken more than one million instructions on a real machine. + +
Also, if the emulator says it has executed 1 million instructions, and +the CPU family in question was capable of scalar execution (i.e. one cycle +per instruction), it might still have taken more than 1 million cycles on +a real machine because of cache misses and similar micro-architectural +penalties that are not simulated by GXemul. + +
Because of these issues, it is in my opinion best to measure +performance as the actual (real-world) time it takes to perform a task +with the emulator, e.g.: -
-The default bintrans cache size is 16 MB, but you can change this by adding --DDEFAULT_BINTRANS_SIZE_IN_MB=xx to your CFLAGS environment -variable before running the configure script, or by using the -bintrans_size() configuration file option when running the emulator. +
-By default, an emulated OS running under DECstation emulation which listens to -interrupts from the mc146818 clock will get interrupts that are close to the -host's clock. That is, if the emulated OS says it wants 100 interrupts per -second, it will get approximately 100 interrupts per real second. +
So, how fast is it? :-) Answer: it varies. -
-There is however a -I option, which sets the number of -emulated cycles per seconds to a fixed value. Let's say you wish to make the -emulated OS think it is running on a 40 MHz DECstation, and not a 7 MHz one, -then you can add -I 40000000 to the command line. This will not -make the emulation faster, of course. It might even make it seem slower; for -example, if NetBSD/pmax waits 2 seconds for SCSI devices to settle during -bootup, those 2 seconds will take 2*40000000 cycles (which will take more -time than 2*7000000). -
-The -I option is also necessary if you want to run -deterministic experiments, if a mc146818 (or similar) device is present. -
-Some emulators make claims such as "x times slowdown," but in the case of -GXemul, the host is often not a MIPS-based machine, and hence comparing -one MIPS instruction to a host instruction doesn't work. Performance depends on -a lot of factors, including (but not limited to) host architecture, host speed, -which compiler and compiler flags were used to build GXemul, what the -workload is, and so on. For example, if an emulated operating system tries -to read a block from disk, from its point of view the read was instantaneous -(no waiting). So 1 MIPS in an emulated OS might have taken more than one -million instructions on a real machine. Because of this, imho it is best -to measure performance as the actual (real-world) time it takes to perform -a task with the emulator. @@ -183,10 +107,12 @@
Running an entire operating system under emulation is very interesting +in itself, but for several reasons, running a modern OS without access to +TCP/IP networking is a bit akward. Hence, I feel the need to implement +TCP/IP (networking) support in the emulator.
As far as I have understood it, there seems to be two different ways to go: @@ -377,56 +303,51 @@ + +
-NOTE: 2005-02-26: I'm currently rewriting the -device registry subsystem. +Each file called dev_*.c in the +src/devices/ directory is +responsible for one hardware device. These are used from +src/machines/machine_*.c, +when initializing which hardware a particular machine model will be using, +or when adding devices to a machine using the device() command in +configuration files. -
-(I'll be using the name 'foo' as the name of the device in all these -examples. This is pseudo code, it might need some modification to +
(I'll be using the name "foo" as the name of the device in all +these examples. This is pseudo code, it might need some modification to actually compile and run.) -
-Each device should have the following: +
Each device should have the following:
- /* - * devinit_foo(): - */ - int devinit_foo(struct devinit *devinit) + DEVINIT(foo) { - struct foo_data *d = malloc(sizeof(struct foo_data)); + struct foo_data *d; - if (d == NULL) { - fprintf(stderr, "out of memory\n"); - exit(1); - } - memset(d, 0, sizeof(struct foon_data)); + CHECK_ALLOCATION(d = malloc(sizeof(struct foo_data))); + memset(d, 0, sizeof(struct foo_data)); /* * Set up stuff here, for example fill d with useful - * data. devinit contains settings like address, irq_nr, + * data. devinit contains settings like address, irq path, * and other things. * * ... */ + + INTERRUPT_CONNECT(devinit->interrupt_path, d->irq); memory_device_register(devinit->machine->memory, devinit->name, devinit->addr, DEV_FOO_LENGTH, - dev_foo_access, (void *)d, MEM_DEFAULT, NULL); + dev_foo_access, (void *)d, DM_DEFAULT, NULL); /* This should only be here if the device has a tick function: */ @@ -438,45 +359,76 @@ }
DEVINIT(foo) is defined as int devinit_foo(struct devinit *devinit), + and the devinit argument contains everything that the device driver's + initialization function needs. + +
+
struct foo_data { - int irq_nr; + struct interrupt irq; /* ... */ }
+
- #define FOO_TICKSHIFT 10 + #define FOO_TICKSHIFT 14 - void dev_foo_tick(struct cpu *cpu, void *extra) + DEVICE_TICK(foo) { - struct foo_data *d = (struct foo_data *) extra; + struct foo_data *d = extra; if (.....) - cpu_interrupt(cpu, d->irq_nr); + INTERRUPT_ASSERT(d->irq); else - cpu_interrupt_ack(cpu, d->irq_nr); + INTERRUPT_DEASSERT(d->irq); }
- int dev_foo_access(struct cpu *cpu, struct memory *mem, + to an address which is in the device' memory mapped region. To + simplify things a little, a macro DEVICE_ACCESS(x) + is expanded into+ int dev_x_access(struct cpu *cpu, struct memory *mem, uint64_t relative_addr, unsigned char *data, size_t len, int writeflag, void *extra) +The access function can look like this: ++ DEVICE_ACCESS(foo) { struct foo_data *d = extra; uint64_t idata = 0, odata = 0; - idata = memory_readmax64(cpu, data, len); + if (writeflag == MEM_WRITE) + idata = memory_readmax64(cpu, data, len); + switch (relative_addr) { - /* .... */ + + /* Handle accesses to individual addresses within + the device here. */ + + /* ... */ + } if (writeflag == MEM_READ) @@ -513,76 +465,6 @@ -Regression tests
- -In order to make sure that the emulator actually works like it is supposed -to, it must be tested. For this purpose, there is a simple regression -testing framework in the tests/ directory. - --NOTE: The regression testing framework is basically just a skeleton so far. -Regression tests are very good to have. However, the fact that complete -operating systems can run in the emulator indicate that the emulation is -probably not too incorrect. This makes it less of a priority to write -regression tests. - -
-To run all the regression tests, type make regtest. Each assembly -language file matching the pattern test_*.S will be compiled and -linked into a 64-bit MIPS ELF (using a gcc cross compiler), and run in the -emulator. If everything goes well, you should see something like this: - -
- $ make regtest - cd tests; make run_tests; cd .. - gcc33 -Wall -fomit-frame-pointer -fmove-all-movables -fpeephole -O2 - -mcpu=ev5 -I/usr/X11R6/include -lm -L/usr/X11R6/lib -lX11 do_tests.c - -o do_tests - do_tests.c: In function `main': - do_tests.c:173: warning: unused variable `s' - /var/tmp//ccFOupvD.o: In function `do_tests': - /var/tmp//ccFOupvD.o(.text+0x3a8): warning: tmpnam() possibly used - unsafely; consider using mkstemp() - mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns -mips64 - -mabi=64 test_common.c -c -o test_common.o - ./do_tests "mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns - -mips64 -mabi=64" "mips64-unknown-elf-as -mabi=64 -mips64" - "mips64-unknown-elf-ld -Ttext 0xa800000000030000 -e main - --oformat=elf64-bigmips" "../gxemul" - - Starting tests: - test_addu.S (-a) - test_addu.S (-a -b) - test_clo_clz.S (-a) - test_clo_clz.S (-a -b) - .. - test_unaligned.S (-a) - test_unaligned.S (-a -b) - - Done. (12 tests done) - PASS: 12 - FAIL: 0 - - ---------------- - - All tests OK - - ---------------- -- --Each test writes output to stdout, and there is a test_*.good for -each .S file which contains the wanted output. If the actual -output matches the .good file, then the test passes, otherwise it -fails. - -
-Read tests/README for more information. - - -