GXemul documentation: Technical details

--- trunk/doc/technical.html 2007/10/08 16:18:19 8 +++ trunk/doc/technical.html 2007/10/08 16:22:32 42 @@ -1,18 +1,18 @@ -GXemul documentation: Technical details +Gavare's eXperimental Emulator: Technical details

-GXemul documentation: +Gavare's eXperimental Emulator:
Technical details

+ Back to the index

Technical details

-This page describes some of the internals of GXemul. - -

-NOTE: This page is probably not -very up-to-date by now. +

This page describes some of the internals of GXemul.

Overview -
Speed +
Speed and emulation modes
Networking
Emulation of hardware devices -
Regression tests

- -

Overview

- -In simple terms, GXemul is just a simple fetch-and-execute -loop; an instruction is fetched from memory, and executed. - -

-In reality, a lot of things need to be handled. Before each instruction is -executed, the emulator checks to see if any interrupts are asserted which -are not masked away. If so, then an INT exception is generated. Exceptions -cause the program counter to be set to a specific value, and some of the -system coprocessor's registers to be set to values signifying what kind of -exception it was (an interrupt exception in this case). - -

-Reading instructions from memory is done through a TLB, a translation -lookaside buffer. The TLB on MIPS is software controlled, which means that -the program running inside the emulator (for example an operating system -kernel) has to take care of manually updating the TLB. Some memory -addresses are translated into physical addresses directly, some are -translated into valid physical addresses via the TLB, and some memory -references are not valid. Invalid memory references cause exceptions. - -

-After an instruction has been read from memory, the emulator checks which -opcode it contains and executes the instruction. Executing an instruction -usually involves reading some register and writing some register, or perhaps a -load from memory (or a store to memory). The program counter is increased -for every instruction. - -

-Some memory references point to physical addresses which are not in the -normal RAM address space. They may point to hardware devices. If that is -the case, then loads and stores are converted into calls to a device -access function. The device access function is then responsible for -handling these reads and writes. For example, a graphical framebuffer -device may put a pixel on the screen when a value is written to it, or a -serial controller device may output a character to stdout when written to. - -

Speed

- -There are two modes in which the emulator can run, a) a straight forward -loop which fetches one instruction from emulated RAM and executes it -(described in the previous section), and b) -using dynamic binary translation. - -

-Mode a is very slow. On a 2.8 GHz Intel Xeon host the resulting -emulated machine is rougly equal to a 7 MHz R3000 (or a 3.5 MHz R4000). -The actual performance varies a lot, maybe between 5 and 10 million -instructions per second, depending on workload. +

Speed and emulation modes

-Mode b ("bintrans") is still to be considered experimental, but -gives higher performance than mode a. It translates MIPS machine -code into machine code that can be executed on the host machine -on-the-fly. The translation itself obviously takes some time, but this is -usually made up for by the fact that the translated code chunks are -executed multiple times. -To run the emulator with binary translation enabled, just add --b to the command line. - -

-Only small pieces of MIPS machine code are translated, usually the size of -a function, or less. There is no "intermediate representation" code, so -all translations are done directly from MIPS to host machine code. +So, how fast is GXemul? There is no short answer to this. There is +especially no answer to the question What is the slowdown factor?, +because the host architecture and emulated architecture can usually not be +compared just like that. + +

Performance depends on several factors, including (but not limited to) +host architecture, target architecture, host clock speed, which compiler +and compiler flags were used to build the emulator, what the workload is, +what additional runtime flags are given to the emulator, and so on. + +

Devices are generally not timing-accurate: for example, if an emulated +operating system tries to read a block from disk, from its point of view +the read was instantaneous (no waiting). So 1 MIPS in an emulated OS might +have taken more than one million instructions on a real machine. + +

Also, if the emulator says it has executed 1 million instructions, and +the CPU family in question was capable of scalar execution (i.e. one cycle +per instruction), it might still have taken more than 1 million cycles on +a real machine because of cache misses and similar micro-architectural +penalties that are not simulated by GXemul. + +

Because of these issues, it is in my opinion best to measure +performance as the actual (real-world) time it takes to perform a task +with the emulator, e.g.: -

-The default bintrans cache size is 16 MB, but you can change this by adding --DDEFAULT_BINTRANS_SIZE_IN_MB=xx to your CFLAGS environment -variable before running the configure script, or by using the -bintrans_size() configuration file option when running the emulator. +

"How long does it take to install NetBSD onto a disk image?" +
"How long does it take to compile XYZ inside NetBSD + in the emulator?". +

-By default, an emulated OS running under DECstation emulation which listens to -interrupts from the mc146818 clock will get interrupts that are close to the -host's clock. That is, if the emulated OS says it wants 100 interrupts per -second, it will get approximately 100 interrupts per real second. +

So, how fast is it? :-) Answer: it varies. -

-There is however a -I option, which sets the number of -emulated cycles per seconds to a fixed value. Let's say you wish to make the -emulated OS think it is running on a 40 MHz DECstation, and not a 7 MHz one, -then you can add -I 40000000 to the command line. This will not -make the emulation faster, of course. It might even make it seem slower; for -example, if NetBSD/pmax waits 2 seconds for SCSI devices to settle during -bootup, those 2 seconds will take 2*40000000 cycles (which will take more -time than 2*7000000). -

-The -I option is also necessary if you want to run -deterministic experiments, if a mc146818 (or similar) device is present. -

-Some emulators make claims such as "x times slowdown," but in the case of -GXemul, the host is often not a MIPS-based machine, and hence comparing -one MIPS instruction to a host instruction doesn't work. Performance depends on -a lot of factors, including (but not limited to) host architecture, host speed, -which compiler and compiler flags were used to build GXemul, what the -workload is, and so on. For example, if an emulated operating system tries -to read a block from disk, from its point of view the read was instantaneous -(no waiting). So 1 MIPS in an emulated OS might have taken more than one -million instructions on a real machine. Because of this, imho it is best -to measure performance as the actual (real-world) time it takes to perform -a task with the emulator. @@ -183,10 +107,12 @@

Networking

-Running an entire operating system under emulation is very interesting in -itself, but for several reasons, running a modern OS without access to -TCP/IP networking is a bit akward. Hence, I feel the need to implement TCP/IP -(networking) support in the emulator. +NOTE/TODO: This section is very old. + +

Running an entire operating system under emulation is very interesting +in itself, but for several reasons, running a modern OS without access to +TCP/IP networking is a bit akward. Hence, I feel the need to implement +TCP/IP (networking) support in the emulator.

As far as I have understood it, there seems to be two different ways to go: @@ -377,56 +303,51 @@ + +

Emulation of hardware devices

-Each file in the device/ directory is responsible for one hardware device. -These are used from src/machine.c, when initializing which hardware a -particular machine model will be using, or when adding devices to a -machine using the device() command in configuration files. - -

-NOTE: 2005-02-26: I'm currently rewriting the -device registry subsystem. +Each file called dev_*.c in the +src/devices/ directory is +responsible for one hardware device. These are used from +src/machines/machine_*.c, +when initializing which hardware a particular machine model will be using, +or when adding devices to a machine using the device() command in +configuration files. -

-(I'll be using the name 'foo' as the name of the device in all these -examples. This is pseudo code, it might need some modification to +

(I'll be using the name "foo" as the name of the device in all +these examples. This is pseudo code, it might need some modification to actually compile and run.) -

-Each device should have the following: +

Each device should have the following:

A devinit function in dev_foo.c. It would typically look - something like this: +

A devinit function in src/devices/dev_foo.c. It + would typically look something like this:

-	/*
-	 *  devinit_foo():
-	 */
-	int devinit_foo(struct devinit *devinit)
+	DEVINIT(foo)
 	{
-	        struct foo_data *d = malloc(sizeof(struct foo_data));
+	        struct foo_data *d;
 
-	        if (d == NULL) {
-	                fprintf(stderr, "out of memory\n");
-	                exit(1);
-	        }
-	        memset(d, 0, sizeof(struct foon_data));
+		CHECK_ALLOCATION(d = malloc(sizeof(struct foo_data)));
+	        memset(d, 0, sizeof(struct foo_data));
 
 		/*
 		 *  Set up stuff here, for example fill d with useful
-		 *  data. devinit contains settings like address, irq_nr,
+		 *  data. devinit contains settings like address, irq path,
 		 *  and other things.
 		 *
 		 *  ...
 		 */
+
+		INTERRUPT_CONNECT(devinit->interrupt_path, d->irq);
         
 	        memory_device_register(devinit->machine->memory, devinit->name,
 	            devinit->addr, DEV_FOO_LENGTH,
-	            dev_foo_access, (void *)d, MEM_DEFAULT, NULL);
+	            dev_foo_access, (void *)d, DM_DEFAULT, NULL);
         
 		/*  This should only be here if the device
 		    has a tick function:  */
@@ -438,45 +359,76 @@
 	}

At the top of dev_foo.c, the foo_data struct should be defined. +
DEVINIT(foo) is defined as int devinit_foo(struct devinit *devinit), + and the devinit argument contains everything that the device driver's + initialization function needs. + +
+

At the top of dev_foo.c, the foo_data struct + should be defined.

 	struct foo_data {
-		int	irq_nr;
+		struct interrupt	irq;
 		/*  ...  */
 	}

- -

If foo has a tick function (that is, something that needs to be - run at regular intervals) then FOO_TICKSHIFT and a tick function - need to be defined as well: + (There is an exception to this rule; some legacy code and other + ugly hacks have their device structs defined in + src/include/devices.h instead of dev_foo.c. + New code should not add stuff to devices.h.) +
+

If foo has a tick function (that is, something that needs to be + run at regular intervals) then FOO_TICKSHIFT and a tick + function need to be defined as well:

-	#define FOO_TICKSHIFT		10
+	#define FOO_TICKSHIFT		14
 
-	void dev_foo_tick(struct cpu *cpu, void *extra)
+	DEVICE_TICK(foo)
 	{
-		struct foo_data *d = (struct foo_data *) extra;
+		struct foo_data *d = extra;
 
 		if (.....)
-			cpu_interrupt(cpu, d->irq_nr);
+			INTERRUPT_ASSERT(d->irq);
 		else
-			cpu_interrupt_ack(cpu, d->irq_nr);
+			INTERRUPT_DEASSERT(d->irq);
 	}

Does this device belong to a standard bus? +
- If this device should be detectable as a PCI device, then + glue code should be added to + src/devices/bus_pci.c. +
- If this is a legacy ISA device which should be usable by + any machine which has an ISA bus, then the device should + be added to src/devices/bus_isa.c. +
+

And last but not least, the device should have an access function. The access function is called whenever there is a load or store - to an address which is in the device' memory mapped region. -

-	int dev_foo_access(struct cpu *cpu, struct memory *mem,
+	to an address which is in the device' memory mapped region. To
+	simplify things a little, a macro DEVICE_ACCESS(x)
+	is expanded into+	int dev_x_access(struct cpu *cpu, struct memory *mem,
 	    uint64_t relative_addr, unsigned char *data, size_t len,
 	    int writeflag, void *extra)
+	The access function can look like this:
++	DEVICE_ACCESS(foo)
 	{
 		struct foo_data *d = extra;
 		uint64_t idata = 0, odata = 0;
 
-		idata = memory_readmax64(cpu, data, len);
+		if (writeflag == MEM_WRITE)
+			idata = memory_readmax64(cpu, data, len);
+
 		switch (relative_addr) {
-		/* .... */
+
+		/*  Handle accesses to individual addresses within
+		    the device here.  */
+
+		/*  ...  */
+
 		}
 
 		if (writeflag == MEM_READ)
@@ -513,76 +465,6 @@
 
 
 
-

-
-
Regression tests
-
-In order to make sure that the emulator actually works like it is supposed 
-to, it must be tested. For this purpose, there is a simple regression 
-testing framework in the tests/ directory.
-
-
-NOTE:  The regression testing framework is basically just a skeleton so far.
-Regression tests are very good to have. However, the fact that complete
-operating systems can run in the emulator indicate that the emulation is
-probably not too incorrect. This makes it less of a priority to write 
-regression tests.
-
-

-To run all the regression tests, type make regtest. Each assembly 
-language file matching the pattern test_*.S will be compiled and 
-linked into a 64-bit MIPS ELF (using a gcc cross compiler), and run in the 
-emulator. If everything goes well, you should see something like this:
-
-
-	$ make regtest
-	cd tests; make run_tests; cd ..
-	gcc33 -Wall -fomit-frame-pointer -fmove-all-movables -fpeephole -O2 
-		-mcpu=ev5 -I/usr/X11R6/include -lm -L/usr/X11R6/lib -lX11  do_tests.c
-		-o do_tests
-	do_tests.c: In function `main':
-	do_tests.c:173: warning: unused variable `s'
-	/var/tmp//ccFOupvD.o: In function `do_tests':
-	/var/tmp//ccFOupvD.o(.text+0x3a8): warning: tmpnam() possibly used
-		unsafely; consider using mkstemp()
-	mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns -mips64 
-		-mabi=64 test_common.c -c -o test_common.o
-	./do_tests "mips64-unknown-elf-gcc -g -O3 -fno-builtin -fschedule-insns 
-		-mips64 -mabi=64" "mips64-unknown-elf-as -mabi=64 -mips64" 
-		"mips64-unknown-elf-ld -Ttext 0xa800000000030000 -e main 
-		--oformat=elf64-bigmips" "../gxemul"
-
-	Starting tests:
-	  test_addu.S (-a)
-	  test_addu.S (-a -b)
-	  test_clo_clz.S (-a)
-	  test_clo_clz.S (-a -b)
-	  ..
-	  test_unaligned.S (-a)
-	  test_unaligned.S (-a -b)
-
-	Done. (12 tests done)
-	    PASS:     12
-	    FAIL:      0
-
-	----------------
-
-	  All tests OK
-
-	----------------
-
-
-
-Each test writes output to stdout, and there is a test_*.good for 
-each .S file which contains the wanted output. If the actual 
-output matches the .good file, then the test passes, otherwise it 
-fails.
-
-

-Read tests/README for more information.
-
-
-