--- trunk/TODO 2007/10/08 16:22:20 41 +++ trunk/TODO 2007/10/08 16:22:32 42 @@ -1,120 +1,33 @@ -$Id: TODO,v 1.489 2007/05/01 04:05:06 debug Exp $ +$Id: TODO,v 1.536 2007/06/15 22:30:17 debug Exp $ --------------------------------------------- +Some things, in no specific order, that I'd like to fix: +(Some items in this list are perhaps already fixed.) -Testing for the next release (0.4.5.1): - -TEST DISK OVERLAY IMAGES -TEST LANDISK/SUPERH EMULATION MODES -REGRESSION TESTS FOR ALL OTHER SUPPORTED GUEST OSES - -# NetBSD/pmax 3.1 or 1.6.2 OK -# NetBSD/arc 1.6.2 OK -# NetBSD/hpcmips 3.1 OK -# NetBSD/cobalt 3.1 OK -# NetBSD/evbmips 3.1 OK -# NetBSD/algor 3.1 OK -# NetBSD/sgimips 3.1 OK -# NetBSD/cats 3.1 OK -# NetBSD/evbarm 2.1 OK -# NetBSD/netwinder 3.1 OK -# NetBSD/prep 2.1 OK -# NetBSD/macppc 3.1 OK -# NetBSD/dreamcast 3.1 MD OK -# NetBSD/dreamcast 3.1 LiveCD OK -# Linux/dreamcast Live CD OK -# OpenBSD/pmax 2.8-BETA not tested because of lack of time -# OpenBSD/cats 4.0 OK -# OpenBSD/landisk 4.1 OK -# Ultrix/RISC 4.5 OK -# Sprite for DECstation OK -# Debian GNU/Linux for pmax not tested because of lack of time - -Optional: -# OpenBSD/sgi FAILED to boot after setup (as expected) - --------------------------------------------- - -Some things, in totally random order, that I'd like to fix: -(Some items in this list are possibly out-of-date by now.) - -Dyntrans: - x) Instruction combination collisions? How to avoid easily... - x) Think about how to do both SHmedia and SHcompact in a reasonable - way! (Or AMD64 long/protected/real, for that matter.) - x) 68K emulation; think about how to do variable instruction - lengths across page boundaries. - x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, - it would be reasonably simple to add; in each individual fast - load/store routine = a lot more work, and it would become - kludgy very fast.) - x) Dyntrans with SMP... lots of work to be done here. - x) Dyntrans with cache emulation... lots of work here as well. - x) Remove the concept of base RAM completely; it would be more - generic to allow RAM devices to be used "anywhere". - o) dev_mp doesn't work well with dyntrans yet - o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans - x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, - so that it caches a translation (that is, an instruction - word and the instr_call it was translated to the last - time), so that it doesn't need to do slow - to_be_translated for each end of page? - x) Program Counter statistics: - Per machine? What about SMP? All data to the same file? - A debugger command should be possible to use to enable/ - disable statistics gathering. - Configuration file option! - x) Breakpoints: - o) Physical vs virtual addresses! - o) 32-bit vs 64-bit sign extension for MIPS, and others? - x) INVALIDATION should cause translations in _all_ cpus to be - invalidated, e.g. on a write to a write-protected page - (containing code) - x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) - x) Lots of other stuff: see src/cpus/README_DYNTRANS - x) Native code generation backends: - o) think carefully about this. - o) simple syntax for emitting opcodes; backend implementation - must be optional, so I don't have to write more code - than necessary. after all, the non-native (C) code should - always work. - o) convert into native code only after an entire - block has been translated? probably best. - o) the "almost native" opcodes may be rearranged, - "peep-hole optimized", etc. and then as a separate step - this list of almost native opcodes is written out - as native code. - o) think about delay slots at the end of a block! - o) x86/amd64 code generator can be very similar... perhaps - o) NOTE that generation is per _ABI_, not per host arch! - the configure script must detect ABI!!! - o) branches to already translated code blocks can - link the blocks together - o) load/store are the most important to optimize - -Simple Valgrind-like checks? - o) Mark every address with bits which tell whether or not the address - has been written to. - o) What should happen when programs are loaded? Text/data, bss (zero - filled). But stack space and heap is uninitialized. - o) Uninitialized local variables: - A load from a place on the stack which has not previously - been stored to => warning. Increasing the stack pointer using - any available means should reset the memory to uninitialized. - o) If calls to malloc() and free() can be intercepted: - o) Access to a memory area after free() => warning. - o) Memory returned by malloc() is marked as not-initialized. - o) Non-passive, but good to have: Change the argument - given to malloc, to return a slightly larger memory - area, i.e. margin_before + size + margin_after, - and return the pointer + margin_before. - Any access to the margin_before or _after space results - in warnings. (free() must be modified to free the - actually allocated address.) +M88K: + o) Neither NIP nor FIP valid in rte? + o) FIP != NIP + 4, in rte! (Simulate delayed branch stuff.) + o) cpu_dyntrans.c: MEMORY_USER_ACCESS implementation for M88K! + o) xmem: Set transaction registers! + o) CMMUs: + o) Translation invalidations, could be optimized. + o) Move initialization from dev_mvme187 to somewhere + more reasonable? + o) Instruction trace by using bits of ??IP control regs. + o) Interrupts (these are machine dependent, though). + o) Implement devices etc. for one or more machine modes, + to get some guest OS running. OpenBSD/mvme88k on MVME187 + seems to be the smartest path to follow for now. + o) VME bus device + o) PCC2 + o) Cirrus Logic serial port controller + o) Instruction disassembly, and implementation: + o) See http://www.panggih.staff.ugm.ac.id/download/GCC/info/gcc.i5 + for some strange cases of when "div" can fail (?) + o) Floating point stuff + o) "Graphics" instructions (M88110-specific) MIPS: o) Nicer MIPS status bits in register dumps. - o) Alignment exceptions. o) Floating point exception correctness. o) Fix this? Triggered by NetBSD/sgimips? Hm: to_be_translated(): TODO: unimplemented instruction: @@ -122,19 +35,21 @@ o) Some more work on opcodes. x) MIPS64 revision 2. o) Find out which actual CPUs implement the rev2 ISA! + o) DINS, DINSM, DINSU etc o) DROTR32 and similar MIPS64 rev 2 instructions, which have a rotation bit which differs from previous ISAs. - o) EI and DI instructions for MIPS64/32 rev 2. - NOTE: These are _NOT_ the same as for R5900! x) _MAYBE_ TX79 and R5900 actually differ in their opcodes? Check this carefully! o) Dyntrans: Count register updates are probably not 100% correct yet. o) Refactor code for performance and readability/maintainability. o) (Re)implement 128-bit loads/stores for R5900. + o) Coprocessor 1x (i.e. 3) should cause cp1 exceptions, not 3? + (See http://lists.gnu.org/archive/html/qemu-devel/2007-05/msg00005.html) o) R4000 and others: x) watchhi/watchlo exceptions, and other exception handling details + o) MIPS 5K* have 42 physical address bits, not 40/44? o) R10000 and others: (R12000, R14000 ?) x) The code before the line /* reg[COP0_PAGEMASK] = cpu->cd.mips.coproc[0]->tlbs[0].mask & PAGEMASK_MASK; */ @@ -149,10 +64,10 @@ (http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_284.html) SuperH: - x) SH4 performance is VERY low when running user-space instructions, - because I actually simulate the 4-entry ITLB as being separate - from the 64-entry DTLB. This is correct, but VERY slow. I need - to experiment with _not_ simulating it in too much detail. + x) Auto-generation of loads/stores! This should get rid of at least + the endianness check in each load/store. + x) Experiment with whether or not correct ITLB emulation is + actually needed. (20070522: I'm turning it off today.) x) SH4 interrupt controller: x) MASKING should be possible! x) SH4 DMA (0xffa00000) @@ -163,9 +78,7 @@ x) Instruction tracing should include symbols for branch targets, and so on, to make the output more human readable. x) SH3-specific devices: Pretty much everything! - x) NetBSD/evbsh3, mmeye, hpcsh! Linux? - x) Replace pc-relative loads with immediate load, if within the - same page. (Similar to the same optimization for ARM.) + x) NetBSD/evbsh3, hpcsh! Linux? x) Floating point speed! x) Floating point exception correctness. x) NetBSD HEAD (as of April 2007) hangs during bootup, because it @@ -222,23 +135,6 @@ o) SPARC v8, v7 etc? o) More machine modes and devices. -Debugger: - o) How does SMP debugging work? Does it simply use "threads"? - What if the guest OS (running on an emulated SMP machine) - has a usertask running, with userland threads? - o) Try to make the debugger more modular and, if possible, reentrant! - o) Remove the emul command? (But show network info if showing - machines?) - o) Evaluate expressions within []? That would allow stuff like - cpu[x] where x is an expression. - o) Settings: - x) Special handlers for Write! - +) MIPS coproc regs - +) Alpha/MIPS/SPARC zero registers - +) x86 64/32/16-bit registers - x) Value formatter for resulting output. - o) see src/debugger.c for more - POWER/PowerPC: x) Fix DECR timer speed, so it matches the host. x) NetBSD/prep 3.x triggers a possible bug in the emulator: @@ -280,31 +176,13 @@ x) Mouse/pad support! :) x) A NIC? (As a PCMCIA device?) -M88K: - o) Everything. :) - o) More instruction disassembly! - o) Implement more instructions. - o) has-delay-slot (for debugging) - o) Find manuals! - o) MMU stuff - o) Exceptions - o) FPU - o) Control registers - -AVR: - o) Everything. - ARM: o) See netwinder_reset() in NetBSD; the current "an internal error occured" message after reboot/halt is too ugly. - o) ARM "wait"-like instruction? + o) Generic ARM "wait"-like instruction? o) try to get netbsd/evbarm 3.x or 4.x running (iq80321) o) make the xscale counter registers (ccnt) work o) make the ata controller usable for FreeBSD! - o) Zaurus emulation: - x) OpenBSD/zaurus - x) NetBSD/zaurus? See the following URL: - http://mail-index.netbsd.org/port-arm/2006/11/19/0000.html o) Debian/cats crashes because of unimplemented coproc stuff. fix this? @@ -319,6 +197,139 @@ halt(); } +Debugger: + o) How does SMP debugging work? Does it simply use "threads"? + What if the guest OS (running on an emulated SMP machine) + has a usertask running, with userland threads? + o) Try to make the debugger more modular and, if possible, reentrant! + o) Remove the emul command? (But show network info if showing + machines?) + o) Memory dumps should be able to dump both physical and + virtual emulated memory. + o) Evaluate expressions within []? That would allow stuff like + cpu[x] where x is an expression. + o) "pc = pc + 4" doesn't work! Bug. Should work. ("pc=pc+4" works.) + o) Settings: + x) Special handlers for Write! + +) MIPS coproc regs + +) Alpha/MIPS/SPARC zero registers + +) x86 64/32/16-bit registers + x) Value formatter for resulting output. + o) Call stack display (back-trace) of emulated programs. + o) Nicer looking output of register dumps, floating point registers, + etc. Warn about weird/invalid register contents. + o) Ctrl-C doesn't enter the debugger on some OSes (HP-UX?)... + +Dyntrans: + x) For 32-bit emulation modes, that have emulated TLBs: tlbindex + arrays of mapped pages? Things to think about: + x) Only 32-bit mode! (64-bit => too much code) + x) One array for global pages, and one array _PER ASID_, + for those archs that support that. On M88K, there should + be one array for userspace, and one for supervisor, etc. + x) Larger-than-4K-pages must fill several bits in the array. + x) No TLB search will be necessary. + x) Total host space used, for 4 KB pages: 1 MB per table, + i.e. 65 MB for 32-bit MIPS, 2 MB for M88K, if one byte + is used as the tlb index. + x) (The index is actually +1, so that 0 means no hit.) + x) "Merge" the cur_physpage and cur_ic_page variables/pointers to + one? I.e. change cur_ic_page to cur_physpage.ic_page or something. + x) Instruction combination collisions? How to avoid easily... + x) Think about how to do both SHmedia and SHcompact in a reasonable + way! (Or AMD64 long/protected/real, for that matter.) + x) 68K emulation; think about how to do variable instruction + lengths across page boundaries. + x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, + it would be reasonably simple to add; in each individual fast + load/store routine = a lot more work, and it would become + kludgy very fast.) + x) Dyntrans with SMP... lots of work to be done here. + x) Dyntrans with cache emulation... lots of work here as well. + x) Remove the concept of base RAM completely; it would be more + generic to allow RAM devices to be used "anywhere". + o) dev_mp doesn't work well with dyntrans yet + o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans + x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, + so that it caches a translation (that is, an instruction + word and the instr_call it was translated to the last + time), so that it doesn't need to do slow + to_be_translated for each end of page? + x) Program Counter statistics: + Per machine? What about SMP? All data to the same file? + A debugger command should be possible to use to enable/ + disable statistics gathering. + Configuration file option! + x) Breakpoints: + o) Physical vs virtual addresses! + o) 32-bit vs 64-bit sign extension for MIPS, and others? + x) INVALIDATION should cause translations in _all_ cpus to be + invalidated, e.g. on a write to a write-protected page + (containing code) + x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) + x) Lots of other stuff: see src/cpus/README_DYNTRANS + x) Native code generation backends: + o) calculate at runtime whether or not chunks of emulated + (physical) memory are worth translating to native code + (it is assumed that it has high overhead) + o) experiment with calling the host's cc and ld externally; + extremely high overhead, but could be interesting none- + theless. + o) experiment with using LLVM, or GNU Lightning? + o) Important cases to think about: + x) loads/stores + x) delay branches + x) other kinds of calls, branches + o) branches to already translated code blocks can + link the blocks together (block-chaining), although + I'll probably want to wait with this until other + things work. + o) The first tests should be done with "testm88k", because + that does not affect other modes. + +------------------------------------------------------------------------------- + +Performance comparison when emulating the QEMU_MIPS machine (QEMU's default +MIPS machine mode): + +mips-test-0.2: +-------------- + +1. while true; do ls -l > /dev/null; echo -n .; done, 80x36 dots +2. while true; do /usr/bin/md5sum /usr/bin/* > /dev/null; echo -n .; done, 80 dots +3. while true; do grep hej lib/libc.so.6 > /dev/null; echo -n .; done, 80 dots + + Test 1 Test 2 Test 3 + ------ ------ ------ +QEMU 0.9.0: 2 min 20 sec 45 sec 4 min 41 seconds +GXemul-20070608: 1 min 59 sec 3 min 18 sec 18 min 10 seconds [A] + + +[A] = Normal portable dyntrans, no native code generation. + +------------------------------------------------------------------------------- + + +Simple Valgrind-like checks? + o) Mark every address with bits which tell whether or not the address + has been written to. + o) What should happen when programs are loaded? Text/data, bss (zero + filled). But stack space and heap is uninitialized. + o) Uninitialized local variables: + A load from a place on the stack which has not previously + been stored to => warning. Increasing the stack pointer using + any available means should reset the memory to uninitialized. + o) If calls to malloc() and free() can be intercepted: + o) Access to a memory area after free() => warning. + o) Memory returned by malloc() is marked as not-initialized. + o) Non-passive, but good to have: Change the argument + given to malloc, to return a slightly larger memory + area, i.e. margin_before + size + margin_after, + and return the pointer + margin_before. + Any access to the margin_before or _after space results + in warnings. (free() must be modified to free the + actually allocated address.) + Better CD Image file support: x) Support CD formats that contain more than 1 track, e.g. CDI files (?). These can then contain a mixture of e.g. sound @@ -369,6 +380,7 @@ is another option (easier to implement, but very very slow). Documentation: + x) Update the documentation regarding the testmachine interrupts. x) Note about sandboxing/security: Not all emulated instructions fail in the way they would do on real hardware (e.g. a userspace program writing to @@ -402,14 +414,6 @@ that use 3MAX into using CATS or hpcmips? (To remove the need to use a raw ffs partition, using up all of the disk image.) -More generic out_of_memory error reporting, and check everywhere! - Causes: OpenBSD has low default limits for normal users. - Host is 32-bit? (32-bit hosts are limited to 4 GB or less - of userspace memory.) - You are actually low on RAM. (As trivial as this might sound, - Unix systems usually allow processes to allocate virtual - memory beyond the amount of RAM in the machine.) - The Device subsystem: x) allow devices to be moved and/or changed in size (down to a minimum size, etc, or up to a max size); if there is a collision, @@ -443,9 +447,8 @@ Clocks and timers: x) Fix the PowerPC DECR interrupt speed! (MacPPC and PReP speed, etc.) x) DON'T HARDCODE 100 HZ IN cpu_mips_coproc.c! - x) Test the 8253? Right now it doesn't seem to be used? - x) NetWinder timeofday is incorrect! It seems to be exactly - 1 day ahead of actual time? + x) NetWinder timeofday is incorrect! Huh? grep -R for ta_rtc_read in + NetBSD sources; it doesn't seem to be initialized _AT ALL_?! x) Cobalt TOD is incorrect! x) Go through all other machines, one by one, and fix them. @@ -464,13 +467,18 @@ o) non-IEEE modes (i.e. x86)? Userland emulation: + x) Try to prefix "/emul/mips/" or similar to all filenames, + and only if that fails, try the given filename. + Read this setting from an environment variable, and only + if there is none, fall back to hardcoded string. + x) File descriptor (0,1,2) assumptions? Find and fix these? x) Dynamic linking! x) Lots of stuff; freebsd, netbsd, linux, ... syscalls. x) Initial register/stack contents (environment, command line args). x) Return value (from main). x) mmap emulation layer x) errno emulation layer - x) struct conversions for may syscalls + x) struct conversions for many syscalls Sound: x) generic sound framework @@ -542,3 +550,5 @@ o) Generalize the framebuffer stuff by moving _ALL_ X11 specific code to src/x11.c! +------------------------------------------------------------------------------- +