--- trunk/TODO 2007/10/08 16:21:53 38 +++ trunk/TODO 2007/10/08 16:22:32 42 @@ -1,85 +1,33 @@ -$Id: TODO,v 1.476 2007/04/14 05:39:47 debug Exp $ +$Id: TODO,v 1.536 2007/06/15 22:30:17 debug Exp $ -Some things, in totally random order, that I'd like to fix: -(Some items in this list are probably out-to-date by now.) +Some things, in no specific order, that I'd like to fix: +(Some items in this list are perhaps already fixed.) -Dyntrans: - x) Instruction combination collisions? How to avoid easily... - x) Think about how to do both SHmedia and SHcompact in a reasonable - way! (Or AMD64 long/protected/real, for that matter.) - x) 68K emulation; think about how to do variable instruction - lengths across page boundaries. - x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, - it would be reasonably simple to add; in each individual fast - load/store routine = a lot more work, and it would become - kludgy very fast.) - x) Dyntrans with SMP... lots of work to be done here. - x) Dyntrans with cache emulation... lots of work here as well. - x) Remove the concept of base RAM completely; it would be more - generic to allow RAM devices to be used "anywhere". - o) dev_mp doesn't work well with dyntrans yet - o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans - x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, - so that it caches a translation (that is, an instruction - word and the instr_call it was translated to the last - time), so that it doesn't need to do slow - to_be_translated for each end of page? - x) Program Counter statistics: - Per machine? What about SMP? All data to the same file? - A debugger command should be possible to use to enable/ - disable statistics gathering. - Configuration file option! - x) Breakpoints: - o) Physical vs virtual addresses! - o) 32-bit vs 64-bit sign extension for MIPS, and others? - x) INVALIDATION should cause translations in _all_ cpus to be - invalidated, e.g. on a write to a write-protected page - (containing code) - x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) - x) Lots of other stuff: see src/cpus/README_DYNTRANS - x) Native code generation backends: - o) think carefully about this. - o) simple syntax for emitting opcodes; backend implementation - must be optional, so I don't have to write more code - than necessary. after all, the non-native (C) code should - always work. - o) convert into native code only after an entire - block has been translated? probably best. - o) the "almost native" opcodes may be rearranged, - "peep-hole optimized", etc. and then as a separate step - this list of almost native opcodes is written out - as native code. - o) think about delay slots at the end of a block! - o) x86/amd64 code generator can be very similar... perhaps - o) NOTE that generation is per _ABI_, not per host arch! - the configure script must detect ABI!!! - o) branches to already translated code blocks can - link the blocks together - o) load/store are the most important to optimize - -Simple Valgrind-like checks? - o) Mark every address with bits which tell whether or not the address - has been written to. - o) What should happen when programs are loaded? Text/data, bss (zero - filled). But stack space and heap is uninitialized. - o) Uninitialized local variables: - A load from a place on the stack which has not previously - been stored to => warning. Increasing the stack pointer using - any available means should reset the memory to uninitialized. - o) If calls to malloc() and free() can be intercepted: - o) Access to a memory area after free() => warning. - o) Memory returned by malloc() is marked as not-initialized. - o) Non-passive, but good to have: Change the argument - given to malloc, to return a slightly larger memory - area, i.e. margin_before + size + margin_after, - and return the pointer + margin_before. - Any access to the margin_before or _after space results - in warnings. (free() must be modified to free the - actually allocated address.) +M88K: + o) Neither NIP nor FIP valid in rte? + o) FIP != NIP + 4, in rte! (Simulate delayed branch stuff.) + o) cpu_dyntrans.c: MEMORY_USER_ACCESS implementation for M88K! + o) xmem: Set transaction registers! + o) CMMUs: + o) Translation invalidations, could be optimized. + o) Move initialization from dev_mvme187 to somewhere + more reasonable? + o) Instruction trace by using bits of ??IP control regs. + o) Interrupts (these are machine dependent, though). + o) Implement devices etc. for one or more machine modes, + to get some guest OS running. OpenBSD/mvme88k on MVME187 + seems to be the smartest path to follow for now. + o) VME bus device + o) PCC2 + o) Cirrus Logic serial port controller + o) Instruction disassembly, and implementation: + o) See http://www.panggih.staff.ugm.ac.id/download/GCC/info/gcc.i5 + for some strange cases of when "div" can fail (?) + o) Floating point stuff + o) "Graphics" instructions (M88110-specific) MIPS: o) Nicer MIPS status bits in register dumps. - o) Alignment exceptions. o) Floating point exception correctness. o) Fix this? Triggered by NetBSD/sgimips? Hm: to_be_translated(): TODO: unimplemented instruction: @@ -87,19 +35,21 @@ o) Some more work on opcodes. x) MIPS64 revision 2. o) Find out which actual CPUs implement the rev2 ISA! + o) DINS, DINSM, DINSU etc o) DROTR32 and similar MIPS64 rev 2 instructions, which have a rotation bit which differs from previous ISAs. - o) EI and DI instructions for MIPS64/32 rev 2. - NOTE: These are _NOT_ the same as for R5900! x) _MAYBE_ TX79 and R5900 actually differ in their opcodes? Check this carefully! o) Dyntrans: Count register updates are probably not 100% correct yet. o) Refactor code for performance and readability/maintainability. o) (Re)implement 128-bit loads/stores for R5900. + o) Coprocessor 1x (i.e. 3) should cause cp1 exceptions, not 3? + (See http://lists.gnu.org/archive/html/qemu-devel/2007-05/msg00005.html) o) R4000 and others: x) watchhi/watchlo exceptions, and other exception handling details + o) MIPS 5K* have 42 physical address bits, not 40/44? o) R10000 and others: (R12000, R14000 ?) x) The code before the line /* reg[COP0_PAGEMASK] = cpu->cd.mips.coproc[0]->tlbs[0].mask & PAGEMASK_MASK; */ @@ -114,28 +64,33 @@ (http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_284.html) SuperH: + x) Auto-generation of loads/stores! This should get rid of at least + the endianness check in each load/store. + x) Experiment with whether or not correct ITLB emulation is + actually needed. (20070522: I'm turning it off today.) x) SH4 interrupt controller: x) MASKING should be possible! x) SH4 DMA (0xffa00000) x) SH4 UBC (0xff200000) - x) SH4 timers are going too fast! x) Store queues can copy 32 bytes at a time, there's no need to copy individual 32-bit words. (Performance improvement.) x) SH4 BSC (Bus State Controller) x) Instruction tracing should include symbols for branch targets, and so on, to make the output more human readable. x) SH3-specific devices: Pretty much everything! - x) NetBSD/evbsh3, mmeye, hpcsh! Linux? - x) Replace pc-relative loads with immediate load, if within the - same page. (Similar to the same optimization for ARM.) + x) NetBSD/evbsh3, hpcsh! Linux? x) Floating point speed! x) Floating point exception correctness. + x) NetBSD HEAD (as of April 2007) hangs during bootup, because it + turns on/off interrupts in an unfortunately synchronized way + with dyntrans. This needs to be fixed. + x) Exceptions for unaligned load/stores. OpenBSD/landisk uses + this mechanism for its reboot code (machine_reset). x) Think carefully about how to implement SH5/SH64 (for evbsh5). Landisk SH4: - x) When NetBSD/landisk 4.0 and OpenBSD/landisk 4.1 have been - released, test to see if they work. (If so, update documentation, - guestos + index, and set stable=1 in machine_landisk.c.) + x) When NetBSD/landisk 4.0 has been released, make sure it works + in the emulator. (Update documentation, etc.) Dreamcast: x) G2 DMA @@ -180,23 +135,6 @@ o) SPARC v8, v7 etc? o) More machine modes and devices. -Debugger: - o) How does SMP debugging work? Does it simply use "threads"? - What if the guest OS (running on an emulated SMP machine) - has a usertask running, with userland threads? - o) Try to make the debugger more modular and, if possible, reentrant! - o) Remove the emul command? (But show network info if showing - machines?) - o) Evaluate expressions within []? That would allow stuff like - cpu[x] where x is an expression. - o) Settings: - x) Special handlers for Write! - +) MIPS coproc regs - +) Alpha/MIPS/SPARC zero registers - +) x86 64/32/16-bit registers - x) Value formatter for resulting output. - o) see src/debugger.c for more - POWER/PowerPC: x) Fix DECR timer speed, so it matches the host. x) NetBSD/prep 3.x triggers a possible bug in the emulator: @@ -238,20 +176,13 @@ x) Mouse/pad support! :) x) A NIC? (As a PCMCIA device?) -AVR: - o) Everything. - ARM: o) See netwinder_reset() in NetBSD; the current "an internal error occured" message after reboot/halt is too ugly. - o) ARM "wait"-like instruction? + o) Generic ARM "wait"-like instruction? o) try to get netbsd/evbarm 3.x or 4.x running (iq80321) o) make the xscale counter registers (ccnt) work o) make the ata controller usable for FreeBSD! - o) Zaurus emulation: - x) OpenBSD/zaurus - x) NetBSD/zaurus? See the following URL: - http://mail-index.netbsd.org/port-arm/2006/11/19/0000.html o) Debian/cats crashes because of unimplemented coproc stuff. fix this? @@ -266,6 +197,139 @@ halt(); } +Debugger: + o) How does SMP debugging work? Does it simply use "threads"? + What if the guest OS (running on an emulated SMP machine) + has a usertask running, with userland threads? + o) Try to make the debugger more modular and, if possible, reentrant! + o) Remove the emul command? (But show network info if showing + machines?) + o) Memory dumps should be able to dump both physical and + virtual emulated memory. + o) Evaluate expressions within []? That would allow stuff like + cpu[x] where x is an expression. + o) "pc = pc + 4" doesn't work! Bug. Should work. ("pc=pc+4" works.) + o) Settings: + x) Special handlers for Write! + +) MIPS coproc regs + +) Alpha/MIPS/SPARC zero registers + +) x86 64/32/16-bit registers + x) Value formatter for resulting output. + o) Call stack display (back-trace) of emulated programs. + o) Nicer looking output of register dumps, floating point registers, + etc. Warn about weird/invalid register contents. + o) Ctrl-C doesn't enter the debugger on some OSes (HP-UX?)... + +Dyntrans: + x) For 32-bit emulation modes, that have emulated TLBs: tlbindex + arrays of mapped pages? Things to think about: + x) Only 32-bit mode! (64-bit => too much code) + x) One array for global pages, and one array _PER ASID_, + for those archs that support that. On M88K, there should + be one array for userspace, and one for supervisor, etc. + x) Larger-than-4K-pages must fill several bits in the array. + x) No TLB search will be necessary. + x) Total host space used, for 4 KB pages: 1 MB per table, + i.e. 65 MB for 32-bit MIPS, 2 MB for M88K, if one byte + is used as the tlb index. + x) (The index is actually +1, so that 0 means no hit.) + x) "Merge" the cur_physpage and cur_ic_page variables/pointers to + one? I.e. change cur_ic_page to cur_physpage.ic_page or something. + x) Instruction combination collisions? How to avoid easily... + x) Think about how to do both SHmedia and SHcompact in a reasonable + way! (Or AMD64 long/protected/real, for that matter.) + x) 68K emulation; think about how to do variable instruction + lengths across page boundaries. + x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, + it would be reasonably simple to add; in each individual fast + load/store routine = a lot more work, and it would become + kludgy very fast.) + x) Dyntrans with SMP... lots of work to be done here. + x) Dyntrans with cache emulation... lots of work here as well. + x) Remove the concept of base RAM completely; it would be more + generic to allow RAM devices to be used "anywhere". + o) dev_mp doesn't work well with dyntrans yet + o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans + x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, + so that it caches a translation (that is, an instruction + word and the instr_call it was translated to the last + time), so that it doesn't need to do slow + to_be_translated for each end of page? + x) Program Counter statistics: + Per machine? What about SMP? All data to the same file? + A debugger command should be possible to use to enable/ + disable statistics gathering. + Configuration file option! + x) Breakpoints: + o) Physical vs virtual addresses! + o) 32-bit vs 64-bit sign extension for MIPS, and others? + x) INVALIDATION should cause translations in _all_ cpus to be + invalidated, e.g. on a write to a write-protected page + (containing code) + x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) + x) Lots of other stuff: see src/cpus/README_DYNTRANS + x) Native code generation backends: + o) calculate at runtime whether or not chunks of emulated + (physical) memory are worth translating to native code + (it is assumed that it has high overhead) + o) experiment with calling the host's cc and ld externally; + extremely high overhead, but could be interesting none- + theless. + o) experiment with using LLVM, or GNU Lightning? + o) Important cases to think about: + x) loads/stores + x) delay branches + x) other kinds of calls, branches + o) branches to already translated code blocks can + link the blocks together (block-chaining), although + I'll probably want to wait with this until other + things work. + o) The first tests should be done with "testm88k", because + that does not affect other modes. + +------------------------------------------------------------------------------- + +Performance comparison when emulating the QEMU_MIPS machine (QEMU's default +MIPS machine mode): + +mips-test-0.2: +-------------- + +1. while true; do ls -l > /dev/null; echo -n .; done, 80x36 dots +2. while true; do /usr/bin/md5sum /usr/bin/* > /dev/null; echo -n .; done, 80 dots +3. while true; do grep hej lib/libc.so.6 > /dev/null; echo -n .; done, 80 dots + + Test 1 Test 2 Test 3 + ------ ------ ------ +QEMU 0.9.0: 2 min 20 sec 45 sec 4 min 41 seconds +GXemul-20070608: 1 min 59 sec 3 min 18 sec 18 min 10 seconds [A] + + +[A] = Normal portable dyntrans, no native code generation. + +------------------------------------------------------------------------------- + + +Simple Valgrind-like checks? + o) Mark every address with bits which tell whether or not the address + has been written to. + o) What should happen when programs are loaded? Text/data, bss (zero + filled). But stack space and heap is uninitialized. + o) Uninitialized local variables: + A load from a place on the stack which has not previously + been stored to => warning. Increasing the stack pointer using + any available means should reset the memory to uninitialized. + o) If calls to malloc() and free() can be intercepted: + o) Access to a memory area after free() => warning. + o) Memory returned by malloc() is marked as not-initialized. + o) Non-passive, but good to have: Change the argument + given to malloc, to return a slightly larger memory + area, i.e. margin_before + size + margin_after, + and return the pointer + margin_before. + Any access to the margin_before or _after space results + in warnings. (free() must be modified to free the + actually allocated address.) + Better CD Image file support: x) Support CD formats that contain more than 1 track, e.g. CDI files (?). These can then contain a mixture of e.g. sound @@ -316,6 +380,7 @@ is another option (easier to implement, but very very slow). Documentation: + x) Update the documentation regarding the testmachine interrupts. x) Note about sandboxing/security: Not all emulated instructions fail in the way they would do on real hardware (e.g. a userspace program writing to @@ -349,14 +414,6 @@ that use 3MAX into using CATS or hpcmips? (To remove the need to use a raw ffs partition, using up all of the disk image.) -More generic out_of_memory error reporting, and check everywhere! - Causes: OpenBSD has low default limits for normal users. - Host is 32-bit? (32-bit hosts are limited to 4 GB or less - of userspace memory.) - You are actually low on RAM. (As trivial as this might sound, - Unix systems usually allow processes to allocate virtual - memory beyond the amount of RAM in the machine.) - The Device subsystem: x) allow devices to be moved and/or changed in size (down to a minimum size, etc, or up to a max size); if there is a collision, @@ -390,8 +447,8 @@ Clocks and timers: x) Fix the PowerPC DECR interrupt speed! (MacPPC and PReP speed, etc.) x) DON'T HARDCODE 100 HZ IN cpu_mips_coproc.c! - x) Test the 8253? Right now it doesn't seem to be used? - x) NetWinder timeofday is incorrect! + x) NetWinder timeofday is incorrect! Huh? grep -R for ta_rtc_read in + NetBSD sources; it doesn't seem to be initialized _AT ALL_?! x) Cobalt TOD is incorrect! x) Go through all other machines, one by one, and fix them. @@ -410,13 +467,18 @@ o) non-IEEE modes (i.e. x86)? Userland emulation: + x) Try to prefix "/emul/mips/" or similar to all filenames, + and only if that fails, try the given filename. + Read this setting from an environment variable, and only + if there is none, fall back to hardcoded string. + x) File descriptor (0,1,2) assumptions? Find and fix these? x) Dynamic linking! x) Lots of stuff; freebsd, netbsd, linux, ... syscalls. x) Initial register/stack contents (environment, command line args). x) Return value (from main). x) mmap emulation layer x) errno emulation layer - x) struct conversions for may syscalls + x) struct conversions for many syscalls Sound: x) generic sound framework @@ -488,3 +550,5 @@ o) Generalize the framebuffer stuff by moving _ALL_ X11 specific code to src/x11.c! +------------------------------------------------------------------------------- +