--- trunk/TODO 2007/10/08 16:21:17 34 +++ trunk/TODO 2007/10/08 16:22:32 42 @@ -1,112 +1,33 @@ -$Id: TODO,v 1.453 2007/02/19 01:34:42 debug Exp $ +$Id: TODO,v 1.536 2007/06/15 22:30:17 debug Exp $ -------------------------------------------------------------------------------- - -Fix after the 0.4.4 release: - Fix the PowerPC DECR interrupt speed! - -------------------------------------------------------------------------------- - -Possible (relatively large) work packages to concentrate on in the future: - - x) SMP: - Get SMP working again. It is pretty much broken since I started - the conversion from the old bintrans system to the new dyntrans system. - Add better Test machine demos for SMP in the demos directory. - - x) Network: - Redesign of the networking subsystem, at least the NAT translation - part. The current way of allowing raw ethernet frames to be - transfered to/from the emulator via UDP should probably be extended - to allow the frames to be transmitted other ways as well. - Also adding support for connecting ttys (either to xterms, or to - pipes/sockets etc, or even to PPP->NAT or SLIP->NAT :-). - - x) PCI: - Pretty much everything related to runtime configuration, device - slots, interrupts, whatever. The current code is very hardcoded - and ugly. - - x) Debugging: - Think more about SMP debugging, etc. Right now, the - debugger is a mess. Also, a better connection to GDB would be - very nice to have. +Some things, in no specific order, that I'd like to fix: +(Some items in this list are perhaps already fixed.) - x) Userland emulation: - Primary goals would be NetBSD and Linux syscall emulation. - -And of course, there are _LOTS_ of minor TODOs spread out throughout -the source code, which must be fixed sooner or later. - -------------------------------------------------------------------------------- - -Some other things, in random order, that I'd like to fix: (Some items in -this list are probably out-to-date by now.) - -Dyntrans: - x) Instruction combination collisions? How to avoid easily... - x) Think about how to do both SHmedia and SHcompact in a reasonable - way! (Or AMD64 long/protected/real, for that matter.) - x) 68K emulation; think about how to do variable instruction - lengths across page boundaries. - x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, - it would be reasonably simple to add; in each individual fast - load/store routine = a lot more work, and it would become - kludgy very fast.) - x) Dyntrans with SMP... lots of work to be done here. - x) Dyntrans with cache emulation... lots of work here as well. - o) dev_mp doesn't work well with dyntrans yet - o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans - x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, - so that it caches a translation (that is, an instruction - word and the instr_call it was translated to the last - time), so that it doesn't need to do slow - to_be_translated for each end of page? - x) Program Counter statistics: - Per machine? What about SMP? All data to the same file? - A debugger command should be possible to use to enable/ - disable statistics gathering. - Configuration file option! - x) Breakpoints: - o) Physical vs virtual addresses! - o) 32-bit vs 64-bit sign extension for MIPS, and others? - x) INVALIDATION should cause translations in _all_ cpus to be - invalidated, e.g. on a write to a write-protected page - (containing code) - x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) - x) Lots of other stuff: see src/cpus/README_DYNTRANS - x) true recompilation backend? think carefully about this. - o) abstract syntax for emitting opcopdes - o) convert into native code only after an entire - block has been translated? probably best. - o) x86/amd64 code generator can be very similar... perhaps - o) branches to already translated code blocks can - link the blocks together - o) load/store are the most important. - -Simple Valgrind-like checks? - o) Mark every address with bits which tell whether or not the address - has been written to. - o) What should happen when programs are loaded? Text/data, bss (zero - filled). But stack space and heap is uninitialized. - o) Uninitialized local variables: - A load from a place on the stack which has not previously - been stored to => warning. Increasing the stack pointer using - any available means should reset the memory to uninitialized. - o) If calls to malloc() and free() can be intercepted: - o) Access to a memory area after free() => warning. - o) Memory returned by malloc() is marked as not-initialized. - o) Non-passive, but good to have: Change the argument - given to malloc, to return a slightly larger memory - area, i.e. margin_before + size + margin_after, - and return the pointer + margin_before. - Any access to the margin_before or _after space results - in warnings. (free() must be modified to free the - actually allocated address.) +M88K: + o) Neither NIP nor FIP valid in rte? + o) FIP != NIP + 4, in rte! (Simulate delayed branch stuff.) + o) cpu_dyntrans.c: MEMORY_USER_ACCESS implementation for M88K! + o) xmem: Set transaction registers! + o) CMMUs: + o) Translation invalidations, could be optimized. + o) Move initialization from dev_mvme187 to somewhere + more reasonable? + o) Instruction trace by using bits of ??IP control regs. + o) Interrupts (these are machine dependent, though). + o) Implement devices etc. for one or more machine modes, + to get some guest OS running. OpenBSD/mvme88k on MVME187 + seems to be the smartest path to follow for now. + o) VME bus device + o) PCC2 + o) Cirrus Logic serial port controller + o) Instruction disassembly, and implementation: + o) See http://www.panggih.staff.ugm.ac.id/download/GCC/info/gcc.i5 + for some strange cases of when "div" can fail (?) + o) Floating point stuff + o) "Graphics" instructions (M88110-specific) MIPS: o) Nicer MIPS status bits in register dumps. - o) Alignment exceptions. o) Floating point exception correctness. o) Fix this? Triggered by NetBSD/sgimips? Hm: to_be_translated(): TODO: unimplemented instruction: @@ -114,19 +35,21 @@ o) Some more work on opcodes. x) MIPS64 revision 2. o) Find out which actual CPUs implement the rev2 ISA! + o) DINS, DINSM, DINSU etc o) DROTR32 and similar MIPS64 rev 2 instructions, which have a rotation bit which differs from previous ISAs. - o) EI and DI instructions for MIPS64/32 rev 2. - NOTE: These are _NOT_ the same as for R5900! x) _MAYBE_ TX79 and R5900 actually differ in their opcodes? Check this carefully! o) Dyntrans: Count register updates are probably not 100% correct yet. o) Refactor code for performance and readability/maintainability. o) (Re)implement 128-bit loads/stores for R5900. + o) Coprocessor 1x (i.e. 3) should cause cp1 exceptions, not 3? + (See http://lists.gnu.org/archive/html/qemu-devel/2007-05/msg00005.html) o) R4000 and others: x) watchhi/watchlo exceptions, and other exception handling details + o) MIPS 5K* have 42 physical address bits, not 40/44? o) R10000 and others: (R12000, R14000 ?) x) The code before the line /* reg[COP0_PAGEMASK] = cpu->cd.mips.coproc[0]->tlbs[0].mask & PAGEMASK_MASK; */ @@ -135,34 +58,45 @@ register definitions according to http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_263.html#HEADING334 and make sure everything works with R10000. Then test with OpenBSD/sgi? + x) Entry LO mask (as above). x) memory space, exceptions, ... x) use cop0 framemask for tlb lookups (http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_284.html) SuperH: + x) Auto-generation of loads/stores! This should get rid of at least + the endianness check in each load/store. + x) Experiment with whether or not correct ITLB emulation is + actually needed. (20070522: I'm turning it off today.) x) SH4 interrupt controller: - x) Implement correct priorities of interrupts + x) MASKING should be possible! x) SH4 DMA (0xffa00000) x) SH4 UBC (0xff200000) x) Store queues can copy 32 bytes at a time, there's no need to copy individual 32-bit words. (Performance improvement.) x) SH4 BSC (Bus State Controller) - x) SH4 RTC: Read the host's clock. - x) SH4 SCIF: Serial _interrupts_ x) Instruction tracing should include symbols for branch targets, and so on, to make the output more human readable. - x) NetBSD/evbsh3, dreamcast, mmeye, hpcsh! Linux? - x) Replace pc-relative loads with immediate load, if within the - same page. (Similar to the same optimization for ARM.) + x) SH3-specific devices: Pretty much everything! + x) NetBSD/evbsh3, hpcsh! Linux? x) Floating point speed! x) Floating point exception correctness. + x) NetBSD HEAD (as of April 2007) hangs during bootup, because it + turns on/off interrupts in an unfortunately synchronized way + with dyntrans. This needs to be fixed. + x) Exceptions for unaligned load/stores. OpenBSD/landisk uses + this mechanism for its reboot code (machine_reset). x) Think carefully about how to implement SH5/SH64 (for evbsh5). +Landisk SH4: + x) When NetBSD/landisk 4.0 has been released, make sure it works + in the emulator. (Update documentation, etc.) + Dreamcast: x) G2 DMA x) LAN adapter (dev_mb8696x.c). NetBSD root-on-nfs. x) PVR: Lots of stuff. See dev_pvr.c. - x) GDROM + x) Better GDROM support x) Modem x) PCI bridge/bus? x) Maple bus: @@ -172,7 +106,6 @@ x) GD-ROM emulation: Use the GDROM device. x) Use the VGA font as a fake ROM font. (Better than nothing.) - x) Linux/dreamcast? (The gentoo kernel currently crashes.) x) Make as many as possible of the KOS examples run! x) More homebrew demos/games. x) SPU: Sound emulation (ARM cpu). @@ -181,37 +114,6 @@ http://www.maushammer.com/vmu.html for a good description of the differences between LC86104C and the one used in the VME. -Transputer: - x) Implement support for Helios binaries. - x) Stack and register contents at startup? - x) Figure out how to boot an entire Helios distribution. - x) Implement all instructions. :) - -RCA1802/RCA1805, CHIP8: - x) CHIP8 -> RCA180x conversion - x) Think about how to do dual-mode, variable-instr-length - ISAs, and switch between modes. - x) 1805 "extended" opcode -> trigger CHIP8 emulation? - That is, all calls 0NNN could point to 0x68 opcodes, - which, if running on a 1802 in CHIP8-emulation-mode, - would be manually interpreted. - x) Better solution: - CHIP8 calls to 00xx => handle at high level, - calls to 0xxx in general = call 180X machine code - (0000 = reboot?) - x) 1802 info: http://www.nyx.net/~lturner/public_html/Cosmac.html - and: http://www.elf-emulation.com/1802.html - x) 1805 extended opcodes: Implement at least disassembly support! - x) Keyboard input. - x) Sound (beep only). - x) Slow-down to correct speed? Wikipedia: "it was usually operated - at 3.58 MHz/2 to suit the requirements of the 1861 chip which - gave a speed of a little over 100,000 instructions per second" - (Note that _CHIP8_ emulation would then be even slower.) - x) SCHIP48 (Super) emulation: - Some more opcodes, 128x64 framebuffer, larger - sprites and fonts. - Alpha: x) OSF1 PALcode, Virtual memory support. x) PALcode replacement! PAL1E etc opcodes...? @@ -221,34 +123,17 @@ x) More Alpha machine types, so it could work with OpenBSD, FreeBSD, and Linux too? -SPARC: +SPARC (both the ISA and the machines): o) Implement Adress space identifiers; load/stores etc. + o) Exception/trap/interrupt handling. o) Save/restore register windows etc! Both v9 and pre-v9! o) Finish the subcc and addcc flag computation code. o) Add more registers (floating point, control regs etc) - o) Exception/trap handling. o) Disassemly of some more instructions? o) Are sll etc 32-bit sign-extending or zero-extending? - o) Finish the GDB register stuff. - x) Floating point exception correctness. + o) Floating point exception correctness. o) SPARC v8, v7 etc? - -Debugger: - o) How does SMP debugging work? Does it simply use "threads"? - What if the guest OS (running on an emulated SMP machine) - has a usertask running, with userland threads? - o) Try to make the debugger more modular and, if possible, reentrant! - o) Remove the emul command? (But show network info if showing - machines?) - o) Evaluate expressions within []? That would allow stuff like - cpu[x] where x is an expression. - o) Settings: - x) Special handlers for Write! - +) MIPS coproc regs - +) Alpha/MIPS/SPARC zero registers - +) x86 64/32/16-bit registers - x) Value formatter for resulting output. - o) see src/debugger.c for more + o) More machine modes and devices. POWER/PowerPC: x) Fix DECR timer speed, so it matches the host. @@ -276,7 +161,7 @@ x) Alignment exceptions. PReP: - Clock time! ("Bad battery blah blah") + x) Clock time! ("Bad battery blah blah") Algor: o) Other models than the P5064? @@ -291,17 +176,13 @@ x) Mouse/pad support! :) x) A NIC? (As a PCMCIA device?) -AVR: - o) Everything. - ARM: o) See netwinder_reset() in NetBSD; the current "an internal error occured" message after reboot/halt is too ugly. - o) ARM "wait"-like instruction? + o) Generic ARM "wait"-like instruction? o) try to get netbsd/evbarm 3.x or 4.x running (iq80321) o) make the xscale counter registers (ccnt) work o) make the ata controller usable for FreeBSD! - o) Zaurus emulation, for e.g. OpenBSD/zaurus o) Debian/cats crashes because of unimplemented coproc stuff. fix this? @@ -316,6 +197,139 @@ halt(); } +Debugger: + o) How does SMP debugging work? Does it simply use "threads"? + What if the guest OS (running on an emulated SMP machine) + has a usertask running, with userland threads? + o) Try to make the debugger more modular and, if possible, reentrant! + o) Remove the emul command? (But show network info if showing + machines?) + o) Memory dumps should be able to dump both physical and + virtual emulated memory. + o) Evaluate expressions within []? That would allow stuff like + cpu[x] where x is an expression. + o) "pc = pc + 4" doesn't work! Bug. Should work. ("pc=pc+4" works.) + o) Settings: + x) Special handlers for Write! + +) MIPS coproc regs + +) Alpha/MIPS/SPARC zero registers + +) x86 64/32/16-bit registers + x) Value formatter for resulting output. + o) Call stack display (back-trace) of emulated programs. + o) Nicer looking output of register dumps, floating point registers, + etc. Warn about weird/invalid register contents. + o) Ctrl-C doesn't enter the debugger on some OSes (HP-UX?)... + +Dyntrans: + x) For 32-bit emulation modes, that have emulated TLBs: tlbindex + arrays of mapped pages? Things to think about: + x) Only 32-bit mode! (64-bit => too much code) + x) One array for global pages, and one array _PER ASID_, + for those archs that support that. On M88K, there should + be one array for userspace, and one for supervisor, etc. + x) Larger-than-4K-pages must fill several bits in the array. + x) No TLB search will be necessary. + x) Total host space used, for 4 KB pages: 1 MB per table, + i.e. 65 MB for 32-bit MIPS, 2 MB for M88K, if one byte + is used as the tlb index. + x) (The index is actually +1, so that 0 means no hit.) + x) "Merge" the cur_physpage and cur_ic_page variables/pointers to + one? I.e. change cur_ic_page to cur_physpage.ic_page or something. + x) Instruction combination collisions? How to avoid easily... + x) Think about how to do both SHmedia and SHcompact in a reasonable + way! (Or AMD64 long/protected/real, for that matter.) + x) 68K emulation; think about how to do variable instruction + lengths across page boundaries. + x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, + it would be reasonably simple to add; in each individual fast + load/store routine = a lot more work, and it would become + kludgy very fast.) + x) Dyntrans with SMP... lots of work to be done here. + x) Dyntrans with cache emulation... lots of work here as well. + x) Remove the concept of base RAM completely; it would be more + generic to allow RAM devices to be used "anywhere". + o) dev_mp doesn't work well with dyntrans yet + o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans + x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, + so that it caches a translation (that is, an instruction + word and the instr_call it was translated to the last + time), so that it doesn't need to do slow + to_be_translated for each end of page? + x) Program Counter statistics: + Per machine? What about SMP? All data to the same file? + A debugger command should be possible to use to enable/ + disable statistics gathering. + Configuration file option! + x) Breakpoints: + o) Physical vs virtual addresses! + o) 32-bit vs 64-bit sign extension for MIPS, and others? + x) INVALIDATION should cause translations in _all_ cpus to be + invalidated, e.g. on a write to a write-protected page + (containing code) + x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) + x) Lots of other stuff: see src/cpus/README_DYNTRANS + x) Native code generation backends: + o) calculate at runtime whether or not chunks of emulated + (physical) memory are worth translating to native code + (it is assumed that it has high overhead) + o) experiment with calling the host's cc and ld externally; + extremely high overhead, but could be interesting none- + theless. + o) experiment with using LLVM, or GNU Lightning? + o) Important cases to think about: + x) loads/stores + x) delay branches + x) other kinds of calls, branches + o) branches to already translated code blocks can + link the blocks together (block-chaining), although + I'll probably want to wait with this until other + things work. + o) The first tests should be done with "testm88k", because + that does not affect other modes. + +------------------------------------------------------------------------------- + +Performance comparison when emulating the QEMU_MIPS machine (QEMU's default +MIPS machine mode): + +mips-test-0.2: +-------------- + +1. while true; do ls -l > /dev/null; echo -n .; done, 80x36 dots +2. while true; do /usr/bin/md5sum /usr/bin/* > /dev/null; echo -n .; done, 80 dots +3. while true; do grep hej lib/libc.so.6 > /dev/null; echo -n .; done, 80 dots + + Test 1 Test 2 Test 3 + ------ ------ ------ +QEMU 0.9.0: 2 min 20 sec 45 sec 4 min 41 seconds +GXemul-20070608: 1 min 59 sec 3 min 18 sec 18 min 10 seconds [A] + + +[A] = Normal portable dyntrans, no native code generation. + +------------------------------------------------------------------------------- + + +Simple Valgrind-like checks? + o) Mark every address with bits which tell whether or not the address + has been written to. + o) What should happen when programs are loaded? Text/data, bss (zero + filled). But stack space and heap is uninitialized. + o) Uninitialized local variables: + A load from a place on the stack which has not previously + been stored to => warning. Increasing the stack pointer using + any available means should reset the memory to uninitialized. + o) If calls to malloc() and free() can be intercepted: + o) Access to a memory area after free() => warning. + o) Memory returned by malloc() is marked as not-initialized. + o) Non-passive, but good to have: Change the argument + given to malloc, to return a slightly larger memory + area, i.e. margin_before + size + margin_after, + and return the pointer + margin_before. + Any access to the margin_before or _after space results + in warnings. (free() must be modified to free the + actually allocated address.) + Better CD Image file support: x) Support CD formats that contain more than 1 track, e.g. CDI files (?). These can then contain a mixture of e.g. sound @@ -325,6 +339,15 @@ possibly other live-CD formats.) Networking: + x) Redesign of the networking subsystem, at least the NAT translation + part. The current way of allowing raw ethernet frames to be + transfered to/from the emulator via UDP should probably be + extended to allow the frames to be transmitted other ways as + well. + x) Also adding support for connecting ttys (either to xterms, or to + pipes/sockets etc, or even to PPP->NAT or SLIP->NAT :-). + x) Documentation updates (!) are very important, making it easier to + use the (already existing) network emulation features. x) Fix performance problems caused by only allowing a single TCP packet to be unacked. x) Don't hardcode offsets into packets! @@ -357,6 +380,7 @@ is another option (easier to implement, but very very slow). Documentation: + x) Update the documentation regarding the testmachine interrupts. x) Note about sandboxing/security: Not all emulated instructions fail in the way they would do on real hardware (e.g. a userspace program writing to @@ -390,14 +414,6 @@ that use 3MAX into using CATS or hpcmips? (To remove the need to use a raw ffs partition, using up all of the disk image.) -More generic out_of_memory error reporting, and check everywhere! - Causes: OpenBSD has low default limits for normal users. - Host is 32-bit? (32-bit hosts are limited to 4 GB or less - of userspace memory.) - You are actually low on RAM. (As trivial as this might sound, - Unix systems usually allow processes to allocate virtual - memory beyond the amount of RAM in the machine.) - The Device subsystem: x) allow devices to be moved and/or changed in size (down to a minimum size, etc, or up to a max size); if there is a collision, @@ -412,6 +428,16 @@ x) refactor various clocks/nvram/cmos into one device? PCI: + x) Pretty much everything related to runtime configuration, device + slots, interrupts, etc must be redesigned/cleaned up. The current + code is very hardcoded and ugly. + o) Allow cards to be added/removed during runtime more easily. + o) Allow cards to be enabled/disabled (i/o ports, etc, like + NetBSD needs for disk controller detection). + o) Allow devices to be moved in memory during runtime. + o) Interrupts per PCI slot, etc. (A-D). + o) PCI interrupt controller logic... very hard to get right, + because these differ a lot from one machine to the next. x) last write was ffffffff ==> fix this, it should be used together with a mask to get the correct bits. also, not ALL bits are size bits! (lowest 4 vs lowest 2?) @@ -419,9 +445,10 @@ x) generalize the interrupt routing stuff (lines etc) Clocks and timers: + x) Fix the PowerPC DECR interrupt speed! (MacPPC and PReP speed, etc.) x) DON'T HARDCODE 100 HZ IN cpu_mips_coproc.c! - x) Test the 8253? Right now it doesn't seem to be used? - x) NetWinder timeofday is incorrect! + x) NetWinder timeofday is incorrect! Huh? grep -R for ta_rtc_read in + NetBSD sources; it doesn't seem to be initialized _AT ALL_?! x) Cobalt TOD is incorrect! x) Go through all other machines, one by one, and fix them. @@ -440,8 +467,18 @@ o) non-IEEE modes (i.e. x86)? Userland emulation: - x) Lots of stuff; freebsd and netbsd (and linux?) syscalls. - x) Dynamic linking? Hm. + x) Try to prefix "/emul/mips/" or similar to all filenames, + and only if that fails, try the given filename. + Read this setting from an environment variable, and only + if there is none, fall back to hardcoded string. + x) File descriptor (0,1,2) assumptions? Find and fix these? + x) Dynamic linking! + x) Lots of stuff; freebsd, netbsd, linux, ... syscalls. + x) Initial register/stack contents (environment, command line args). + x) Return value (from main). + x) mmap emulation layer + x) errno emulation layer + x) struct conversions for many syscalls Sound: x) generic sound framework @@ -513,3 +550,5 @@ o) Generalize the framebuffer stuff by moving _ALL_ X11 specific code to src/x11.c! +------------------------------------------------------------------------------- +