Using gprof, I found that:
- The image translation from RISC OS Sprite to screen mode was unoptimised (i.e. making two unoptimised function calls per pixel). This, I improved, but it only made a small overall difference since it occurs after the rendering work is done in the emulator.
- gprof is useless for following what's going on in the emulator generated code.
With that six or seven line code change, the situation is now that the BeagleBoard renders the celtic_knot3 file in a little over 37s, down from 85s.
|x86 PC: 14s, Beagle Board: 37s|
The branch fixup code currently avoids having to clear the code cache by calling the fixup code using LDR pc, [pc, #...], having stored the code address in a scratch register. That way, only the word loaded by that instruction has to be changed to point to the generated code instead of the fixup routine, and the next time the instruction is reached, the fixup routine will be bypassed (although the setup for the call remains).
Further possible improvements to be tried:
- Modify the first instruction of the fixed up code to be a proper branch to the address, as well as the current change; if the code falls out of the code cache by itself, the next time the code is run it will be quicker.
- Clear the ARM code cache explicitly, so that the faster code will be called straight away. This may be slower, due to the overhead of a system call the first time the branch occurs.
If you compile with -DSTANDALONE, it will create an executable that takes the name of a file that should contain ARM instructions, and run them (only really useful with gdb, to see what's going on).
The handling of unaligned memory accesses is still incorrect (except on my custom kernel, which fixes up the accesses in the old fashioned way).
Update: http://ro-lf.svn.sourceforge.net/viewvc/ro-lf/ROLF/rolf/Libs/Compatibility?view=tar downloads the whole ROLF compatibility library, including the include files and disassembly code. (I can't check this at the moment, but...) The following should generate an executable on an ARM system:
tar xf ro-lf-Compatibility.tar.gz cd Libs/Compatibility/ # Possibly other subdirectory touch config.h # Usually generated by the ROLF configure routine gcc -o standalone_emulator arm_arm_emulator.c arm_d*.c -DARCH_ARM -DSTANDALONE -DDISASSEMBLE -Iincludes -I.