Tuesday, July 17, 2012

Better timings

Since my last post, I've had a look at the ARM-on-ARM emulator with a view to improving the speed.

Using gprof, I found that:
  1. The image translation from RISC OS Sprite to screen mode was unoptimised (i.e. making two unoptimised function calls per pixel).  This, I improved, but it only made a small overall difference since it occurs after the rendering work is done in the emulator.
  2. gprof is useless for following what's going on in the emulator generated code.
Building the emulator to dump the state after every emulated instruction, then with a bit of fiddling with grep, sed, sort and uniq, meant that I could find the instructions most used by the renderer (about 10000 times each rendering the ACORN file).  I noticed that one branch condition (actually, the case where a conditional branch instruction is not taken) always did a lot of work (hash table lookup, etc.) but that the existing branch fixup code could be used to improve it.

With that six or seven line code change, the situation is now that the BeagleBoard renders the celtic_knot3 file in a little over 37s, down from 85s.
x86 PC: 14s, Beagle Board: 37s
My RISC PC is not cooperating at the moment, but the same file takes about 55s to render on its 200MHz StrongARM.

The branch fixup code currently avoids having to clear the code cache by calling the fixup code using LDR pc, [pc, #...], having stored the code address in a scratch register.  That way, only the word loaded by that instruction has to be changed to point to the generated code instead of the fixup routine, and the next time the instruction is reached, the fixup routine will be bypassed (although the setup for the call remains).

Further possible improvements to be tried:
  1. Modify the first instruction of the fixed up code to be a proper branch to the address, as well as the current change; if the code falls out of the code cache by itself, the next time the code is run it will be quicker.
  2. Clear the ARM code cache explicitly, so that the faster code will be called straight away.  This may be slower, due to the overhead of a system call the first time the branch occurs.
The ARM-on-ARM emulator code is available on the SourceForge ROLF project site, at: http://ro-lf.svn.sourceforge.net/viewvc/ro-lf/ROLF/rolf/Libs/Compatibility/arm_arm_emulator.c?view=log

If you compile with -DSTANDALONE, it will create an executable that takes the name of a file that should contain ARM instructions, and run them (only really useful with gdb, to see what's going on).

The handling of unaligned memory accesses is still incorrect (except on my custom kernel, which fixes up the accesses in the old fashioned way).

Update: http://ro-lf.svn.sourceforge.net/viewvc/ro-lf/ROLF/rolf/Libs/Compatibility?view=tar downloads the whole ROLF compatibility library, including the include files and disassembly code.  (I can't check this at the moment, but...) The following should generate an executable on an ARM system:

tar xf ro-lf-Compatibility.tar.gz
cd Libs/Compatibility/  # Possibly other subdirectory
touch config.h  # Usually generated by the ROLF configure routine
gcc -o standalone_emulator arm_arm_emulator.c arm_d*.c -DARCH_ARM -DSTANDALONE -DDISASSEMBLE -Iincludes -I.

2 Comments:

Anonymous Andy said...

Hi Simon!
I've just found Rolf on S/F - looks very nice!

I was just wondering...
I'm a real "public-domain weenie" (fan of PD software)so just thought I'd ask - would you consider releasing Rolf as "publc domain" (or CC0)? (Hope you're not offended by my asking - I mean no offense).

It's just that there are *so many* GPL GUI apps "out there", and the "public-domain world" could **really** do with a beautiful app like this!

It would also make Rolf **really stand out** from the rest of the pack of apps like this.

I'm not much of a coder myself, but I like to support PD software however I can (and I bought a book about a recently-released PD subset-of-C compiler
to show my support for the app author).

Thanks for your time.
I imagine the answer will probably be "no", but just thought I'd ask.... :)
Keep up the great work! Bye for now -
- Andy
(andy dot elvey at paradise dot net dot nz)

1:48 am, August 26, 2012  
Blogger Simon said...

Hi Andy,

Thanks for the nice comments about the ROLF. (Have you actually tried to use it?)

I must admit I don't understand the question (I'm working on finding out about the CC license and PD). Is there something about the GPL that stops you (or anyone else) from building a system and releasing it as PD? The source is less than a megabyte, and you could (as far as I know) just include a link to SourceForge where the source is? (e.g. http://ro-lf.svn.sourceforge.net/viewvc/ro-lf/ROLF/rolf/?view=tar&pathrev=236). What would you like to include in it?

tl;dr: Not no, just why?

11:33 am, August 28, 2012  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home