Monday, May 31, 2010

ArtWorks Renderer

The ARM emulator is now working with 26-bit code as well as 32-bit (although it would be a bad idea to mix them in the same process) and is capable of executing the free renderer for ArtWorks files. Unfortunately, it seems to be a little slower than the pure 32-bit version was but there's plenty of room for optimisation.

The Viewer window title bar shows the time taken to render the file (actually, the whole ArtWorks file is rendered and copied into a bitmap that is displayed by the Viewer application). The computer is a dual core with about 5000 BogoMIPs per core (the emulator only uses one core, of course). No modules from RISC OS are needed to run Viewer with AWRender.

Saturday, May 29, 2010

Building on and for Knoppix

The Knoppix Live CD ( is an excellent way to try out Linux on any PC with a CD-ROM drive. It won't store anything on your PC (unless you ask it to) and doesn't mess up any existing installations of Linux or Windows.

In this article, I'm going to describe how to build ROLF to run on Linux assuming a PC booted from an unmodified Knoppix 6.2 CD.

When done, you should have a working ROLF desktop being displayed in a window, able to run a few native ROLF programs and even some RISC OS programs, in a limited way. Simply by copying the directory into which you've installed ROLF to some permanent storage (a hard disc or a memory stick, for example), you can use the build again without having to go through the steps I describe.

Set up a build environment

Knoppix, unfortunately, doesn't include all the tools needed to build ROLF, so we have to get them from Debian first. (If your Linux system already includes these tools, you can obviously skip this step.)

First, get the full list of packages that are available:

sudo apt-get update

Then update the compiler to include the C++ compiler, needed to build the Server part of ROLF (the libraries are all plain C, but to build them, you need libtool).

sudo apt-get install build-essentials
sudo apt-get install subversion
sudo apt-get install libtool
sudo apt-get install pkg-config

After that has been done, the compiler will still refuse to compile C++ programs because it can't find cc1plus, for some reason. The simplest way to fix that is to add the path to the (just installed) cc1plus file installed to the PATH environment variable.

export PATH=$PATH:$(dirname $(find /ramdisk/ -name cc1plus 2>/dev/null | head -n 1))

Now, to get the source code:

svn co ro-lf

(If you don't expect to make any changes to the code, you can instead use:
svn export

Knoppix (or Debian) have all their shared libraries taged by their version number; to make the files linkable, we need to do this:

mkdir lib
ln -s /usr/lib/ lib/
ln -s /usr/lib/ lib/
ln -s /usr/lib/ lib/
ln -s /usr/lib/ lib/
ln -s /usr/lib/ lib/
ln -s /usr/lib/ lib/
sed -i 's/CFLAGS.*$/& -L$(CFGDIR)/lib' config.mak

Similarly, the build requires a set of include files [This bit needs an update!]

Specifically, the files needed are:

From zlib:
zlib.h, zconf.h

From libpng:
png.h, pngconf.h

From file:

From libjpeg
jpeglib.h, jmorecfg.h, jconfig.h

From freetype 2:

sed -i 's/CFLAGS.*$/& -I$(CFGDIR)/my_includes' config.mak

Change directory into the downloaded code and run configure (not as complicated as a normal configure script) and build ROLF:

cd ro-lf/ROLF/rolf
./configure --appsdir=$HOME/Apps --prefix=$HOME --tarballs=`pwd`/tarballs/

Assuming all goes well with the build, the next step is to collect the resources needed in one place.

You will need:

  • Fonts (at least FreeSerif.ttf; the default font)
  • Tool and Icon sprites (I use Chris Wraight's Steel theme, from here:
  • A mimemap.txt file (if you're going to use RISC OS software)
  • and a !Boot file, to load them all (like the one below)

Create a resources directory with subdirectories:

mkdir -p Resources/fonts/{FreeSerif,FreeSans,FreeMono}
ln -s /usr/share/file/ Resources/
cd Resources/fonts
for i in *; do for f in $( find /KNOPPIX/ -name $i*.ttf ); do ln -s $f $i/ ; done ; done 2>/dev/null
# This one is needed for the Terminal application:
cp `find /KNOPPIX/ -name default8x16.psf* 2> /dev/null` .
cd ..
cd ..
cat > \!Boot <<"EOF"
IconSprites $ROLF_RESOURCES/Steel/Tools
IconSprites $ROLF_RESOURCES/Steel/Icons

LoadPointer $ROLF_RESOURCES/Steel/Icons ptr_default 0 21
LoadPointer $ROLF_RESOURCES/Steel/Icons ptr_double 0 21

# Match up icons with the appropriate mime types
if [ -f $ROLF_RESOURCES/mimemap.txt ] ; then ( grep -v ^# $ROLF_RESOURCES/mimemap.txt | tr -s '\t' | cut -f 1,3 | sort -k 2 | sed 's/\(.*\)\t\(.*\)$/IconSprite file_\2 "file_\1"/' | sh ) ; fi

# Load applications
Filer Icon $HOME/Apps 50 romapps Apps
Filer Icon /tmp 40 ramfs ramfs
Filer Icon $HOME 45 homedisc Home

Now, we just have to run it!

export LD_LIBRARY_PATH=`pwd`/lib
export PATH=$PATH:`pwd`/bin

#Uncomment the next line if you need a log of what's happened (useful for debugging)
# ROLF_WIMP_LOG=/tmp/rolf_wimp_log \
ROLF_RESOURCES=`pwd`/Resources \

That should have the effect of printing:
VNC Server waiting for connections on port 2008.
New file descriptor 3
New Opaque bitmap VNCFrameBuffer, 1024x768

In another terminal window, type:

vncviewer localhost:2008

... and a colourful window with an icon bar and a couple of drive icons should appear!

I will follow this up with some more postings about running RISC OS software on ROLF, as well as implementations of NetSurf and MPlayer.

Saturday, May 22, 2010

Emulator speedups

The other day I noticed that it took AWRender nearly fifty seconds to render the file celtic_knot3 from here, and I thought that was too long, so I decided to get around to speeding up the emulator. (Had I first tried it on my SARPC, I'd have found that it took 55 seconds on there, so it wasn't really too slow.)

The first step was to disable all the debug output from the compatibility library; that halved the rendering time to a shade over 24s. I found that rather disappointing; I was expecting the debug output to take up at least two-thirds of the time.

I turned on optimisation in the library compilation, -O4 reduced the render time to under 18s.

Two optimisation suggestions from Jake Waskett were to ensure that jump targets were on 16-byte boundaries and to fix up calls to scan from a fixed location and jump to the returned call so that on the second attempt, there wouldn't be a relatively expensive scan call. The former shaved about 0.1s off the render time, but the latter gave a significant improvement, taking the render time down to just over 15s.

During all this, I noticed the SETcc operations and realised that I could use SETO %al ; LAHF to get all the necessary x86 flags into %ax (previously, I'd been using pushf/popf). Of course, when I googled for that combination I found a description of someone writing an ARM JIT compiler using pushf, with a comment recommending the seto/lahf combination; it's all about knowing what to look for! Anyway, once the flags are in %ax, it's fairly easy to get the four flags we want into the bottom nibble of %al by rotating %ah, masking and or'ing with %al: ( ror $3, %ah ; and $0xe, %ah ; or %ah, %al ). The flags aren't in the same order used by ARM, but a 16-byte lookup table can translate between the two in one instruction.

Obviously, there are more flag reads than flag writes (there's no point in setting the flags if they aren't going to be read at least once), so I added a new global variable to be set at the same time the flags were which contains a 16 entry bitmap, one for each condition code (EQ, NE, GT, etc.), so that a conditional instruction just has to test a known bit in a known variable and use the ZF to behave appropriately.

That change was fairly major (but only affected three files), and takes the render time down to 13 seconds.

Next thing to try was to eliminate the extra code for each load or store that checks for non-aligned accesses. The idea is to set the x86 flag that causes a SIGBUS signal to be generated for unaligned accesses and load the registers as necessary before moving on to the next instruction. Unaligned accesses in ARM code will probably be relatively rare and the speedup in the normal memory accesses should more than make up for the slower signal handling. Since the only routines called from emulated ARM code are scan_arm_code and (when debugging is enabled) dump_regs), those routines would reset the flag on entry and restore its state on exit.

That "optimisation" slowed the render time down to 15s again.

Since the only unaligned access from scan_arm_code is likely to be when setting a 32-bit constant in an instruction, I stopped manipulating the flag in scan_arm_code and tried modifying cache_32bit to write its four bytes one at a time, instead, and the time improved again to a little over 12s. However, the code I was testing didn't include any unaligned accesses, and since I hadn't written the signal handler anyway, I've decided to call it a day for the time being and leave that optimisation out.

Future optimisation possibilities:
finish the SIGBUS solution
Improve the hash table lookup
Use mov $constant,arm_emulator_regs[n] for constant loads into registers
Combine consecutive ARM instructions that load a constant into a register
Remember if the flags (or a register's contents) are stored in a register from last time.

All of these things have a chance of making the scan_arm_code routine slower and negating their speed improvements, but they're probably worth a try.

The other thing to do is to profile the ARM code somewhat by generating code to increment counters when, for example, flags are set, flags are read, scan_arm_code and get_hash_entry are called, etc. At the moment, I notice that a sequence of ARM instructions leading up to a decision point (conditional jump, swi, etc.) is rarely much more than ten instructions.

The instruction emulator file stands at 1761 lines (55158 bytes).