Thursday, October 06, 2011

ARM Emulation on ARM

ROLF has a compatibility layer intended to allow execution of RISC OS applications. On x86 platforms, this includes an emulator for the ARM instruction set. Recent experience with trying to simply use the ARM processor on ARM platforms ended in failure. I'm trying a different approach to work around that problem, which should also remove the need for Linux kernel changes and allowing the execution of both 32-bit and 26-bit programs.

The emulator interface will be the same as the emulator on the emulator that generates x86 code, but takes a different approach to code generation due to the similarity of emulated and host processors.

Basically, code generated by the emulator is isolated from the Linux code by a code fragment that stores and replaces Linux register values with the emulator's register values on entry and does the opposite on exit. Most instructions can then simply be copied verbatim from the emulated code to the generated code. Since the emulator generated code needs some workspace to deal with, four registers have been reserved and instructions that use the reserved registers will be modified and supplemented with code to load and/or store the stored register's values into reserved registers.

For example:

ADD r1, r4, #17

Would be copied, but

ADD r8, r9, r10 LSL#2

needs to be replaced with something like

LDR r8, [r11, #40]    ; Load arm_emulator_regs[10] into r8
LDR r9, [r11, #36]    ; Load arm_emulator_regs[9] into r9
ADD r8, r9, r8 LSL#2  ; The original instruction with the register fields replaced
STR r8, [r11, #32]    ; Store r8 into arm_emulator_regs[8]

Use of r15, the program counter, will also be recognised by the emulator and code generated to emulate either the 26-bit or 32-bit PC, depending on the current mode.

The reserved registers chosen are r8-r11, since:
  1. They appear to be the least used registers in the RISC OS 4.02 ROM
  2. They are in a block of four, aligned on a four register boundary, which allows instructions using them to be identified by checking the top two bits of the register number (0b10xx)
  3. Four is the smallest number of registers that this can work with, considering instructions like ADD r11, r10, r9, LSL r8 need three registers to hold the "input" values (we can re-use one of them as the destination register) and one more register (r11) to store a pointer to the emulator's registers in memory (plus the stored caller's registers and the emulator's PSR).
Condition flags are also a breeze compared with emulating them in x86 machine code, and conditional execution of instructions pretty much takes care of itself, except when the instruction sets the flags and the emulator still has to emit code to store the value of a reserved register.

For example:
ADDEQ r8, #1

Can't simply do this:
LDREQ  r8, [r11, #32]
ADDEQS r8, #1
STREQ  r8, [r11, #32]

because the final instruction will execute on the conditions set by the preceeding instruction.

So, in cases where the instruction condition is not AL, and a reserved register may be modified by the instruction, the generated code will have to be more like this:

ADDNE  pc, pc, #offset
LDR    r8, [r11, #32]
ADDS   r8, #1
STR    r8, [r11, #32]

I'll have to experiment to see if it's more efficient to implement all conditional instructions with a opposite-condition hop over unconditional instructions or copying the emulated instruction's condition to each of the generated instructions. I suspect that the hop approach will be easier to implement and also faster at generating code, so that's what I'll try initially.

Two details that have cropped up in the implementation so far are that:
  1. The generated code needs to be written to a writable, executable area of memory, so not a global variable but an anonymous mmap'd area (or possibly a global variable declared specially, perhaps in an assembler file?)
  2. The ARM instruction cache needs clearing after generating code.  Linux provides a call: __clear_cache( void *begin, void *end ) - begin is inclusive, end is exclusive.