ARM Emulation on ARM
The emulator interface will be the same as the emulator on the emulator that generates x86 code, but takes a different approach to code generation due to the similarity of emulated and host processors.
Basically, code generated by the emulator is isolated from the Linux code by a code fragment that stores and replaces Linux register values with the emulator's register values on entry and does the opposite on exit. Most instructions can then simply be copied verbatim from the emulated code to the generated code. Since the emulator generated code needs some workspace to deal with, four registers have been reserved and instructions that use the reserved registers will be modified and supplemented with code to load and/or store the stored register's values into reserved registers.
For example:
ADD r1, r4, #17
Would be copied, but
ADD r8, r9, r10 LSL#2
needs to be replaced with something like
LDR r8, [r11, #40] ; Load arm_emulator_regs[10] into r8 LDR r9, [r11, #36] ; Load arm_emulator_regs[9] into r9 ADD r8, r9, r8 LSL#2 ; The original instruction with the register fields replaced STR r8, [r11, #32] ; Store r8 into arm_emulator_regs[8]
Use of r15, the program counter, will also be recognised by the emulator and code generated to emulate either the 26-bit or 32-bit PC, depending on the current mode.
The reserved registers chosen are r8-r11, since:
- They appear to be the least used registers in the RISC OS 4.02 ROM
- They are in a block of four, aligned on a four register boundary, which allows instructions using them to be identified by checking the top two bits of the register number (0b10xx)
- Four is the smallest number of registers that this can work with, considering instructions like ADD r11, r10, r9, LSL r8 need three registers to hold the "input" values (we can re-use one of them as the destination register) and one more register (r11) to store a pointer to the emulator's registers in memory (plus the stored caller's registers and the emulator's PSR).
For example:
ADDEQ r8, #1
Can't simply do this:
LDREQ r8, [r11, #32] ADDEQS r8, #1 STREQ r8, [r11, #32]
because the final instruction will execute on the conditions set by the preceeding instruction.
So, in cases where the instruction condition is not AL, and a reserved register may be modified by the instruction, the generated code will have to be more like this:
ADDNE pc, pc, #offset LDR r8, [r11, #32] ADDS r8, #1 STR r8, [r11, #32]
I'll have to experiment to see if it's more efficient to implement all conditional instructions with a opposite-condition hop over unconditional instructions or copying the emulated instruction's condition to each of the generated instructions. I suspect that the hop approach will be easier to implement and also faster at generating code, so that's what I'll try initially.
Two details that have cropped up in the implementation so far are that:
- The generated code needs to be written to a writable, executable area of memory, so not a global variable but an anonymous mmap'd area (or possibly a global variable declared specially, perhaps in an assembler file?)
- The ARM instruction cache needs clearing after generating code. Linux provides a call: __clear_cache( void *begin, void *end ) - begin is inclusive, end is exclusive.