ARM Emulation on ARM
The emulator interface will be the same as the emulator on the emulator that generates x86 code, but takes a different approach to code generation due to the similarity of emulated and host processors.
Basically, code generated by the emulator is isolated from the Linux code by a code fragment that stores and replaces Linux register values with the emulator's register values on entry and does the opposite on exit. Most instructions can then simply be copied verbatim from the emulated code to the generated code. Since the emulator generated code needs some workspace to deal with, four registers have been reserved and instructions that use the reserved registers will be modified and supplemented with code to load and/or store the stored register's values into reserved registers.
For example:
ADD r1, r4, #17
Would be copied, but
ADD r8, r9, r10 LSL#2
needs to be replaced with something like
LDR r8, [r11, #40] ; Load arm_emulator_regs[10] into r8 LDR r9, [r11, #36] ; Load arm_emulator_regs[9] into r9 ADD r8, r9, r8 LSL#2 ; The original instruction with the register fields replaced STR r8, [r11, #32] ; Store r8 into arm_emulator_regs[8]
Use of r15, the program counter, will also be recognised by the emulator and code generated to emulate either the 26-bit or 32-bit PC, depending on the current mode.
The reserved registers chosen are r8-r11, since:
- They appear to be the least used registers in the RISC OS 4.02 ROM
- They are in a block of four, aligned on a four register boundary, which allows instructions using them to be identified by checking the top two bits of the register number (0b10xx)
- Four is the smallest number of registers that this can work with, considering instructions like ADD r11, r10, r9, LSL r8 need three registers to hold the "input" values (we can re-use one of them as the destination register) and one more register (r11) to store a pointer to the emulator's registers in memory (plus the stored caller's registers and the emulator's PSR).
For example:
ADDEQ r8, #1
Can't simply do this:
LDREQ r8, [r11, #32] ADDEQS r8, #1 STREQ r8, [r11, #32]
because the final instruction will execute on the conditions set by the preceeding instruction.
So, in cases where the instruction condition is not AL, and a reserved register may be modified by the instruction, the generated code will have to be more like this:
ADDNE pc, pc, #offset LDR r8, [r11, #32] ADDS r8, #1 STR r8, [r11, #32]
I'll have to experiment to see if it's more efficient to implement all conditional instructions with a opposite-condition hop over unconditional instructions or copying the emulated instruction's condition to each of the generated instructions. I suspect that the hop approach will be easier to implement and also faster at generating code, so that's what I'll try initially.
Two details that have cropped up in the implementation so far are that:
- The generated code needs to be written to a writable, executable area of memory, so not a global variable but an anonymous mmap'd area (or possibly a global variable declared specially, perhaps in an assembler file?)
- The ARM instruction cache needs clearing after generating code. Linux provides a call: __clear_cache( void *begin, void *end ) - begin is inclusive, end is exclusive.
3 Comments:
thank you for this
what's the new information about ARM emulator
Good question! I hadn't touched it since last November, and it seems I forgot to upload it.
I'll probably have another play, and write a new post on the subject, but as a quick answer, you can download today's versions of the two files from here and here (the latter file should be stored in a directory rolf/compatibility).
(You will also need a config.h file, but that can be empty, or used to set flags.)
I used the following command line to compile the standalone version (on an ARM platform):
gcc arm_arm_emulator.c -o standalone -std=gnu99 -Wall -pedantic-errors -DSTANDALONE
There are various precompiler flags that can be set to modify the emulator.
The standalone program accepts the name of a binary file containing ARM code, loads it into memory, scans from the first instruction and runs until the first instruction it can't cope with.
Unaligned memory accesses are dealt with however the host code deals with them (i.e. I didn't add code to emulate the read-word-aligned-and-rotate behaviour of earlier ARM processors. Instead I looked at the Linux kernel, to see if I could make it process-dependent behaviour, got frustrated, and went off at a tangent, writing a microkernel for ARM.)
You need ARCH_ARM defined, either in config.h or the gcc command line, too.
Post a Comment
Subscribe to Post Comments [Atom]
<< Home