Sunday, September 08, 2013

LiveSD card nearly ready

I've been working on getting a bootable SD card working for the last couple of weeks, and it is getting close to release.

It is based on Linux From Scratch, specifically the automated version jhalfs-2.3.2, and the latest LFS book on SVN.  LFS is a book describing how to generate a minimal Linux distribution from source code and goes through several stages to avoid dependencies on the build operating system (in my case Knoppix 7.2).  JHALFS automates that process by reading the book for you and generating scripts and makefiles to perform all of the work described in the book.  It's a really good idea to go through the book manually a couple of times first, if you want to learn what it's all about, but eventually you'll want to automate it as I did in the construct-live-cd part of the ROLF repository on sourceforge.  JHALFS, however, moves with the times (my scripts generate linux-2.6.30.7, but LFS is on linux-3.10.10).

Moving with the times is not without its own problems, however.

Somewhere between binutils-2.22/linux-2.6.30.7 and binutils-2.23.2/linux-3.10.10, there seems to have come a change which meant that all my ROLF programs would fail on startup.  More strangely still, ldd and gdb would report the problem as follows:

ldd: exited with unknown exit code (139)
and
(gdb) start
Temporary breakpoint 1 at 0x8000383
Starting program: /tmp/x/x1

Program received signal SIGSEGV, Segmentation fault.
0xb7fdfa24 in dl_main () from /lib/ld-linux.so.2
I had the same problem many years ago, but this time, I'm writing it down!

The culprit is a link option I have to use to avoid the loader putting native code or data where RISC OS programs expect to have ROM or RAM mapped:
-Wl,--section-start,.interp=0x10000100
I have never come across an option that tells the linker to avoid a certain area of memory for the program to use, so this is the best I could come up with.  ".interp" seemed to be the first section located in memory, and all other sections would follow on, so, initially, I tried
-Wl,--section-start,.interp=0x10000000
...with the result that I got similar problems as above.  Adding 256 bytes to the address, however made it go away and for a long time, I was happy....

It turns out that adding another 256 bytes makes it go away again.  If anyone can tell me why this happens, I'd be very interested to know!


Wednesday, October 31, 2012

bind: Address family not supported by protocol

Trying to get ROLF running on a Raspberry Pi, I installed Debian Wheezy, but then found that my vnc server version of ROLF doesn't start up.

I've boiled the problem down to a minimal program that works fine on 2.6.32.6, but not on "Linux raspberrypi 3.2.27+ #250 PREEMPT Thu Oct 18 19:03:02 BST 2012 armv6l GNU/Linux" or "Linux Microknoppix 3.4.9 #34 SMP PREEMPT Fri Aug 17 06:30:04 CEST 2012 i686 GNU/Linux", so presumably there's a change in a major version.  All I have to do is find out what it is...

Update: It seems someone turned on the "check for idiot programmers" flag in the kernel; fix: addr.sin_family = AF_INET;

#include <sys/types.h>
#include <sys socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>

int main()
{
    int vnc_server;
    int port = 2008;

    struct sockaddr_in addr = { 0 };
    addr.sin_family = htons( AF_INET );
    addr.sin_port = htons( port );
    
    if (-1 == (vnc_server = socket( AF_INET, SOCK_STREAM, 0 )))
        perror( "socket" );

    if (0 != bind( vnc_server, (struct sockaddr*) &addr, sizeof( addr ) ))
        perror( "bind" );

    return 0;
}

Tuesday, July 17, 2012

Better timings

Since my last post, I've had a look at the ARM-on-ARM emulator with a view to improving the speed.

Using gprof, I found that:
  1. The image translation from RISC OS Sprite to screen mode was unoptimised (i.e. making two unoptimised function calls per pixel).  This, I improved, but it only made a small overall difference since it occurs after the rendering work is done in the emulator.
  2. gprof is useless for following what's going on in the emulator generated code.
Building the emulator to dump the state after every emulated instruction, then with a bit of fiddling with grep, sed, sort and uniq, meant that I could find the instructions most used by the renderer (about 10000 times each rendering the ACORN file).  I noticed that one branch condition (actually, the case where a conditional branch instruction is not taken) always did a lot of work (hash table lookup, etc.) but that the existing branch fixup code could be used to improve it.

With that six or seven line code change, the situation is now that the BeagleBoard renders the celtic_knot3 file in a little over 37s, down from 85s.
x86 PC: 14s, Beagle Board: 37s
My RISC PC is not cooperating at the moment, but the same file takes about 55s to render on its 200MHz StrongARM.

The branch fixup code currently avoids having to clear the code cache by calling the fixup code using LDR pc, [pc, #...], having stored the code address in a scratch register.  That way, only the word loaded by that instruction has to be changed to point to the generated code instead of the fixup routine, and the next time the instruction is reached, the fixup routine will be bypassed (although the setup for the call remains).

Further possible improvements to be tried:
  1. Modify the first instruction of the fixed up code to be a proper branch to the address, as well as the current change; if the code falls out of the code cache by itself, the next time the code is run it will be quicker.
  2. Clear the ARM code cache explicitly, so that the faster code will be called straight away.  This may be slower, due to the overhead of a system call the first time the branch occurs.
The ARM-on-ARM emulator code is available on the SourceForge ROLF project site, at: http://ro-lf.svn.sourceforge.net/viewvc/ro-lf/ROLF/rolf/Libs/Compatibility/arm_arm_emulator.c?view=log

If you compile with -DSTANDALONE, it will create an executable that takes the name of a file that should contain ARM instructions, and run them (only really useful with gdb, to see what's going on).

The handling of unaligned memory accesses is still incorrect (except on my custom kernel, which fixes up the accesses in the old fashioned way).

Update: http://ro-lf.svn.sourceforge.net/viewvc/ro-lf/ROLF/rolf/Libs/Compatibility?view=tar downloads the whole ROLF compatibility library, including the include files and disassembly code.  (I can't check this at the moment, but...) The following should generate an executable on an ARM system:

tar xf ro-lf-Compatibility.tar.gz
cd Libs/Compatibility/  # Possibly other subdirectory
touch config.h  # Usually generated by the ROLF configure routine
gcc -o standalone_emulator arm_arm_emulator.c arm_d*.c -DARCH_ARM -DSTANDALONE -DDISASSEMBLE -Iincludes -I.

Thursday, July 05, 2012

AWRender on BeagleBoard, with timings

The AWRender module is now working under ROLF on the BeagleBoard.

It's a long story, but it looks like the problem wasn't with unaligned accesses after all (the newest module version doesn't do any), as I'd thought.

The symptoms were twofold: Viewer crashed as soon as it tried to open an ArtWorks image with a problem in the dynamic linker, and the !AWRender BASIC program displayed just a small part of the image (and showed unaligned accesses occuring).

It wasn't until I built from svn sources on the x86 platform again that I noticed that the BASIC program generated exactly the same (wrong) output.  The implication was that the problem was (a) independent of the emulator and (b) previously fixed, but lost (the program had previously be working, as can be seen from screenshots on this blog).  This started me looking at the Viewer problem, and I eventually noticed that ARMLinux locates executables at 0x8000 (same as RISC OS), rather than up above the 128MB boundary.  A build explicitly locating the executable in the same area resulted in properly displayed images (although, strangely, the celtic knot 3 image appears lighter on the BB than on the x86 PC, when both viewed via tightvnc on the same monitor).

This is what Linux's 'uname -a' and /proc/cpuinfo tells me about both systems:

BeagleBoard:

Linux (none) 2.6.39.1 #10 SMP Wed Jun 13 21:00:36 CEST 2012 armv7l GNU/Linux

Processor : ARMv7 Processor rev 3 (v7l)
processor : 0
BogoMIPS : 490.52

Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x1
CPU part : 0xc08
CPU revision : 3

Hardware : OMAP3 Beagle Board
Revision : 0020
Serial : 0000000000000000


PC:
Linux Microknoppix 2.6.32.6 #8 SMP PREEMPT Thu Jan 28 10:51:16 CET 2010 i686 GNU/Linux

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Athlon(tm) Dual Core Processor 5050e
stepping : 2
cpu MHz : 2593.613
cache size : 512 KB
bogomips : 5189.36

The same for core 1 (the emulator is single threaded, so that the second core will make minimal difference to rendering speed).

The AMD processor appears to be over ten times faster (BogoMIPS) than the BB's ARM.

The test I'm using is to display celtic_knot3, a file of 355812 bytes. The rendering time is in the tens of seconds, displayed in the titlebar of the Viewer window, so the file access time (in the millseconds) is of no consequence.

The initial times I'm getting are:

BeagleBoard 85s, PC 15s (to the nearest second).

This is disappointing; the (200MHz) RISC PC manages it in 55s (see here).

Friday, June 01, 2012

Mounting an NFS filesystem requires the loopback interface to be up

Over the last few weeks, I've been using flash drives connected to my BeagleBoard to store files for compilation; the MMC card I use for the root filesystem is too small for compiling MPlayer.  Unfortunately, the flash drive stopped working, and I can't seem to recover a single byte of what was on it.

So, I decided to mount an NFS partition instead, but most of the commands I tried after running rpcbind (the portmapper) hung until they timed out.  What I finally worked out is that the loopback interface has to be up ("ifconfig lo up") for them to work.

If I was trying to mount an NFS partition as my root partition, I would probably take the approach I used for the x86 live CD, which was to have a compressed minimal root filesystem as part of the kernel image (look into CONFIG_BLK_DEV_INITRD, CONFIG_INITRAMFS_SOURCE, etc.), which was able to bring up the appropriate interfaces (reading environment variables set in the uEnv.txt file for the NFS server, etc.).

Tuesday, February 07, 2012

Programming ROLF windows

In changing the Viewer application over from the old (furniture.h) Wimp interface to the new (composite_window.h) version, it occurred to me that I hadn't done any documentation for the way it works.  I'll try to remedy that here.

Background

The ROLF Wimp has always treated a window as a rectangle controlled by the applications with a set of (usually smaller) rectangles, called hotspots, which respond to user input events according to a set of flags (report drags, clicks, activations, etc.).  Windows may be opaque (obscuring any windows lower than them on the stack) or translucent (not completely obscuring them).

Initially, the hotspots were defined to be non-overlapping, but that can add a lot of complexity to the application, so the definition was changed so that the last hotspot in the set (now a list, since the order of definition is important) matching a given coordinate is the one that may report the event.

The Wimp behaviour has been fixed for some years, now.  I need to add one more function, to return the current keyboard modifier flags (shift key states, etc.) for the filer to use, and to fix a bug in translucent window redrawing.

A window may also be marked as "transient", indicating that it should be informed before an active user input event (i.e. a click, not simple pointer movement) is sent to another process.  This enables the implementation of pop-up menus.

Original API (furniture.h)

The original API, defined in furniture.h, wasn't very flexible, and shouldn't be used any more.  It allowed an application to request window sizes according to the required size of the application drawn area of a window, which was good, but failed to provide the flexibility to do more "interesting" things, like sharing the horisontal scroll bar area with a status bar.  Also, the user input events went through a single set of application provided routines, that had to distinguish between window types and made it difficult to provide standard window types (like a save window).


The new (and probably final) window API provides a C object-oriented interface where the application provides a pointer to a structure containing function pointers to the API which uses that pointer as the first parameter to those functions.

New Window API (window.h)

I spent some considerable time considering various approaches to defining how your application's windows should be laid out by the library, either with North, South, East, West, or a spiral of rectangles out from a central rectangle, etc., until I realised that it would be more work to define any but the most simple windows in that sort of way than to simply write a couple of routines per window type that would fill in the hotspot information for the library, given the window dimensions.

The API now invisibly, routes user events to the appropriate window's routines, which allows standard window types to be implemented without affecting the application code past the initial window creation.

Composite Windows

The composite window API is a layer above the windows.h interface which makes the definition of non-overlapping rectangular areas, frames, of the window easier.  This, in turn, allows libraries to provide standard window furniture features such as scroll bars and sizer icons.

Each frame provides a similar set of routines to those used in the window.h interface and a composite_window creating application will provide similar routines that the library can ask to provide information about the whereabouts of each frame.

There are some assumptions that can be made by application (and library) authors regarding these routines, namely:
  • Only one user event will occur at a time (there's only one pointer)
  • Only one window (per process) will be being redrawn at a time
This means that, say, the titlebar implementation can provide a single shared frame structure pointer, plus a routine to set up what it should display when it is drawn.

Drawing Windows

ROLF applications should be written with the assumption that the Wimp will handle acceleration of window redraws compatible with the system the application is running on.  In general, applications should aim to use as little memory as possiblem and let the Wimp worry about buffering.

As an example, on a small-memory device with shared framebuffer memory, the Wimp will probably request the application draws directly to the framebuffer; in a large-memory device, the Wimp may allocate memory for the content of each window and combine them automatically, perhaps with DMA to a separate graphics card, so that the application will never be asked to perform a redraw because of window repositioning.  The aim is to converge on an intermediate solution that notices details like how frequently a window's content changes, how quickly redraws complete, etc. and buffers the most processor intensive windows, automatically.  (This doesn't happen yet, though.)

ROLF's drawing mechanism is based around an Image library, with a few essential types/interfaces:
  • SourceImage (things you can render to a DestinationImage)
  • DestinationImage (things you can render SourceImage things to)
  • Bitmap (providing both of the above interfaces)
  • Rectangle
  • ScreenRectangle (Rectangles with short (16-bit) values - it's sufficient for a 3m square display at 300dpi)
  • rgba16 (The standard colour format, with 16-bits for each of RGB and Alpha)
One of the main aims of the library is to minimise the amount an application has to know about how images are encoded, conversion and optimisation will be provided within the library.

All Images (Source or Destination) represent a rectangular area, not necessarily with a (0, 0) origin.

[ To be continued ]

Friday, February 03, 2012

NetSurf on ROLF

I thought it was time to update NetSurf on ROLF, so I have.

Revision 191 should build on a clean system.

Most annoying features:
  • Can't type a URL into the URL bar
  • Menu in a window exits the application
  • Double-clicking an html file doesn't start the program [Fixed in rev. 192]
  • Scroll bars don't update properly
Still, it should give an idea of what's possible.

I've done something to the filer that means the NetSurf icon shows up OK, but none of the others do, any more. [Update: That was just a libpng version mismatch, which also showed up as lots of NoMemory messages from NetSurf.]