Sunday, 18 September 2011

OMF support for binutils

The setting

Some years back I was working on an exciting project: at Prism Payment Technologies we were building self-service terminals destined for fuel stations in Kuala Lumpur.  These terminals were miniaturized 90s-era PC-compatible computers with a PC/104 bus: to add peripherals one simply stacked the boards onto the previous board's million-pin header.  Responsible for pretty much all the firmware running on these critters, it felt like I had been transported back to 1992, back when I was trying to master all the PC's standard(ish) peripherals.  I even got to tie up the one loose end I never got to in the 90s: programming the VGA registers!

The code was all in C, and we used TopSpeed C to compile the project.  It was a mixture of little bits of assembly for the ISRs, a few third-party libraries (a TCP/IP stack) and a real-time operating system (uCOS), and our own IFSF protocol stack and application.  When I inherited the project, 640K was just barely enough for us - it was a constant battle for bytes in order to make everything fit into memory.  Some days I would need to add a feature, but be unable to run the firmware as the last few of the 640K bytes had been consumed.  Then I would spend the next day or so, painfully inspecting each of the usual memory-hog suspects, searching for arrays to shrink.

Enter GNU binutils

I knew objdump could, in principle, tell me exactly which object modules were the ones most likely to harbour hogs.  Yet there was no OMF support - that FOO.OBJ object file format familiar to DOS programmers.  (TopSpeed used the same format, and luckily it was same-enough.  More in [1].)

Rather than spend frustrating hours manually searching for unreasonably large variables, I decided to teach libbfd how to read OMF object files.  Pretty soon I was able to answer my needs efficiently - there had indeed been several large buffers that no code was using.

Over time I extended the OMF port to support most of the common features of these object files.  Wrapping my head around relocations was the hardest part - the BFD concept of relocations in particular, because it is (necessarily) so complex due to the many ports' quirky features it has to address.

Dinosaur mating season

But before I could get the GNU paperwork through with the comparatively Open Source-friendly [2] Prism management, mating season arrived and the company had new owners.  A much bigger company, whose HR handbook seemed to consist of variations of the theme, "Lift the drawbridges - keep the barbarians out".  It became a sufficiently unpleasant environment for many of us Prismers (say that fast) that about one third of us quit - including me.  I simply didn't have the energy to convince a likely-to-be impersuasible and anecdotally abusive Kaiser to sign the GNU paperwork: redoing it all from scratch seemed the better deal.

Second time lucky

In the two weeks between jobs, I hacked and hacked and hacked deep into the night, sweating to implement enough OMF support to be useful, while my memory of the file format was still fresh.  It was a real reimplementation: I didn't simply copy&paste my earlier code - I knew that it was now off-limits and would taint anything that I derived from it.  I'm pretty sure the result was better in some respects than what I had had at Prism - the code certainly smelled cleaner.

But eventually I got busy enough at my day job that I didn't have enough mental bandwidth to devote to finishing my BFD port, so my efforts faded and then stopped altogether for a few years.  At that point objdump could answer most of my questions, save for external symbols and relocations.

At some point I asked my boss Rob Love to sign the employer disclaimer of rights (a necessary evil part of the GNU paper trail); he did so with enthusiasm (thanks!) and I no longer had an excuse to continue procrastinating.  I spent a while rebasing my between-jobs patches to the current binutils code, and reacquainting myself with OMF.

Desert Wandering

Since then I've left Rocketseed (email marketing ≠ I'm a spammer, but it's fun to tell dance class girls that antifact) and am now retired / freelancing / unemployed [3], so I've had time to plug a few holes: my port now understands external symbols, relocations, and a few other minor features.

The result

Every project has a foo.ext.  Here's mine:

segment text

extern bar
extern baz
global foo, reloc_kitty, reloc_foo, reloc_bar1, reloc_bar2

call bar
call baz
call baz + 10
call baz
call foo
call seg bar:bar
lea ax, [foo wrt seg bar]

call text:kitty
dw foo
dw bar + 10 wrt seg baz
dw seg bar

segment trampoline
kitty: ret
dw trampoline wrt text

NASM assembles this not-useful-at-all code, and objdump -D -r -p foo.obj dumps it as:

foo.obj:     file format i386omf

Module name: foo.asm
Translator: The Netwide Assembler 2.10rc4
  2 text
  3 trampoline
  text (2)
  trampoline (3)

BFD: Found 7 symbols

Disassembly of section text:

00000000 :
   0: e8 00 00             call   3
1: OFFPC16 bar+0xfffffffe
   3: e8 00 00             call   6
4: OFFPC16 baz+0xfffffffe
   6: e8 0a 00             call   13
7: OFFPC16 baz+0xfffffffe
   9: e8 00 00             call   c
a: OFFPC16 baz+0xfffffffe
   c: e8 f1 ff             call   0
   f: 9a 00 00 00 00       lcall  $0x0,$0x0
10: OFF16 bar
12: SEG bar
  14: 8d 06 00 00           lea    0x0,%ax
16: OFF16 text
16: WRTSEG bar

00000018 :
  18: 9a 00 00 00 00       lcall  $0x0,$0x0
19: OFF16 trampoline
1b: SEG text

0000001d :
1d: OFF16 text

0000001f :
  1f: 0a 00                 or     (%bx,%si),%al
1f: OFF16 bar
1f: WRTSEG baz

00000021 :
21: SEG bar

Disassembly of section trampoline:

00000000 :
   0: c3                   ret    
1: SEG trampoline
1: WRTSEG text

I'm not quite satisfied with the relocation type names I chose, especially that WRTSEG business.  It's necessary though, because in OMF, a relocation can ask for the offset of a symbol from the base of any segment, not only the segment in which its definition resides.

Show me the code!

Your wish is my command.  Behold, OMF support in binutils!  Also, a perl script to dump the OMF records: omfdump.


[1] TopSpeed C was one of the 80s-era compilers; one of the better ones IMHO, but not very well known.  The way I heard it (William Hayes seemed to know more of the back story), TopSpeed C was going to be Borland's C compiler, but when it took too long to ship, Borland acquired Wizard Systems and used their C compiler instead.  (The Wikipedia article on Borland doesn't specifically mention this shipping delay - this part might be apocryphal.)  The C compiler folk at Borland eventually spun off into TopSpeed.  TopSpeed C was the better compiler though; it understood volatile better than Microsoft C, Microsoft QuickC (I cut my C teeth on this one), and Turbo C.  Some of these got volatile partly right, others seemed simply to ignore the keyword.  TopSpeed C on the other hand knew to load or store a volatile variable from or to memory each time the C code referenced it, which whas close enough to completely right for my purposes.  (I needed the uCOS synchronization primitives to work right, and the async serial port handler not to lose characters.)  I was rather amused though when I found one compiler bug in TopSpeed C: it discarded comments before it processed backslash-newline line continuations!  Granted, this was before the days when // comments had been standardized.  As for the object file format that the TopSpeed compilers generated, it was OMF, but I have a sneaky suspicion they did at least one thing that is a little... unconventional: LEDATA records sometimes overflow the corresponding SEGDEF!  It's probably not a mortal sin, as a sufficiently smart decoder could realize that the segment was in fact larger than just the first SEGDEF with that segment's name: subsequent SEGDEFs seem to "extend" the first.  I don't have solid evidence to back all of this up, but it showed up while I was testing my port with an old object file I had lying around.

[2] I usually prefer to use the term "Free Software", but in this case I wanted the superset-meaning of the phrase "Open Source".  The Prism company culture was by no means welcoming of only Free Software!

[3] Honestly, I'm a bit bored with writing only software.  I really miss the hardware / software mix I was part of at Prism (that world is now forever lost; I have no illusions about wanting to go back there).  I'm currently using the occasional software work that comes in, combined with my intense frugality, as a runway to find fun again.  I'm hoping for something with a little bit of bricks and mortar.  In fact, I should be learning to weld, not blogging about code.