Aggressive Assembly Language Programming in SPITBOL

The Preface to “Macro SPITBOL: The High-Performance SNOBOL4 Language,” by Mark B. Emmer and Edward K. Quillen, includes the following:

Robert Dewar created Macro SPITBOL in the mid 1970’s, and showed the world that a high-performance version of SNOBOL4 was possible. His Macro SPITBOL compiler forms the kernel of this implementation. As an example of “aggressive” assembly-language-programming, it remains a programs tour de force.

I was reminded of this when I came upon a comment by Mark in the source for Macro SPITBOL:

note: as an experiment, we tried aligning all ent and prc’s on a dword boundary, in an attempt to minimize instruction stall time on the first opcode of a procedure. the resultant exe file waslarger by 592 bytes, and actually ran 0.1% slower. we will continue to use odd alignment of block entry routines so that the .cepp conditional may be used.

592 bytes? 0.1% difference in performance?

Can you think of another programmer who pays this level of attention to performance?

Here is another:


* move chars from xl (esi) to xr (edi), count in wa (ecx)
*
* the following sequence "old method" is shorter than the "new method"
* shown below, but is much slower because of the conditional jumps that
* cause the instruction cache to be flushed for 3 out of 4 count values.
*
* old method:
* shr ecx,1
* jnc tmp1
* movsb ; move odd byte
* tmp1 shr ecx,1
* jnc tmp2
* movsw ; move odd word
* tmp2 rep movsd ; move string as double words
*
* genop('shr','ecx','1')
* genop('jnc','short ' (t1 = genlab()))
* genop('movsb')
* genopl(t1 label.delim,'shr','ecx','1')
* genop('jnc','short ' (t1 = genlab()))
* genop('movsw')
* genopl(t1 label.delim,'rep','movsd') :(opdone)
*
* new method:
* shrd eax,ecx,1 ; preserve ecx[0] in eax[31]
* shr ecx,2 ; preserve ecx[1] in cy, divide by 4
* rep movsd ; move dwords, leaves ecx=0
* adc ecx,ecx ; copy cy to ecx[0]
* rep movsw ; copy 1 or 0 words, leaves ecx=0
* shld ecx,eax,1 ; copy eax[31] to ecx[0]
* rep movsb ; copy 1 or 0 bytes

elel

Some knowledge — indeed more than I have — of programming for Intel’s x86 architecture is needed to fully appreciate the above, but the attention to detail is obvious.

Dewar’s MACRO SPITBOL was preceded by SPITBOL/360, written in IBM360 Assembly Language in the late 1960’s by Dewar and Ken Belcher. [1]

It must be the most efficient 360 assembly language program ever written. Robert once remarked that it was the only program known to him where the authors kept a copy of the instruction timing/cycle details for each model of the 360 in front of them as they wrote the code, and they frequently consulted these details to decide on best code sequence.

SPITBOL/360 demonstrated an astounding level of performance. SPITBOL/360 routinely compiled hundreds of thousands of lines of source code per second.

I once heard, perhaps from Robert himself, of one of the great tricks in SPITBOL/360.

SNOBOL has an integer variable &STLIMIT. A counter is initially set to this value. Whenever a new statement is executed, the counter is decremented, and if the value reaches zero then an exception is raised. This is used to avoid runaway loops and other code that would result in execution going on forever.

The obvious method is to use an integer counter, decrement it when a new statement is executed, and raise the exception when the value zero is reached.

Dewar and Belcher’s device was to maintain the counter as a floating-point (real) variable. It was set by computing a “magic” value and initializing the counter to that value. When a new statement was executed, the value 1.0 was added to the floating point counter. The magic was in that a floating-point exception, due to overflow of the value, would be raised at the appropriate time.

Andy Koenig — a fellow contributor to Macro SPITBOL — made mention of this device in his column Some Programs Are Poorly Designed On Purpose. The post includes several other examples from SPITBOL of what Andy calls “nefarious purposes” in the use of floating point arithmetic.

Another device, one of much greater generality, was the use of indirect threaded code (ITC), described in Dewar’s paper Indirect Threaded Code.

Most interpreters use a “switch on opcode” statement when a new statement is to be execute. However, this involves testing the value, and doing a switch instruction to the appropriate code, which tends to slow things down. Indeed this is one of the main costs of writing an interpreter instead of generating actual code for the target machine. After all, the hardware is designed to extract opcodes and take the appropriate execution as fast as it is possible.

When using ITC, the transition from one abstract instruction to the next is made not by a switch, but by doing an “indirect branch” to the address contained in the the code block for the next statement.

By the way, this is the reason you cannot write code in C to execute the Minimal assembly language level code in which SPITBOL is written. C has no such contruct (what is needed is something like goto *p).

By the way, Dewar is equally aggressive when it comes to documenting his code. See for example the source code for SPITBOL in file spitbol.min available at Github Hardbol SPITBOL. I consider it a tour de force in program documentation.

For example, while working together on the port of Macro SPITBOL to the IBM PC in the early 80’s, I remember particularly an instance where Dewar saved a file, and then immediately went back to edit the source when he noticed there was an extraneous space..

Notes:

1. SPITBOL/360 is available in open source form under the GPL license. See SPITBOL 360.

Dewar and Belcher applied their aggressive programing techniques in the early 80’s by writing a COBOL compiler in COBOL! The wikipedia entry for Robert Dewar includes:

In the 1970s Dewar was a principal author of the Realia COBOL compiler, widely used in commercial environments to this day (marketed by Computer Associates).

Written in COBOL, a remarkable feat in itself, I learned later from colleagues at IBM that Realia COBOL generated code that was better than that produced by IBM’s product COBOL compiler

Advertisements

6 Comments

  1. Mark Emmer
    Posted August 31, 2012 at 09:56 | Permalink | Reply

    Yeah Dave, jumps are painful. I’d forgotten the character copy code, but it reminds me why I like the ARM architecture so much — the ability to conditionally execute individual instructions without having to resort to jumps around them. An ARM SPITBOL would really scream. And the .CEPP example is a reminder that thinking about a possible improvement is not as good as actually implementing it and measuring the results. One of Mike Radow’s favorite quotes is apt:
    “In theory, there is no difference between theory and practice. But, in practice, there is.” — Jan L. A. van de Snepscheut
    Too bad Robert’s Indirect Threaded Code and his other SP&E article on SPITBOL internals are locked away behind for-pay publishers. I once asked Wiley (SP&E) for permission to republish the internals article and got the run-around. Of course, as you point out, the source doc is a pretty good article all by itself.
    Finally, as for the possibility of translating MINIMAL to C, gcc does provide computed goto’s as an extension (goto *ptr;). Also, gcc’s __builtin_return_address offers the tantalizing possibility of implementing Robert’s trick of a multi-way indirect jump on function return. Might be worth a little experiment in the spirit of van de Snepscheut’s quote.
    Mark

  2. Posted August 31, 2012 at 11:31 | Permalink | Reply

    Mark,

    Thanks for the comment.

    ARM is next on my list. As I blogged recently, only two architectures are of interest, at least in my work on SPITBOL: x86 (32 and 64 bit), and ARM.

    An ARM port is needed to port SPITBOL to iOS (iPhone + iPad) and Android (phone + their new tablet).

    I agree it should run *really* fast.

    dave

  3. paul
    Posted November 28, 2012 at 03:28 | Permalink | Reply

    I’m trying to understand, did Spitbol 360 convert Spitbol source to actual 360 machine code, while Macro Spitbol uses ITC? Is there a huge efficiency hit from the ITC, comparable to ITC Forth vs. assembly code? Thanks.

    • Posted December 11, 2012 at 20:18 | Permalink | Reply

      SPITBOL/360 generated machine code on-the-fly. The CACM paper on TIC wasn’t published until a few years later, so I don’t know if SPITBOL/360 used it.

      As for efficiency, ITC is I think about the best you can do for an interpreter-like system. My guess would be it’s better than FORTH, unless FORTH also uses it.

      • paul
        Posted December 12, 2012 at 16:00 | Permalink

        Thanks. Yes, ITC is the classic and probably best-known Forth implementation technique. However, some Forths generate machine code on the fly, and they are faster than ITC systems. So I wonder if Macro Spitbol took an efficiency hit on going to ITC.

        Another Forth implementation tradition was “meta-compilation” which basically meant Forths were often completely self-hosting. The implementation would contain an assembler for the target machine, that was written in Forth. Then the low level primitives would be defined as machine code sequences using the assembler. Then the higher level routines would be defined in terms of the low level primitives. All of this was completely in Forth, so you’d just need a working Forth (possibly on another system) to get started. Once you had something runnable you would no longer need to cross-compile to rebuild or extend the system.

        Spitbol actually seems like a good language to write assemblers in, so maybe it could be an alternative to NASM.

      • Posted December 14, 2012 at 12:33 | Permalink

        Paul,

        Indeed, SPITBOL is a great language to write assemblers. For examplex, the SPITBOL translator translates the abstract MINIMAL language to assembler.

        As for doing an x32/x64 assembler, I have thought of it, but the architecture is too complex and tedious.

        ARM, on the other hand, would be an interesting — and probably quite fun — exercise.

        thanks,dave

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

  • Pages

  • August 2012
    M T W T F S S
    « Jul   Sep »
     12345
    6789101112
    13141516171819
    20212223242526
    2728293031  
  • RSS The Wayward Word Press

  • Recent Comments

    mrrdev on On being the maintainer, sole…
    daveshields on On being the maintainer, sole…
    James Murray on On being the maintainer, sole…
    russurquhart1 on SPITBOL for OSX is now av…
    dave porter on On being the maintainer, sole…
  • Archives

  • Blog Stats

  • Top Posts

  • Top Rated

  • Recent Posts

  • Archives

  • Top Rated

  • %d bloggers like this: