[First published as XO-Python: On Von Neumann’s First Program on December 17, 2007.]
While doing some web research preparing the previous post in this series on Python and the XO, Python-XO: Programming as applied mathematics, I came across a fascinating article by Don Knuth about John von Neumann’s first program, Von Neumann’s First Program (PDF).
John von Neumann was one of the greatest physicists and mathematicians of the 20th century. He was legendary for his ability to do complex mathematical computations in his head, and set out much of the foundations of the structure of computers as we use them today. Back in my days in grad school several decades ago it was common to hear his name associated with computers, though as Wikipedia notes this is now rarely the case.
Though I had read some of von Neumanns’ early work in the late 1940’s in typescript form in the Courant library while in grad school, I was not familiar with this work, which was done in 1945 It was kept by Herman Goldstine, another of the pioneers of computing, in manuscript form. Don Knuth, the best computer scientist of his generation, begins his paper as follows:
A handwritten document now in the possession of Dr. Herman H. Goldstine contains what is probably the earliest extant program for a stored program digital computer. Its author, the remarkably talented mathematician John von Neumann (1903-1957), was in the process of refining the stored program concept as he was writing this code; so his program represents a significant step in the evolution of computer organization as well as of programming techniques. In this paper we will therefore investigate the contents of yon Neumann’s manuscript in some detail, attempting to relate its ideas to other developments in the early history of high speed computing.
The program we will study is not what we might expect an “ordinary” mathematician to have written; it does not solve a partial differential equation! Instead, it deals with what was considered at that time to be the principal example of a nonnumeric application for computers, namely, the problem of sorting data into nondecreasing order.
Shortly thereafter we learn that:
The key question was whether or not the proposed instruction set provided a satisfactory means of logical control for complex processes, and so he felt that a sorting program would be a most instructive test case. Furthermore, the existence of IBM’s special purpose machines for sorting gave him a standard against which he could measure the proposed computer’s speed.
By way of background, the “IBM machines” mentioned were not computers, but specially-built mechanical devices that could sort punched cards. I go back far enough that I can recall using such machines, and they were a joy to use, though the occasional jam could ruin a day. They gave a real-time demonstration of what is known as a “radix sort.”
It is worth noting that von Neumann was not only working out fundamental principles of operation of computers as we know them today but was also from the start interested in measuring the performance of his program. He was not programming in the abstract world of Turing machines (about which we will learn more in forthcoming posts) but in the real world, a world where cost and performance mattered, especially given the great cost needed to build computers in those days, even though von Neumann was designing for a machine with only three 32- bit registers and 8192 words of memory. The current state of the art at the time, as expressed by Knuth, was “The ENIAC was a highly parallel computer; weighing over 30 tons, it involved over 19,000 vacuum tubes, 1500 relays, etc. Because of its electronic circuitry, it was considerably faster than any computing machine previously built. But it had only 20 words of internal memory, and it required complicated manual operations for setting up a program on plugboards.”
Hardware afficionados can appreciate the following comment:
The EDVAC memory was to be divided into 256 “tanks” of 32 words each, operating in a cyclic fashion. Word 0 of each tank would pass a reading station one bit at a time, then (32 bit-times later) word 1 would be available,…, finally word 31, then word 0 again, etc. Thus the accessing of information from tanks is essentially the same as we now have from drums or head-per-track disks.
This means of accessing information is not only that used in disks today but was also used to access memory in the earliest versions of what is now the x86 architecture. Indeed, the requirement to account for a similar delay in accessing memory led to the infamous “wrong” byte order in the 8080 architecture that has plagued generations of programmers. 
Key to von Neumann’s means of expressing programs is the notion of sequencing, what is called “control flow”:
Instructions were to be executed from consecutive locations, unless the sequence of control was modified by a JMP order. If the control sequence would come across a number (not an instruction word), the effect would be as if an LOD instruction were performed referring to this number.
Von Neumann sketched out only part of the complete algorithm, in a form of notation similar to mathematics. Knuth translates that into “pseudo” code that is is recognizable as code to any programmer. “Pseudo code” is code not for a particular machine, but code for a hypothetical machine to illustrate how a computation can be carried out.
A few sections are of particular interest to those first learning a programming language such as Python:
After having written the program, he assigned actual addresses to the subscripted ones. In order to make the code relocatable, for use as a general open subroutine, he assigned the addresses relative to an unspecified starting location e. His address assignments are shown in Figure 2 at the right of the instructions.
Von Neumann discussed the relocatability of this routine by enumerating the nine instructions which are variable (those whose codes depend on p, EXIT, or the relocation factor e). He didn’t say exactly how these instructions were to be changed after they have been read in from tape; he apparently did not yet realize that the limited EDVAC code he had proposed (with no shift instructions, for example) made it difficult to insert p into the “PIK” and “PUT” instructions, since the machine could only store into the address field of instruction words. It is perhaps significant that he thought of this program as an open subroutine, not closed, since he did not regard EXIT as a parameter on a par with n, m, location(x0), etc.
Von Neumann here first sketched out, or defined, what are now known as subroutines or procedures. These are expressed in Python using the
Not a great deal of technical expertise is needed to read Knuth’s full paper. Much of it is quite accessible; it provides a fascinating insight to the first expression of key concepts that every programmer takes for granted. As such, it serves as a reminder that what seems commonplace now was once unknown.
1. I once heard a talk by the designer (whose name I can’t recall now) of the Intel 8086 architecture. He said that during the design process he had been given the option of putting the bytes in the “right” order, but decided to leave things as they were. This comment was met by a loud groan from all the programmers in the audience, and he said he had come to realize the error of his ways.
Programmers sometimes have to live with their mistakes for decades. For example, if you ever happen to meet a Google employee named Stu Feldman, give him a hard stare while silently saying the phrase “tab character” to yourself. You don’t have to say anything out loud, just don the appropriate look of despair as you recall the hours that have been wasted putting tabs at the right place in “make” files. He’ll get the message. You will make his day.