A Programmer’s Programmer: Philippe Charles

Fran Allen, IBM’s first woman Fellow, learned in mid-February that IBM was about to lay off Philippe Charles. She asked me to write an appreciation of his work since she was unfamiliar with the details of his contributions to the Jikes project. She wanted it for background for a meeting she had requested with another IBM Fellow.

Fran so valued Philippe that she made a run at saving his job. She was unsuccessful. Philippe joined me and thousands of others in the our Long March out of IBM at the end of February.

As I started to write what Fran expcted would be a short note, I decided that I should provide sufficient background so the document could stand on its own. I realized not far into it that I was writing a history of Jikes from a new perspective. I wrote it intensely over a period of a couple of hours. I didn’t have time to revise it, so I just sent it on its way.

Here then, is my appreciation.

This is a memoir about the four years I spent working with Philippe Charles on Jikes, and an appreciation of his contributions to that work. Jikes is a Java source to bytecode compiler that became IBM’s first open-source project.

I first learned about Java late in November of 1995 when Fran Allen suggested I read Jim Golsing’s Java whitepaper. I did so, and was very impressed by the technology. It was based on the premise that microprocssor performance had reached a level that supported the use of virtual machine technology for constructing serious, industrial-strength programs.

At that time I was just ending work on the High Performance Fortran project, and so was looking for something to work on. I soon learned that my manager, Edith Schonberg, would be leading a small Research project to investigate Java technology, and joined that project in early 1996.

The initial team consisted of four people: Edith, Derek Lieber, Philippe Charles, and myself. Derek had decided to implement a Java virtual machine from scratch, and was already at work on that when I came on board.

Though I had known Philippe since the late 1970’s, we hadn’t worked together in a serious way since then. I had enjoyed our previous work, so offered to help him with his work on Java. I learned that he had started writing a Java compiler, and had brought it to the point where it could parse programs, and do semantic analysis. However, he had no back-end for the compiler and so had no way to execute programs.

So I offered to join him in his effort by writing a back-end. Thus began almost four years of intense collaboration that, looking back, was the most satisfying work of my career.

Brief History of the Jikes Project

This is a brief history of the Jikes project. More can be found at my Jikes Archives.

Philippe and I started working together on what became the Jikes compiler sometime in April of 1996, and by mid-July were able to compile and execute the “hello world” program.

We then sought out a few trial users, and learned by mid-August two important lessons. First, as I will discuss in more detail later, Jikes was fast … damned fast. Jikes was written in C++, while Sun’s compiler, Javac, was written in Java. Our initial tests showed that Jikes was 10 to 50 times faster than Javac.

We also found that Jikes was correctly detecting as erroneous programs that Javac was wrongly accepting, and also correctly accepting programs that Javac was rejecting. Jikes was not only faster, it had better function. We always attached more importance to getting it right that just speed, so this was especially rewarding.

We concluded that Jikes offered real promise, and so committed to working on the project 24×7 as long as we could, to see just far we could go. We organized our work on two principles. I told Philippe, “You just keep on coding. I’ll do all the other stuff, such as builds and testing, and I’ll code when time permits.” Since we knew we would be working on nothing else for the forseeable future, we also decided to make sure we were always having fun doing the work.

IBM launched alphaWorks, a showcase for new software technology, in early September of 1996. We learned about it within a few weeks and set as our goal to bring Jikes to a form suitable for release on alphaWorks. We were done, or so we thought, by early 1997.

When we sought to get clearance to release Jikes, we learned that Gina Poole from Software Group, who was then part of IBM’s relationship with Sun, was ok with the release, but that we had to pass Sun’s compatibility tests first. We welcomed this challenge, as finding test cases and users had been a problem, and now we had a chance to use Sun’s test suite. Sun then, and for several years thereafter, while open about allowing anyone to implement Java, did not make the test suite openly available, thus making it difficult for competitors to test their efforts.

It took about a month to work our way through the tests. We found about as many errors in Jikes as we did in Sun’s compiler. Though we failed some tests, we also found test programs that Sun’s compiler was wrongly accepting.

Jikes was released on alphaWorks in mid-April of 1997. We put out versions (binary only, no source) for Windows and AIX. (We added a Solaris version a few months later, after learning that Toronto had produced a version of IBM’s xlC compiler that supported the Sparc architecture.) The response was enthusiastic, with lots of downloads. Within a day or so after the release, someone sent me an mail saying there was an article in PC Week about Jikes. Jikes was mentioned in the main story on the front page, which said in part: “IBM’s alphaWorks division today released a new Java compiler. It’s called Jikes and it sets new standards for conforming to the Java language specification.”

It was Jike’s success on alphaWorks that put Jikes on the map. Though I knew Paul Horn was the director of Reseearch, I had never dealt with him personally. Within a few days I was asked to give him a briefing on Jikes. (For more on the alphaWorks release, see my blog post Happy Birthday alphaWorks!

It was soon after Jikes was released on alphaWorks that Philippe informed me we had a problem. Sun had announced early in 1997 that they were adding a feature called “inner classes” to Java. We hadn’t paid it much attention while we busy getting Jikes ready for release, but once Philippe had read the specification, he realized it would require major changes to the compiler. Indeed the scope of the changes would require rewriting a substantial amount of the compiler. We thus decided to shut the project down in mid-July since we didn’t have the resources to maintain the compiler while revising it.

Almost eight months elapsed before the next release of Jikes in early 1998. Within a month we got the first requests of a version for Linux. While we had a complete implementation of the language, we still had bugs, and so had concentrated on finding and fixing them. We had observed that we got about one useful bug report for every thousand or so downloads, in May I decided to see if we could put out a version of Jikes for Linux. I spent the first week in June reading — and re-reading — the GPL and LGPL licenses, and wrote a memo to management about Jikes in Linux that resulted in the approval of releasing a binary version for Linux. (It can be found at Jikes Archives.)

Jikes for Linux was released in mid-July of 1998 and was received an ever more enthusiastic response that our first appearance on alphaWorks. We had been getting about two hundred downloads a day, and were getting over a thousand downloads a day within a few days after the release. The technical press reception was also positive.

Soon after the release of the binary version of Linux we started getting requests to release the source for Jikes. At that time we communicated with our users via an externally-visible Notes application, one that allowed users to post comments. Early in August, after getting several more requests for the Jikes source, we made a pitch to management, and got approval to release the source code.

Jikes was released as IBM’s first open-source project in early December of 1998. Again we set new records for downloads at alphaWorks. (I think we got over 3000 the first day.) There was also extensive press coverage. We also received our first meaningful patch to the program within twelve hours of its publication.

Since part of the argument for releasing Jikes was not just to make the code available, but to show that IBM could maintain and develop as an open-source project, Philippe and I ran Jikes as such a project starting that December. In August of 1999 RedHat decided to include Jikes in its distribution, thus making Jikes the first open-source project to have its code incorporated in a major Linux distribution. (Jikes is now found in all the major Linux distro’s.)

We continued work on Jikes until the end of 1999, when IBM management told us it was time to move onto other things. Fortunately we had been able to build a community capable of taking over the project, and there were four different leaders of the project until its last release in October, 2004. Jikes can now be found at Jikes Sourceforge. As of mid-February of 2009 Jikes has been downloaded over 165,000 times. Even though the project has been dormant for over four years, it still ranks among the top five percent of the projects at sourceforce, measured by the number of downloads.

IBM began serious involvement in open-source following a review by Lew Gerstner in February 1999 of a study produced by a team read by Robert LeBlanc. As a result IBM was a major presence at LinuxWorld at San Jose in March of 1999. I was there as part of the IBM team, to present Jikes. It was there for the first time I encountered real Jikes users, and they were all appreciative of the value of Jikes and of IBM’s opening up of the code.

We received several rewards for our work on Jikes. The citation for one (I think it was a Research Outstanding Technical Achievement Award) included a letter from Robert LeBlanc, in which he said that, “Jikes was featured in virtually every public presentation by IBM on open-source and Linux throughout 1999.”

Technical Accomplishments

Here are some thoughts on Philippe’s technical accomplishments in Jikes. Though I am a co-author, I am responsible for at most 20 percent or so of the code, primarily in the back-end and support utilities. I did the easy part. Philippe is resonsible for overall design and he wrote all the difficult code. He worked full-time on Jikes for close to four years. I consider it a heroic achievement, and one of the fondest memories of my career is that I had the good fortune to watch him at work during that time.

Jikes is a Java compiler. It reads source code and generates Java bytecode for use by a Java Virtual Machine. It is plug-compatible with Sun’s Java compiler, Javac. Jikes is written in C++ and I recall there were about 85,000 lines of code when we released the source code.

I think Jikes is notable in that it was built from scratch. Though most compilers accept plain text as input, Java supports Unicode, so that even just reading the source text requires doing so byte by byte. Jikes reads byte strings and produces byte strings in the form of Java bytecode. I recall only one bit code that we didn’t write ourselves, as we made use of some code from the Info-Zip project to read Java “jar” files.

Performance: Speed Kills

Philippe’s first priority was always to get it right — to accept only valid programs, and to reject only invalid programs.

He also felt that performance was important. When I asked him about this he said, “Dave, getting it right is my first goal. But I know that this alone won’t impress management. So I set out from the start to make it as fast as I can while maintaining the integrity of the design.”

When Jikes first appeared we found it up to fifty times faster than Javac on AIX, and from ten to twenty times faster on Windows/x86. Though Javac improved somewhat over time, we were always at least five times faster, and often more. My standard demo at tradeshows was to have two windows up, one showing Jikes compiling a suite of programs, the other showing Javac compiling the same suite. The difference in compilation rates was always immediately obvious.

One of my favorite Jikes memories is of one of the comments at Slashdot just after we posted the code. It went something like this:

I downloaded Jikes and tried it. The prompt came up so quick I thought it had done nothing. Then I listed my directory contents and I found all the class files had been created. Holy s—! Jikes is fast!

I used Jikes when I worked on the Stellation project in 2001-2002. I routinely found it could compile twenty thousand lines of code on a stock x86 workstation in a few seconds, with compilation rates of 300,000-400,000 lines per minute. It was so fast I never bothered with Make or other build tools — there was no need for them.

It was also striking to compare the Jikes compilation rates with those of gcc. Even without optimization gcc could take many minutes to compile the 80,000 or so lines of Jikes, while Jikes could have compiled that amount of Java code in just a few seconds.

I once did some comparison of Jikes with the Linux file copy program cp. I found that for larger programs Jikes approached cp in performance, in that the cost of compilation was within twenty percent or so of that of just copying the file.

Error detection, correction, and reporting

I have used many compilers in my decades of programming, but none has come close to the quality of the error messages produced by Jikes. One of the key contributions of Philippe’s doctoral work was on the automatic detection and correction of errors in LALR parsers, and he applied all that work — and more — in doing Jikes. In order to provide accurate messages, Philippe kept track of the source coordinates of every token in the program, including comments,
while still maintaining the high compilation rates.

Here is an example from the well-known “Hello world” program with the wrong name for a method:

Found 1 semantic error compiling "Hi.java":

     3.    System.out.printl("hello brave new world\n");
           ^------------------------------------------^
*** Semantic Error: No method named "printl" was found in type "java.io.PrintStream".
 However, there is an accessible method "println" whose name closely matches the name "printl".


See Jikes FAQ for additional examples.

We received many favorable comments on the quality of the error messages. I expect that many folks had never seen a compiler with such high-quality diagnostics.

Incremental Compilation

Since the Java specification does not allow either subsets or supersets of the language, there is no room for innovation in language features when writing a compiler. However, Jikes included a major innovation in its support for incremental compilation, by providing a form of incremental compilation that goes far beyond the simple form of recompilation of Java class files based on timestamps found in Javac. This was due to an interest in full incremental compilation in Java based on Philippe’s prior work in the Montana project, an ambitious attempt to build an IDE for C++ that supported incremental compilation.

Though Jikes was so fast that as best we can little this feature was little used, Philippe persisted in supporting it.
He said it was because it was a good stress test of the compiler, though I suspect he did because he knew how hard it was to do and was proud of his work.

No other compiler at that time attempted this level of incremental compilation. I understand that Eclipse now provides a form of incremental compilation. I don’t know whether this reflects any of Philippe’s work.

Inner classes

Sun’s addition of “inner classes” to the language in early 1998 changed the game of compiling Java. Until that time anyone with some knowledge of the Dragon book and a reasonable amount of skill could put together a Java compiler, or at least something that would have been acceptable to many users. Though Java had added a number of features not previously commonly found, such as verifying that variables had been assigned an initial value, there were a number
of Java compilers available in early 1998.

However, when we released a version of Jikes supporting inner classes in early 1999, we found that the men had been separated from the boys; more precisely, the only compilers still standing were Sun’s Javac, Symantec’s Java compiler (they then had an IDE for Java on the market), and … Jikes from Philippe.

The main problem in compiling Java is name resolution. When you see an instance of a name such as “I” does it stand for a local variable, or a method, or a field, or a class, or a subclass, and so forth? Until the addition of inner classes, name resolution for Java was a harder problemthan in compilers for many languages such as C, but it was tractable, in that much of the issues relate to classes, and Java classes were then closely tied to file names. However with inner classes, there were many more options, so that sorting out the possibilities was much more difficult.

As best as we could tell, Jikes’s implementation of inner classes was at least as good as, and perhaps better than, that found in Javac. It was hard to tell from the results of running Sun’s test suite, since it Sun added few new tests related to inner classes, and did not attempt to add the thousands of tests that would have been needed to take many of their pre-existing tests and created variants that explored the same test in the context of inner classes.

Parsing Technology

Jikes was built from scratch. Almost all of it was written by Philippe, with my contribution being limited to the bytecode generator and various utility support code around the core of the compiler. Jikes is thus one of only a very small number of compilers that can make that claim. This is because virtually every compiler written in the last couple of decades has made use of parser generator technology to produce the compiler’s parser.

Jikes also made use of a parser generator. It was called LPG at that time. Thing is, Philippe wrote it. It was based on his doctoral work at NYU, in which he made major advances in the automatic generation and compression of parser tables, automated error detection and correction, and so forth. LPG had at time (1998) been widely used in IBM for several year; for example, in the various product compilers such as xlF and xlC, and it was also used in DB2, the predecessor to UDB.

Because of this wide use, LPG was perceived as a valuable technology by IBM. This was of some concern to me, since I knew that we could only release Jikes as open-source if we could also get permission to release LPG as open-source, since one of the basic requirements that code be deemed “open-source” is that it not make use of tables or code generated by a program that is not itslf open-source. Fortunately management eventually came around to releasing LPG,
where it became known as JikesPG (Jikes Parser Generator).

The Eclipse team made use of LPG to write their own compiler. They also consulted with Philippe often during those early days on Java compilation issues.

LPG was able to compile the Java grammar as specified in the first version of the Java Language Specification (JLS) without changing a single character, a testament both to Guy Steele, who wrote the grammar, and Philippe. It’s worth noting that when the second version of the JLS came out there was a minor change to the grammar. Philippe investigated, and concluded that Sun had not changed the language, but had just made the change since their own compiler generator was incapable of recognizing the (correct) language grammar in the first version.

JikesPG is a great piece of technology. I consider it Philippe’s masterpiece. It is not as well known as Jikes. This is, I think, mainly because it wasn’t released in public until late in the 90’s, by which time bison and other parser generators, even though not as good, were widely used. Philippe’s modesty also worked against him here.

Microsoft’s Appreciation of Jikes

One day back in 2000 or 2001 I got a phone call asking I drive up to Somers to meet witn a senior attorney at Software Group for a conference call. I learned that Rational, then an IBM partner (this was before IBM acquired the company) had approached IBM on behalf of one of its own partners, who wanted to obtain a license to the Jikes source code. I learned that IBM had told them the code was freely available, but the mysterious partner wanted to license the code for Jikes before it came open-source, so they could adapt the code to their own needs without having to meet the terms of the Jikes license. (The Jikes License is viral in that changes to code under the license must be disclosed.)

We soon guessed that the mysterious partner was Microsoft, as the folks from Redmond had made recent announcements about supporting Java technology, and we could think of no one else who would have any interest in the code under these terms.

There followed a hilarious series of calls. The Rational folks refused to identify the partner by name, and we kept poking them, until they finally let slip the magic phrase, “Microsoft.”

We pointed out that if they took the code in the form when it was first released, then they could not make use of the many bug fixes that had been incorporated into the code. They would have to recreate well over a year’s work of contributions by us and our contributors. Nevertheless, they persisted in trying to acquire the rights, saying they were willing to fix the bugs on their own.

IBM’s last offer was to make the code available at no charge under a commercial license, except that the mysterious partner had to agree to pass Sun’s compatability test suite. We never heard them from again.

I considered this high praise indeed. Microsoft considered Philippe’s work of such quality that they were willing to include it in their products in preference to all the other Java compiler technology they could have acquired. Given that this was Microsoft, money was not an issue. They were looking for the best.

Ebay’s Appreciation of Jikes

One day back in 2003 or 2004 I got a call from an IBMer on the Ebay team. Ebay’s code base in Java had grown so large that is was taking over an hour to compile a new release of their system using Javac. They were aware of Jikes performance and had expressed an interest in trying it out.

Though I hadn’t worked on Jikes in a couple of years, I offered to take a look, and within a day or two found myself in possession of several million lines of Ebay’s Java code. I spent a little bit of time working on it, until I mentioned to a colleague that I had eBay’s code base on my trusty little laptop. He said, “Good luck, Dave. Just make sure you don’t lose that sucker, because if you do then your IBM career is toast.” I then erased the code and told the folks at Ebay that the project was beyond my skill level. I think they wound up throwing more and more hardware at the code.

I consider Ebay’s interest in Jikes a strong endorsement of Philippe’s accomplishment.

IBM Impact

It took the two of us three years of work before the code was released, and what turned out to be an additional year of work on Jikes as an open source project. (It was understood that IBM needed to do more that just publish the code; IBM had to also demonstrate it could to run a project under open-source rules.)

That works out to eight programmer years, with both programmers having Ph.D.’s, so the cost to IBM of creating the work that was given away at no cost was well over a million dollars.

What did IBM get in return?

  • Demonstration that IBM was committed to Java technology based on open standards;
  • Demonstration that IBM was willing to donate a substantial body of code, providing proof that IBM would be more than just a consumer of open-source code;
  • The good will of all those Java programmers, including many of them IBM employees, who didn’t have to wait endless hours for Javac to finish compilation;
  • Widely accepted, field-tested code that was incorporated in part into the Eclipse Java compiler (Note also that the Eclipse compiler relies on the Jikes Parser Generator).

All this was possible only because of what Philippe had accomplished.

Lessons Learned

The main lesson I learned from the Jikes experience is that it is difficult for a large research establishment to evaluate its own new technology. Reputations tend to become more important than an honest evaluation of the work itself.

I think one of IWB’s major contributions to IBM was his sponsorship of alphaWorks, which made new IBM software technologies available at no charge so IBM could learn if the world found the technology useful.

It was alphaWorks that made Jikes’s reputation. Users could try out the code and make their own judgment. Those users didn’t know of, or care about, PBC ratings and such. They just looked at the work itself.

This lesson was best summarized by R. P. Feynman in the concluding sentence of his Appendix to the Challenger Report:

For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.

There is another lesson to be drawn.

If it is so difficult to evaluate new technology, then is it no less difficult to evaluate the people who create that technology?

Advertisements

2 Comments

  1. laidoff IBMer
    Posted April 4, 2009 at 10:31 | Permalink | Reply

    Wow – even Fran Allen’s input couldn’t save someone. That is REALLY saying something extraordinary about the state of IBM Research.

    IBM is not the company it used to be. It’s all downhill from here.

    Sell your stock. And if you still work at IBM US, hone up that resume and get it out there.

    And write your senators and representatives to ensure that IBM doesn’t receive a PENNY of stimulus money after laying off over 9000 US workers after announcing a stellar 4th quarter of 2008

    • Posted May 1, 2009 at 12:20 | Permalink | Reply

      I was teaching our intro CS course in Java when jikes came out. We were using emacs as our “IDE” and when I switched settings so the compiler would be jikes instead of javac, the compiler ran so fast the students didn’t believe it did anything.

      I know I’ve seen studies that examine what happens to your brain when it has to wait for something more than a certain amount of time, and I can say that students stay “in the zone” when using jikes as their compiler. It’s so fast, it doesn’t give your mind time to stray.

      Dave, you and Philippe did an incredible job!

2 Trackbacks

  1. […] As proof of this, consider that Microsoft could have taken the Apache code at any time during the past decade, forked it, and created their own httpd server, yet they have never done so, since they know their programmers cannot match the skills of the Apache team. It’s also worth noting that Microsoft could not have forked Jikes in the same way, due to its viral nature, and so tried to acquire the rights to the Jikes code under a commerical license, as described near the end of my post A Programmer’s Programmer: Philippe Charles. […]

  2. […] Philippe Charles and I worked full-time on Jikes for almost four years. We had time to discuss other issues, especially while waiting as our chief competition, Sun’s javac compiler, slowly made its way through Sun’s large test suite. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

  • Pages

  • April 2009
    M T W T F S S
    « Mar   May »
     12345
    6789101112
    13141516171819
    20212223242526
    27282930  
  • RSS The Wayward Word Press

  • Recent Comments

    mrrdev on On being the maintainer, sole…
    daveshields on On being the maintainer, sole…
    James Murray on On being the maintainer, sole…
    russurquhart1 on SPITBOL for OSX is now av…
    dave porter on On being the maintainer, sole…
  • Archives

  • Blog Stats

  • Top Posts

  • Top Rated

  • Recent Posts

  • Archives

  • Top Rated

  • %d bloggers like this: