Father Of Decompilation

Program-Transformation.Org: The Program Transformation Wiki

The Father of Decompilation

maury.jpg maurysig.jpg
Permission Requested. IEEE Transactions of Software Engineering. Figure courtesy of Bill Caudle.

This page is dedicated to the memory of Professor Maurice Halstead, the father of decompilation. The page contains information of decompilation projects directed by Professor Halstead during the 1960-1976 period. Thanks to Bill Caudle for all the information and anecdotes shared. Bill worked on decompilation projects directed by Professor Halstead between 1965 and 1976. Throughout this page, unless otherwise noted, all text in quotes are Bill's words. sort
Professor Halstead is best known for his work in Software Science, an experimental and theoretical discipline concerned with measuring properties of computer programs. In the 1960s he wrote one of the first books on compiler implementation.

Maurice Halstead, referred to fondly as Maury, received a BA degree in physics-meteorology from the University of California, Berkeley in 1940, an MS degree in aerological engineering from the Naval Academy Postgraduate School, Annapolis, MD in 1943, and a PhD degree in physical science from Johns Hopkins University, Baltimore, MD in 1951. His resume describes his professional career.

"Maury was a true experimental scientist, pioneer and visionary in the field of computer software. At a time when software developers were still arguing about the viability of higher level languages, Maury committed part of the resources of a Navy development project to create the first self-compiling compiler (Neliac) which was then used to complete the project on time and within budget. At another time, with a team of 30 people working on the Global Weather conversion project apparently in schedule trouble, he took the gamble to cut the project team staff in half (rather than increasing the staff). The project was completed on time due to releasing the productive members' time to productive work rather than assisting junior members with their problems.

"Maury's background in physics, meteorology and the physical sciences provided him with insights and interests that ultimately led to the development of software metrics which he initially called 'Software Thermodynamics.'

"As a boss, Maury could be the epitome of a father figure: he was supportive, interested, and always made a person feel like the ideas that he fed to them were their own (at least this was the case for me). He could also be an aggressive politician and created enemies as well as friends as a player in the development of computing capabilities of the U.S. Navy."

The following sections describe three decompilation projects for a variety of machines. The aim of these decompiler projects was to aid in the migration of software to newer/successor machines, hence eliminating part of the time consuming task of rewriting code for the new machines. "A primary requirement was to be able to decompile from binary (or more specifically using binary card images as input)."

The D-Neliac Decompiler (1960, U. S. Navy Electronic Labs)

The Neliac language was developed at the U. S. Navy Electronic Labs (NEL) in 1958 to support a Navy systems project. The compiler was developed to reduce the increasing costs of these systems which previously used assembler/machine language. Prior to the first decompiler, the first Neliac compiler was written in assembler by Maury and Lieutenant John White, and perhaps others. The first self-compiling Neliac compiler was written by Ensign A. E. Lemay. Neliac was a dialect of Algol 58. Neliac supported low-level code as well as high-level language constructs. These first compilers for the language were developed in 1958-9 under the direction of Maury Halstead.

In 1960, Maury directed a project on decompilation mainly to show the usefulness of the Neliac language as a Universal Computer Oriented Language, as well as a problem-oriented language. Joel Donnelly and Herman Englander implemented the D-Neliac decompiler for the Univac M-460 Countess Computer while on Maury's staff. "Each time Joel would bring an 'impossibility' to Maury, they would sit down and figure a way around it." As Herm noted, "the difficult we do immediately -- the impossible takes a little longer." The D-Neliac decompiler was an operational decompiler that decompiled Univac M-460 binary code into Neliac source code. There was also a CDC 1604 version of D-Neliac.

In the summer of 1962, Bill joined NEL and worked for Maury developing an earth atmosphere simulation program. After returning to NMSU for one more year of graduate studies, Bill returned to NEL in 1963.

The IBM 7094 to Univac 1108 Neliac Decompiler (1963-7, Lockheed Missiles and Space)

With Maury's move to Lockheed in the fall of 1963, and with the introduction of newer and faster machines, the IBM 360 and the Univac 1108, Lockheed planned to move its scientific applications to the Univac 1108 and its business applications to the IBM 360. The conversion to 1108 was done via decompilation to Neliac of assembler programs and LIFT/SIFT of Fortran programs. The conversion to IBM 360 was by manual conversion and 360 emulation of IBM 7094/1401/1410 programs.

"The first version of the IBM 7094 to Neliac Decompiler was from IBM 7094 binary card images to IBM 7094 Neliac. Then, after the Univac 1107-8 compiler was bootstrapped from the IBM 7094 version, the decompiler was modified to translate to Univac 1108 Neliac. Since the resulting programs were running on a different system and instruction set, we did not have the luxury of inserting crutch code (i.e., machine code), even though Neliac allowed it. What we did do, however, was to modify the Neliac compiler, which we then called the Neliac "recompiler", to compile in ways to assist the correct operation of the decompiled code. One of these was the order of evaluation of arithmetic. That was one of the first tasks I accomplished after enhancing the Neliac compiler to include compilation of algebraic expressions and exponentiation. The original compiler allowed only very simple arithmetic. The recompiler used this simpler, and predictable, order of evaluation. This allowed such idioms as the conversion of integer to floating point to work as they had on the IBM 7094 (where the conversion was performed by a load of the integer, ORing the result with a floating point exponent, followed by a floating point add of zero). Of course, maintaining the order of arithmetic is sometimes important, and that was the primary emphasis of the recompiler."

Original team members of the advanced software development group working for Maury on the Neliac compiler and decompiler projects included: Bill Caudle, Lambert Lui and Gordon Pelton on the decompiler; Bob Stelloh and Alan Howard on the compiler. Al Fuhrman developed the Neliac compressor/reformatter. Others were involved in the later stages of the group.

The Lockheed decompiler was used in several demonstrations that included Phillips Petroleum, Ontario Hydro and the US Air Force. "One of the more humorous decompilations I ever did was the decompilation of a Monte Carlo application that included a random number generator. The converted random number generator was required to produce the same sequence of random numbers, and the Monte Carlo application was required to generate exactly the same results on the same inputs."

The Lockheed decompiler was used on a sizable project with Lockheed as the subcontractor to Univac. This contract was won after Bill put on a demonstration to the Air Force in Washington DC. Lambert Lui supported Bill in this effort, and Maury attended the actual demo (to assure our success, ashtrays were placed in the computer room for the Air Force Colonel. The team was warned not to ask the Colonel to extinguish his cigar.) The other vendors in the competition were IBM (using strictly conversion) and CDC (using code simulation).

Starting in the fall of 1967 and completed in about May 1968, the US Air Force Global Weather Center's weather forecasting programs (125 programs written in assembler and FORTRAN) were converted to run on the Univac 1108 at the Offutt AFB in Omaha. Bill recalls having encountered a bug in one of the decompiled programs -- the largest assembler program in the set. This program was 24K lines of code and 6K lines of data (corresponding approximately to a 180K byte program). Using a set of input data picked by the Air Force as part of an acceptance test, the program generated by the decompiler would crash. After exhaustive manual checking of the decompiled code, a parallel run of the original program on the IBM 7094 was requested. The result? The parallel run showed that the original program running against the same test data would crash too -- the bug was in the original program, and was faithfully reproduced in the decompiled program.

"We were very successful in decompiling programs that had resulted from FORTRAN programs. These would run with little manual change. The IBM 7094 to Neliac Decompiler recognized FORTRAN calling sequences and replaced them with a more compact version. For pure assembler programs, the decompiler was estimated to remove 80-90% of the effort over manual conversion. At that time, we estimated the cost of conversion to be less than U$1.00/instruction decompiled. This was substantially less than the estimate at that time for manual conversion of assembler programs."

Attempts at Marketing Decompilation Services (1969-72)

Bill and Maury were part of a company in Lafayette, IN, formed in 1968-69 called Teknatronic Applications, Inc. (TAI). Company members included three full-time staff. A group of professors (Maury Halstead, Jay Nunamaker, Samuel Conte and Andrew Winston all of Purdue University, Daniel Teichroew at University of Michigan and Glenn Graves of UCLA) were board members and acted as TAI consultants. As one of the business ventures at TAI, Maury had purchased a license from Lockheed to market the IBM 7094 to Neliac decompiler as a conversion service. Marketing was attempted through the Univac Information Systems Division (ISD) in Chicago. A demo at the Chicago ISD was made. (As an amusing aside, someone came in with a photocopy of 4 cards of binary code and asked Bill to decompile this code. The decompiled Neliac code showed that the cards were those of a simple numeric routine, whereas they had been thought to be a complicated system interface.) This brief attempt to market the decompilation service was a failure in that it produced no business for TAI.

When the primary investors in TAI failed to produce sustaining capital for the TAI, it was abandoned. In 1970, the company staff and university associates established another company called Combinatorics (Also in Lafayette, IN). In 1973, Combinatorics was bought by Mathematica, Inc., of Princeton, NJ.

Throughout this period, Maury supervised a PhD thesis on decompilation at Purdue University, that of Barry Housel (awarded 1973). At the same time, this was the period in which Maury began his formal investigation of the complexity of programs, resulting in what are now known as the "Halstead Metrics".

The Sperry*Univac 494 to 1100 Inverse Compiler (1972-6)

The success of the Global Weather project with Univac in 1968 resulted in a contract in 1972 between Combinatorics and Univac. Maury and Bill were contracted to lead the effort for the Univac 494 to Series 1100 decompiler, which Univac preferred to call "Inverse Compiler". Maury was a consultant to Bill on a one day/month basis. Bill was the technical leader of the project for the two years of the contract with Univac. During this contract, a feasibility study, functional specification, design and initial implementation of the decompiler were performed.

The Sperry*Univac Inverse Compiler was made up of the following modules: input processing, basic block generation, local data usage analysis, control and data flow analysis, data mapping, reporting, intermediate form preparation, level transformations, and target language production. The model for decompilation is distinguished from prior models by the extensive use of analysis techniques in an attempt to derive data structures. The latter was in recognition of the fact that data conversion is of primary importance. Additional details on the analysis techniques are given in the previously unpublished paper On the Inverse of Compiling by Bill Caudle, 1980.

The minimum input to the inverse compiler was the binary code in the absolute or relocatable load modules. Additional input in the form of the associated assembly and map listings for the program (produced by the ASM or SPURT assemblers) were optional; these listings assisted in the conversion process. The reporting module flagged missing libraries and parts of the program that would be hard to decompile. Using the input processor, the user could provide additional information to completely define the instruction blocks, data blocks, and subroutine definitions. Any information that the decompiler obtained by analysis could be modified or enhanced by the conversion specialist. However, no changes were allowed to be made to the binary programs, which remained unmodified. The enhanced input files were then input to the decompiler which produced COBOL code. The optimizer was optionally used to improve the readability of the generated COBOL code. The 1100 series sales memo from 1976 announces the 494 to 1100 inverse compiler.

The following figures were reported in internal documents:

  • Figure of merit: 67 to 87%. The figure of merit is the removed cost of resources required to successfully convert a program with the decompiler in ratio with the cost to convert the program manually.
  • Code expansion: 1.79 (inhouse evaluation: 1.6).
  • Translation cost: $0.94 per line of assembler.
  • Decompiler cost: the cost of a decompiler with a high figure of merit was similar to the cost of a modern high-level language compiler.
  • Computer cost: the above per line costs included computer cost at approximately U$160/hour.

Subsequent to the 2 year contract, Bill went to work with Sperry*Univac in Roseville, MN, and continued on the decompilation project for another 2 years. About 10 people worked on the decompilation project in Roseville, with as many at times in Salt Lake City, UT. Work on decompilation in Roseville was in the area of loop analysis, interval analysis, global data analysis, and dictionary techniques. The Salt Lake City group provided input processing, conversion to internal forms, level transformations, and conversion to target language. Effectively, this separated the project into that related to decompiler techniques and that more directly related to compiler techniques.

In 1975, there was a version of the decompiler that generated PLUS (a PL/1-like language for systems programming) code. Univac did a benchmark test on decompilation in Zurich using the PLUS version. This was the first conversion done outside their own environment. Bill led the team in Zurich that translated 8 programs of 5600 lines of assembly code. These programs were picked from the most difficult type of programs to be converted and resulted in many improvements in the process. Part of the results of this field test were the recommendation to use COBOL as the target language.

As a result of this field effort, in late 1975/76 a version of the inverse compiler was developed that generated COBOL code. Both PLUS and COBOL target languages were generated from the same internal representation used in the decompiler.

Copyright 1998 Cristina Cifuentes and Bill Caudle, All Rights Reserved.
CategoryDecompilation Wiki:CategoryHistory Wiki:CategoryPeople