Compiler Design - Terror in the classroom
Paw Prints: Writings of the maddog
I was terrified! It was my second term of teaching at Hartford State Technical College, and I had to teach the course on compiler design!
Compiler Design was the course that I almost flunked in undergraduate study, and when I took it for my Master's Degree I almost flunked it again!
For a long time I did not understand why so many people struggle with this course, but finally I realized that it is typically the first time coders do some programming where you are not just adding numbers or scanning a string of characters, but instead you are transforming a set of input characters into another completely different set of output characters. Sometimes there are just a few input characters that generate a few output characters, but often there are reams of output for a little input, or reams of input characters, and little output. In any case, it is a transformation, and this seems to be what throws people off.
Of course the first time I took the course it was not “compiler design” or even “compiler theory”, it was “compiler BLACK MAGIC!”. I still remember sitting at the keypunch machine, with my boxes of cards holding my “compiler” for this tiny, stupid little language our professor had created. I think my program consisted of about 15000 “conditional IF” statements in a row (oh, I did not tell you that I was coding in FORTRAN...great character handling in FORTRAN...said very sarcastically). Why was I not coding in “C”? Very simply, “C” had not been invented yet.
I still remember the day that I was sitting at the keypunch and my roommate came in waving a few sheets of paper and said “Look, there is this concept of a state-table-driven-compiler!” After reading the paper I remember crying as I threw away 14000 of my “IF” statements.
Another reason why a lot of people struggled were the texts on compilers. I will not mention the title or the authors of the worst of these, as the authors might still be alive and I do not want to embarrass them. I remember that the two books that I had to buy on compilers cost about 100 dollars for the pair, and this was back in 1969. After I had finished my undergraduate work the two books were combined into one that cost about 50 dollars....but after combining the two books, neither of the authors could understand what the other had written. Nor could anyone else. It took a few more editions of the book until it was readable.
So there I was, setting out to teach the course that (as far as I was concerned) still a black hole and a struggle for me to comprehend.
Fortunately I found a good compiler book, “Compiler Design Theory” by Richard E. Lewis, Philip M., Rosenkrantz and Daniel J. Stearns, which immediately became “Lewis, Rosenkrantz and Stearns” in my jargon. It was a great book back in 1976 (the year before I started teaching) and even though an initial search of Amazon could not find the book, Amazon still contained two outstanding reviews
[Ed. Note: through a fluke of fate I found a BRAND NEW COPY on Amazon, and purchased it.]
What I really liked about the book was the pseudo-code listing of a compiler for a simple language. Turn the pseudo-code into real code, and your compiler probably worked. Great for first year students.
So I waded into the middle of compiler design, and I found that the way you really learn something is to force yourself to teach it. As a student you can skip over the parts you do not understand and hope that it will not be on the test. As a teacher you have to understand EVERYTHING well enough to explain it in a simple way to some student (who in turn hopes that information will not be on the test...).
I have taught compiler design five times. The last time was in 1986. About twenty years after that I challenged myself to give a simple and short explanation of how a compiler worked. I explained the workings to a group of students in fifteen minutes. Did I give them all the “bells and whistles” and in-depth knowledge? No, but they had a fairly good idea of what goes on inside a compiler. Even after 20 years I had not forgotten.
Time has moved on, however, and now I am looking into compiler issues again. My project with Linaro and ARM to make all of the GNU/Linux code “64-bit safe” is now morphing into a larger project of “make all the code run faster”.
I should explain that I am not looking at issues in the compilers themselves. I have the utmost faith in the engineers that write those compilers. What I am looking at is whether or not the software programmers who use those compilers are aware of all the new optimizations which have gone into the compilers over the years, and if they are taking advantage of them.
I have pulled out new documentation and seen compiler optimization switches and issues that did not even exist in 1986, when I last looked at them, and this is true of any compiler, not just the GNU ones.
Some of these new techniques are things that have been researched since 1986. Some of these techniques (such as out-of-order instruction scheduling) have come about due to new features in the execution of the hardware.
Therefore over several future blog entries I will be looking at optimization issues. I will probably start out with optimizations that are more or less independent of the hardware architecture (such as moving fixed assignment statements outside of a loop) to things that are very hardware oriented.
Along the way I will interject some other blog entries based on design and algorithms so non-developers will not become bored.
Finally, it would not be a “maddog blog” if I did not include a programming story from my past, only this one is not from my past, but the past of a friend of mine, Mike Gancarz (Hi Mike!).
It was many years ago and Mike was a young engineer at Digital Equipment Corporation. Mike was chartered to come up with one of the first window managers for the upcoming commercial release of the X Window System on DEC's Ultrix Operating System. Mike's project was calle “uwm” for U*x Window Manager.
Mike was trying to figure out the best way of reading the start-up file where the user could specify the attributes of the window system and their initial values, and as the window manager started up, these attributes would be read from the file and the values applied. Mike thought about writing a parser in “C” code, but instead used two Unix tools “lex(1)” and “yacc(1)” to do the lexical analysis of the data and the syntactical analysis. From there, instead of generating binary code, Mike would simply plug in the values into the X Window Manager environment and he was “done” with that stage. The code produced by lex(1) and yacc(1) would actually scan the file and produce the desired result.
Mike was not sure this would work, since in his mind lex(1) and yacc(1) would not generate as efficient code as a hand-crafted “C” program, for instance, but he felt that Lex and Yacc could generate fairly good code and then he could go in and “clean it up”.
What Mike found out was that lex(1) and yacc(1) generated very efficient code, and code that was easy to update and change as the start-up “language” changed, unlike hand-written “C” code that might be more problematic to change.
Mike never did change the code to a “hand-crafted” solution. lex(1) and yacc(1) were fine. These days lex(1) and yacc(1) have been replaced by flex(1) and bison(1).
If you would like to investigate this interesting field, there are lots of texts on the Internet, and not the least is the FREE (as in free beer) text called “Compiler Construction using Flex and Bison”comments powered by Disqus
Linux Foundation's big event celebrates the 25th anniversary of Linux
Competitors get in the game with RHEL without Red Hat
Security researchers have already notified Microsoft; some fixes are available
The company is collaborating with Google and Intel to use Kubernetes as an engine for Fuel
Customers can take a free test drive of SLES for HPC on the Azure Cloud
San Francisco-based chip company announces their first fully open source chip platform.
The whole distro gets rebuilt on glibc 2.3
Ubuntu Vendor tries to solve app packaging and distribution problem across distributions.