C++ Considered Harmful


Every now and then somebody takes exception to my very negative remarks concerning the popular programming language C and its derivatives, claiming that it's really nothing more than a matter of personal preference. This essay offers objective criteria in support of my position. If after reading this you still disagree with me, please take the time to tell me the objective basis for your position.

Some 25 years ago I made my income -- and a good income at that -- writing embedded systems for the nascent security market. At this point in time I was finishing up a fairly large program, involving a virtual machine and multi-tasking operating system that I designed for the product, and some critical timing loops where the software was directly generating tones with FCC-specified amplitudes and frequency components. Think of all those whistles and buzzes you hear from a modem, only completely created in software on the 2MHz main CPU without much support hardware, interleaved with numerous other processes, all in one giant program. That kind of stuff is my stock in trade, and I do it well. Last month I walked into a business that is still using the code I wrote back then.

This program was the largest I had written to that time, all in assembly language enhanced with macros (think: just like C), and it took a lot of time to debug. After I got it working to my satisfaction ("no known bugs"), I turned it over to the client to exercise with real-world data. Figuring they would probably find some things that needed fixing, and also looking forward to product extensions, I set about building a "lint" tool (enhanced macros) to warn me at compile time when I committed some of the more common coding errors in this program, then ran my debugged code through it to test the tool on known good code. There were the usual mistakes easily fixed, but there were some 15 lines in my clean code that it kept reporting as errors. I struggled for some time to find what was wrong with the lint code before looking at what it was reporting on -- and they really were problems in the code I simply had not encountered.

C is essentially a high-level assembler for the PDP-11, slightly more strongly typed than pure assembly language, but not much. Anything you can do in assembler, you can pretty much do in C without the compiler complaining too much. At the other end of the spectrum, Pascal (and its derivatives, Modula-2 and Ada) is very strongly typed. You can still write pretty much any program in Pascal that you can write in C, but you must pay a lot more attention to data types, because it won't let you do address (pointer) arithmetic nor add characters nor implicitly discard function results, nor do bit manipulation on numbers. Modern programmers are beginning to recognize that strong data types are your friends, and C++ is (ever so slightly) more strongly typed than pure C, and Java substantially more so. Lint adds some of that type-checking to pure C, but in a haphazard way.

My insight 25 years ago was a profound paradigm shift. Like all programmers, I like the freedom to code what I want to do the way I want to do it. No wimpy C for me, I wrote macro-assembly. And paid for it dearly.

The next program I wrote for that company came after I did my PhD dissertation on compiler optimization, so I wrote a Modula-2 (M2) compiler for the 1-chip microprocessor we were using. Knowing how long such programs take, they scheduled their test team for one year after I started coding, which was realistic. I finished in less than three months, because I was using a strongly typed Modula-2 instead of untyped assembler macros. That included debugging the compiler.

A few years later I went on to write another compiler in a different language. One day I was feeling particularly frustrated at the slow progress, wondering why it went so much slower than coding in M2 -- then I suddenly realized it was because I wasn't coding in M2, I was using a weakly typed language something like C (HyperTalk doesn't look anything like C, but this implementation was weakly typed the way C is). I made better progress in VisualBasic (surprise! It's more strongly typed than C), and an astounding 100+ lines per day during the past year while coding mostly in a very strongly-typed Turkish Demitasse.

It's not only my own experience that supports this observation. About a decade ago, when M2 was still a commercially viable programming language, one of the papers presented at an international Modula-2 conference reported on a study of programming bug frequency. I wish I had kept my copy of the proceedings, but the major finding of this research showed that C programmers made on the average six times more errors than M2 programmers, when all participants were using their own preferred language. They gave a variety of typical programming tasks to each participant, then analyzed the coding bugs introduced in the process. A significant number of the C errors are simply not possible in M2 and many more were caught by the M2 compiler but not the C compiler, but many of the others simply occurred much more frequently in C code compared to M2. When you consider that debugging is typically 70% or more of the development time, this makes C programming take anywhere from two to four times longer in development time than the same job in a better language.

Another research project about the same time, I believe at the University of Tasmania, tracked the emerging Modula-2 standard with a compiler, which they used for testing the language ideas being proposed to the standards committee. The implementors reported the curious observation that code written in M2 compiled smaller and ran faster than comparable code originally written in C, despite that the M2 compiler merely translated M2 source code into C source code and used the same C compiler as those students writing directly in C. Here is a clear case where the impact of the language on the programmer is the only significant factor explaning the difference; normally source-to-source compilers introduce (rather than remove) inefficiencies. The researcher explained the discrepancy by the fact that C is a low-level language which encourages bad programming habits in its users. The C compiler of course doesn't care, but it can produce better code when these low-level "optimizations" are not used, a fact well-known by those of us familiar with code optimization in compilers.

Over the years I have watched very carefully all comparisons of the C programming language to anything else. Especially when there was significant competition among languages, comparisons were published from time to time in the magazines programmers like to read, such as Dr.Dobbs Journal and Byte. Without exception, in every comparison over every metric (except programmer popularity), C fared worse than the competition. Usually this was accompanied with some lame but hopeful explanation by the author who obviously preferred C, about how they expected C to show better as compilers improved -- and then they quit publishing comparisons. I saw a similar poor showing for Apple's version of unix (OSX) compared to the classic MacOS (the published comparisons stopped with unix still losing the contest), but that's another story; I suspect it may be partly because unix systems are written in C, while the original MacOS was written in Pascal.

Why is C so bad? I have a theory that it's the lack of strong data types. You can do just about anything you want in C, without stopping to think if it makes sense. You can do the same things in a strongly typed language, but only after thinking through how to assign types to the data. The longer you think about how to solve the programming problem, the better its chance of running with fewer bugs. For a while I suspected that was what gave Object-Oriented programming (OOPS) its advantage over pure C: it's so hard to force some programming jobs into the OOPS model (it can be done, but with great difficulty) that it requires a lot more thinking up-front to achieve it. It is this extra thinking, not the OOPS model, that confers the advantage.

Thinking is hard work. Nobody -- not even programmers -- likes to do it if there's an easier way. C is simply easier. You can write code faster. And then take very much longer finding the bugs. Or failing to find bugs the compiler of a better language would catch for you.

In the early days of the Macintosh most of the programs were written in Pascal, because that was all Apple made available. Programmers prefer C, so third-party C compilers soon became available. When I got a new program, it wasn't more than a few hours of using it that I could tell you whether it was written in C or Pascal -- just from the frequency and types of bugs, bugs that C encourages and Pascal inhibits, bugs like buffer overruns which today are responsible for all the security holes in Windows and Linux. Because the subroutine calling protocols were different for the two languages, I could verify my guess by looking at some disassembled code (I was wrong only once, that I could tell).

I used to tell people "I think C is a wonderful language, and I hope all my competitors make full use of it." I don't say that any more, because I (and the people I need to explain computers to) need to use those C programs. sigh

What about C++? Is it any better? Not really, because it still has everything that makes C a bad language. If you restrict your code to pure OOPS, the compiler will give you a little bit of help, but not much -- and it won't tell you if you drop back to any of the bad habits in C, because C++ is a proper extension of C. You are on much firmer ground writing in Java, because that language has eliminated most (but not all) of the really dangerous language features in C. Unfortunately, Java is a proprietary product of Sun Microsystems, and they keep a firm grip on preventing it from becoming a usable system programming language that might have a chance of replacing C. Oh well.

Is there any hope for redemption? Is there any hope that America will give up its love affair with gas-guzzling large cars and SUVs? Not much chance -- both for the same reason: individual autonomy. When gas price reaches $10/gal (it will, there is a limited supply and it will be gone in a few short decades), then SUVs will get 20mpg instead of 10, but people will still buy them (and maybe scrimp on bigger-screen TVs). When software development costs become astronomical, companies will just send their programming jobs to India (where programmer time is cheaper) or Russia (where they know better than to use C). Oh wait, that's already happening. Do you want your programming job back? Learn a more productive language than C, so you can compete in the global economy.
 

Tom Pittman, 2005