C++ for Java Programmers

by Tom Pittman

"C is the grade we give students who have not earned an A or B; C++ is a little better, but still not good enough for a B."

The C++ language is aptly named.

As Java programmers you are skilled in the best, most robust and productive programming language commercially available today. It unfortunately is both incomplete (you cannot write an operating system in Java) and slow in the sense that there are no certified Java compilers that generate efficient native code. We hope eventually to develop Turkish Demitasse, a dialect of Java that solves those two problems, but until that happens, C++ is the language you must use for "real" programs. C is like a surgeon's scalpel, very sharp, and suitable for doing brain surgery or slashing your wrists. Unfortunately, there is no safe handle to hold it by, so most people cut themselves up pretty badly. Java is C++ with most of the problems fixed, but they fixed too much. It's more like a butter knife, hard to cut yourself with it, but you also cannot cut your steak. In this course we will try to help you learn how to hold the rasor blade that is C so that you can keep the self-inflicted wounds down to a few nicks and scratches.

The important thing is, Try to think in Java. You will write better C++ with fewer problems and bugs. If you use all the "cool" gadgets that C++ offers you, your programs will become exceedingly difficult to read and maintain. Don't do that. Small student programs will work fine, but the big stuff you get paid to write out there in the Real World will quickly get out of hand.

In the not-too-distant past there were a number of programming languages better in every way than C/C++ (more robust in the sense of more strongly typed, and also generated faster and more efficient code, as well as being easier to learn, plus fewer programming errors when writing in it), but those languages have all died; C/C++ and Java are the sole survivors in the commercial world today. Gresham's Law ("Bad money drives out good") strikes again. sigh

C++ is a very big language, and there is no way that you will master all of it. I have looked at a lot of books and references, and all of them have substantive errors. 14 years after the standard was published, the C++ compilers themselves are still full of bugs -- probably because they are implemented in C++ (it usually gives a compiler a good workout and reduces bugs when it's implemented in its own language; that is probably true of C++ too). If you limit yourself to a Java-like subset of C++, you have a much better chance of avoiding the pitfalls and the compiler bugs. Plus, your programs are less likely to break when they update the compilers, replacing old bugs with new and different bugs.

This document was written before 64-bit processors became popular, so everywhere I refer to pointers as "32 bits" or "4 bytes" you should think "64 bits" and "8 bytes". Execution speed is about the same, but 64-bit pointers take up twice as much space as 32-bit, which makes your data (and some code) bigger and possibly slower due to larger cache requirements (more likely to fault and take extra time to reload, but not much).

Topics:

What Is the Same
What Is Subtly Different
Bigger Differences
Radical changes
   Preprocessor
   Pointers and Arrays
   Operator Overloading
   Templates and STL
Other Resources

What Is the Same

All the normal statement types you use in Java work the same in C/C++.

Assignment in C/C++ looks and works the same as assignment in Java. Like Java, assignment statements have a value, so you can do multiple assignments in one statement the same in both languages, although I do not recommend it in either language. With two exceptions, all the expression operators you know in Java work the same in C/C++. The two exceptions are the two Java operators longer than two characters (>>> and instanceof), which have been added to Java; they do not exist in C/C++.

The program control structures in C/C++ and Java are the same. Simple statements are terminated by semicolons. Curly braces around a sequence of statements makes it count as a single statement. C requires all variable declarations to precede executable statements, but C++ is like Java in allowing them to be mixed. Your programs will be more readable if you collect all variable declarations at the front of their methods.

Selection (if, switch) and loops are the same in C/C++ and Java. Java preserved all the strangeness of the C for loop, so you can use multiple control variables and/or leave out any or all of the three sections, and it has the same default behavior as in Java. Try not to do that, it makes the compiler work harder to generate efficient code (or else it just gives up and generates inefficient code) and (more importantly) it makes your program much harder to read.

Method declarations and calls inside of classes look and work pretty much the same. C++ has some fancy ways to pass parameters (see below), but the Java standard pass-by-value is still the default. Polymorphism by method overloading (using the same method name with a different parameter type signature) works the same in C++ as in Java.

The Java type-casting syntax (new type name in parentheses in front of the expression whose type is being changed) was taken straight from C and still works the same. True casting (reinterpreting the same bits as a different, but compatible type) still works the same when casting a subclass expression as an ancestor class and for converting an integer to a different storage width, and pretty much the same going the other way. Type conversion (where the compiler adds implicit code to move bits around) such as between integers and floating point, also works the same.

Exceptions work pretty much the same, except that C++ is slightly more general in what kinds of types you can throw, and unlike Java, you do not declare which exceptions a method throws. That extra generality is not useful, and you can safely ignore it.

At the statement level, if you write code that would compile and run correctly in Java, it will pretty much do what you expect in C++.

What Is Subtly Different

C is weakly typed. C++ is slightly stronger, but not as strong as Java. Strong types are your friend, they enable the compiler to catch coding errors that might otherwise lurk unseen until the customer tries to do something you didn't think of during testing. That both makes you look foolish, and is very expensive to fix.

Java has a proper boolean data type with two values, true and false, which are incompatible with any other data type. C++ pretends to have a bool (note the spelling) data type, but it is really only another name for int, and true/false are just other names for 1 and 0. In C/C++ 0 is understood to mean false, and any non-zero numeric value is taken as true. You should pretend and code as if bool is a real data type, and your programs will be more readable and easier to debug, but the language and compilers offer you no protection at all from foolish mistakes. You will see a lot of commercial C code that checks for numeric or pointer nonzero implicitly without any inequality operator (as in if(val)...); I do not recommend the practice. Similarly, using the assignment operator (=) instead of the equality operator (==) inside an if expression is legal in both Java and C, but Java's strong boolean type will usually catch the error, while many C programmers consider it a laudable programming trick. Don't go there. Modern compilers offer no advantage to such tricks; good compilers never did.

Java and C/C++ both have several different sizes of integer; in Java they are carefully defined to be 8, 16, 32, and 64 bits in size, but in C/C++ their sizes are merely relative: short int is usually smaller than int, which is probably smaller than long int. Yes, you can't be sure of that, except that short int is always smaller than long int. How big those are depends on the implementor and the hardware it runs on. Usually they are the same size you are used to in Java. Characters (char) in Java are not quite a separate type, but compatible in every way with 16-bit integers; in C/C++ char is exactly an 8-bit integer (there is no byte, but you can make your own using #define), and the character constants are (like in Java) just numbers assignable to any integer variable, and on which you can do any arithmetic operations (their value is the ASCII character code, as in Java).

C++ compilers attempt to resolve overloaded methods substantially the same way Java does, but the implicit type conversions available in C++ can lead to more opportunities for ambiguity, which the compilers will rightly reject as errors. A program that depends on a lot of method overloading is probably badly designed and will be difficult to debug and maintain.

In Java every class member is individually declared public or private, while in C++ there are public: and private: sections that apply to all declarations in that section. If you write it the way you would in Java, with a modifier on each declaration line, then insert a colon on each visibility word, it works in C++. Think of it as just a spelling change.

Subclasses may override parent class member methods in both C++ and Java, but in C++ you have to say you are doing that with the keyword virtual. Otherwise the C++ compilers will assume that all methods are staticly bound, which is slightly faster for method calls. Overriding subclassed methods is usually considered a virtue in OOPS; it's unfortunate that C++ makes it dependent on the presence of a keyword, the accidental omission of which can lead to subtle and hard-to-find bugs. I do not know of any good way to protect yourself from such errors except to deprecate overriding in general. Its benefits are somewhat overrated, and there are better and more obvious ways to get that kind of flexibility when you need it.

A class in C++ is about the same as a class in Java, except that Java classes are always dynamic (requiring the keyword new to allocate memory for their objects), while C++ classes never are, unless you explicitly use pointer syntax in the declaration (see below). Before OOPS, you could make class-like structures in C using the keyword struct, except you could not declare methods inside a struct, and it requires a semicolon after the closing brace. C++ extends the struct to allow methods and to specify other visibility besides the default public, and then added the class keyword to mean exactly the same thing as struct, except the default visibility is private. Apart from the default visibility, a class and a struct in C++ are exactly the same. A union in C is the same as a struct, except that the member fields all occupy the same memory space. This is used for specialized (and dangerous) purposes like redefining the bits of an intrinsic data type, such as to pick out the exponent and significand of a floating-point variable, or packing multiple characters into a single integer, and sometimes just to save space. C also permits you to define structs and unions that have no type names, only variables declared between the closing right brace and the following (required) semicolon. This is not valid Java syntax, and you do not need to use it.

There are some places where C++ uses a "scope resolution operator" (::) where Java would just use a dot (.), but these mostly occur in connection with the fact that C++ attaches slightly different semantics to methods defined outside the class body where they are declared. Method bodies inside the class body are typically understood by the compiler to be inline, while methods with only a header inside the class body need the scope resolution operator where their code is elaborated, to indicate which class they belong to; these out-of-body experiences tend to be compiled as subroutines. Of course a smart compiler can ignore this advice from the programmer and do something better, but most C++ compilers are lucky to run at all, let alone be intelligent about the code they generate.

In Java programs you must designate one class to be the main program, and it must have a main method with a particular signature. C/C++ also starts your program in main, but it must not be inside a class, and it should return an integer value (some, but not all, compilers actually enforce this), which is 0 if your program terminated correctly, and an error code number otherwise -- a vestige from its unix heritage, where that error code is used consistently to stop a shell script.

Bigger Differences

C is not object-oriented and has no classes nor objects. C++ is (very nearly) a proper superset of C, so while it has classes and objects and inheritance and (upon request) dynamic binding, you can also write functions and declare variables that are not in any class, just like C, and it will compile and run. In fact "functions" and "methods" are just two different names for the same thing, except that methods are inside classes and functions are not. I think it's an artificial distinction invented by object-oriented proponents to make OOPS look more different from its predecessors than it really is. Functions are exactly the same as Java static methods, in that you don't call them with an object reference, and this has no meaning in them.

All Java classes (except Object) are subclasses of some other superclass; if none is specified with the keyword extends, then the superclass is by default Object. In C++, if no superclass is specified (using just a colon where Java would use the keyword extends), then there is no superclass. To access an overridden method from a superclass in Java you use the keyword super; in C++ you use the class name with the scope resolution operator (::). These can mostly be thought of as different spelling.

In Java every class must be in its own file, and system library classes are imported; the declaration order within each file is somewhat more relaxed than C. C++ (and to a lesser degree, C, because it has defaults for undeclared functions) requires every name to be declared before it is used; by convention this is done by means of header files, which are incorporated (using #include) into every compilation file that needs those declarations. Library functions are accessed by including their headers. Strictly speaking, you can put anything you want in the header files, or omit them entirely, just so long as the declarations occur before use and there are no conflicts; however, if you flout convention, people will snear at your code.

Variable declarations can occur (in C++) anywhere in your program, so long as they are before their first use, as in Java. However, C++ variables declared inside curly braces are visible only within those braces, while Java scopes them to the enclosing method or class. Don't depend on this smaller scope, it will make your program hard to read and debug. Put your variables at the front of the method, where anybody can find them.

Methods (and functions) should (and often must) also be declared before used. If the method declaration is fully defined within the class body (as in Java), you are done; otherwise you need a "prototype" for the method in the class body, and the rest of it later, outside. Functions (not in any class) are required to have prototypes before they are fully defined with code. A prototype is everything in the function specification up to but not including the first brace, followed by a semicolon. In the following example, the first line is a prototype (declaration), the second line is the definition:

int MyMax(int a, int b);
int MyMax(int a, int b) {if (a>b) return a; else return b;}

Although in this example the prototype is right before the definition (which is perfectly legal C/C++), normally you would put the prototype in the header file, and the definition in the main file. Then all the other program files that need to call this function can #include the same header file and have the declaration available. The linker will subsequently connect up all the calls in all the code files to the definition code in this file. If the program file you are putting this in is "MyProg.cpp" then its corresponding header file would by convention be "MyProg.h". The suffix tells the compiler what your intentions are: ".c" is C (not C++, which has slightly different compilation rules), ".cc" and ".cpp" are both widely accepted suffixes for C++, and ".h" is typically used for headers in both C and C++. Unlike Java, the rest of the file name has no significance to the C/C++ compiler; you can use any name you like, but it's a good practice to use the Java convention of naming the file for its class anyway, just to keep the confusion factor down.

You should be careful not to put into a header file any declarations that would cause compilation errors if they occurred twice in the same code, because it is perfectly legal (and often the case) for headers to #include other headers, which sometimes leads to duplications. Duplicate function prototypes and duplicate type specifications are not considered errors; duplicate variables (except extern) are errors. Therefore you should put type specifications (including classes) and function prototypes in header files, and the code for the methods and functions, as well as the non-instance variables in the ".cpp" files. Instance variables are declared within the class specification; that does not allocate memory for them until the class is used as the type of a global or local variable, so duplication of the class specification does not necessarily cause a compiler error. However, see #ifdef below for a way to eliminate duplications.

Radical changes

The rest of this discussion deals with the language features where Java has no counterpart in C/C++, or else it is so different that you cannot think of the two languages as being "the same, only different." We treat these differences in separate sections, first on the so-called "preprocessor commands", then pointers, and finally operator overloading and templates.

Preprocessor

The original C compiler was designed to run in a PDP-11 minicomputer with only a small amount of memory (maximum 64K bytes, often much less); as a result the language is pretty minimal and depended on the programmer to tell the compiler how to make the code (for example by declaring frequently used variables to be register, which is preserved in modern C but considered only advisory). One of the features that gave C programmers considerable expressive flexibility without burdening the compiler was a "macro preprocessor" which essentially defined string substitutions that were made before the compiler saw the code. All of these preprocessor substitutions are specified by one-line commands that begin with "#". Note that regular C attaches no significance to line breaks (many C compilers accept the //-to-line-end style of comments, but they are not in the C standard); only the preprocessor commands are line oriented. I believe most C/C++ compilers take out comments before the preprocessor sees the file, but you should nonetheless avoid "commenting-out" preprocessor directives except on a line-by-line basis; use #ifdef instead to surround blocks of unwanted preprocessor commands.

C++ preserves all the C preprocessor commands, but it is often not a separate pass through the program, just an early step in the compilation process. Using these commands makes a very large source file for the compiler; some compilers have an option that lets you look at the result, which is useful for debugging your preprocessor commands, especially if you do a lot of macro substitution. However, if you need to look at the intermediate file, you are over-using the preprocessor, and your code will be exceedingly hard to debug and maintain. Avoid that.

The following commands are still in widespread use; you should know and understand them:

#include <filename.h>
#include "filename.h"

Either of these two lines is completely replaced by the entire file so referenced. The angle brackets version looks for the file in the compiler's built-in library, while the quoted version looks in the same directory as the source file in which it is referenced. You are not limited to ".h" files except by convention -- indeed the ISO C++ standard has dropped the suffix for the standard libraries. Included files may further #include their own list of headers; this process is applied recursively until there are no more #include declarations in the file. You can have header files that #include each other, but you must be careful to stop an unending recursion loop by using #define and #ifdef (see below).

#define name some-string
#define macroname(p1,p2,p3) some-string using p1 and p2 and p3
#undef name

The #define command is the basic string-substitution command. After a #define like the first line above, every occurance of name in the rest of the program (including inside other macro definitions, but not after the corresponding #undef command) will be replaced by some-string, which can be any sequence of characters, spaces, operators, etc., up to the end of the line. If the string is empty (that is, the line ends with the name being defined) then the name is simply deleted from the file everywhere it occurs. It is still defined in the macro processor, so it can be used in #ifdef commands to delete or select code (see below).

Unlike Java, C has no final keyword for defining constants; it is customary to use #define for that purpose. C++ has added the keyword const, which can be used like final in Java to declare constants, but old habits die hard. Besides, #define constants can be used in other preprocessor commands (see #if below), while const constants probably won't work there.

The most general form of the #define command allows one or more parameters (so its use resembles a function or method call), the actual strings of which are substituted for every occurrance of those names in the rest of the line. For example, if you write the macro:

#define max(a,b) ((a>b)?(a):(b))

then somewhere else in the program write this line:

x = max(x+1,3)*2;

the compiler will instead see the line:

x = ((x+1>3)?(x+1):(3))*2;

Note that the parentheses are often necessary to prevent composite parameter values from interacting with adjacent code. Extra parentheses in an expression do not change the value and are sometimes required. If they had been omitted in this example, the compiler would try to multiply 3*2 before testing the result of the comparison, and if x+1 is greater than 3, it would not be multiplied. Even this definition does not have enough parentheses; notice what happens to the relational expression in the following macro invocation:

x = max(y&7,5);

which gets compiled as

x = ((y&7>5)?(y&7):(5));

Because comparison has a higher precedence than the bit operator &, this will compare 7>5 (which is true, which is equal to 1 because there is no honest boolean type) then bit-and that to y, effectively testing only its low bit. If y is odd then the low three bits of y are assigned to x, otherwise x gets 5. Before the macro was expanded, it looked as if the low three bits of y were to be compared to 5 (which would give a different result if for example y==9 or y==6). If you had instead coded this as an inline function, then the comma separating the two parameters would force the separate evaluation of the two values before making the comparison, and you would have the same effect as adding a couple more pairs of parentheses to the macro.

The #undef command terminates the substitution so that the name can be redefined for another purpose.

#if constexpn
#ifdef name
#ifndef name
#else
#endif

These lines control conditional compilation in C/C++. If the constexpn evaluates to zero (usually by consisting of constants specified in #define), then all the code between that line and the matching #endif (or #else) is simply deleted; if nonzero, then that code is preserved and the code betwen the #else and the #endif is deleted instead. The other two forms (#ifdef and #ifndef) can be used to test if the name was defined in a prior #define line (or not defined). The #ifndef preprocessor command is commonly used with #define in header files to ensure that declarations and #include lines are not duplicated, thus:

#ifndef __MyPackage__
#define __MyPackage__
... declarations...
#endif

The first time this sequence is encountered in your program, the compiler will determine that the name "__MyPackage__" is undefined and compile the declarations; all subsequent times the name will be already defined and the duplicate declarations will be deleted from the compiled code. Notice the double underscore characters in the name being defined. It is a common practice in C/C++ to reserve these double-underscore names for preprocessor #define names; all kinds of confusion and compiler errors can result if you start using names like these in the body of your program. It's legal to do that, but the chances of using a name already used in somebody's header is higher.

Conditional compilation directives can be nested and often are. Another use for them is to select between different sequences of code for different platforms or situations. Many library routines are shared between C and C++ and conditionals are used to select the correct syntax in the special cases where the calling protocol is different. In other cases a common code base can be shared between unix/Linux, Windows and Macintosh platforms, with #define and #ifdef selecting between the different function and system call protocols. For a long time the baroque non-linear architecture of the x86 PC processors led programmers to define different calling sequences ("near" and "far") depending on whether the system supported 32-bit pointers or only 16-bit pointers; those days are largely gone, but the code has been retained. You will see it in the header files you happen to stumble into when your code breaks.

There are a lot more preprocessor commands, but these are the ones you need to know about to read and write professional C/C++ code.

Pointers and Arrays

It is said that pointers are the single most difficult C/C++ feature for programmers to understand and get right. That may be true, and if so, it is largely the fault of the language designer, who badly confused pointers and arrays. Neither C nor C++ has an honest array type; Java fixed most (but not all) of the problem, and mostly got rid of pointers at the same time. Actually Java still has a fairly tame form of pointer, called a reference. More on that later.

C/C++ has array syntax, but no array type. Instead, array elements are accessed by pointers. As a result, the pointer type in C has been corrupted to support array access, vestiges of which corruption linger on in the Standard Template Library (STL). C++ preserves this corruption, and tries to fix some of the problem with vectors; that may be better than what C offers, but not by much. The best advice I can offer is, use Java array syntax (which mostly works) and avoid pointer arithmetic like the plague that it is.

What is a Pointer? A pointer is a reference, exactly like the object references in Java. In fact, Java object references are pointers, pure and simple. A pointer is a small variable (typically four bytes in most of today's 32-bit architectures) which holds the address of some location in memory, where the real data is stored. It points to that location. Some pointers don't point to any memory location, they are NULL, which by convention is zero. There really is a memory location 0000, but we define pointers not to be able to point to it, mostly because zero tests are cheap in computer hardware, and therefore also in C (recall that zero, including null pointers, is considered equivalent to false in C).

In Java, because all objects are accessed through an object reference, when you want to access the data a pointer refers to, you use a dot (.) between the object and member names. C++ classes do not need to be at the end of a pointer, so the dot is reserved for the historical function of member selection, not pointer dereference. Instead, a C pointer is explicitly dereferenced by a star to the left of the variable name. This is a silly place to put a pointer dereference -- all other languages with pointers put the dereference operator to the right of the name, where it can be intermixed with member selection operators and array subscripting brackets while preserving explicit order, but in C that would confuse it with the multiply operator (other languages distinguish dereference by a different symbol, like "@" or "^") -- so C also has another combination operator which is equivalent to a star followed by a dot, but both on the right side of the name (which would be syntactically incorrect in C); this combo dereference+selection operator looks like an arrow (->) and is exactly equivalent to the simple dot in Java. Use the arrow instead of the star-dot combination when you need both.

Because C++ classes do not imply dynamic memory, it follows that any variable type (including integers, floats, and arrays) can sit at the end of a pointer -- or not. Java arrays are all dynamically allocated, so an array reference (the square bracket syntax) in Java also implicitly includes a pointer dereference, but the strong-typed array bounds checking means that there is really a more complex data structure out there, the details of which we are not told. In C/C++ everything is open and visible; there are no secrets. That's why pointer dereferencing is also explicit, and why you must learn it.

Consider the following fragment of C++ code:

void sample() {
class pair {public: int a; int b;}; /* 1 */
int x;     /* 2 */
int * y;   /* 3 */
pair p;    /* 4 */
pair * q; /* 5 */
x = 99;    /* 6 */
y = &x;    /* 7 */
*y = 88;   /* 8 */
p.a = 3; p.b = 5; /* 9 */
q = &p;           /* 10 */
pair r = *q;      /* 11 */
r.a = q->b;       /* 12a */
r.a = (*q).b;     /* 12b */
} // end sample

In this example, pair (line 1) is a data type; that line creates no variables, it only tells the compiler that a pair consists of two integers, named a and b. The next four lines declare one variable each: on line 2 x is a simple integer, exactly like in Java. Line 3 declares y to be a pointer to an integer, sort of (but not exactly) like the Java wrapper class Integer. Line 6 gives a value 99 to x, again like Java, but line 7 sets the value of y to be the address of x (the & operator is the "address of" whatever follows it). Now y points to x, so line 8 sets the value of whatever y points to (which we know is x) to be 88; y was dereferenced by the star and the result was the destination of the assignment, so x now has the value 88. You just need to keep in mind that *y is an alias for whatever y points to. If y is uninitialized, then you might be changing anything in memory, depending on what bits accidentally happened to be there. If it's code, your program (or maybe the operating system) just crashed (or soon will). Kaboom! If it's some other data, you may or may not discover it before a long time, then your program (or some other program in memory) will do strange and incomprehensible things. Be very careful to properly initialize all pointers before you use them.

Line 4 looks the same as Java, but it isn't. The variable p is not a reference to a pair, it is the pair itself. C has a sizeof() function that you can use in expressions to calculate the size of data; sizeof(x) is probably 4 (the size of one int), and sizeof(y) is also 4 (the usual size of a pointer), but sizeof(p) is two integers, =8. On the other hand, sizeof(q) is again 4, because it's a pointer. All pointers are the same size. Line 9 sets the values of the two member fields of p, and line 10 sets q to point to p. Now we can dereference q to access p as in line 11, where we declare another pair variable r and immediately copy p into it. Note that when you assign a variable, the whole variable is copied; in this case the whole variable r is of type pair, so both integers are copied to it. Both lines 12 get just one of the member fields from p (which q still points to) and puts it into member variable a of the r variable. Unlike Java, r was only a copy of p, not merely a pointer to the same data, so the change we made to r is not reflected in p; they are separate variables with separate values. Note that the dereference+selection expression q->b in line 12a can equivalently be written (*q).b as in line 12b,where the dereference is explicit (with the star on the left), then parentheses added to force that to happen first, followed by the member selection dot operator. A C/C++ compiler produces identical code for both lines, but it's a hassle to write the extra parentheses; that's why C has the arrow form. C programmers are consummately proud of their sparse syntax.

Is all of that confusing? Go back and follow it carefully. C/C++ has very precise (albeit sometimes misleading) syntax. The computer will do just exactly what you tell it to do, even if you didn't want to tell it to do that. Just keep the symbols clear in your mind (and watch out for overloading ;-)

The easy way to understand how to use pointers and non-pointers in C/C++ is to remember: if a variable is declared with a star (that is, it's a pointer), you need to use a star to dereference it and access the data. If it's declared with names inside of braces (it's a class or struct, which is just another name for a class, or a union, which is the same only different), then you need to use a dot to get at the member names as in Java. If it's both a pointer and a class/struct, then use the arrow to get at the member names. The address operator is the opposite of the star operator: if you used an ampersand to get the address of a variable, then you need to use a star to get the value back, and vice-versa. Thus *(&p) = &(*p) = p.

Dynamic Memory. In Java all objects are allocated with new and a constructor call, and are garbage-collected after the last reference variable gets a new value or goes out of scope. In C++ new works with a constructor the same as Java, or you can leave off the parameters and parentheses to implicitly call the default (parameterless) constructor, or you can use the C library routine malloc(s), where s is the number of bytes you wish to allocate, and then cast the resulting pointer into the appropriate pointer type. There is no garbage collector in C++, you must explicitly use the delete operator on the variables allocated with new, or call free(p) on any pointer p allocated with malloc. The books all say not to cross over and use new with free(p) nor delete with malloc. It's probably best not to use malloc at all, unless you are doing really tricky memory management things like overloading new. Yes, you can do that, but you deserve what you get. I warned you.

One of the most insidious pointer problems in C/C++ is unmatched new/delete. If you fail to call delete on a pointer when you are done with it, the memory allocated for that pointer will become inaccessible; do this long enough and the whole system will grow slower and slower, or else simply crash "out of memory" as your program keeps requesting (but not releasing) memory; this is called a "memory leak". Java eliminated this problem by garbage collection: every pointer is automatically deleted when it can no longer be reached. The opposite problem is called a "dangling pointer" and is also not possible in Java; it consists in continuing to use a pointer after the memory it points to has been deleted. This might happen if you have two or more pointers pointing to the same block of memory, and you delete one of them, but continue using the other, not realizing it's the same block you deleted. There are several programming practices, none of them cheap, that can minimize these kinds of bugs going uncaught. The best policy is to be very careful with dynamic memory allocation.

Reference Variables. In Java as in C, all method/function parameters are "passed by value", that is, a copy of the argument value is made as the parameter. Large structures in Java are effectively "passed by reference", as the only way to access these structures is by an object reference, which is passed by value even though the object it points to is not copied for the method call. In C, however, you can have large structures locally declared (like the pair p; above), and if you pass a structure, the whole structure is copied as the parameter. You can get the same effect as Java by passing the address of a structure (using the address operator in the call) and receive it in the called function as a pointer parameter (using a star), but it's hard to remember to code these pointers and addresses in every call. Most other programming languages either optionally or by default allow large parameters to be declared as by reference, and I guess the C gurus felt deprived; C++ now has additional syntax to specify that a parameter is passed by reference. They do this by using the ampersand address operator (&) in the parameter declaration where C would have used a star to declare it a pointer; when calling a function or method with reference parameters, you simply supply the name of the variable (it must be a variable: you can't pass the address of an expression that does not resolve to a place in memory), and the compiler automatically applies the address operator in building the argument list. C++ thus now works the same as Pascal and Ada and Fortran and Basic with respect to reference parameters.

C++ did not stop with reference parameters, they made the type a first-class citizen by allowing programmers to declare any variable to be a reference. Consider the following example:

void refer(int & p) {
int x = 99;
int & y = x;
y = 88;
p = x;
} // end refer
... (in main)...
int a;
refer(a);

In this example the function refer is declared to have a single reference integer parameter p, and it is called from the main program with the address of variable a. Local variable x is a simple integer as before and initialized to the value 99, but y is now a reference variable which is initialized to point to x (you must do this in an initializer, because there is no way to set its value in ordinary code). The next line sets x (because y points to it!) to a new value 88, and the final value stores that 88 also in the caller's variable a, using the reference parameter p. It's easy to see how this code can quickly become exceedingly obscure. Obscure code is a popular game among C programmers, but it is not a virtue; other than as parameters you should use reference variables sparingly or not at all.

Function Pointers. Very few people understand the idea of storing in a variable the address of a function, then using that variable to call the function. Java does not have honest pointers, so you cannot do that at all, except implicitly by overriding a method in a subclass (which under the covers is implemented as a function pointer). In C/C++ you can do these things out in the open, where people can look at it and say "Huh??" The syntax is a little strange:

int funca(int x) {return -x;}
int funcb(int a) {return a+3;}
void fun() {
int n, m;
int (*p) (int);
p = funca;
n = (*p)(3);
p = funcb;
m = (*p)(5);
} // end fun

In this example we have defined two trivial functions funca() and funcb(), both with the same signature (takes one integer parameter and returns an integer result), and a function pointer variable p with the same signature. Then we call first one of those two functions through the function pointer, followed by the other one. At the end, variable n has -3 in it and variable m has 8. Obviously there is not much sense in using function pointers here where the function itself is accessible, but you could for example write a mathematical graphing program without knowing which function it needs to graph, then pass it pointers to whatever appropriate functions you want graphed and have it call back to that function to obtain the values to graph. Function pointers make wonderful iterators for sequentially walking through complex data structures, nevermind that the STL has their own somewhat misnamed notion of what an iterator is.

Like reference variables, function pointers should not be used unless they are needed; because they are so little understood, your program using them will be essentially unmaintainable. That is a Bad Thing.

Arrays. So far we have looked at good uses for pointers, valuable social functions that make the world a better place, if somewhat obscurely. C also does one very Bad Thing with pointers: it encourages programmers to use them in place of array subscripts. You need to know this, because you will see a lot of it in code written by other people. Don't do it yourself, it's a bad habit. The problem is that C/C++ does not have honest arrays.

If you declare an array of 100 integers and a pointer to an integer like this in C:

int ary[100];
int * ptr;

C treats these two declarations almost exactly alike, except that it allocates 400 bytes for the array and only 4 for the pointer (which you can test using sizeof). Here are some of the strange things you can do with these two variables:

ary[0] = 3; // reasonable
ptr = ary; // note: no address operator!
ptr[4] = *ary; // !!
ptr = ptr+3; // actually adds 12
*ptr = 5; // same as: ary[3] = 5;

Notice the third line, we are subscripting the pointer variable and dereferencing the array variable. The compiler actually accepts this! Similarly, adding 3 to the pointer variable actually adds 12, which is 3*sizeof(int). You should not be allowed to do arithmetic at all on pointers (they are not numbers), but it's especially disconcerting to see that the arithmetic doesn't do what you said to do. It all makes sense if you realize that array names are not arrays at all, but merely pointers to a block in memory; this is why the second line above works despite the lack of an address operator. The subscripting syntax that we know and love from all the sane and sensible programming languages that ever existed is in C nothing more than a macro for the pointer arithmetic -- or maybe it's the other way around. Try not to let this go to your head, you will become hopelessly confused and have all kinds of problems with pointers -- just like all the other deranged C/C++ programmers.

There are some things you can do to make C arrays better behaved. Do not use pointer arithmetic for array access; any compiler worth its salt will generate the same or better code for subscripts than for pointer dereferencing. On the C++ compiler where I tried it, array brackets are nearly twice as fast as pointer dereferencing. With some of the more feeble C compilers in the past the performance might have been the other way around, but it's still a bad programming practice. In any case, pointer access is very difficult to generate optimized code for; the compiler knows more about arrays and can invoke much better optimizations. I know, I did my dissertation on compiler optimization.

Java array access is checked for index in range; in C/C++ nothing is checked unless you program it to be checked. If you want array bounds checking, you have to write it into your code. If you want null (or dangling) pointer checking, you have to write it into your code. If you forget, or if you mistakenly assume that your pointers and subscripts are correct, Kaboom!

In most programming languages, when you assign the name of a variable to the name of another variable of the same type, it copies the whole variable. Because arrays in Java, like objects, are implicit reference variables, all you get is a copy of the reference (pointer), so it behaves somewhat like C. If you are comfortable with that kind of behavior, you won't make too many mistakes assigning arrays in C/C++. However, if you want arrays to work like every other C/C++ variable (obviously except reference variables, because of the hidden dereference), you should enclose all arrays within a wrapper struct, then use the struct name for copying the entire variable, thus:

struct IntAry100 {int i[100];}; // declares the array type
IntAry100 ary, bry; // declares two arrays of that type
int ix;
...
for (ix=0; ix<100; ix++) ary.i[ix] = ix;
bry = ary; // copies the whole array

It costs a couple extra characters in the source code for each array access (and no runtime: the compiler generates exactly the same object code with or without the wrapper struct), but the array is much better behaved. You can still pass these wrapped arrays by reference, using the address operator, so you have not lost that advantage.

In C++ you can also create a class or template to accomplish the same thing, but hiding all the class and member field syntax in the ability to overload the operators (including subscripting brackets). You can also code bounds checking into these overloaded array-indexing bracket operators, but it will cost you on every access. Unlike the better languages with built-in array bounds checking, C++ has no easy way for the compiler to notice that the index is provably already in range and eliminate the extra check in those cases only.

Operator Overloading

Most OOPS languages (including Java) encourage you to overload methods -- especially constructors -- by defining multiple versions of the same method name, but with different numbers and types of parameters, and the compiler must then try to sort out which method call gets connected up with which actual method code. That may be a small amount of extra work for the compiler, but it's a huge cognitive burden on anybody trying to read the code with all these different signatures -- especially for the same number of parameters of differing types. However, given that this is an accepted "benefit" of OOPS, C++ does it in spades. Not only can you overload ordinary method names, you can also overload all the operators! Used judiciously, you can get programs that behave in reasonably intuitive ways -- that is, after they are fully debugged, which is itself a gargantuan task in C/C++; you can also get programs that do bizarre or (worse!) subtly strange things without obvious cause. Consider the following short program, which actually compiles in C++ and runs:

#include "stdint.h"
...
Int i = 2, j = 4, k;
k = i+j;
cout << "i=" << i << ", j=" << (int)j << ", i+j=" << k << endl;

That strange third line is essentially the C++ version of Java's System.out.println. Somebody got the goofy notion that the bit-shift operator could be overloaded to represent a sort of arrow showing the flow of data to an output stream (yes, input is the same, except the arrows come out of the stream). So everything to the right of these "arrows" are strings or other data types that somebody has written code to overload the bit-shift operator to accept as an output operand pretending to be a bit-shift operand, but actually passes the data to the ostream that is its left operand. Think of this line as the C++ version of the Java line:

System.out.println("i=" + i + ", j=" + (int)j + ", i+j=" + k);

Thus it's just slightly different spelling: "System.out.println(" is spelled "cout<<", and the string concatenation operator "+" is spelled "<<". Java has its own problems with an overloaded "+" operator: if a has a 3 in it, the assignment a=a+5; could put either of two different results -- either 8 or 35 -- back into a, depending on what type a is! That's the kind of problems operator overloading brings to the table. Look at the rest of this snippet of code, and try to guess what this program prints. Let's see, it kind of looks like there are three integers, i, j, and k, with i initialized to 2, j to 4, and k subsequently assigned to the sum of i+j; would you believe it prints out "i=2, j=4, i+j=6"? You would be badly mistaken. It actually prints "i=6, j=44, i+j=7". Really.

There is a subtle clue to this pathological behavior in the name of the type Int. Like Java, C/C++ is case-sensitive, so Int is not the same as int. Somewhat less obvious is the #include line, which looks like it imports some kind of standard library header for integers, but is actually a custom header in the program's own directory (quoted, not angle brackets). If you open that file you would see the class definition for class Int, as follows:

/* This header implements a pathological class Int */
#include <iostream>
using namespace std;
class Int {
private: int i;
public:
Int() {i=9;}                             // #1
Int(int v) {i=v;}                        // #2
operator int() {return (i%10)*11;}       // #3
Int operator+(Int n) {                   // #4
    Int res = i<n.i ? n.i-i : i-n.i;
    return res;}
Int operator=(Int n) {                   // #7
    i = n.i-1;
    return *this;}
int actualvalue() {return i;}            // #10
};
ostream & operator<<(ostream& os, Int i) { // #12
os << 8-i.actualvalue();
return os;}

Let's look carefully at all the misanthropic code here. Line #1 defines the default (parameterless) constructor for this class, which is used when variables of this type (like k) are declared without an initializer. The single instance variable i is set to 9. Line #2 is the constructor used when variables of this class are declared with an initializer that is an integer; that value is used directly. I could have been even sneakier and made it v-1 or some other bizarre value. Line #3 overloads the type-casting operator IntÆint which when it is used looks like it should just give you back the value of the wrapped instance variable i, but actually picks off and duplicates its last digit (44 instead of the 4 that is actually there). The three-line method beginning with #4 overloads the "+" operator so it returns the absolute value of the difference of its two operands instead of their sum. The three-line method beginning with #7 overloads the assignment operator, so that the value assigned is one less than the value given. The three-line method beginning with #12 overloads the output stream operator "<<" to give yet another strange result. Line #10 is there so that the "<<" operator can get the actual value, unencumbered by those other more obvious (but wrong) methods. Therein lies a tale of its own: OOPS class methods in C++ as in Java have a hidden first parameter "this" which points to the object that invoked the method. Since the first parameter in an output streaming expression is an ostream, and we cannot add our overload declaration to the ostream class, C++ also lets you use a plain C function to overload any operator. Notice that this operator function is not inside any class declaration. If this header file were to get included twice, there would be a compile error from the duplicate definition of operator<< with that signature. Go figure.

This example gives you a little taste of the power to "slash your wrists" in C/C++. Operator overloading may make certain types of class variables easier to write into simple expressions, but with that flexibility comes a heavy price to pay in program readability and perspicuity. Overloading operators generates exactly the same code as using ordinary methods with meaningful names, but meaningful names make it far more evident what is going on to other people who may need to read the code after you've moved on to other projects -- or even to yourself six months or a year later when your employer wants you to add a tiny new feature to your already perfect(ly unreadable) code. Using every tricky feature in the language is not a virtue.

Templates and STL

Abstraction is an extremely important skill in programming, and one of the goals of OOPS is to facilitate abstraction in the programming language. Unfortunately, the way it is accomplished in Java and C++ falls far short of what programmers need to be doing. Nevertheless, what these languages do in support of abstraction is better than nothing. In Java you can declare an abstract data type by creating a class with one or more disembodied methods, which you now can recognize as C/C++ prototypes. You can also define interfaces which abstract the required method signatures for particular groups of classes. C++ achieves by multiple inheritance (a class may be a subclass of multiple superclasses) what Java does with interface, and like Java permits an abstract class to leave only prototypes for abstract methods.

But far be it from C++ to solve any problem only in the simple and direct method used by other languages (even its own derivative, like Java). So the most recent C++ standard has also defined templates, which are abstract functions and classes where the same source code gets re-used for any number of signature types. The template name includes angle brackets around parameter class names which can be replaced by any actual class when you use the template. Here for example is a line from the specification for the standard vector template in the Standard Template Library, which is required of all conforming C++ implementations (but reportedly not necessarily correctly implemented by any of them yet):

template <class T, class Allocator = allocator<T> > class vector

You then create new classes for each type you substitute for T, thus:

vector<int> ary(100); // declares a 100-element vector of integers
vector<char> chv(5); // declares a 5-element vector of characters

The template code is instantiated for each new (separate) type you throw at it; if you use it with the same type for multiple declarations in the same file, it will re-use the same code for each of them, but different types get a new copy of the code. This is useful if you really need "same only different" classes, differing for example only in the type of contained data. Another way to accomplish the same result is to use a superclass pointer for the contained data in an ordinary class declaration -- which is the way you would solve the same problem in Java -- but a small amount of extra processing might need to be spent on type-checking the contained data, unless the driver code is robust enough to prevent such type faults.

The problem with templates is that they are very tricky to write and get correct. If and when the library vendors actually succeed, they might save some coding time on standard operations like sorting and searching. As a long-time programmer who's been burned more than once on vendor-supplied code that stopped working (or never actually started working correctly), I tend to prefer to keep more in-house control on mission-critical code; other policies may apply at other locations. Another consideration is the time it takes to learn all the inticacies of using code that generality forces to be far more complex than corresponding ad-hoc code. In any case, you do not need to understand templates to effectively program in C++, but you might need to know them to effectively collaborate in a shop where they are widely used.

Other Resources

There are a large number of resources for introducing the Java programmer to C++ -- even a textbook or two. Almost all of them have substantive errors (probably including this page, but if you find any mistakes here on my page, please let me know). Here are a few links (all old, some may be broken) you may find helpful, many or most of them by people with a higher opinion of the language than I have:

C++ For Java Programmers, by Scott Sigman (for CIS-3333, 2001). The same idea by our own Dr.Sigman two years ago.

C++ For Java Programmers, by Timothy A. Budd (Addison-Wesley, 1999, $43.20). I have not actually read this book, but the publisher's review on this web page admits that it is of limited scope.

C++ for Java Programmers, by Barbara Staudt Lerner (1998, or Modified by Bill Lenhart, 2002). This is a brief introduction similar in purpose to my own page here, but IMO less complete. Lerner's page was obviously targeted for a particular software product written in Java; Lenhart revised it mostly to remove the product references (and perhaps also to correct some technical errors, but some errors remain).

C++ for Java Programmers, by Uwe Rastofer (PDF Lecture slides, 2000). Although the course these 77 slides were prepared for is taught in Germany, the slides are (except for the copyright notice) all in English.

Learning a New Programming Language: C++ for Java Programmers, by Rebecca Hasti (undated). This is a more structured, self-paced tutorial you might find helpful.

All but the first of these (and many more) were found by a Google search on "C++ for Java Programmers".

Revised 2003 October 8
Links fixed 2004 July 29
Typos fixed 2008 July 30
Bug & more typos fixed 11 Dec 26 & 13 Dec 28
64-bit paragraph added 2019 August 31