Late-Breaking News (Trees)

(This Is Not In the Video)

A very important principle in software development is:

Design first, then
Code to the design.

Design First

A large part of our design focus is what we call "Top-Down Design" (TDD) which starts with a single-line description (name) of what the program (or this sub-program) is all about, then under it, three (not less than two, not more than four) components that can be understood as composing the whole of that program, but restricted to one-line names. Then each of those component lines becomes a new block with the name on top followed by approximately three component names under it, repeating for every component that does not immediately and obviously resolve into code (our "Five Ideas" and nothing else). You can write programs without doing this, but not large programs, and certainly not programs involving more than one programmer.

You could conceivably think of the smallest block of what needs to be done, then wrap a label around it, then compose a few of those labels into a larger block, until you are finished with a top-level block that is the whole program. This is called "Bottom-Up" but it is rather harder than TDD, because you don't have the benefit of seeing the larger plan first. You also can end up with redundant code, which a better understanding of what you are really trying to do might prevent. Or even useless code that must be discarded unused.

Example: Seaman

Let's consider a word game between two players, where the first player thinks of a word, then the second player tries to guess that word, one letter at a time. We want to write a computer program to be scorekeeper. The obvious partition of this game into components is those two phases:

"See Word Game"
First Player
Second Player

This division would not work for a "keep-away" game like basketball or (American) football, when a more natural division would be Offense and Defense, or a turns-based board game like chess, where the two players play essentially the same game and a better division would be Opening, MidGame and EndGame. Anyway, however you divide up the first layer, those two (or three) components each become their own box (subroutine descriptor) at the next level:

"First Player"
...

"Second Player"
...

Now we consider the components of that first subprogram. This is from the computer's perspective, so thinking of a word for the other player to guess is not one of the things the program needs to do, but rather the input of that word is relevant. Before that you might want the program to tell the players what is going on, then after the first player has told the computer what the word is, the computer needs to hide any residue of that input process, so the second player doesn't see it, so:

"First Player"
Print Instructions
Input the Word
Hide the word

"Second Player"
...

Each of those three lines might become its own subroutine, but the process stops there because the code for each of those three subroutines is a few lines of "Five Ideas" code, nothing more.

Now we consider the components of the "Second Player" subprogram. That player is going to be making guesses -- at a minimum as many guesses as there are unique letters in the word, or worst case 26 letters, the number is unimportant -- but they are making multiple guesses, which suggests that this second part should be repeated. There is space in the top box to add a Repeat around the "Second Player" line, and you might say it really belongs there. No matter, here or there, one of the two boxes needs a Repeat.

Back to the "Second Player" box. That person doesn't know how many letters remain to be guessed -- initially they don't know anything at all! -- so it's customary in this game to print out dashes for every unguessed letter, and the correctly guessed letters filled in where they belong. Remember, this part is repeated, so the same part of the program prints all dashes at the beginning, then more letters as they are guessed. Wrong guesses contribute body parts to a stick figure, so we want to print out the partially constructed stick figure also, but that is an unnecessary complication at this point in the game design, it's just part of printing out the game state. It's messy, so the first draft of the program might just print the number of missed guesses.

Then the second player types in a letter -- from the computer's perspective, that's an Input -- and the computer program decides if that letter is in the word (and if so, inserts it everywhere it occurs) or else decides it's an error that adds another body part. We could call this part "Scoring" like so:

"First Player"
Print Instructions
Input the Word
Hide the word

"Second Player"
Print Game State
Input a letter
Scoring

Then of course each of these new lines becomes another subroutine. Printing the game state has two parts, printing the stick figure (or a number as a stand-in) and printing the partial word. Inputting the letter is trivially one of the "Five", but scoring is more complicated (the details do not concern us here, it is discussed in the text on Seaman). So our total TDD (shrivelled up a little) looks something like this:

Computer people call this a "tree" because everything branches out from the "root" -- except computer trees are always drawn upside down with the root at the top. Here, look at it this way, so it looks more like a tree:

After a while you'll get used to seeing trees upside down.

Now, if you start at the root ("Game") and crawl (we say "walk") the tree, always taking the left branch not yet taken, and write down the name of each "node" (box) in a list, but only the first time you see it, then you get "See Word Game" on the first line, then "First Player" on the second line, and "Print Instructions" in the third line. Whenever you get to a leaf (a box with no outgoing lines), after you write it down, you go back to the previous node and take the next branch to the right, if there is one, or else go back again. If you do this consistently, you will visit every node in the whole tree exactly once going forward. Here is the list you get for this particular tree:

See Word Game
First Player
Print Instructions
Input the Word
Hide the word
Second Player
Print Game State
Input a letter
Scoring

Notice that I indented one more level each time I went further into the tree, so it looks like an outline the way they tell you to write compositions in English class -- at least they did when I was in school. Now I can ignore the non-leaf nodes, and I get a simplified list:

Print Instructions
Input the Word
Hide the word
Print Game State
Input a letter
Scoring

Which is the whole computer program in the order it runs (except the repetitions are not shown).

The point is, the tree representation of the program is identical to the program the computer runs, except it has some annotations (names of non-leaf boxes) that help us to understand what the program is doing at that instant.

Code to the Design

The tree structure (boxes with connecting links) is your TDD. The names of the boxes are the names of your subroutines, and the leaf nodes are the subroutines you turn directly into ("Five Ideas") code. The branches are subroutine calls ("Do" in the English computer). If you get your TDD correct, you don't need to think very hard about the code you are writing, because you (or the team leader who did the design, if you are in a group) already did all the hard thinking when you/they did the design. All you need to do is make the obvious conversions of these leaf node items into their corresponding ("Five Ideas") code.

It's very important that you do not think about all the rest of the program, because if you do, you will become confused and your program won't work properly. THE MOST IMPORTANT THING is to focus on how to convert this leaf node into correct code. Some not-very famous guy in Germany a hundred years ago made the useful observation that

The most important thing is to keep the most important thing the most important thing.

When you are coding to the design, the most important thing is to ignore all the rest of the program while you are coding this (each) part.

Thinking Linearly (Don't)

Recall I said that the tree-walk gives you the code in execution order. This is the order that the computer actually runs the program. The most obvious thing to try to do is think through the program in that order, and then write the program in that order. Everybody does that their first time. Then they get confused and all bollixed up. Make (or get from your team leader) a good design (including TDD) then code to the design.

Sometimes, especially after you are making your own designs for your own programs, you might make a mistake in the design that nobody noticed until after you got there. FIX THE DESIGN. Most often the fix will affect several parts of the program, and if you don't get the design right, you cannot get the program to work correctly.

Most programmers spend 80% or more of their program development time finding (or failing to find) errors in their code, mostly because they did not have a clear design to code to, or else they deviated from the design. Don't be that guy.

Big software firms usually plan on throwing their first design away after they get it built and discover what they didn't know enough about how it should work. Then they design it right. Maybe that happens several times. Managers and bean counters often ask if there is anything in it that can be saved. Don't be fooled by their penny-pinching ideas, it costs much more to try to recycle badly written code, than it does to just throw it away and do it right. Well, if you have a trade show where you must demonstrate your product at a particular date, take your buggy code and carefully step around the potholes. Then go back and make it right. It always takes longer than you thought.

Your Work In This Class

In this course, you and a Mentor must agree on a design for every program before you start coding it. And then you must follow the design. That way you will get your code correct -- except for small errors in getting the code to match the design, but those are easy to find and fix.

Tom Pittman
Revised 2022 November 1