Programming by Design

Design has become a major focus of this course in programming, and is scattered about, mostly at the front of every programming project. No software of any consequence can be written without a good design, and we hope our students, at the completion of our course, will be able to produce a good up-front design for any program they set out to write, or at least be able to code to a design prepared by somebody else, and if so, they will be able to make a good living in the software industry.

The people in the educational community who try to teach software Design typically give it some other name, like "Thinking Computationally" and identify four components, but not always the same four components. Here are the most common of them:

Code Abstraction, also called "Successive Decomposition" (we call it "Top-Down Design" or TDD), where the program is structured logically as a tree (with the root at the top), and each layer adds more detail, until the bottom layer is the actual lines of code.
Pattern Recognition, also called "Code Re-Use", where the program is examined for sequences of similar code, which is then merged into iteration or (more often) subroutines. This somewhat overlaps the Abstraction item above.
Algorithm Design, choosing the right sequence of operations to do the job. Rarely do programmers design their own algorithms. Mostly we know about the most important algorithms for doing a variety of things, and go search for whatever we don't know about. The internet has made this search very much easer.
Data Types and Structures, choosing appropriate data types (and classes, if appropriate) to do the job. If you have one or more groups of very similar (or identical) data elements, you will use arrays; if you have composite data clumps of very dissimilar elements, you will use classes. If you have a vast amount of data, or if you need persistence across runs, you will use files. If you have numbers, it helps to think about what kinds of numbers you have, thereby to choose integers or floating-point.

TDD

We spend most of our discussion in this paper on that first category (TDD), because it seems to be the hardest for people to understand and the biggest hurdle to knowing what to do next when writing a program.

When I design a new program, I get up and walk away from the computer, pacing up and down across the floor, telling myself a story in rough abstract terms (in English), what the computer must do to run this program. Michael Schuff considers it helpful to think about the known inputs: Are they single items, or is there a sequence of inputs? What needs to be set up before you can do anything with these inputs? What operations do you perform on them after they are in? Stuff like that.

I think it is helpful to write down (in English) first a one-line title describing what this program does, then in three lines under that, the major parts of what it does.

For example, if you are playing a single round of Rock-Paper-Scissors (that would be the title), the first obvious thing that happens is the two players wave their fist up and down to get into synchronization. The purpose when two people play is to make sure both players expose their plays at the same time (otherwise whoever shows their play first is at a disadvantage to the other player who can see what they are going against). Then on the third down-beat, both players show their play at the same time, then finally the two plays are compared to see who won that hand. Later on you can add an iteration around the round to repeat it.

If you are doing a calculator, the three parts might be to get the inputs (two numbers and an operator), then do that operation on those inputs, then display the result. Later on you can add an iteration around this basic form to combine several operations into a session.

Most programs, there is no one way to decompose it into three component lines. If you are doing a card game, you might combine shuffling and dealing into the first step, then iterate the individual plays (one line in the top level) then do your scoring. Or you could separate out the shuffling and dealing as separate steps, and then combine the plays and betting into a single iteration. The important thing is to ask yourself is this a part of the game or program that I can give a short name to, and explain in a single sentence?

Even more important, Do you understand (in your head) what this program must do? Can you write a short narrative (in English) telling that operation as a story? Can you break that story into chapters and paragraphs and sentences? That's what your program is: the chapters are classes, the paragraphs are subroutines, and the sentences are lines of code. Write your chapter titles first, then ignore all but the one you are looking at, and write its parts. You cannot keep the whole program in mind at once, so you must ignore (forget) all the parts you are not currently working on. Do them later, or earlier, but not now.

This is the same skill they call "outline" when they teach it in English composition. Learn it there, use it here. Or else learn it here, use it there. It's a terribly important life skill, it will also help you sell your product to the customer (if you become a salesman), or pitch a project to the Vice-President of Engineering in a corporate environment, or give a campaign speech if you run for office. If you are a soccer coach or a scout leader, it will help you tell the kids in your troop or team how to cooperate for this hike or game. If you decide to write a novel, it can make the difference between a best seller and the dust bin. If you become a painter, it will help you organize your image so it is pleasing to the eye. As a computer programmer, you cannot even imagine a complete program without breaking it into component parts, which you fill out one at a time.

The most important skill in programming is hiding what you are not working on right now. I write big humongous subroutines, but they are impossible to understand. Break your code and data into smaller chunks, you will be a better programmer.

Code Re-Use

You want to be able to recognize similarities. You go to the grocery store, they have all these shelves in aisles, and they all look the same, except what's on the shelves is all different in every aisle. The shelves and the stands that hold them up, and the walkway between then, that part is the same, make a subroutine to do that part. Stocking the shelves, the goods and produce is different, but putting it on the shelves is the same. If you are washing windows, the curtains are different, the size may be different, but the frame and the glass is the same. Driving a car down the street is the same as driving it down the highway, the highway is wider and there are fewer or no cross streets, the city street has cars parked on the sides, but you still steer in a more or less straight line and stay in your lane.

Some programs do a lot of multiply-and-add, multiply-and-add, multiply-and-add, you can make an iteration to do that. If you are writing three card games, they all shuffle the deck, so write a single subroutine to shuffle, then use the same subroutine everywhere. Look for the similarities.

Algorithm

Algorithms are hard to figure out. I don't think I invented more than a half dozen in my whole life. Go on the internet and see how other people did it. Think through what they did, then write your own code that does it the same way. You probably can't copy what they did word-for-word -- maybe they used a different programming language, or got their data differently, or operated on a different number of variables -- but try to understand what they did, and write it down in English, then do the TDD thing on it.

Data Types

We start everybody off in English. There are no data types in English, you can put numbers or text or true/false values in any variable and trade off at random. But you cannot write big programs that way. In Java, you need to be much more intentional about your data. Decide what the variable is used for, then choose a data type appropriate to that usage, and the compiler will help you avoid mistakes. If you have a lot of data, group it into chunks that make sense, and call the chunks by name (classes), then only look inside when you need to.

Every programming language has primitive data -- numbers, booleans (True or False), characters and sometimes whole words (strings of characters treated as a unit). All modern programming languages (usually not including Assembly, which is a human-readable version of machine language) also let you specify aggregate data, either collections of named data fields where you can refer to the whole collection by name -- older languages call these "records" but newer languages like Java they are called class data -- or else a linear array of the same type, accessed by a numerical index. A file is a (possibly permanent) array of records (or any kind of data) usually stored in a non-volatile mass storage like disk or tape. The file can be much larger than you have memory space for, so this is the classic way to process vast amounts of data, or data that must survive to another day.

Relational database files started the practice of structuring the data to minimize duplications, and it's a good idea when you write complex Java programs also. For example a business might have a file of all their customers, names and addresses and phone numbers, and another file of invoices with some of the same customer data printed on the invoices, as well as the particular transactions for the current billing cycle. Rather than duplicate the customer data in both files, they put just a reference to the customer (typically a unique customer number) into the invoice record in the file, then the database can be told to build the invoice from both files, where the customer number is called a relation between the files (or "tables").

You can do the same thing in your program. For example, if you are making a card game, there is one deck of cards with (for example) 52 unique cards, which you can refer to by a simple number (perhaps the order the new deck came in its box), and then have a string array, the display name for each card, and another numeric array, each card's rank when scoring. Most of your processing then works with simple integers, perhaps small arrays for each player's hands, and another for the undealt deck and another for the discard pile. Dealing and shuffling and discarding then deal with simple integers, the fastest of all computer operations.

Another example, the game Tetris has a rectangular grid for its game board, so the natural data structure is a 2-dimensional array, where each cell could be one of the tetronimo colors, or else blank, but a better cell content would be a reference number for the particular tetronimo, which in turn would be a record of a 4-element array (the cells it currently occupies), plus color and shape. You still need to update all four cells when you rotate or move a tetronimo, but all the information you need is in that one record.

More TBA

Tom Pittman
Rev. 2022 November 3

Programming by Design

Contents

TDD

Code Re-Use

Algorithm

Data Types