User:Average/Hacking

From HackerspaceWiki
Jump to: navigation, search

How to Code, v.20150317 DRAFT @Xer0Dynamite

Audience[edit]

Programmers and coding can be divided into four camps:

  1. Participating in the Programmer/Data Ecosystem (GPL and OOPv2),
  2. Scientific Computing (Performance-based),
  3. Business Use (Persistence and Security of Data),
  4. Fun and learning (toy problems and user-specific domains)

This guide is mainly for those in camp #1. The rest of you, in order of precedence, will have to wait until you gravitate to the top camp and figure out that that's where you wanted to be.

Foundations[edit]

I'm going to tell you, the way to mastery is long. If you just want to be a mediocre programmer, go get your certification from Microsoft, display it on your resume for everyone, and be done with it.

Otherwise, plan on growing your neckbeard (or female equivalent) and let's go.

Challengers[edit]

...On the way to becoming a master, there are two main challengers along side with you in the programmer's chair. These are like Death always ready to take you.

Challenger #1: RushingToExecutable. You'll be inclined to get a program to work as soon as possible. Everybody likes to see immediate results, but down the road, this short-term savings won't be extendible--if it's even comprehensible. Most code should get re-used or appreciated, otherwise why write it? We're in the Internet age here. I'll appreciate your slim code for your ultra domain-specific application later -- after you write it so I can use it for *my* applications.

The best counter-weapon for this battle is TestDrivenDevelopment.

Challenger #2: Cruftiness. You wrote some code a while back. You're not sure how it works anymore, so you don't want to change it, right? You pussy-foot around it like a black-box. That's called cruft. I'm assuming brevity is a shared value in this guide. You're a programmer, not a shill working for LOC, right? So, bite the bullet.

The best weapon for this battle is RefactorMercilessly which you can draw out of your bag of ExtremeProgramming.

Got it? It's not good to enter battle unprepared and without knowing the challengers.

Procedures[edit]

PREAMBLE: The simplicity of the computer program.

Let's get this straight: you're on a vonNeumann machine not a Symbolics Machine. That means computer programming consists of loading a programming language statement and any data that goes along with it, executing said statement (raising any errors or exceptions that occur upon such execution), and continuing forward excepting on some condition otherwise and repeating this sequence. It's like a little Turing Machine, the basis for much of Computer Science, that. It may or may not finish. All that's to say: Let's not go on a joy ride upon the language-du-jour, 'k? We will, at some point, be worried about memory and CPU performance as all good programmers should, we're just saving it to last -- just as Master Bentley taught us.

So, the basic tools for forging your program are Divide and Conquer ;^). The basic tools for forging YOU are below this section under "Tools". It's contents should be applied along the way and will be referenced within this section.

Now, consider that the process of dividing and conquering has already started in two ways. One, you've broken off a chunk of human curiosity from the "master program", and perhaps have a clear idea of what you want to do. That's one point. Secondly, the history of computing has already began this division by separating I/O (generally named "stdin/stdout") from processing. This is a giant help and saves you buttloads of time wiring I/O junctions to your panels.

Beyond that, there can be several different "breakdowns" of how you go about structuring your program, but if you're constraining yourself to a particular programming language, chose the axis in which your language was designed (file-based vs. run-time based, procedural/functional vs. imperative, parallel vs. serial dimensions, etc.). If you're not contraining yourself, choose Python/C and make it easy on yourself.

Let's begin.

1. So, you've already sliced off a big piece of the problem domain and narrowed it down to some task. You've started using CRUCIBLE #2. That is the first insight. Give it a name (CRUCIBLE #1), perhaps it's "Adventure". Open a new file, write some boilerplate code: giving author contact info, comment or docstring on the purpose of the program, a header for your main function (or something that says where your program will start)(CRUCIBLE #3), and save the file with your (functional) name. Eventually, you're going to divide downwards until you get to the level in which your programming language stops providing natural mechanisms to control the machine. BUT instead of making a single monolithic program, you want to train yourself for modularity and re-usability (CRUCIBLE #1,#2), because you're making constructs that, if generalized, could almost certainly be useful to others.

In any event, in TDD, the next step would be to ensure that it compiles correctly, run it, and letting it fail to accomplish its task.

2. Divide your remaining problem into the multiple, conceptual [sub]units (CRUCIBLE #2) and give them simple, but meaningful names (CRUCIBLE #1). If you don't have a meaningful division, then you've found your first function (which is "main" if you're at the top). This subunit is now like a mini-program, each likely with its own I/O needs.

3. Consider the tightest vertical structure you'll need to hold everything you want to accomplish for the conceptual unit you're working on, how you're going to traverse that structure (while and for loops, for example), and what I/O you're going to have that will communicate between your other conceptual units (including the top-level ones the system has provided: stdin/stdout).

There are two main idioms to communicate between your different conceptual units: MessagePassing or VariableStorage. VariableStorage is preferred up to the point that you are constrained by your language mechanisms (like within a function definition). MessagePassing is generally how you send data in and out of your machine (collecting or sending characters/bytes in and out). A language like Assembly doesn't have lexical structure, so variable storage is about all you use within a program. Otherwise, MessagePassing is suggested as more modular and scalable.

By way of definition: Vertical data structures go towards greater meaning (like a Person being composed of name, department, eat functions, etc.), horizontal data structures are more like collection data types, holding meaningful bits of even the slightest bit of vertical data (like a numerical quantity).

4. If and when the conceptual unit you are working with is too big, repeat. You can divide and conquer upwards or downwards. Divide upwards when you want SeparationOfDomains, to create higher level objects or separate files. Divide downwards when you need to conquer closer to the machine, to get the greater detail needed in order to make a compilable program. Return to step #2.

Now, how to harness these challengers to hone yourself into a MASTER....

Tools[edit]

Keep the following three rules close to your chest. They are the basic tools for forging YOU, the apprentice who wants to master the machine. Don't reveal them to just anyone except other journeyors. They are the result of million-year efforts of the community. The value of the crucible rules exist and can be used independent of my or any other commentary and should you stray, they just won't shine anymore.

Onward. The following are called "crucibles" because they exist in a complex contradiction that YOU as the programmer must resolve. They are koans. There is no set of rules or guidebook in this terrain because you'll be forging the path, otherwise you'd earn nothing. Meditate on them until you understand them.

GOLDEN CRUCIBLE #1: ClotheYourData and LeaveItNaked.

This is the drill-down and build-up rule.

Modular and object-oriented programming is about "clothing your data": putting a name or conceptual unit around your otherwise naked code or data bundles. This maxim applies to variable names, object names, and the file names holding your source. Apart from proper formatting, this is *the* step that allows your code to be readable "from the inside".

However, when your problem isn't broken-down well, it can lead to bulky, poorly-fitting structures that no one wants to re-use (and can often indicate that you're working on the wrong problem. See "folksonomy"). So it has a companion rule: LeaveItNaked (a.k.a DontContainTheUnknown).

The first example of this tension comes when naming index variables. What should you call it? Well, if you know nothing about it, LeaveItNaked would say to make it as generic as possible, like "i". But unless it's coming from the void, tie the variable name to where it originates (from the mouse: "click", keyb: "keypress"). If you know it's going to be a number or a character, call it "n" or "c" unless you're writing for beginners (in which case, you can use "num" or "ch" ;^).

Keeping your variable name as *small* and as *meaningful* as possible (another tension to optimize!) will prevent annoying variable re-naming later. The LeaveItNaked rule will help you not to put too much clothes on your data.

So, when you have a meaningful collection of data, put it in a struct or other linguistic force to group data and give it a name. It now becomes a unit. When you have this AND methods or operators to go with it, you have an object: make it so. Else, without a meaningful category or uniting force for some data, a grouping will only confuse everyone else and prematurely constrain your data.

Your programming environment, to make it enormously easier for you, has created keywords and variables: keywords are like machine instructions, variables are names to associate with data. To make things simple for everybody, you're not going to name any of your variables with a name from the list of keywords, ok? And, remember at some point, code is data, too.

These little programatic things provided by your language and computing environment allow your concepts to be flattened out into individual working expressions or *sentences* and composed together in way that is grammatically correct according to your programming language. A symbol (+,?,|,^, etc.), by the way, should be seen as an extra-short keyword.

Working together, the two directives for this maxim produce a virtuous and powerful tension with each another. End result: your code becomes beautiful.

GOLDEN CRUCIBLE #2: DomainsOfUseability and ZeroSuperfluousNames.

Unless you're well-along your path already, this one is probably the toughest one crack, but once you do it your code will become technical masterpieces. This rule is the abstract upwards and winnow the chaff rule, and is opposite of the first crucible. Think of Separability of Domains to mean the creation of multiple domains of usability.

You want to maximize lexical separability of program function so that they make the most meaningful units of modular code (i.e. you don't want everything lumped under one main function). BUT you also don't want a lot of excessive names polluting your namespace without good reason. What's the good reason? If there are more than two legitimate uses for your code (even if you don't use it yet), separate it into it`s own module/object/function with it's own name. Done right, there should be little to no code duplication NOR function harnesses or Objects that only get used once.

Apart from LISP machines, all computers use variables (names on data). Names for one purpose should be distinct from names of a different purpose (SeparabilityOfDomains). Constants, for example, shouldn't look like variables. Syntax highlighting can't do this for you, so you have to perform this in the source text itself. Names act as an anchor. They are the gravitational attractors for other eyeballs which may be circulating around your code, so make them count. Languages are defined by how well they allow you to do this (having classes for objects, for example).

Besides careful name-choice, you have three techniques available for making names distinct and more meaningful: capitalization, parts of speech, and using special characters. These are minor, compared to how well your language assists you in separating out different domains, but are included here for completeness.

  1. Capitalization. Ideally, you'd use case to indicate "Parent", but this isn't practical in a real data ecosystem where objects are being examined and re-used, so use it in your source text to indicate something you intend to be passed around. That communicates something valuable to anyone who reads your code also -- it's like saying "ready for work". It's like the act of giving birth for your object (or module) to the outside world. And rather than ALLCAPS on a constant, use _underscores_ which also makes constants easier to find -- keeping them lowercase if they're for "internal use only".
  2. Use verb-words for methods/functions, nouns for classes/variables (unless they're a pointer *to* a function, in which case call it a "pointerToX" or "pFunction"). This is a general rule, but for Objects that are designed to recede to the background and act as a go-between (like a network socket, for example), a verb might be more appropriate. Of course, all this is prior to satori. After enlightenment, you know that you only need << (in), >> (out), ? (query-name), % (clone) operators and that ObjectComposition will do the rest. The clone operation may even be provided by the interpreter environment.
  3. If your language allows it, "?" and "!" characters at the end of function names can signify that they're a boolean query or will write-in-place, respectively.

These maxims will allow your code to be readable from the outside (the above view). Here endeth the lesson on separability.

GOLDEN CRUCIBLE #3: TightCoupling (code <--> documentation <--> tests) and Brevity.

The first rule of documentation is ProperNaming, so be sure to follow the first two golden ways.

Let you're ReadyForWork objects have comments or DocStrings underneath their definition header to form a self-encapsulated and documented, re-usable object.

By encapsulating your documentation with the code, you help ensure it stays up-to-date. Languages like Python define this into the language with DocStrings. Python DocTests go further and facilitate testing by allowing test code in your documentation. Read the Python DocTest module source by Tim Peters for some very good reasons why it's good.

This MAXIM will make a data ecosystem fun and complete. Good tests are instructive to everyone who uses your object. Good object/module documentation makes it easier and feels safer for others to re-use your code.

End result: You become a bona-fide steward of the programmer and data ecosystem and can take part of the management and value-generation of peta-bytes of knowledge amassed by humanity. Sweet!

Conclusions[edit]

  • The right outcome of the battle between You and RushingToExecutable is Maintainability (of your code) and Reserve (of yourself).
  • The right outcome of the battle between You and Cruftiness is Parsimony (read: simplicity + harmony) and Mastery.
  • Together the end result is Elegance and you become the Victor! Now go get your Eye of the Tiger and roll!

Attributions[edit]

Special thanks to the denizens of RefactorMercilessly, Bjarne Stroustrup for the concept of encapsulation, and Niklaus Wirth for the concept of modular programming, and many other old-school programmers who forged the way. Also, WikiWords are documented at the wikiwikiweb: [[1]]. If beginners come your way, direct them to the OneTruePath.