Hacking with the Tao
<rant>This guide is for those who what to tapped power of modern computing and the Internet and perhaps have been getting the feeling that it's all been co-opted by jabbascript, login portals, and e-commerce (perhaps even by... systemd?). What is this untapped power, you might ask? Of a million nodes of interconnectedness, unused computer cycles, and giga-storage? That power is the same power of the original internet -- a concept in computer science called the graph: of inter-linked content and of peer-to-peer, value generation. These two give you polynomial, O(n2*n2), value generation. Add the ability to form arbitrary groups (mailing lists, social networking, etc), and you get the power of Reed's Law and super-exponential value generation O(2n) -- the power of the original internet. Add transactions and new economy and you can get O(nn) -- you've re-made the revolution. But where can you get that now? No where. In this century, the internet has been reduced to about O(n2) -- if you're internet-savvy. That would be a lower-bound that should be happening now and it's not due to monolithic sites locking you in to compete for your attention and the fact that hardly anyone uses the power of hypertext anymore. It's all text blogs and commerce, "glam and sham" -- repeating, as Marshall McLuhan said: "The content of a new medium is always the old one.". Most users aren't savvy enough to tap even Metcalfe's value out of the net, they're more like O(n log n). But hey, surely more than one of us kept the memory of the true power before it got bogged down into "browser wars", right? Hence, this guide! </rant>
Computers and their programming generally fall into one of these rough categories:
- scientific computing (with a focus on performance and versimilitude of simulation),
- business use (whose essential need is data stability and security),
- learning and entertainment,
- making and participating in a shared programmer "ecosystem" (Free software, ObjectOrientedProgramming).
This guide is mainly for those in the last group, check it. We're in the Internet Age here, not mainframe computing. Gosh! For those who have made Operating Systems or socket network applications you are already halfway down the Path. The hope of this guide is to be so abstract as to force you, the computer programmer, to think of programming language design itself as a part of the programmer`s challenge -- that the program is only part of the equation and the programmer is a type of crafter with the language as the tool and the machine as the "uncarved block". The reason: data and their relationships are growing faster than any programmer`s code. There are exabytes of data out there now, mostly feral. As such, this is for those who want to go all the way and forge it into something insanely great. Something beyond what any of your leaders have even thought about. This is what I'm talking about. We can do it.
Oh, if you're one of the JS flybois, this guide is for programmers, not brogrammers (pronounced broewwgrammers); this might be too much for you, please get back on the OneTruePath. Neophytes, feel free to use that link also and upgrade your connection to the machine.
Most of what passes for "software engineering" doesn't integrate all three major specializations of the field: smart architecture, sound engineering, and dependable construction. Just like erecting a large building. No, most programming teams are in continual stress that runs between order and chaos, pushed down by IQ-deprived managers who have little mastery of the problem domain itself. You can't do your data architecture if you don't know the domain in question. And you can't do good engineering if none of your coders has a clue about machine-level instructions and architecture. Making good software is just as intensive and specialized as erecting a building. A building you have to live in too. If you push bad software, you'll generally find the roof collapses over your proverbial head. This document teaches the three specializations, plus one more if you know how to look: project management.
The way to mastery is a long path if you don't have a guide. It's not something that the whole canon of certifications can give you. Wrestling with the allies will make that long path shorter and straighter. Here, the computer is the primary feedback on your success. It can hone your critical-thinking skills like no teacher can, as it allows no sloppiness and gives no favor. If you're content being a mediocre programmer, go get your certification, display it on your resume, and be done with it. Most people succumb to sucking it for the anti-Christ, generating gigabyte silos of data and code that no one else can use, or they content themselves with an academic understanding and spend their life writing theoretical papers without really knowing the craft of the true programmer. Those are hardly noble enough destinations for the most powerful resource of knowledge-sharing and data processing possessed right NOW.
For those who'd like to jump ahead, start up your abacus and study the Gospel of Object Oriented Programming. If you don't know why you want Object Oriented Programming, you're probably coding micro-controllers (or LISP). This former arena can simply be termed "programming" rather than "software engineering". As for LISP, use it to learn to think like a master, and then drop it -- it doesn't have the object architecture or type system for what we need. Implementing TightCoupling (with those silly parenthesis), for example, is like wearing a bikini to a formal. LISP is the bikini: efficient in one dimension only. Your data will get shuffled around in a large data ecosystem, and your tight lisp data structure will get violated in every way. ClotheYourData, non-existent.
Like any worthy path, there are many counterfeits. Some of them speak as if they have the Way, but when the glamour fades, you'll be abandoned just at the point where the road forks into a thousand different paths. And then what are you going to do? For all the programmers out there, there are very few masters. You don't have to believe me, but the four allies are all you need. Just bring this little guide with you and test my words with your own experience...
...But, hey, if towering monoliths of proprietary data are good enough for you, you can click your browser`s "back" button now. Otherwise, plan on growing your neck-beard (or female equivalent) and let's go.
You shouldn't go into battle without knowing your opponents. On the way to becoming a mistress/master of the machine, you'll see there are only two real adversaries, but they are always alongside you, often coming from unexpected corners. Like Death, they are invisible and ready to consume you.
ADVERSARY #1: RushingToTheFinish. You're excited to get a program to work as soon as possible! But, hey. While everybody likes to see immediate results, down the road this short-term savings won't be extensible--if it's even comprehensible. Most code should get re-used or appreciated, otherwise why write it? We're in the Internet Age here. Most highly-specialized, highly-worked routines are brittle and break when used in someone else's code. Meditate on this instead, grasshopper. Hidden blow: Prematurely optimizing your code.
The best counter-weapon for this battle is TestDrivenDevelopment.
ADVERSARY #2: CruftMonster. You made yourself a nice groove; it works ("AFAIK", right?). You don't want to do the work to re-write it. That, friend, is the scent of the CruftMonster. OR maybe this: You wrote some code awhile back. You're not sure how it works anymore, so you don't want to change it, eh? You pussy-foot around it as if treading on the ice atop a deep lake that may crack at any moment. That's called cruft. I'm assuming long-term time-savings is a shared value in this guide, yes? You're a software engineer, not just a brogrammer, working for LOC? So, you bite the bullet. ==>Guided meditation<==. Hidden gotcha here? Believing in all your working code. Nothing spells "cruft" like True Believers. They will pass on their problems and go out for a beer while the tech support crew gets hammered. That's why there's The Gospel.
The best weapon for this battle is RefactorMercilessly which you can draw out of your bag of ExtremeProgramming.
Don't enter a battle arena unprepared. Without exaggeration, these two have captured many souls, now lost. Those who succumb to the first, hardly make it out of Scriptland -- never even knowing they've become victims. Ask them if they're good programmers, and they say "Yes, sure." Those who succumb to the second -- help me Thor -- rarely, if ever, come back. They curl over and merge into their cubicle like fat pupae. Got it?
After many decades of wrestling with these enemies in the field, some powerful allies have been found. Keep them close to you at all times. They should work on you even when your desk-work is done. Despite their name, you shouldn't assume they're friendly. They are primal forces for forging YOU, the apprentice, who wants to be a MASTER. The value of the allies exists independently of this commentary. Treat them well and they'll stay dependable guides. Used properly, the allies harness the forces surrounding your task and forge you.
Being somewhat independent of your personal thoughts, the allies are "crucibles" -- they exist in a complex contradiction that YOU as the programmer must resolve. They are koans. There is no set of rules or guidebook in this terrain because you'll be forging the path yourself, otherwise you'd earn nothing. Meditate on them.
- NameFu :: ClotheYourData <==> LeaveItNaked.
- LogicNinja :: TightCoupling <==> Compactness.
- ModularityDojo :: ZeroSuperfluousNames <==> SeparabilityOfDomains.
- AikidoGate :: ReduceMemoryFootprint <==> MinimizeProcessing.
Now these, plus some knowledge of digital logic, are wholly sufficient for the adventurous programmer to master the craft and call themselves an engineer. The last crucible should be read in the voice of Blade Runner "watching C-beams glitter in the dark near the Tannhauser Gate" ; it decides who's in and who's not. Read on, though, so you can see the path a little clearer.
Those who put faith in abstractions from academia have confused the path. So let's get this straight: you're on a vonNeumann machine not a Symbolics. That means computer programming consists of loading a simple programming language statement and any data that goes along with it, executing said statement (raising any errors or exceptions that occur upon such execution), and continuing forward excepting on some condition otherwise and repeating this sequence. It's like a little Turing Machine, the basis for much of Computer Science, that. It may or may not finish. All that's to say: Let's not go on a joy ride upon the language-du-jour, 'k? We will, at some point, be worried about memory and CPU performance as all good programmers should, we're just saving it to last -- just as Master Bentley taught us.
Also, don't lose sight of the fact that there aren't really any 1s and 0s running around in the machine. Those are just helpful abstractions to develop consistent behaviors that can be pushed into computation. They could have been made by red and blue lights -- whatever qualities pass the tests: disting0ishab1e, transf(x)rmable, pass<->able, and STORable -- the same things needed for defining a Turing Machine. For our general-purpose computer, we need one more thing: repeeeatable -- like a billions times.
The major tools for writing your program are Divide, Conquer, Encapsulate, and Optimize. The word "conquer" should be considered in the sense of "ratiocination": the application of exact logic. Apart from your programming environment (which should be a reflection of your relationship with these), there is nothing more. Take the cross-product of each combination, and you have the complete toolkit (e.g. Divide x Optimize = Profiling).
The Outer Process
The full stack for true coding consists of various layers from the highly abstract at the top to the very concrete where your actual physical computer is sitting and where your code needs to run. From the top, you have the Capt. Dynamite --> Interpreter--> Compiler--> Hardware. Web programming (once again) conflates two and three and adds another layer between the compiler and the hardware called a "virtual machine". So the word "compiler" is a bit ambiguous, much like "interpreter" , which in this context is not to be equated with "interpreted languages", but more like "parser". This document is strictly for non-web programming, but capable of creating an application that can create a web programming environment (note to JABBAscript flybois: YOU won't be doing that anytime soon).
For the four terms listed previously, each encloses something, like this (respectively): (concepts)--> (expressions)--> (imperative statements and simple boolean expressions)--> (digital logic). That is, the Captain (YOU) processes concepts, the Interpreter processes expressions, the Compiler converts these into imperative statements and simple boolean expressions, and the Hardware takes and runs the digital stream. The force that links the Captain to the Interpreter is called programming, from Interpreter to Compiler is the parser/lexer, and from the Compiler to the Hardware you have the loader at program execution which is generally determined by your Operating System (or machine room ;^) environment. Beyond that, should be the unwavering execution of physical law.
An assembler is partly a low-level compiler in this definition, because it generally takes imperative instructions (most are not expressions) like MOVE, JUMP, XOR and putting it into machine code (MOV ax, 1 on an 8-bit machine might become 0A01). This (immediately-transformed) binary machine code operates through the logic gates on the CPU in whatever complex ways the manufacturer has stamped upon it and specified: generally relying entirely on the resolution of the voltage gradients your instructions have set-up within. You could then almost say that the universe actually does the computation (there's still something that has to drive the clock). That whole endeavor is an art in itself, called "computer engineering". Additionally, an interpreted language like Apple BASIC has to be compiled at the point you type RUN. This is true regardless of whether the language is called an "interpreted language" because the CPU can't process and never sees ASCII characters (like "GOTO 100"), it only deals with 1s and 0s (or *ultimately* voltage gradients).
So that's the stack. Now what do you do?
Gathering the Four Forces
Divide and Conquer you know by now and are the basics of every computer science course. Basically, instead of tackling the large problem, you break it down into smaller problems that you can solve and conquer that. So to understand the other two.... What does OOP get you, for example? Modularity. This is what I'm calling "encapsulate". This was previously gained by named functions (and files). Include data with your functions and you have encapsulation. Put a name on it and you have an object. OOP begins there. The idea is to have everything relevant to the task travel around with some handle in which it can be referenced.
With Encapsulate, you're synthesizing lower-level commands and expressions into higher-level abstractions to make a referable concept. Each function, routine, class, and object presumeably represents a solution and higher-order abstraction compared to what came before. Most domains are complicated and in order to architect your code well, you need to learn how to see it from 100mi high. This arena has rather peaked with the ObjectOrientedProgramming paradigm, but the covert purpose of this manual is to get beyond their plateau. The basic idea is eliminating duplication of code and move it to a single named object for their common purpose -- unless it is semantically separate.
Optimize was also always there, but it went as a generally-unspoken value among the masters. With high-level languages and web programming, optimization seems to have gone extinct, but it remains the sign of a genuine master. In isolation, optimization is a false peak. It is seductive because it's so concrete, so measurable. But before you hone your code, hone the right code. Are you conceiving of the problem correctly? Learn Big O notation.
But these four tools should be seen as mutual-aids and paired on two separate axiis. Your program should be the circle that encloses them. The four allies are related but stay on the outside "exerting the pressure". Bonus credit if you find that the four allies and these four tools are related.
The task of mastering the machine has already started in several ways:
- Your language designer already divided the problem of getting computers to do your bidding and you have learned how to command it.
- Secondly, the history of computing has already conquered the whole domain by separating I/O from Processing in order to make a General Purpose Computer. This is a giant help and saves you buttloads of time wiring I/O junctions to your panels.
- The document you have here is your path to optimize the process of utilizing the machine to it`s full potential.
- Lastly, you could be the encapsulation who's going to make it happen!
Test-driven development with doctests is the catalyst to glue these forces together to create a data/code ecosystem. It is both a security blanket and documentation for others and yourself. That blanket is helpful because of the large disconnect between your code and it`s results. By getting rid of any ambiguity, you can move on with other tasks, allowing a faster cycle of: coding -> running -> feedback -> re/architecting. That is the essence of Agile and eXtreme Programming, by the way. It will accelerate your progress towards the "insanely great".
Practically, there can be several different "breakdowns" of how you go about structuring your program, but if you're constraining yourself to a particular programming language, chose the axis in which your language was designed: concurrent vs. sequential, procedural vs. functional, iterative vs. recursive dimensions, et cetera. If you're really going "hard-core", choose a hardware architecture that fits your application. Otherwise, if you're not constraining yourself, choose Python/C and make it easy on yourself.
Understand how to construct expressions in your language. Understand how to join these "sentences" together to tell the logical story you wish to inform the computer. Learn how to save that, compile it, and run it. Know that, just like in literature, there can be several ways to fashion a sentence and say the same thing, but pick the best way: accurate, precise, clever, and brief. For Object Oriented Programmers, you should check out the the Gospel, so you can be sure of salvation.
Basically, YOU are going to constrain yourself at the top by keeping the purpose for your program in mind while your language will constrain you at the bottom, forcing syntactically-valid expressions. That leaves the middle with the most degrees of freedom, putting modular programming in mind. So the first thing to do is to start there, writing boilerplate. If your language is as easy as Python, you can write your function stubs and tests first (showing and documenting all the behavior you want to be considered passable) and save the first program that will run (passing the test). If it doesn't give you the right result, you've gained some knowledge and continue the cycle -> go back and correct it. Continue fixing your program until it passes all tests. The general idea is that you're going to keep constraining yourself by dividing the problem with your function stubs and tests, while the design of the language keeps constraining you at the bottom forcing you to conquer your logic. In the middle, you encapsulate -- forming abstract, categorical groupings (named functions, classes, modules), while simultaneously honing your code towards greater perfection until it works and becomes a work of art that other programmers can appreciate and re-use. Add contracts and you've put the polish on your code and can ship it out to the wild. You're mastering coding, memory use, object communication, and sharpening your code towards the perfect amount of terseness (truly vorpal code). This scales from the CPU to larger and larger units (modules, files) like a fractal until you're at the top and can just send the data, that is: "run" and like a perfect chain of cause-and-effect, it goes. That's the whole process in a paragraph. Click on each link for the 3-hour tour, Skipper.
If you're not coding alone, you do it like this. It's called PairProgramming. One of you writes all the tests and the other writes the code. Let the one who conceived of your initial DataModel rest while the others start with writing the tests. As soon as you have the first tests written, the resting person starts righting code.
Started off with the wrong data model or got a whole team? Whiteboard out your main architectural blocks as an outline until you get to a fine-grained enough level that you can start writing tests. Put each block into a file (for your revision management system) and let programmers decompose different spots until you can combine files and have a workable app.
TestDrivenDevelopment without tools like Python's DocTests become their own bureaucracy. The goal of the GPL and a data ecosystem is community. That means re-useable code/objects, which means trustable and understandable with a minimum of effort. Readable examples, like DocTests, are other programmer`s best path to understand and trust your code. When you've got an ecosystem as big as the Internet, you don't want to have to _wonder_ about the logic in other objects. DocTests give you that and you get code-testing as a bonus -- truly great.
Note that I haven't said anything about revision management (where I'd recommend svn since it has the best partnership with the allies) or code editors (two other hidden "attractors" of programming) because your relationship to the allies determine these. Remember they are what's going to hone you into a master -- not the language and not your tools. These essential tools help you with ClotheYourData, SeparabilityOfDomains, TightCoupling, and someof(MinimizeWork|ReduceMemoryFootprint), but this only covers half of the crucibles, at best. So, you have to figure out whether that's your path or not -- nothing else.
There is another another set of tools, commonly filed under code analysis, but I haven't seen anyone implement them well. Most programmers simply aren't good enough to architect large code bases for complex problem domains as might be encountered in big business, and most tools mislead them because they don't understand that either (hence Agile's ascendance). Nonetheless, like code profilers, they could offer the programer important lenses into their semi-linear (2-d really) codebase. One tool, for example, would be a code visualizer that graphs all objects and their function calls into a 3-d visualization with edges to indicate where calls were made. You could even step through the code and watch the action move around between screen objects, clicking the objects to inspect them, etc. The other is add a heat map to your profiler (or even aforementioned visualizer) that adds degrees of color so you can see the degree of usage of every function or object call in a way that the visual cortex can process immediately.
Anyway, stamp a version number on your program when you get something interesting (<1.0 for non-working apps). Use major numbers (before the decimal point) for incompatible version changes (old input files aren't readable, for example) and the minor numbers (on the right of the decimal point) for minor, compatible changes. Stick with this mechanism and there's less explaining to do. You can have any number of digits to the right of the decimal point, so you won't run out of minor versions.
When you correctly apply the concept of using and understanding the allies while you code, you'll see that the four allies are perfectly correlated, respectively, to the four processes and have always been working on the programmer's soul. Any stagnation you have is because you've let one of the enemies into your practice. If you find yourself at a programming or personal impasse, explore the meditations under "adversaries". But, when you're looking for an environment with other like-minded coders, to master your techniques and work on some awesomeness, try joining the Church of Free Software. Or, perhaps drop it all for awhile and distract yourself, work on an engine, or Hack the Law. Ultimately, programming is just a bridge, a means to an end where computers may not even be necessary anymore. It's at that point (and that point only) that you have to decide whether you've reached the end of your path.
Until then: lather, rinse, repeat. Chill out with Lonny Zone when you're not doing that. You program and know it because there's nothing better to do with such awesome tools. There is indeed something else to convince you, but I'll have to save it for when you reach the same conclusion that I did. ;^D
The Chill-out Lounge
You have working code. It's documented; includes tests so others know it will work as they expect. It has reusable functions, objects, and/or modules, demarcated from internal functions and objects by some syntactical flourish, and it does something interesting or re-useful. So...
The right outcome of the battle between You and RushingToTheFinish is Maintainability of your code and personal Reserve.
The right outcome of the battle between You and the CruftMonster is Parsimony (read: simplicity + harmony) and personal Mastery.
I'm going to give you another lens in which to view the value of the tools you have at your disposal: the GeneralPurposeComputer and the Network. Earlier I suggested that these, with transactions and the creative economy, will give you O(nn) value. Since this was not an algorithm analysis, we should be using big-omega (Ω), as the claim was a view from the outside, from the top, but I don't have time to update this document. Here's the analysis from the data scientist`s perspective (I'll use the term m to distinguish from the usage above):
The basic programming environment gives you O(m) value out of the computer by turning repetitive tasks into simple computer routines that, thereafter, make them constant complexity (done in the mind of the coder). The other complexity being managed is your data. There, OOP and named functions will simplify things further to get you an additional O(m) value. The value from these two tools, I'll call looping and grouping. (What is m?, then, you might ask: m is representing the complexity of the world and you are condensing this complexity into something simpler and more manageable -- that is what all these electronic gadgets are doing generally. Unless you're going to simulate the whole universe, I'd estimate m on the scale of millions. Combine them and you don't get m + m or even m * m. No, OOP allows you to create arbitrary groupings so you get 2m value -- that's about enough to simulate a universe. Why is it geometric? Reed's Law again. Their value is like having two axiis/dimensions rather than one. O(2m): that's the general value of programming enterprise and you can see how Steve Jobs and IBM made the personal computing revolution.
There's more, though. We're not in the 70's. Again, I'll change the term used (p) and now add the social scientist`s perspective. We've got p programmers and so get p as a multiplier, each one having access to this 2m value of the general purpose computer. Adding the network combines another p factor via the time savings of sharing data + code. This is linear because, as the computer just sees both of these as the same (a la VonNeumann), it's simply transferring these across without processing them. The data ecosystem combines these two into O(p * p) or O(p2) -- Metcalfe's basic value of a fully-connected network, as you should expect, but p here is the people.
Now, how do you leverage the value of the Internet revolution for software engineers? How do you combine these two separate tools offering O(p2) and O(2m) value to express their awesome, theoretical power? Again, they are orthogonal to one another. The former is like two axiis on the plane, the other is like the objects created thereupon. So, cancel out the common exponential in each one and you get O(pm). (That's a neat trick, eh?) With the vastness of the internet and the world at large as people continue generating data over time, m and p approach parity. In other words, the needs of a functioning world government and it`s complexity of data will tend to make m (the quantified or ordered complexity of the world) be rise with p (the population). Therefore, I'll argue, that the value of our project here is O(nn) (distilling back to our original term: n). Radical. And that's not even all, because we can be free agents in a thriving macrocosm (offering another cross-product with a separate O(nn) system) with our technolgoy. That's O(nn). Tetration. Divine Plans. Grok it.
Whatever the case, once you make it to the end of this road, you'll have found that the enemies and techniques of programming are applicable to life itself: don't rush, but beware of your own cruft. You will have achieved Elegance and have become the Victor!
Perfection is attained not when there is nothing more to add, but when there is nothing left to take away. -- Antoine de Saint Exupéry
Now sacrifice your Amulet of Yendor and let's go!
Special thanks to the denizens of RefactorMercilessly and the wikiwikiweb (creator of WikiWords and home of eXtreme Programming), TimPeters for inculcating Python with some powerful Zen via DocTests, Bjarne Stroustrup for providing the concept of encapsulation to me, Niklaus Wirth and PLU for the concept of modular programming and Big O notation, Bell Labs for inventing C and many other old-school programmers who lit the Way. Inspiration for this document and format from don Juan (Carlos Castaneda) and the Tao te Ching. See also the C FAQ for some powerful wizardry. Folks from wikiwikiweb, feel free to add your name to this document below -- you helped make it happen. Sunlight Labs: Get in touch! Xer0Dynamite (talk) 15:00, 17 September 2015 (UTC)
Other credits: .add.your.name.here.