Hacking with the Tao
Zen Way to Code, v.20161219
<rant>This guide is for those who recognize the untapped power of modern computing and the Internet and perhaps have been getting the feeling that it's all been co-opted by jabbascript, login portals, and e-commerce (perhaps even by... systemd?). What is this untapped power, you might ask? Of a million nodes of mostly-unused computer cycles, storage, and interconnectedness? That power is the same power of the original internet -- a concept in computer science called the graph: of inter-linked content and of peer-to-peer value generation. These two give you polynomial, O(n2*n2), value generation. Add the ability to form arbitrary groups (mailing lists, social networking, etc), and you get the power of Reed's Law and super-exponential value generation O(2n) -- the power of the original internet. Add transactions and new economy and you can get O(nn) -- you've re-made the revolution. But where can you get that now? No where. In this century, the internet has been reduced to about O(n2) -- if you're internet-savvy. That's the level we had with the telephone and television due to monolithic sites locking you in to compete for your attention and the fact that hardly anyone uses the power of hypertext anymore. It's all [2-d] text blogs and commerce, "glam and sham" -- repeating, as Marshall McLuhan said: "The content of a new medium is always the old one.". Most users aren't savvy enough to tap even Metcalfe's value out of the net, they're more like O(n log n). But hey, surely more than one of us kept the memory of the true power before it got bogged down into "browser wars", right? Hence, this guide! </rant>
Computers and their programming generally fall into one of these rough categories:
- scientific computing (with a focus on performance and versimilitude of simulation),
- business use (whose essential need is data stability and security),
- learning and entertainment,
- making and participating in a shared programmer "ecosystem" (Free software, ObjectOrientedProgramming).
This guide is mainly for those in the last group. For those who have made Operating Systems or socket network applications you are already halfway down the Path. The hope of this guide is to be so abstract as to force you, the computer programmer, to think of programming language design itself as a part of the programmer's challenge -- that the program is only part of the equation and the programmer is a type of crafter with the language as the tool and the machine as the "uncarved block". The reason: data and their relationships are growing faster than any programmer`s code. There are exabytes of data out there now, mostly feral. As such, this is for those who want to go all the way and forge it into something insanely great. Something beyond what any of your leaders have even thought about. This is what I'm talking about. We can do it.
Oh, if you're one of the JS flybois, this guide is for programmers, not brogrammers (pronounced broewwgrammers); this might be too much for you, please get back on the OneTruePath. You traded your passions for glory. Survivors of that bullshit <*ahem*> or other neophytes may want to use that link also.
For those who'd like to jump ahead, please start up your abacus and study the Gospel of Object Oriented Programming. If you don't know why you want Object Oriented Programming, you're probably coding micro-controllers (or LISP).
Making software is like erecting a building. You have to have architecture, engineering, and construction -- and then you have to live in it. This document encodes all three, plus one more if you know how to look: project management.
Like any worthy path, there are many counterfeits who claim to be a master. Some of them speak as if they have the Way, but when the glamour fades, you'll be left with just trinkets. For all the programmers out there, there are very few real masters. You don't have to believe me either, and should test my words with your own experience. Just bring this little scribble with you along the way and see for yourself...
The way to mastery is a long path. It's not something that the whole canon of certifications can give you. This guide is here to make that long path straighter, less windy. Here, the computer is your primary teacher. It can hone your critical-thinking skills like no teacher can, as it allows no sloppiness and gives no favor. If you're content being a mediocre programmer, go get your certification, display it on your resume, and be done with it. Most people succumb to sucking it for the anti-Christ, generating gigabyte silos of data that no one else can use, or they content themselves with an academic understanding and spend their life writing theoretical papers without really knowing the work of a true programmer. Those are hardly noble enough goals worthy of the most powerful resource of knowledge-sharing and data processing you have in your hands, right now.
...But, hey, if towering monoliths of proprietary data are good enough for you, you can click your browser`s "back" button now. Otherwise, plan on growing your neck-beard (or female equivalent) and let's go.
You shouldn't go into battle without knowing your opponents. On the way to becoming a master of the machine, you'll see there are only two real adversaries, but they are always alongside you, often coming from unexpected corners. Like Death, they are invisible and ready to consume you.
ADVERSARY #1: RushingToTheFinish. You'll be inclined to get a program to work as soon as possible. Everybody likes to see immediate results, but down the road, this short-term savings won't be extendible--if it's even comprehensible. Most code should get re-used or appreciated, otherwise you've (once again) reduced the whole value of the enterprise. We're in the Internet Age here. And I'll appreciate your slim code for your ultra-elite application after you re-write it modularly so I can use it for my applications. Meditate on this instead, grasshopper. Hidden blow: Prematurely optimizing your code.
The best counter-weapon for this battle is TestDrivenDevelopment.
ADVERSARY #2: CruftMonster. You made yourself a nice groove; it works--enough. You don't want to do the work to re-write it. That, friend, is the scent of the CruftMonster. OR maybe this: You wrote some code awhile back. You're not sure how it works anymore, so you don't want to change it, right? You pussy-foot around it as if treading on the ice atop a deep lake that could crack at any moment. That's called cruft. I'm assuming brevity is a shared value in this guide. You're a programmer, not a shill working for LOC, right? So, bite the bullet. ==>Guided meditation<==. Hidden gotcha here? Believing in all your working code. Nothing spells "cruft" like True Believers.
The best weapon for this battle is RefactorMercilessly which you can draw out of your bag of ExtremeProgramming.
BTW, the false prophets mentioned at the start are like something incomplete. They are as demons enslaved by these enemies and now feed off of new meat -- best keep your distance.
Don't enter an arena unprepared. Without exaggeration, these two have captured many lost souls. Those who succumb to the first, hardly make it out of script-land -- never even knowing they've become victims. Those who succumb to the second -- help me Thor -- rarely, if ever, come back. They curl over and merge into their cubicle, like pupae. Got it?
After many decades of wrestling with these enemies in the field, some powerful allies have grown. Keep them close to you at all times. They should work on you even when your work seems done. Despite their name, you shouldn't assume they're friendly. They are primal forces for forging YOU, the apprentice who wants to make something great. The value of the allies exists (and can be used) independently of my commentary. Treat them well and they'll stay lustrous. Used properly, the allies harness the opposing forces and hone you into a MASTER....
Being somewhat independent of your personal thoughts, the allies are "crucibles" -- they exist in a complex contradiction that YOU as the programmer must resolve. They are koans. There is no set of rules or guidebook in this terrain because you'll be forging the path, otherwise you'd earn nothing. Meditate on them.
CRUCIBLE #1: NameFu :: ClotheYourData <==> LeaveItNaked.
CRUCIBLE #2: LogicNinja :: ZeroSuperfluousNames <==> SeparabilityOfDomains.
CRUCIBLE #3: ModularityDojo :: TightCoupling <==> Compactness.
CRUCIBLE #4: TheOptimizer :: ReduceMemoryFootprint <==> MinimizeProcessing.
Now these, plus some knowledge of digital logic, are wholly sufficient for the adventurous programmer to master the craft. But if you want a little more guidance, read on.
Those who put faith in abstractions have confused the path. So let's get this straight: you're on a vonNeumann machine not a Symbolics. That means computer programming consists of loading a simple programming language statement and any data that goes along with it, executing said statement (raising any errors or exceptions that occur upon such execution), and continuing forward excepting on some condition otherwise and repeating this sequence. It's like a little Turing Machine, the basis for much of Computer Science, that. It may or may not finish. All that's to say: Let's not go on a joy ride upon the language-du-jour, 'k? We will, at some point, be worried about memory and CPU performance as all good programmers should, we're just saving it to last -- just as Master Bentley taught us.
Also, don't lose sight of the fact that there aren't really any 1s and 0s running around in the machine. Those are just helpful abstractions to develop consistent behaviors that can be pushed into computation. They could have been made by red and blue lights -- whatever qualities pass the tests: disting0ishab1e, transf(x)rmable, pass<->able, and STORable -- the same things needed for defining a Turing Machine. For our general-purpose computer, we need one more thing: repeeeatable -- like a billions times.
The major tools for forging your program are Divide, Conquer, Encapsulate, and Optimize. The word "conquer" should be considered in the sense of "ratiocination": the application of exact logic. Apart from your programming environment (which should be a reflection of your relationship with these), there is nothing more. Take the cross-product of each combination, and you have the complete toolkit (Divide X Optimize = Profiling).
The Outer Process
The full stack for true coding consists of various layers from the highly abstract at the top to the very concrete where your actual physical computer is sitting and where your code needs to run. From the top, you have the Biff, the Programmer--> Interpreter--> Compiler--> Hardware. Web programming conflates two and three and adds another layer between the compiler and the hardware called a "virtual machine". So the word "compiler" is a bit ambiguous, much like "interpreter" , which in this context is not to be equated with "interpreted languages", but more like "parser". This document is strictly for non-web programming, but capable of creating a application that can create a web programming environment (note to JABBAscript flybois: YOU won't be doing that anytime soon).
For the four terms listed previously, each encloses something, like this (respectively): (concepts)--> (expressions)--> (imperative statements)--> (digital logic). That is, Biff<YOU> processes concepts, the Interpreter processes expressions, the Compiler processes these into imperative statements, and the Hardware takes and processes the digital stream. The force that links Biff to Interpreter is called programming, from Interpreter to Compiler is the lexer, and from the Compiler to the Hardware you have the loader at program execution which is generally determined by your Operating System or (machine room ;^) environment. Beyond that, should be the unwavering execution of physical law.
An assembler is a low-level compiler in this definition, taking imperative instructions (not expressions note) like MOVE, JUMP, ADD and putting it into machine code (MOV ax, 1 on an 8-bit machine might become 0A01). This (immediately-transformed) binary machine code operates through the logic gates on the CPU in whatever complex ways the manufacturer has stamped upon it and specified: generally relying entirely on the resolution of the voltage gradients your instructions have set-up within. You could then almost say that the universe actually does the computation (there's still something that has to drive the clock). That whole endeavor is an art in itself, called "computer engineering". Additionally, an interpreted language like Apple BASIC has to be compiled at the point you type RUN. This is true regardless of whether the language is called an "interpreted language" because the CPU can't process and never sees ASCII characters (like "GOTO 100"), it only deals with 1s and 0s (or *ultimately* voltage gradients).
So that's the stack. Now what do you do?
Gathering the Four Forces
Divide and Conquer you know by now and are the basics of every computer science course. Basically, instead of tackling the large problem, you break it down into smaller problems that you can solve and conquer that. So to understand the other two.... What does OOP get you, for example? Modularity. This is what I'm calling "encapsulate". This was previously gained by named functions (and files). Include data with your functions and you have encapsulation. Put a name on it and you have an object. OOP begins there. The idea is to have everything relevant to the task travel around with some handle in which it can be referenced.
With Encapsulate, you're synthesizing lower-level commands and expressions into higher-level abstractions to make a referable concept. Each function, routine, class, and object presumeablye represents a solution and higher-order abstraction compared to what came before. Most domains are complicated and in order to architect your code well, you need to learn how to see it from 100mi high. This arena has rather peaked with the ObjectOrientedProgramming paradigm, but the covert purpose of this manual is to get beyond their plateau. Basically, duplicate code is put in a single object for their common purpose and named -- unless it is semantically separate.
Optimize was also always there, but it went as a generally-unspoken value among the masters. With high-level languages and web programming, optimization seems to have gone extinct, but it remains the sign of a genuine master. In isolation, optimization is a false peak. It is seductive because it's so concrete, so measurable. But before you hone your code, hone the right code. Are you using the best algorithm? Learn Big-O notation.
But these four tools should be seen as mutual-aids and paired on two separate axiis. Your program should be the circle that encloses them. The four allies are related but stay on the outside "exerting the pressure". Bonus credit if you find that the four allies and these four tools are related.
The task of mastering the machine has already started in several ways:
- Your language designer already divided the problem of getting computers to do your bidding and you have learned how to command it.
- Secondly, the history of computing has already conquered the whole domain by separating I/O from Processing in order to make a General Purpose Computer. This is a giant help and saves you buttloads of time wiring I/O junctions to your panels.
- The document you have here is your path to optimize the process of utilizing the machine to it`s full potential.
- Lastly, you could be the encapsulation who's going to make it happen!
Test-driven development with doctests is going to be the catalyst to glue these forces together to create a data/code ecosystem. It will accelerate your mastery. It is both a security blanket and documentation for others and yourself. That blanket is helpful because of the large disconnect between your code and it`s results. By getting rid of any ambiguity, you can move on with other tasks, allowing a faster cycle of: coding -> running -> feedback -> re/architecting. That is the essence of Agile and [[wiki: XtremeProgramming|eXtreme Programming]], by the way.
Practically, there can be several different "breakdowns" of how you go about structuring your program, but if you're constraining yourself to a particular programming language, chose the axis in which your language was designed: concurrent vs. sequential, procedural vs. functional, iterative vs. recursive dimensions, et cetera. If you're really going "hard-core", choose a hardware architecture that fits your application. Otherwise, if you're not constraining yourself, choose Python/C and make it easy on yourself.
Understand how to construct expressions in your language. Understand how to join these "sentences" together to tell the logical story you wish to inform the computer. Learn how to save that, compile it, and run it. Know that, just like in literature, there can be several ways to fashion a sentence and say the same thing, but pick the best way: clever, precise, brief, and accurate. For Object Oriented Programmers, you should check out the the Gospel, so you can be sure of salvation.
Basically, YOU are going to constrain yourself at the top by keeping the purpose for your program in mind while your language will constrain you at the bottom, forcing syntactically-valid expressions. That leaves the middle with the most degrees of freedom. So the first thing to do is to start there, writing boilerplate. If your language is as easy as Python, you can write your tests and first (all the behavior you want to happen for it to be considered passable) and save the first program that will run (otherwise, you'll have to content yourself with comments). If it doesn't give you the right result, you've gained some knowledge and must go back and correct it. Continue fixing your program until it passes all tests. The general idea is that you're going to keep constraining yourself by dividing the problem with your tests, while the design of the language keeps constraining you at the bottom forcing you to conquer your logic. In the middle, you encapsulate -- forming abstract, categorical groupings (named functions, classes, modules), while simultaneously honing your code towards greater perfection until it works and becomes a work of art that other programmers can appreciate and re-use. Add contracts and you've put the polish on your code and can ship it out to the wild. You're mastering coding, memory use, and object communication (passing data back and forth from other functions, etc), and sharpening your code towards the perfect amount of terseness (truly vorpal code). This scales from the CPU to larger and larger units (modules, files) like a fractal until you're at the top and can just send the data, that is: "run" and like a chain of dominoes, it goes. That's the whole process in a paragraph. Click on each link for the 3-hour tour.
TestDrivenDevelopment without tools like Python's DocTests become their own bureaucracy. The goal of the GPL and a data ecosystem is community. That means re-useable code/objects, which means trustable and understandable with a minimum of effort. Readable examples, like DocTests, are other programmer`s best path to understand and trust your code. When you've got an ecosystem as big as the Internet, you don't want to have to _wonder_ about the logic in other objects. DocTests give you that and you get code-testing as a bonus -- truly great.
Note that I haven't said anything about revision management (where I'd recommend svn since it has the best partnership with the allies) or code editors (two other hidden "attractors" of programming) because your relationship to the allies determine these. Remember they are what's going to hone you into a master -- not the language and not your tools. These help you with ClotheYourData, SeparabilityOfDomains, TightCoupling, and someof(MinimizeWork|ReduceMemoryFootprint), but this only covers half of them, at best. So, you have to figure out whether that's your path or not -- not anything else.
There is another another set of tools, commonly filed under code analysis, but I haven't seen anyone implement them well. Most programmers simply aren't good enough to architect large code bases for complex problem domains as might be encounted in big business, and most tools mislead them because they don't understand that either (cf. Agile's ascendence). But, like code profilers, they could offer the programer important lenses into their semi-linear (2-d really) codebase. One tool would be a code visualizer that graphs all objects and their function cals into a 3-d visualization with edges to indicate where calls were made. You could even step through the code and watch the variables get moved around between screen objects, clicking the objects to inspect them, etc.. The other is add a heat map to your profiler (or aforementioned visualizer) that adds degrees of color so you can see the usage of every function and object call in a way that the visual cortex can process immediately.
Anyway, stamp a version number on your program when you get something interesting (<1.0 for non-working apps). Use major numbers (before the decimal point) for incompatible version changes (old input files aren't readable, for example) and the minor numbers (on the right of the decimal point) for minor, compatible changes. Stick with this mechanism and there's less explaining to do. You can have any number of digits to the right of the decimal point, so you won't run out of minor versions.
When you correctly apply the concept of using and understanding the allies while you code, you'll see that the four allies are perfectly correlated, respectively, to the four processes and have always been working on the programmer's soul. Any stagnation you have is because you've let one of the enemies into your practice. If you find yourself, at a programming or personal impasse, explore the meditations under "adversaries". But, when you're looking for an environment for other like-minded coders, to master your techniques and work on some awesomeness, look into hackerspaces. Or, perhaps drop it all for awhile and distract yourself, work on an engine, or Hack the Law. Ultimately, programming is just a bridge, a means to an end where computers may not even be necessary anymore. It's at that point (and that point only) that you have to decide whether you've reached the end of your path.
Until then: lather, rinse, repeat. Chill out with Lonny Zone when you're not doing that. You program and know it because there's nothing better to do with such awesome tools. There is indeed something else to convince you, but I'll have to save it for when you reach the same conclusion that I did. ;^D
You have working code. It's documented; includes tests so others know it will work as they expect. It has reusable functions, objects, and/or modules, demarcated from internal functions and objects by some syntactical flourish, and it does something interesting or re-useful. So...
The right outcome of the battle between You and RushingToTheFinish is Maintainability of your code (equated to longevity) and personal Reserve.
The right outcome of the battle between You and the CruftMonster is Parsimony (read: simplicity + harmony) and personal Mastery.
I'm going to give you another lens in which to view the value of the tools you have at your disposal: the GeneralPurposeComputer and the Network. Earlier I suggested that these, with transactions and the creative economy, will give you O(nn) value. That big claim is a view from the outside, from the top. Here's the analysis from the data scientist`s perspective (I'll use the term m to distinguish from the usage above):
The basic programming environment gives you O(m) value out of the computer by turning repetitive tasks into simple computer routines that, thereafter, make them constant complexity (within the mind of the coder). The other complexity being managed is your data. There, OOP and named functions will simplify things further to get you an additional O(m) value. The value from these two tools, I'll call looping and grouping. (What is m?, then, you might ask: m is representing the complexity of the world and you are condensing this complexity into something simpler and more manageable -- that is what all these electronic gadgets are doing generally. Unless you're going to simulate the whole universe, I'd estimate m on the scale of millions.) Combine them and you don't get m + m or even m * m. No, OOP allows you to create arbitrary groupings so you get 2m value -- that's enough to simulate a universe. Why is it 2m? Because these two are orthogonal to each other. Their value is like having two axiis/dimensions rather than one. O(2m): that's the general value of programming enterprise and you can see how Steve Jobs and IBM made the personal computing revolution.
There's more, though. We're not in the 70's. Again, I'll change the term used (p) and now add the social scientist`s perspective. We've got p programmers and so get p as a multiplier, each one having access to this 2m value. Adding the network adds another p factor via the time savings of sharing data + code. This is linear because, as the computer just sees both of these as the same (a la VonNeumann) and it's simply transferring these across without processing them. The data ecosystem combines these two into O(p x p) or O(p2) -- Metcalfe's basic value of a fully-connected network, as you should expect.
Now, how do you leverage the value of the Internet revolution for programmers? How do you combine these two separate tools offering O(2m) and O(p2) value to express their awesome, theoretical power? Again, they are orthogonal to one another. If the first (m) two (looping and grouping) are like two axiis, the latter (p) are like the objects on the plane created thereby. So, combine THESE and you get O(mp). With the vastness of the internet and the world at large as people continue generating data over time, m and p become roughly equal. In other words, the complexity and needs of a functioning world government and it`s daily production of data will tend to make p approach m (the quantified complexity of the world). Therefore, I'll argue, that the value of our project here is O(nn) (distilling back to original terms). Radical. And that's not even all, because we can be free agents in a thriving macrocosm (offering another cross with an O(nn) system) with our computers and network. That's O(nn). Tetration. Grok it.
Whatever the case, once you make it to the end of this road, you'll have found that the enemies and techniques of programming are applicable to life itself: don't rush, but beware of your own cruft. You will have achieved Elegance and have become the Victor!
Obligitary quote: Perfection is attained not when there is nothing more to add, but when there is nothing left to take away. -- Antoine de Saint Exupéry
Now sacrifice your Amulet of Yendor and let's go!
Special thanks to the denizens of RefactorMercilessly, TimPeters for imbibing Python with some powerful Zen via DocTests, Bjarne Stroustrup for providing the concept of encapsulation to me, Niklaus Wirth and PLU for the concept of modular programming and Big O notation, Bell Labs for inventing C and many other old-school programmers who lit the Way. And, of course the Tao te Ching. See also the C FAQ for some powerful wizardry. WikiWords and eXtreme Programming are documented at the wikiwikiweb. Folks from there and the denizens mentioned above, feel free to add your name to this document below -- you helped make it happen. Oh, and Sunlight Labs: I've got my eye on you! Xer0Dynamite (talk) 15:00, 17 September 2015 (UTC)
Other credits: .add.your.name.here.