Monday, March 30, 2015


1. Distributed means "more than one".
Like the Buddha going to the hot dog stand and asking the vendor to "make me one with everything," let us contemplate upon the meaning of this title.
Now, more than ever, we live in a binary world.  Almost all digital logic can be expressed as a seemingly endless string of ones and zeros.  Computers can perform any operation and calculation imaginable in base-2.  People could be divided into the "haves, and the have-nots."  Has the necessity for anything more become an outdated relic of the past?  A historical footnote of a simpler culture, like the aboriginal language that only had words for the concepts of none, one, and "more than one"?  Of course not.

But let us first consider the special circumstances conferred by 0 and 1.

e - 1 = 0
Euler's equation.  Well, one of them, anyway.  Notable for including most of the important numbers used in math.  Who would need anything more?

Is there any advantage by considering a third option for "many"?  That could add extra complexity, overhead, and waste.  And some things are impossible to duplicate.  You after all only live once.

 How can we justify investing extra energy and resources to redundancy?  Well, maybe it's not always worthwhile.  That's the first decision you need to be prepared to make right after deciding to build a product or capability -- going from none to one ...  Should you have gone from one to many at the outset?

I would argue yes... inherently you will always be faced with a multiplicity of things that you will need to maintain throughout their life cycle, so you might as well plan for handling the many from the beginning.   It can be extremely difficult to transition from a system that was only designed to be single to work or migrate to anything else.  You will hurt yourself more in the long run by taking the short view.  It isn't terribly difficult to plan for handling the many from the onset of you have an organizational framework.  That's what this blog is all about.

But first, let us take a moment to consider what kinds of situations actually make sense for there to be only one.

We mentioned "one life".  Marley would add "one love", but the animal kingdom gives us several examples which gives an evolutionary advantage to being... flexible with that rule.

What else would be better if you only had one.  One car?  Sure, if it breaks down you can telecommuter for a while or take public transit, but eventually that might take a toll on your job performance or personal time.  You probably end up renting a car while your only means of transportation is on the shop.  But that's like having more than one car available to you.

One house?  Yes, it doesn't make sense to purchase more than one house just in case one of them burns down or needs to be fumigated or simply losses power or some other utility for a week or so after a storm.  But chances are if you do, you have friends or family nearby that you can crash with, or at least stay in a hotel or shower at the gym on occasion.  We share what we own in times of need, that's distributed.

One phone?  We could certainly disappear into the mountain's away from contact for a while, but eventually our answering machine fills up and we miss bills and lose friends.  There's only so much time you can leave things to buffer up.  But you can certainly leave many things to buffer up for a few days while you replace a broken handset.  No financial harm done, unless you missed a job interview during that time or couldn't help a family member in need of emergency assistance.

Which brings us to computers.  Perhaps you only use your computer for entertainment, and you can only play one game or watch one movie at a time, so it doesn't make sense to have more than one, and if it breaks you just find some other pastime to keep your fun-meter pegged.  If you actually use your computer to do anything important, though, you may have found that losing it due to a disk failure can range from annoying to debilitating depending upon how much time it takes to set up a replacement.

So the common thread in all of this is that equipment loss or failure, in most cases, are a recoverable interruption in service that just means you need to spend some extra time, money, and apologies while you tinker with getting the replacement up and running again.  If you have extra time and money and reputation to spare, then by all means, do not worry too much about accounting for ever having more than one of a capability on hand.  Chances are, this won't happen during the worst possible moment.  As the old pilots' saying goes, "I'd rather be lucky than good."

There are several reasons why going the distributed route may not be necessary for your situation.  If you have the ability to convince your boss or customer to buy your excuse for delay or failure, that's great!  Second, if you are the only person inconvenienced by an equipment failure, and you're willing to just grin and bear it, then your not hurting anyome but yourself.  Third, perhaps you've poured substantial resources into this one house or commercial vehicle, and you just have to risk your livelihood on its continued reliable functioning.  If it does need maintenance, you simply drop everything and focus on getting that critical component up and running again.  Finally, perhaps you are blessed with a monopoly on some product or service.  Then if your ability to deliver had been interrupted, your clients will just buffer up and be forced to wait, because there is no other competition.  Then sure, you can go ahead and cut costs by neglecting fault tolerance or even preventative maintenance if you're not going to be losing any money in the long term.  That could work fine if you're some sort of government bureaucracy or sought-after artist, but it probably won't make you popular.

So far, we've only been talking about accounting for failure modes, which is likely only interesting for insurance actuaries.  There are plenty of other more interesting benefits for using the distributed model.  Let's engage in a few thought exercises now in order to save time and resources during crises in the future.  Then we'll be better equipped to decide if it's worth the extra complexity to consider design for distributed operation up front.