Thursday, May 28, 2015

Logically Distributed

"Logically distributed" simply means that nodes are capable of sharing their work.  No node is ever the only node that can perform a specific task -- other nodes must be able to come in and perform the same function.  They can either help (preferred) or take over completely.  A more practical and familiar way of looking at this is that the system should have no single point of failure (SPoF).

While this sounds kinda silly, this quality of logical distribution pretty much enables everything else we need for distributed operations.

Thursday, May 21, 2015

Geographically Distributed

Assuming we've made the step to a distributed model, it's necessary to consider where we're distributing our resources to.  There isn't much that can occupy the same space at the same time, so if we're going to have more than one of something, where should we put them?

Sometimes we want things as close to each other as possible.  With a distributed model, we often want to try to push them as far apart as practical.

Colocated Things

Performance and convenience are the main drivers that push the nodes of your system close together or colocated.  Proximity makes communication faster, cheaper, and lower-latency.  Plus, it's easier to maintain everything if it's all in one place.  But likely the primary reason to consolidate and centralize resources are to help minimize overhead.  Things that designers tend to centralize without really thinking about it too much:
  • Backups - yes, it's faster to make a local copy for disaster recovery, you probably want to ship your backups as far as practical if they're going to survive whatever kills your primary working copy.
  • Command HQ - everyone wants to hobnob with the big cheese, to the detriment of satellite offices.  But when the goal is to have clear and authoritative leadership and top-down communications, people haven't really figured out how to do it better yet.
  • Inventory and maintenance - buses and rail systems would have a single station to consolidate spares and specialists for repairs.
  • Databases and storage - network-attached storage is placed into large banks for centralized management and provisioning.  Stateful databases also tend to be the most performance and security critical element of an information system, so we try to keep them locked away in a secure central facility for compliance with various laws.
  • Network Switches - ironically the major piece of equipment that makes distributed operations possible also tends to get itself consolidated into huge backplanes in a central switching room.  But this ensures that high network performance is always available to throw at problems with ill-defined or emerging requirements.  Throwing more bandwidth at a problem can often be a decent substitution for planning.

Dispersed Things

For distributed systems, you will often find yourself pushing nodes out as far as practical.

The network enables decentralization.  From a philosophical standpoint, the DoD has plenty of insight (yes, the DoD pays people to wax philosophical about the uses of ARPANET).  DoD white papers on Effects-Based Operations talk about how data networks can push the power to the edge, allowing the decision-making to occur where it is most needed.  Instead of long feedback and control loops where all sensors must report data to a central command for analysis and synthesis of a response to be transmitted back to the effectors, the network allows the effectors themselves to understand the situation and take the appropriate action on the spot.

With this in mind, let's consider some of the ways where geographical dispersion of system components is beneficial.
  • Backups - disasters take place on different scales.  For business continuity, the farther your backups live from your computer systems, the better they'll be able to survive progressively more catastrophic events:
    • Backup disk in your computer:  a virus or a simple power surge could corrupt both your system and your backup volume in one fell swoop
    • Backup disk next to your computer: a thief or fire sprinkler could wipe out your electronic equipment
    • Backup disk in another room: an actual fire could destroy your building
    • Backup disk in another building: a flood or earthquake could wipe out your city
    • Backup disk in another city: probably will take a cyberattack or government legal action to shut you down at this point
    • Backup disk in another country: good luck complying with all applicable export laws
    • Backup disk on another planet: what we all aspire to
  • Web servers:  sure they're on the internet in the "cloud", so it shouldn't matter.  But studies by Amazon and other retailers have shown that server responsiveness does increase sales, and tenths of a second count.  The speed of light may be fast at 300,000 km per second, but that still translates to over a tenth of a second round-trip coast-to-coast over the US.  That may not sound like much, but add in all of the data transfer times and encryption and backend api abd database calls, each with their own set of handshaking delays, and you're easily counting web interaction response time in seconds.  For reference, video gaming lag becomes painfully noticeable above about 0.2 seconds, and 2-way human voice conversation becomes tortured above just 0.5 seconds with people mistakenly talking over each other.  So with internet services, the extra gains in responsiveness from locating your servers geographically close to your customers are measureable and significant.