![]() ![]() A single log server (in this case) is capable of servicing about 1000 app servers, but logs are really important so we need to run them redundantly, hence two per cluster. Given a reasonable number of clusters, say 25, that amounts to 50 log servers. It gets worse, notice that we have two log servers per cluster. In this example, note that the load balancers are not part of a cluster, but span clusters – they need to as they are responsible for determining which cluster can handle a given request! In practice, some things simply do not work well with hard boundaries like this. Add to this enough capacity to handle spikes, provide acceptable redundancy, and voila, you have a cluster. Typically, you can use clusters as units by which to add capacity, and the exact contents of the cluster will be determined by finding the limiting element (usually the one which needs to maintain lots of state), on the most constrained axis of scale embodied in the cluster, and sizing out the rest of the elements based on their capacity relative to the limiting element. If we look closely at the slightly fancier system, we note that a cluster consists of: In this setup, if one of the caches started blackholing requests the damage done (hopefully just latency increasing a small bump to a reasonable timeout) would stop at the bulkheads around the cluster. On a ship we’d call the groups compartments, but we’ll call them clusters because each vertical bunch of stuff forms a logical unit which can be thought of as one thing (say, a cluster!). If we take a slightly fancier system (ie, slightly more realistic) we can see we develop (mostly) identical vertical slices: ![]() Given this setup, if a single app server goes berserk and starts lashing out with a TCP hatchet at everything it talks to, no matter how angry it gets it only takes out a vertical slice of the system, the rest goes about business happily. In this system a given app server only talks to the database in its partition. We can put bulkheads in between sets of app servers talking to distinct databases. If we look at a very simple system, say something that easily partitions by user, like a wish list of some kind. This same concept is useful in the architecture of large systems for the same reason – limiting the scope of failure. If water breaks through the hull in one compartment, the bulkheads prevent it from flowing into other compartments, limiting the scope of the failure. The bold vertical lines in Samuel Halpern’s diagram illustrate them: Bulkheads are used in ships to create seperate watertight compartments which serve to limit the effect of a failure – ideally preventing the ship from sinking.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |