Showing posts with label networks. Show all posts
Showing posts with label networks. Show all posts

SLA

SLA (Service Layer Agreement) for large distributed software is very important. It plays the role of contract between the application and it's clients. It's also not very straightforward to achieve, especially if many components participate in the process. Each component in this case should have even more strict SLA, because the result of each component SLA would sum up. For example, if single call to some service A is resulted in multiple calls to other services, it's important that other services had better SLA in order to keep service A SLA promise.

There might be different types of SLA: resources, availability, durability and performance.
Many systems provides contract on how many resources are available for the client: memory, CPU, disk space etc.
Some websites and services say that their availability SLA is 99.9%, which means that 99.9% of time they going to be available. Actually, that is not very much at all. There is a nice table on Wikipedia with conversion between availability percentage and actual time.

Some services, especially the storage services like S3, have also durabitlity SLA. This to say how rarely the service might lost the data.

Performance SLA is common for running services that need to not only be available, but return response on request within specified period of time. For performance SLA, it is always common to use some percentile of requests that would be handled within SLA. For instance, it might be said that SLA is return response in 10 ms or less for 99.9% of requests.

Leader Election: Gallager-Humblet-Spira (GHS) algorithm

GHS is an algorithm to elect a leader in the arbitrary network. It is based on building a minimal spanning tree (MST) of the network, and then based on it electing a leader.

Algorithm for building MST is very similar to the Kruskal's algorithm, although it has some specifics for the nodes distributed in the arbitrary networks. For example, the nodes have to communicate with each other in order to detect connected components, merge etc. After the MST is built, one of the nodes is chosen to be a leader. The notification is sent to all other nodes in the MST.

Basically, the goal of this algorithm is to identify the nodes in the network, and the elect a leader in the known field.

Leader Election: LeLann, Chang and Roberts algorithm (LCR)

This algorithm is for ring networks. Each message goes in the network from one process to another, ie no broadcasting. This means that each process knows exactly about only one other process - it's neighbor. This could be imagined as linked list.

Algorithm complexity is better than in bully algorithm, because its worst case is O(N^2), which happens only if hosts are organized into the network using by increasing (or decreasing, depends on the what extreme is used) order of their UIDs.

Leader Election: Bully Algorithm

Only the process with largest (smallest) UID becomes the leader. If some process sees that it's UID is extremal to the current leader UID, it will initiate leader election.
The are 3 types of messages:
  • election message - sent to announce election
  • answer message - response to the election message
  • coordinator message - sent to announce the identity of the elected leader
Algorithm is following:
  1. Some process P finds that leader is not available (or coordinator is down).
  2. P initiates a new leader election by broadcasting election message.
  3. Each process replies with own UID
  4. If P finds a process with higher UID, it waits a bit and if there is no reply with higher UID, it will promote that host to leader.
  5. However, if P receives a coordinator message with new leader, which UID is lower than P's UID, it will initiate another election.
Algorithm requires there should be a coordinator. It also requires N^2 + N messages, as at any moment any process can try to start election, thus send election message, handle N answer messages and send coordinator message. As an optimization, the process can send messages to the processes with larger UID only, this will give us N^2 / 2 + N messages, which is better, but not ideal.

Onion Routing

I've being interested in how Tor is working for a long time already. So just recently got to the reading more about it and especially about the technique that is a heart of Tor. This technique is called Onion Routing and this random-random-note (RRN) is about it.

Before, I quite couldn't understand how anonymity could be reached in the modern networks. However, onion routing is actually quite simple but powerful way to do that. All genius is simple!

Gossip Protocol

Gossip protocol is a style of inter-process communication in the network, inspired by gossips that happen in a human social groups. Basically, nodes (peers) exchange some information. This information may contain the knowladge that node gain by itself (eg some events), received from user (eg search queries) or received from other nodes during conversation with them.

Peers are connecting to each other in a random manner to ensure that information is shared pretty quickly. This could be like 10 times per second that one peer connects to another and exchanges the information.

It's also possible to organize the groups of hosts where each node share the information only with nodes within this group. This simplifies finding next random node to communicate, and limits the number of random useless connections between nodes. Some nodes are shared between multiple groups, to ensure that information is spread over the whole network. As an alternative, nodes could communicate only with it's near neighbors. Another alternative is to define the hierarchical levels, thus minimize the need to send message to everyone, but instead only some hosts in the specified level.

Gossip protocol unfortunately does not not guarantee that knowledge would be shared between all nodes. Another issue is that the received information might be not correct or valid anymore (well, that's in the nature of gossips, right?). Also it is said that gossip protocol maintains relaxed consistency.

Another good usage of gossips is to multicast information over the network without known structure. For example, it's not always possible to multicast information using multicast IP, because hosts might be organized into different restricted networks.

Gossip protocols are in the core of the internet.

When using gossip protocols it's better to share information about the fact than the fact itself, ie configuration was changed instead of new value for property A is B. This is also true in many distributed systems.

Usages

There are multiple usages for gossip protocol, some of them are:

  • propagate events between multiple hosts, so each node could react if need
  • could be used to route data using different paths (get the data, split on packages, deliver to the target using different routes)
  • could be used to aggregate values among different hosts (each node receive the data, check if it was seen, if not - aggregate with own data and send forward to next hosts)
  • search among large amount of nodes (used in unstructured P2P networks)
  • finding the list of nodes or just counting nodes in the network of unknown size and structure
  • electing leaders
  • sorting nodes in the network, detecting failures and errors, recovering the network structure
  • used for monitoring and gathering data, for example for load balancing
  • can help with detecting network partitions and dynamic insertions of new services
  • heartbeat implementation
  • implementing epidemic algorithms in the network
  • gather information about other nodes to make local decision, like load balancing, accessing data located in specific nodes etc.; do this in scalable way
  • detecting new nodes and additing them to the cluster or load balancer, updating DNS records etc.

But usually gossip protocols are not the most efficient one. It's pretty often possible to find some specific protocol that does the job better. However, some other protocols may not work for large networks (ie paxos), while others are centralized thus might have failure issues (ie based on centralized db).

Another downside of gossip protocols are:
same information can be delivered same host twice
message latency is usually high especially for large networks