How I lost 60 pounds in 3 months

November 30th, 2013. I went for my first time in recent years to the gym. Simple goal - get rid of extra weight. And I was way overweight - 230 lbs with a height of 5'10". Exactly, 3 months later, my weight was 174 lbs, ie 56 lbs less, and another 20 days and my weight was 167 lbs.

Some days, I was loosing 0.5 pound per day, other days only a small amount, but most of the days I woke up lighter and lighter. With weight decreasing, I got energy increasing. Although, I was eating less, I was moving more, and feeling more energized. But not always. Sometimes, I felt really exhausted, I couldn't work more than 7-8 hrs per day. I couldn't allow myself to skip a lunch, otherwise I became aggravated and felt bad and angry.

One way or another, 1 year after, my weight is 172 lbs and I feel myself best ever.

Facebook Posters

Just love those motivational posters you could find in Facebook's offices:






All those posters and more are from http://www.designforfun.com.

Simple Multiple BloomFilters data structure implementation on Java

As a follow up on my previous post about using Multiple Bloom Filters data structure to identify hot values, I decided to write a simple dumb implementation on Java. And open source it.

The project can be found on GitHub and is proudly named multi-bloom-filter.

It comes with basic little class called MultiBloomFilter which does most of the job. It accepts the number of enclosed bloom filters, capacity of each BF and duration before the the head BF will be reset. One can also specify what hash function is used and how many times hash function should be applied for each value.

Simple example:

This short example shows that MBF will reset only one of the internal BFs. Means, whenever reset happens, it will remove only part of the data, and whenever the hot key is added again, it would be identified as such.

Once again, MBF is a great solution if you need to find a set of hot values for some period of time. In particular, this helps to put only hot values into the cache. If we have many hosts that use a single distributed cache service, then using MBF might save from redundant traffic of putting cold data into the cache, where they would be evicted pretty fast. Also, as hot keys are in MBF, means there is a high chance they are in the distributed cache as well. Thus application has some kind of "bloom filter" to check what is the chance that value could be found in the cache for specified key.

There are much more use cases for the MBF data structure. Being able to work in concurrent scalable environment is another "feature" that I love about BloomFilters and MultiBloomFilter in particular. For me, good implementation of BloomFilter, that is able to grow and scale correctly, has different mechanism to evict data and fight the false positives, sounds as a very useful service.

SLA

SLA (Service Layer Agreement) for large distributed software is very important. It plays the role of contract between the application and it's clients. It's also not very straightforward to achieve, especially if many components participate in the process. Each component in this case should have even more strict SLA, because the result of each component SLA would sum up. For example, if single call to some service A is resulted in multiple calls to other services, it's important that other services had better SLA in order to keep service A SLA promise.

There might be different types of SLA: resources, availability, durability and performance.
Many systems provides contract on how many resources are available for the client: memory, CPU, disk space etc.
Some websites and services say that their availability SLA is 99.9%, which means that 99.9% of time they going to be available. Actually, that is not very much at all. There is a nice table on Wikipedia with conversion between availability percentage and actual time.

Some services, especially the storage services like S3, have also durabitlity SLA. This to say how rarely the service might lost the data.

Performance SLA is common for running services that need to not only be available, but return response on request within specified period of time. For performance SLA, it is always common to use some percentile of requests that would be handled within SLA. For instance, it might be said that SLA is return response in 10 ms or less for 99.9% of requests.

Twister: A Simple Way to Manage Your Scripts

Imagine an average project that has many scripts, each written using different practices, uses different argument names, different namings, does something similar to other script but a bit different etc. Sometimes there are so many scripts, that it's hard to find the one you really need at this very moment. Moreover, scripts have no standard location, often put in semi-random directories, so it's really hard to find them. And even more, many developers have similar scripts for different projects. Some scripts are in the PATH, others are relative to the project directory. The one in the PATH are named in odd manner, because different versions used for different projects. And some scripts are written using bash and ruby and python etc.

Oh, so many troubles just because of the scripts. That's the reason why Twister was created. Twister is a simple framework/tool that allows to create and manage project scripts. I use it in a few of my home projects, and find it very helpful. About a year ago, I open sourced Twister. It's a python project, so it makes it simple to create scripts and execute them on any popular operating system.

Reverse Hash Tree

Reverse hash tree (RHT) is a tree data structure, where nodes contain a hash (ie not value only). I call this tree a "reverse" because the way node's hash value is computed is reversed to the Merkle tree way. In Merkle tree, parent nodes contain a hash of the child nodes. This allows a quick way to find the difference between 2 parent nodes in 2 trees - they would just have different hashes. The next step would be to find the child node that is actually differs.

In RHT, not a parent node hash is build based on child nodes hash, but in opposite: child node hash is build based on its own hash/value and parent node hash. For leaf nodes, value could be something useful, for intermediate nodes it could be just some version or category etc.
 
When child node it accessed, its hash is validated based on current value/hash and parent node hash. If the value is valid, we keep validating parent node hash with its parent node hash etc. This actually could be done on the way from root node to the child node. If found any node with inconsistent hash, this means the node and its children require a hash/data recalculation.

Leader Election: Gallager-Humblet-Spira (GHS) algorithm

GHS is an algorithm to elect a leader in the arbitrary network. It is based on building a minimal spanning tree (MST) of the network, and then based on it electing a leader.

Algorithm for building MST is very similar to the Kruskal's algorithm, although it has some specifics for the nodes distributed in the arbitrary networks. For example, the nodes have to communicate with each other in order to detect connected components, merge etc. After the MST is built, one of the nodes is chosen to be a leader. The notification is sent to all other nodes in the MST.

Basically, the goal of this algorithm is to identify the nodes in the network, and the elect a leader in the known field.

Identify hot data to cache using Multiple Bloom Filters

Recently was thinking how to support caching of hot data in highly loaded application with huge throughput. Here I'm describing a note about the idea, based on bloom filters, that I've ended up. Far from the ideal, and I haven't tested it yet. But at this moment, it seems optimal in terms of CPU and memory usage. Also, after some research over the internet, I found that idea is not new though (see link at the end of post.)

Bloom Filter (BF) gives a nice way to check if some value have been seen before. Of course, this is a probabilistic data structure, so result always goes with with some level of error.

Using multiple hash functions within single BF allows it to be better at avoiding conflicts, but in this case, it also requires the size of bloom filter to be larger. As an opposite to having one large BF and using multiple hash functions, it is possible to using multiple BFs, each relying only on single hash function, both have different size for each BF.

Multiple BFs could be also used to figure out if specified value has been seen for some number of times. As an example, imagine there is an application that accepts reads of data from storage. There is a need to store some data in the LRU cache, but because the number of different values is so large, it is impossible to just add every value to the cache. Cache limit + LRU policy will make sure that even hot data could be easily removed from cache. For example, assume we have a cache with N elements, and we reading from storage M elements, such that M is much much larger than N. If we would start adding every read element to the cache, we easily fill it with one-time read values, and the hot data would be evicted.

Leader Election: LeLann, Chang and Roberts algorithm (LCR)

This algorithm is for ring networks. Each message goes in the network from one process to another, ie no broadcasting. This means that each process knows exactly about only one other process - it's neighbor. This could be imagined as linked list.

Algorithm complexity is better than in bully algorithm, because its worst case is O(N^2), which happens only if hosts are organized into the network using by increasing (or decreasing, depends on the what extreme is used) order of their UIDs.

Leader Election: Bully Algorithm

Only the process with largest (smallest) UID becomes the leader. If some process sees that it's UID is extremal to the current leader UID, it will initiate leader election.
The are 3 types of messages:
  • election message - sent to announce election
  • answer message - response to the election message
  • coordinator message - sent to announce the identity of the elected leader
Algorithm is following:
  1. Some process P finds that leader is not available (or coordinator is down).
  2. P initiates a new leader election by broadcasting election message.
  3. Each process replies with own UID
  4. If P finds a process with higher UID, it waits a bit and if there is no reply with higher UID, it will promote that host to leader.
  5. However, if P receives a coordinator message with new leader, which UID is lower than P's UID, it will initiate another election.
Algorithm requires there should be a coordinator. It also requires N^2 + N messages, as at any moment any process can try to start election, thus send election message, handle N answer messages and send coordinator message. As an optimization, the process can send messages to the processes with larger UID only, this will give us N^2 / 2 + N messages, which is better, but not ideal.

Leader Election in Distributed System

One of the important problems of distributed computing is electing a leader process. Leader process is needed for multiple reasons: it could be used to make a final decision, it could be used to run any task that has to be run only by single process at a time (like finding a next task to execute and spreading execution between other hosts, acquiring a lock etc.), it may contain authoritative data and provide them to other nodes etc.

There are multiple leader election algorithms. Almost always, for most of the tasks it's possible to find the solution where there is no need of single leader:
  • for task scheduling the list of tasks, tasks could be partitioned and with locking spread over multiple hosts
  • for making final decision, it's possible to come up with some consensus (like paxos algorithm)
  • distributed locking could help other systems to avoid having a leader (again it might be based on some consensus algorithm like paxos or raft)

Amazon Dynamo paper notes

Notes based on the paper about Amazon Dynamo. There is however a difference between Dynamo and AWS DynamoDB (at least based on name), however this is not discussed in the document.

Dynamo is designed to be an eventual consistent storage. At the same moment it allows clients to specify number of hosts in the read (R) and write (W) quorums. The number of hosts in the cluster N is also configurable. The good rule of thumb is to keep N >= 3, and R + W > N. In this case, only the correct data would be returned.

Many systems use synchronous data replication to keep consistency strong (e.g. MySQL). This causes issues with availability, b/c if the replica failed, it becomes unavailable to its clients. Only single master host could accept writes in such systems. In opossite to this, Dynamo uses asynchronous data replication algorithms. This helps to build partition tolerated systems, as Dynamo would accept reads and writes even during network partition, and resolves the update conflicts using different algorithms.

Dynamo is a P2P system that uses gossip based protocol to share the membership and failure detection information with peers. The membership information contains the list of key ranges that the node is responsible for. Also nodes use gossip protocol to discover itself to other hosts. Basically, when a new dynamo host starts, it knows already a list of seed hosts, which it connects to tell them about own availability. Those seed hosts help to deliver the information via gossiping to other peers.

Dynamo uses consistent hashing for partitioning. This allows adding new nodes to the hash space without a need to re-hash existing values. Instead, only some part of values is moved to the new node, after what it becomes available for reads and writes.

Merkle trees

Merkle tree is a hash tree, where leaves are hashes. And each intermediate node is a hash of its children. Root node is a hash of values in its children node. Thus to know the difference between two hash trees, it's just enough to compare the root nodes of those trees. It's also easy to find the actual difference between two trees: just traverse them top-to-bottom and find the nodes with different hashes. This is how Merkle tree helps to reduce the amount of data passed between two sources.

Merkle trees are used as anti-entropy mechanism that helps to detect cases of divergence between data stored in multiple location. Merkle trees, for example, are used by Dynamo and Cassandra to find the cases when replicas have different versions of data. Merkle trees are also an awesome way to detect changes before syncing up data, which is one of the primary tasks for Dropbox, Chrome Backup, Google Drive, BitTorrent Sync etc.

Merkle trees do not rely on timestamps, but on actual data differences. This makes hash trees a wonderful structure to synchronize data between source and target with minimal effort of detecting actual changes.

For example, assume that there is a backend B, and N clients, each client maintains copy of the data (like Dropbox, BitTorrent Sync, Google Drive etc.) And some of the clients changed the data locally, and uploads them to backend B. Then each client can determine the actual change and do exchange of only necessary data with backend.

Hash trees are used by Git and Bitcoin. As I mentioned before, Dynamo, Cassandra and Riak are using Merkle trees for anti-entropy as well.Merkle trees are also used in cryptography for message signatures (search for Lamport signature).

Links:

5 WHYs

Problem solving is a whole science. To solve the problem, you need to understand the root cause. It's not always simple to figure out the real root cause, so you'd need to speculate, search, suggest and fail till you get more information to define the root cause. Just re-trying the same thing again and again till it's fixed won't help.
There are different techniques to find the root cause. One of them is "5 WHYs".

Onion Routing

I've being interested in how Tor is working for a long time already. So just recently got to the reading more about it and especially about the technique that is a heart of Tor. This technique is called Onion Routing and this random-random-note (RRN) is about it.

Before, I quite couldn't understand how anonymity could be reached in the modern networks. However, onion routing is actually quite simple but powerful way to do that. All genius is simple!

Psychology of Everyday Things

This is the final notes after my readings the "Design of Everyday Things" by Donald Norman, which is really awesome book. Notes are kind of unstructured and, maybe, not even worth publishing here. But I'll give it a try.

Conceptual Model

Conceptual model is our way to see how the thing might work and how to operate over it. For example, when we see a bike, we can understand how it works in general be modeling it in our mind. Conceptual model is actually defined by 3 parts - affordance, constraints and mappings.
Affordance helps to understand how basically approach the thing. Constraints help to see how the best operate over the thing.

Affordance and Contraints helps to make the conceptual model obvious. For example, scissors. There are 2 holes in it. So we can understand that they are to put something inside, like fingers. While constrains help to understand what fingers we need to put inside.
Affordance and constraints helps to build visibility, which is required to understand how the thing works and what it is for. For example, scissors have clear visibility, while handwatches with multiple are not as clear. Obvious conceptual model is required for product, if you want its design to be transparent to customers.
 
Design model is how the product seen by its designer. User model is how the product seen by its user.

Gossip Protocol

Gossip protocol is a style of inter-process communication in the network, inspired by gossips that happen in a human social groups. Basically, nodes (peers) exchange some information. This information may contain the knowladge that node gain by itself (eg some events), received from user (eg search queries) or received from other nodes during conversation with them.

Peers are connecting to each other in a random manner to ensure that information is shared pretty quickly. This could be like 10 times per second that one peer connects to another and exchanges the information.

It's also possible to organize the groups of hosts where each node share the information only with nodes within this group. This simplifies finding next random node to communicate, and limits the number of random useless connections between nodes. Some nodes are shared between multiple groups, to ensure that information is spread over the whole network. As an alternative, nodes could communicate only with it's near neighbors. Another alternative is to define the hierarchical levels, thus minimize the need to send message to everyone, but instead only some hosts in the specified level.

Gossip protocol unfortunately does not not guarantee that knowledge would be shared between all nodes. Another issue is that the received information might be not correct or valid anymore (well, that's in the nature of gossips, right?). Also it is said that gossip protocol maintains relaxed consistency.

Another good usage of gossips is to multicast information over the network without known structure. For example, it's not always possible to multicast information using multicast IP, because hosts might be organized into different restricted networks.

Gossip protocols are in the core of the internet.

When using gossip protocols it's better to share information about the fact than the fact itself, ie configuration was changed instead of new value for property A is B. This is also true in many distributed systems.

Usages

There are multiple usages for gossip protocol, some of them are:

  • propagate events between multiple hosts, so each node could react if need
  • could be used to route data using different paths (get the data, split on packages, deliver to the target using different routes)
  • could be used to aggregate values among different hosts (each node receive the data, check if it was seen, if not - aggregate with own data and send forward to next hosts)
  • search among large amount of nodes (used in unstructured P2P networks)
  • finding the list of nodes or just counting nodes in the network of unknown size and structure
  • electing leaders
  • sorting nodes in the network, detecting failures and errors, recovering the network structure
  • used for monitoring and gathering data, for example for load balancing
  • can help with detecting network partitions and dynamic insertions of new services
  • heartbeat implementation
  • implementing epidemic algorithms in the network
  • gather information about other nodes to make local decision, like load balancing, accessing data located in specific nodes etc.; do this in scalable way
  • detecting new nodes and additing them to the cluster or load balancer, updating DNS records etc.

But usually gossip protocols are not the most efficient one. It's pretty often possible to find some specific protocol that does the job better. However, some other protocols may not work for large networks (ie paxos), while others are centralized thus might have failure issues (ie based on centralized db).

Another downside of gossip protocols are:
same information can be delivered same host twice
message latency is usually high especially for large networks

My Setup

A few weeks a go I found for my self again a nice website usethis.com with interviews of different people who describes their setup. I found a lot of intersting there, and decided to share my own setup. Although I believe it is pretty standard these days.

So, I have a desktop and 2 laptops. My desktop is based on Mac Mini late 2012 where I replaced 4g with 16g ram. I'm using OS X 10.8 in order to support the hack that allows me using my Seiki 4k TV as monitor. I have also Dell 24 monitor, but not using it anymore. My personal laptop is MacBook Pro w/ Retina '13. The other is my work laptop Lenovo ThinkPad T420 with Ubuntu on it. I have 6g ram and 120g SSD drive there, so its pretty ok.

I have Nexus 4 as my phone and Nexus 7 as my tablet.

I also use BitTorrent Sync to sync projects and personal files (including books, videos and photos) between my desktop, laptop and mobile devices. Very handy, as I always have my files available everywhere. Highly recommended!

I use Kindle 3 and Paperwhite as my readers. Actually, I prefer to use Kindle App to read ebooks on my mobile devices. I'm actively using Pocket for reading articles both on mobile gadges and on desktop/laptop. I found it very helpful to keep reading more and more. For sharing links between devices and browsers, I'm using SyncTab. With a new plugin for FF it works for me even better.

I recently switched from Chrome back to the Firefox. In FF I'm fan of plugins like Tree Style Tab, Evernote Clearly, SyncTab and GreaseMonkey.

IntellijIDEA and PyCharm is what I'm usign for developing on Java and Python. I also use SublimeText as text editor for scripts, html and notes. Next plugins for Sublime as invaluable for me nowadays: MarkdownEditing, Markdown Preview, Git and GitGutter. I'm actively using Markdown for my notes. And keep notes, private and public projects on GitHub.

I have also operational notes and todos, which I keep in Evernote. Again, Evernote apps are installed just everywhere - desktop/laptop and mobile devices. However, for my bookmarks, I got back to Delicious. They are not in the best times of thei life, but works good so far for me. As for RSS reader, I finally got used to Feedly, althought still missing Google Reader a lot!

As for the music, I use Spotify to listent to music on my devices, but also have ipod shuffle, which is always with me.

On OSX I'm also using iterm2 with zsh and brew.

My preferred games on mobile devices are Andoku and 2048.

For backing up data and sharing, except BTSync and Evernote, I also use Google Drive and Dropbox. I also upload my photos to Flickr.

I think, I'm done now...

And yeah, some picture:

picture

Design is Hard

Another post based on my notes from the book The Design of Everyday Things by Donald Norman. This one is from section explaining why design is so hard and what could be done to improve situation.

Designers pretty often can't design think correctly, because:
  1. they don't know the end user of the product
  2. they need to complain many rules to satisfy their direct clients, not the users of product
  3. they are limitted by a business rules: make it beautiful, and not always usable
  4. they know too much about the product, so can't look at it as a new user. Designers are not typical users, like programmers are not typical users of their programs.
  5. not always designer is responsible for desing, it could be done by other non-professional, like programmer.
  6. there is always a set of "special people", who have differences, like left-handed, deaf or blind users etc.
One of the issues of many modern products is their creaping featurism. Creators tries to put too much of features into the product. Each feature brings more complexity than it should. Because it usually comes with another control, another window, a couple of use cases and a some options. Instead, the product should come with minimal viable set of features.
Funny, but there is also a list of rules how to make things wrong:
  • make things invisible: increase gulf of execution and establish gulf of evaluation, so user won't know what is happening and where they are
  • be arbitrary: use non-obvious commands and names
  • be inconsistent: change the rules, user different rules in different modes, have multiple different modes
  • make operations unintelligible: use unknown abbreviations, unclear messages and errors.
  • be impolite: insult users
  • make operations dangerous: allow a single erroneous action to destroy invaluable work, make it easy for distatrous things to happen.
In opossite to all this "bad" rules, the system or product should invite user to explore it, play with it, learn it without a need to study huge boring manuals. System should be safe for user to use, it should give user hints, suggestions and advices so user can explore it even more. Visibility and mapping should plan an important role.
Design should:
  • make it easy to determine what actions are available and possible at any moment of time
  • make things visible
  • make it easy to evaluate the current state of the system
  • follow natural mappings between intentions and actions, between actions and results, between that available information about the system and actual system state.
Seven principles for transforming difficult tasks into simple:
  1. use both knowledge in the head and in the world: this is where manuals could help, however one should not rely on the fact that user would study manual before he/she starts working with product, usually user get to the manual when has issues.
  2. simplify the structure of tasks
  3. make things visible
  4. get the mappings right
  5. exploit the power of constraints, both natural and artificial
  6. design for error
  7. standardize

Google Glasses, AI and shadow copy

Last week I had some roadtrip and was passing a few interesting points. I wanted to add a note to visit that places in the future, but couldn't as was driving a car. And I started dreaming on how it would be awesome if you just could add a note using your voice. But I'm lazy. Would be even better if it could be added to the backlog of places to visit just by making thought command. But I'm too lazy. Thinking about visiting some place is good, but need to make a specific command in my thoughts - go and visit this place later. Not ideal yet. Would be better if there would be some invisible assistant that could do that for me. That would know me as good as I do. Who would know what I might be thinking about, who would understand my desires even when it's not clear enough for me yet. Some AI assistant, that would know that I might be thinking how nice it would be to visit that shop with a fresh seafood, because I love seafood so much.

But how can any AI be smart enough? How could any AI understand me so well? Reading thoughts? Yes, but that is such hard thing now (well, yet possible someday). But if there would be a device that could see what I see, hear what I say and what others say, then it would be able to learn and understand me someday. And then I realized that I just described Google Glasses.

The more I thought about this AI assistant the more I understood that it won't be an assistant anymore. It would be my digital shadow copy. Able to think like me. That knows what I know, because has vision and can hear - more than 90% of information sources the human uses. And there would be a company that would own my mental copy. The company that could ask my copy if I would prefer to buy product A or B, or what am I going to search in nearest future. You know, the company like Google could even already have search results for me ready way before I decide to do a search.

And what about ads? Are you kidding me? There is no ads anymore! No one needs ads. Company knows what I want to buy already. It just shows me what I want and need to buy already.

How good it could be? Well, you don't know but I eat 1 banana every day. But my digital copy would know that for sure. I have only 2 bananas left in my fridge. So, my shadow copy knows that I need to buy another bunch of bananas soon. And it buys it for me. My bananas are delivered in a day. No shortage, no rush, no efforts from me. I don't even need to think or worry about that anymore.

This is something that changes basically everything in our lives. And changes are good.

Err is human...

This is my second post based on notes from the book The Design of Everyday Things by Donald Norman. As I mentioned in last post, I loved the chapter about human errors and mistakes and how to design things to avoid this mistakes. So, here are my notes...

There are different types of errors:
  1. data-driven errors
  2. capture errors
  3. description errors
  4. associative activate errors
  5. lost-of-activate error (forgot what wanted to do)
  6. mode errors (forgot in which mode)
It's not always easy to detect errors because:
  1. feedback is not always available (lack of visibility)
  2. different level of seeing error (you're looking error at more low-level while it's a level or a few above)

Human Memory

I've read an awesome book recently - The Design of Everyday Things by Donald Norman. I must say that didn't expect much of this book. My thought was "just another book for designers on how to create usable things". There was however something pushing me to read this book, maybe because  I've read about it in the In the Plex; book was references as the one that influenced Google's founders. Well, now I can understand why. Even thought the book is mostly about trivial things that everyone should understand and know. In fact, it's not true. Not everyone understand and know. I didn't. So many openings about regular things, views from different perspectives, inspirational rules etc.

There are many topics that I liked in this book. But the chapter named "To Err is Human" maybe the most favorite for me. Not only because I make so many mistakes and errors all over the time by myself, and it's nice to understand how this works (and how I work). But because author gives very good explanation on how human memory and brains work.

I made some notes during reading this book, and decided to share some of the them that are related to how human memory works. I also was thinking how this apply to the AI. And made some interesting openings for myself too.

So here are my notes...