Random(Notes)

ListIterator

2024-01-10T00:47:00.000-08:00

I'm quite ashamed right now (after like 15+ years using Java pretty extensively), but I just recently figure out that ListIterator exists. Maybe, I knew about it and just forgot? Many times I avoided modifying the list while iterating over it, just because this would result in ConcurrentModificationException, and I had to come up with some tricky ways.

ListIterator allows adding and removing elements in the list while traversing it. It also supports moving backwards, making it a great tool for some algorithm, especially those based on Linked List. And ListIterator works just fine for ArrayLists too. Here is a simple example. Lets write the code that if finds value about 100, removes it with previous and next value, but if finds values below 10, duplicates it.

  List numbers = new LinkedList<>(); // Could be ArrayList as easily
  numbers.add(10);
  numbers.add(20);
  numbers.add(30);
  numbers.add(110);
  numbers.add(45);
  numbers.add(3);
  numbers.add(35);

  ListIterator iterator = numbers.listIterator();
  while (iterator.hasNext()) {
    Integer number = iterator.next();
    if (number > 100) {
        iterator.previous(); // move prior 110
        if (iterator.hasPrevious()) {
            iterator.previous(); // move prior 30
            iterator.remove(); // remove 30
            iterator.next(); // move prior 110
        }
        iterator.remove(); // remove 110
        if (iterator.hasNext()) {
            iterator.next(); // remove prior 45
            iterator.remove(); // remove 45
        }
    } else if (number < 10) {
        iterator.add(number);
    }
  }
  
  System.out.println(numbers);

prints..

  [10, 20, 3, 3, 35]

I love the moments, when I find learn something, especially if it is going to help me do my work better.

Part 2: Python is slooow.. Rust is fast.

2023-12-31T15:00:00.000-08:00

This is follow up from the previous post.

Now processing time dropped from 20+ seconds to only 2-3 seconds. Ten-folds. And it is not because I coded everything in assembler or moved to GPU or aggressively parallelized. Nope. I just changed to use different algorithm.

Funny enough, even with this algorithm, all the same heavy computation still happens. I just do not need to add data into the HashMap, and do another 2 full loops. And the algorithm is simpler to understand.

And also it is simple to split into parts and run them concurrently. Which gives me another opportunity to parallelize and make it slightly faster.

Lesson learned: using right algorithms and data structures is very important.

I also made the app available on internet, my first and very basic React web application. But yet I have a few features and performance optimizations to add before sharing it in my blog.

Python is slooow.. Rust is fast.

2023-12-19T15:00:00.000-08:00

I love using Python to play with data, solutioning or just prototyping. If I need to come up with some tricky algorithm, I often prototype in Python. Python is great for it, especially with Jupyter added. No compilation time, easy scripting, lots of libraries, especially those backend by native code written in C/C++. Using numpy and similar libraries makes things pretty fast comparing to just raw Python.

But then, any time you need to do lots of processing in Python itself, especially looping through amounts of data, you get hit by a performance issues, that make it inefficient to use Python codein production. Just recently, I needed to do some math and processing of 50MM-100MM elements in 2D array, and without numpy, that would take many hours if not days. Numpy helped to get it to 10-20 minutes. Significant reduction, but still too slow for me, if I want to make similar processing for tens of thousands of times.

I tried to re-implement this in Rust. Took me sometime given I'm pretty new in Rust, but it was a huge satisfaction to see that processing time dropped to 3-4 minutes, and after a few basic optimizations to 2-2.5 minutes. That sounds much better. Then I realized, I'm running this in debug mode. Switched to release mode, which added a bunch of own optimizations, and the time droped to 20-25 seconds. Wow!

But, I think, I still can do better. Can I use CUDA?..

Projects directory structure

2018-06-03T16:16:00.000-07:00

Background

For a very long time, I was following a directory structure very similar to what many others do: there was a directly projects in my home directory, that contained a list of all projects I was or am working. Some projects weren’t trivial and contained multiple modules, but the structure of projects directly was very simple. Project by itself would hold the source code project inside.

However, the further I go with project, the more issue I have with this structuring approach:

Where should I put a documentation? Should it be next to the source code? Should it be in git?

What about various media that is related to the project but has nothing to do with source code?

What about the data, which also irrelevant to the source code, but important for data analysis?

and many other questions.

Although the questions are different, they actually about the same topic: where do I put files that are not related to the source code in any way?

Workspaces

And today I’ve realized, that the problem is due to my outdates approach to structuring files. It came from days where my responsibility was mostly coding project, and source code was a central for my work. And while the responsibilities changed, the approach stayed the same.

Nowadays, I need more than just source code for most of the projects. Except source code, I need to collect data, notes, documents, articles and posts. And instead of placing documents into $HOME/docs/$PROJECT and data into $HOME/data/$PROJECT and source code into $HOME/projects/$PROJECT, I better put everything into same folder. And I didn’t find a better name for this type of folders than workspace.

And goal is to switch from structure

$HOME/projects
    project1/
        .git/
        module1/
        module2/
        pom.xml
        todo.txt
    project2/

$HOME/workspaces
    workspace1/
        docs/
        data/
        dev/
            todo.txt
        source/
            .git/
            module1/
            module2/
            pom.xml
        charts/
    workspace2
        docs/
        ideas/
        source/

Versioning

And then I faced a new question – should I put a whole workspace under version system control? If docs and data and whatever other resources are important for the project, why shouldn’t those be versioned and kept safe and readily available using version control system, like Git?

“That is actually a good idea!” I though, “But it would be better to have different git repositories for each part of the workspace. As documents and code are changed at different times and for different reasons.”

And that’s the decision I’ve made – it is a good practice to use version control for all parts of the workspace, but do not put a whole workspace into a single repository.

Workspace is just a collection of various resources coupled by the same topic. So it is ok to have a desire to keep workspace organized and reproducible – it should be easy to create a new workspace, that someone else can use if need. This also makes the idea of separate repositories a good one – no everyone cares about data or docs.

The desire to put a whole workspace into single repository can be too strong at different moments, mostly b/c of the reasons described in previous paragraph. And that’s where Git submodules can be helpful.

Summary

What was good before might not be good anymore. We should revisit our approaches when we change or our responsibilities change.

The previous structure that I’ve used for project directories was good for my needs. But my needs changed and thus the structure is not anymore.

I need more than just have a structure for the source code or single project. I need a new way to organize files, that would cover a set of documents, source code and data for one or multiple related projects.

And my solution to it is “workspace”.

Programming is like Writing

2017-09-04T20:41:00.000-07:00

Everyone heard that programming is very similar to writing books. “It has many similar traits” programmers think. There is even a thing called “Literate programming” invented by famous Donald Knuth.

But, are there really much similarities between those two activities? Well, except that both require starring at screen and typing a lot of text.

One of my goals for 2017 is to improve at writing texts. I found that there is an increasing need for me to write good texts both at work and for personal needs (like this blog). At the same time, writing is not something that goes for me as easily as coding. To achieve my goal, I’ve started with picking and reading a few books which give advices on how to write texts correctly.

One of such books is “On Writing” by Stephen King. My friend was reading some fiction book written by Stephen King, so I’ve also decided to check what are some interesting books by this author. I must say, that the only book I found the most interesting for me at that moment was “On Writing”. I could have done 2 things in one: finally read some Stephen King book and make a step towards my yearly goal.

I must confess that book was an easy read. It was like a fictional book, but wasn’t. A great example of helpful book that is easy to read. This is due to 2 important parts:

Author gave background stories from his live. First 1/3 of the book is more like a biography of Stephen before he became a famous author. Maybe, it’s because I love to read biographies of famous people, but I’ve finished that part on a single breath.
Every advice author adds also comes with some background story and intuitive explanation. You not only learn good advices, but have a background that helps you remember it once you’ve closed the book.

Among the advices author gave, I have highlighted a few that were, as I felt, most important to me. I’ll share them with you a few moments later. I use some of the advices every time I write a text, like simplification, drafting, and 2nd draft rule to some degree as well.

Somewhere in the middle of the book I had an enlightenment that the writing’s general best practices from Stephen King are very similar to those we follow in programming.

Simplify and remove the clutter

Although some junior engineers love to create overly complicated (they say beautiful, flexible and extendable) designs, over-engineering in architecture and code, experienced software engineers know that simplicity is the only true thing we all must aim.

Software engineers should focus on keeping things simple, both in code and in design. And that’s what author of the book also recommends. Remove unimportant parts that don’t add up to the story, remove overly complicated and unnecessary descriptions, leave as low as needed for the reader to feel the story. Otherwise, 90% of readers would just give up on this boring book.

Not everyone is ready to read 10 pages about the colors of sunset of 20 pages about the architecture of the city. Same in code: not everyone ready to get through 5 page methods, and not everyone is ready to dig deep into the layers and layers of your code.

Avoid passive verbs

The analogy here is very simple:

use verbs for functions/methods names,
put methods into the objects that they related to.

It is not the best sentence “Pizza was delivered to my doors.” Once stop using passive verbs, you get “Courier delivered pizza to my doors.” Way better!

Same in the code: not “pizza.deliveredTo(myDoors)” but “courier.deliverPizza(myDoors)”. That makes modeling object-oriented relationships easier.

Practice Continuously

To become a good author, one should practice continuously, and write something almost every day. For example, Stephen King writes every day. He starts in the morning, and then writes until he reaches his goal.

Same is with coding. You just can’t become a great programmer, if you don’t practice.

Story from my life

Many years ago, I was a kid who wanted to learn programming. The only issue, I didn’t have a computer. But I didn’t get upset that easy, I’ve bought a book about programing on Turbo Pascal 7. That was a book with cover of green and white colors, published by Sankt-Petersburg publisher and covered TP7 from basics to writing code that draws 3D objects and generates audio. I loved that book. It was my first book on programming.

I spent a couple months going through this book: learnt about data types, arrays, pointers, files and many other things. I followed author recommendations and wrote many programs. As I mentioned, I didn’t have a computer yet, so all my programs were on pieces of papers.

But one day, my parents bought a computer. I started porting my programs from paper to Borland Turbo Pascal 7 environment. Only to find that, lo and behold, none of them worked. They even didn’t compile. Boy, was I upset!

I had to spend a few more days to fix some of the programs and got them working. Since then, there were not many periods of my live where I didn’t code for a long time. Because, I quickly realized that practice is the most important part.

For years after that, I still thought the problem was with me: I’m just a “practice” person, not a “theory” man. I learn better and faster from practicing not from reading books. That was a wrong-thinking. It’s not my problem. We all are like that!

If you want to be good at something, practice at it continuously.

Have a place for writing

Authors should have a place where they can hide from everyone and focus on the book. Place, where nobody and nothing would distract you, where things motivates you to do your work the best.

But, to be honest, not only writers need such place. Artists also need a place where they could focus on paining. Designers need such a place too. Software engineers also want to have it in their life; and, ideally, this place shouldn’t be in the open office space.

Sometimes, I imagine my ideal place for programming. It is a large room, with high ceiling, 2 walls are covered completely by bookshelves, another wall has a large glass whiteboard on it. Room has a large desk with large displays on top, and comfortable super expensive chair next to it. There is also a comfy wingback chair for reading with a standing lamp close to it. Right, my ideal “office” is both programming office and small library.

Have a toolbox

Simple here, authors have dictionaries and vocabularies, favorite software to write a book or type of paper and pen they can’t live without.

Programmers have their IDEs, programming languages, version control tools, programs for reading documentation and many many more.

Always do 1 or 2 drafts before final version

Review results of your own work, no matter if it’s code, design or documentation. See if you could improve it, whether it is bug-free and covered with tests. And, once you are fine with the version you have, hand it over for a peer review.

Similar with books: author writes first draft, reviews and modifies it a few times, and once author is ready, book is passed to editor.

Write about something you like

I don’t think any reader would be happy to read what author wrote about a topic author doesn’t like. Author would either make it extremely boring or obviously incorrect. Reader would inevitably feel discomfort reading such work.

Code, that programmer hated to write, would look like… code that written in hatred.

If you want to be successful at what you do, you need to love it. Either it is writing, programming, paining, designing, crafting or counting money.

Read Continuously

Stephen King reads a lot of books. He loves that. And he recommends it to other wanna-be authors as well. Read many different books, see what works good and what doesn’t, learn elements of style from others, and improve your own.

But that advice is exactly same as all programmers receive from their mentors all around the world: read code written by others, study design approaches created by other more experienced software engineers etc.

Summary

Writing good is hard. Coding good is hard.

But there are a few best practices we all can use. There is no magic behind them. They are universal as work for writing, coding, and almost anything else.

These best practices are:

keep simple and remove the clutter,
practice continuously,
learn from more experienced peers,
make your work be more comfortable for you to do,
love what you do.

Using Mind Maps

2017-08-26T19:07:00.002-07:00

Mind maps is a visual way to structure information in a tree like structure. Such maps are convenient way for structuring knowledge, ideas, plans and tasks.

Usually, mind map starts with a root node, which has references to child nodes. References do not have names, but color can be use to code specific subjects. Each node, except root, has a parent node. Child represents some information specific to its parent node. This helps to build a tree of information, that can be easily consumed by human.

I am a big fan of visualization. Good visualization represents complicated things in a simple way that is easy to understand. Mind maps is one of the simplest but effective visualization techniques. It helps me in many different places: from making notes to managing projects.

MindNode

I’ve spent a few days looking for mind map tool that I would love to use. At the end, I’ve picked MindNode, a beautiful tool available for macOS and iOS. It has a few very interesting features that I just love and won’t be able to live without anymore:

There is an app for iOS, so I can work with mind maps using my iPhone and iPad. Experience is very similar to desktop application. Device app works smoothly; I never had issues with creating/updating maps from my phone.
iCloud integration allows me to share maps between devices and computer.
Multiple themes are available by default. Moreover, I can modify existing theme to create my own. I usually use different themes to differentiate different types of maps.
Node can be marked as a task. For parent task, it will convert child nodes into tasks. Parent task can be used to track progress. Tasks mechanism is intuitive.
MindNode allows to attach URL and description to node. It’s possible to create a documentation using mind map; MindNode can easily convert it to Markdown document.
Additional relationship between nodes can be created using links. I do that quite often to connect two nodes that are in different parts of map.
Awesome keyboard shortcuts support: can do most of work without touching touchpad.

MindNode has many more useful features, like: sharing mind maps with others using web interface, attaching media resources to node, exporting map into different formats etc.

Usage

As I’ve mentioned above, I use mind maps for different use cases. Most common are:

building a knowledge
generating ideas
planning and managing projects

Next, I give more description on each of those.

Building knowledge

I found that mind maps work better for me than text notes. It is easier to read mind map: easy to detect most important parts, faster to read map in larger batches, references are an important mechanism to add extra dimensions to data. It is possible to collapse graph to see only the most important parts, and if need, dig deeper for more insights. Branches have different colors to separate them visually.

It is often hard to structure new mind map in the best way. I usually start with something that works right now, and do refactoring once gathered enough information, and there is a need in a new structure.

For me, map is a better way to collect and hold my notes. Before, I usually collected notes for each source of information (i.e. had different notes for different books). Therefore, it was hard to get through and find useful information. Things changed with mind maps - I collect knowledge per subject. For example, I have mind maps for such subjects as “Software Engineering”, “Writing”, and “Machine Learning”. I update those with facts, ideas, and best practices.

Here is example of my mind map for Software Engineering (all branches are collapsed for better visibility):

One of its sections focuses on making projects in a right way:

Such maps usually contain a branch for subjects that I want to research next. Those are represented as tasks, and help track my work on improving my knowledge base, follow up on missing parts etc.

Generating ideas

When working on new projects, it is important step to generate ideas for features, approaches and solutions etc.

I also found mind maps are quite helpful in this realm. When I need to brainstorm features, I start with a node “Features” and add child node for each high level feature. Afterwards, I extend feature node with more details and clarifications using its child nodes and so on and on.

Here is an example of features ideas for some service that I was scoping before. I start with some high level feature description, like “Integration with other tools”, and then move forward to clarify and add details.

And again, branch to track work on further research is a must:

Planning and managing projects

Most of the projects that I start nowadays begin with a new map in the MindNode. Map consists of the main parts: design, risks, scope of work, and plan of work.

I usually don’t go much into details in the map; I use another tools for documentation and sharing. Map stays private to me, as it has many details and thoughts that aren’t necessarily useful for others.

I leverage tasks support a lot in such mind maps. They help me to track work progress and also make sure there are no forgotten nuances and risks.

Because mind maps are so easy to read, many times I was able to find forgotten or missed work for various projects. This helped me a lot, as otherwise we would either get a bug during testing or wouldn’t be able to launch on time. I believe, only text notes couldn’t have helped me in such cases.

However, I do not use mind maps for anything that requires lots of text. Documentation, design notes aren’t part of my interaction with MindNode. I also do not use mind maps for actual tasks and tracking work. There are better tools for these use cases. However, where mind maps are great at defining ideas for design, adding pros and cons, risks and vision; maps are also great for defining list of milestones and high level stories.

Summary

If you’ve never tried mind maps before, and also looking for a way to manage through your life and on work, then give a shot to mind maps.

MindNode is a tool that I’d recommend the most, assuming you are using macOS/iOS and ok to pay around $30.

Message Locker

2017-08-20T21:47:00.001-07:00

Problem

Extreme requirements require interesting solutions. Sometimes there is a need to come up with hybrid solution, that not always seems to be beautiful on first sight. One example is a Message Locker solution.

In service oriented architecture, application consists of many services that interact with each other.

Inevitably comes a need to fight high latencies caused by remote calls and necessary serialization/deserialization ( in a huge chain of dependent services calls, where each call results in a network hop with required fees like data marshaling and passing data through the network, adds at least a few milliseconds extra for each call.)

Service that requires to gather output from multiple dependencies to do its job is an aggregating service.

Such service needs to be smart at how it calls dependencies. If they are called one by one, then their latencies would be accumulated. The most obvious way to prevent this is to call each dependency in parallel. Now, service own latency would be defined mostly by its slowest dependency. In this case, we say that slowest dependency is in the critical path.

Aggregating service isn’t complex because it needs to call multiple services in parallel. And usually, there is simple way to avoid creating another service, if only business value it adds is aggregating output of multiple dependencies.

But, aggregating service becomes complex when:

It adds complex logic on top of the data returned by dependencies
It has to be sophisticated at orchestrating calls to many dependencies.

The need to orchestrate comes from the nature of SOA: sometimes Service need to call one or more dependencies first to gather the data necessary to call another dependency. Often it’s not possible to call all dependencies in parallel and just aggregate replies once all are available. In many cases, Service needs to call dependency A, to get the data necessary to call dependency B, results from which are required to decide if Service needs to call dependency C or D and so on.

Optimal call of dependencies is often the most important thing to do when fighting high latencies. And thus, eventually comes a need to have Aggregating service, that can call multiple dependencies in a savvy way.

But, even when there is an aggregating service in use already, inevitably comes a need to fight high latencies. And there are only so many ways this can be done:

decrease latency for each dependency in the critical path (often by pulling dependencies of own dependency, and call them first)
call dependencies in even smarter way.

This post stops on the 2nd way. If aggregating service already parallelizes calls to dependencies as much as possible and there is no way to make it even better, then, to be honest, not much can be done anymore.

Seriously, when service A needs to call dependency B so it can call dependency C later, what else can be done to save extra 10 ms you need that much?

That’s where Message Locker comes useful. It goes to a bit nasty territory to allow save additional milliseconds in aggregating service.

Message Locker

"Message Locker" means a Locker for a Message. Service allows to store a message in the some kind of locker, so only specific client can grab it. If message is not received during certain period, message becomes unavailable.

Message Locker is a distributed service that stores all the data in the memory. Client that sends a message into the locker is called sender. Client that receives message from locker is called receiver.

Each message is stored in the locker using a unique random key. When sender puts a message into the locker, it also provides additional attributes, like:

TTL - time to store the message in the locker,
Reads - number of times the message can be received.

Message would be removed from the locker whenever received for defined number of times or once its TTL expired. These rules prevent Message Locker to be bloated with obsolete messages.

Even after message was removed, Message Locker is still aware of it previous presence. Whenever receiver tries to get evicted message, it gets an error immediately.

In case receiver tries to get a message that is not evicted yet, it is returned to the receiver, and number of reads is increased. This approach doesn’t handles retries properly though.

In case receiver tries to get a message that is not sent yet, then Message Locker will hold the request until message becomes available or timeout happens.

How to use Message Locker?

Given 3 services A, B and C. Service A is an aggregator service, that calls multiple other services, among them services B and C. Service B has to be called before service C, as its output is part of input for service C. Service A also uses output of service B for own needs later as well.

Normally, service A would call service B, wait for reply and then call service C. During this workflow, service A needs to do following work before it can call C. This extra work becomes part of critical path:

wait for reply from service B
read reply from service B
construct and call service C.

Network and extra serialization/deserialization are often expensive operations, and when one works with large amounts of data, could take 5-10ms. In this case, construction request and making remote call to service C also can add additional 5-10ms.

This is where Message Locker becomes helpful. Workflow is now changed: service A calls service B with key K, and in parallel calls service A with key K, B puts its reply into MessageLocker using key K, service C receive this reply using key K. Service A also receives service B’s reply from Locker using key K, and does this in parallel with service C call.

In this case, there are following notable changes:

time to construct and call service C happens in parallel with call to service B, and as such is removed from critical path
time to deserialize request and do necessary initial work by service C is also execute in parallel with call to service B, and as such is removed from critical path
time to deserialize reply from service B in service A also happens in parallel with call to service C, and as such is removed from critical path
time to call to Message Locker, receive and deserialize received data by service C are added to critical path. This would eliminate savings added by #2.

Using Message Locker also adds complexities:

Service A, B and C need to be integrated with Message Locker
Service A or B needs to know how many times message would be received from locker or what timeout to use in order to not overload Message Locker with not need message and not cause issues with message being removed to fast.

Why not use existing solutions like...

Message Locker by itself is very similar to well known existing solutions: Message Broker and Distributed Cache. Although similarities are strong, there are a few differences, that make Message Locker to stand out for its own very specific use case.

Message Broker?

Message Broker would usually have a predefined list of queues. Producers would send messages to the queue and Consumers would consume. It is possible to create temporary queue, but it is usually expensive operation. Message Broker usually assumes processing latencies are less important than other traits, like persistence or durability or transactionality.

In this case Message Broker can’t be a good replacement for Message Locker.

Distributed Cache?

Message Locker is more like a distributed cache, with additional limitations. Message is stored only for 1 or few reads or very limited amount of time. Message is removed from locker as soon as it becomes “received”.

In ordinary caching, it is expected that content becomes available for much longer period than it is in Message Locker.

Summary

Message Locker is a way to decrease latencies in aggregation services by enabling additional parallelization. This is possible, as dependency between services are organized through a proxy - Message Locker. It holds the replies from dependencies and provides them to the receiver as soon as they are available. This allows to further hide expensive operations: network call and serialization/deserialization.

This comes with additional complexities:

Right value, for timeout and number of reads to evict, can be error prone to define,
Message Locker development and support can be cumbersome as well,
Restructuring services to benefit from Message Locker.

But when there is no other choice and latencies had to be increased, Message Locker could be a solution.

Ukrainian Tech Startups

2017-07-22T18:25:00.001-07:00

I always return from my vacation in Ukraine with a few new books to read. There are many good book publishers nowadays, who provide a continuous stream of interesting books on Ukrainian language.

This time, among the books I’ve picked, there was one special. Not only it stood up on front of the others with an orange cover, but it also was about technological companies founded and now working in Ukraine.

Book name is “Стартап на мільйон. Як українці заробляють статки на технологіях”, which can be translated as “Startup for a million. How Ukrainians earn capital with technologies”.

Companies

It is about 14 Ukrainian companies that became successful based on technologies.

Depositophotos
Genesis
Na’vi
Grammarly
Macpaw
Prom.ua
Rozetka.ua
Jooble
Viewdle
Kodisoft
Приватбанк
Ecoisme
Petcube
Preply

Most of those companies are startups.

Findings

Before I started reading this book, I only knew 6 companies from this list, and had no idea that 2 of them are from Ukraine.

I decided to read the book during my flight back to Seattle. Soon I've realized that it was a right choice: such an interesting book to me that I’ve finished it by the moment I got to my destination. I always find flights a good place to read, but this time it was quite productive too. Lots of notes and highlights were left on the pages of the book and my notepad. Mostly related to 2 main topics: doing business in Ukraine and creating tech company.

Many years ago, I’ve heard that it is hard if even possible to create a successful IT business in Ukraine oriented on local market only: market is too small, local companies can’t offer competitive salary to software engineers or pay for software, and law is often too flexible to rely on it.
Companies described in the book prove this: with an exception of a couple companies, other’s have to work on external markets, which usually give a major income. Companies, like Genesis and Jooble, tried to create business oriented on local market, but that wasn’t enough for them. The only exception for this rule, is a couple of e-commerce companies: Rozetka.ua and Prom.ua (I don’t mention Приватбанк as it is a bank.)
Easy to make a conclusion that if you want to make a profitable start up oriented on Ukrainian market, it should be related to e-commerce. Not sure this conclusion is correct though.

Another finding is related to investments. Most of the companies accepted investments. Only few were financed by founders solely. Reason is obvious: not everyone had enough savings to invest into own company. Companies, who accepted external investments also received advices from investors on how to build a business.

There are 2 types of founders: serial entrepreneurs, who had already 1 or more companies behind their backs and professionals for whom that was a first company. Former benefited from existing connections, experience and capital. Later benefited from experience gained at previous work and also from investors suggestions.

Existing experience couldn’t be overestimated. Founders that had already created profitable business before, have higher chances to build a successful business.

Many companies are registered abroad (e.g. in USA) with a sister company in Ukraine, that does all the development. This helps to manage risks related to the IP and sometime unstable situation in Ukraine. HQ or development offices are mostly in Kyiv. Many founders emphasized numerous advantages of Kyiv among other capitals in Europe and world: good environment, low prices, lots of software engineers, reasonable salaries, close to EU etc.

There are always co-founders. Example with Rozetka.ua shows that wife could be a co-founder as well.

Connections are very important:

useful to know suppliers like it was with founder of Rozetka.ua
useful to have existing connections like it was with Grammarly
useful to have friends who could introduce you to useful people like with Genesis
useful to know the best around like it was with Na’vi

Each of those companies had a bad time at least once. Either there were no financing, or idea didn’t work or profits were too small etc. But all of them withstand raining days and are going to their next goal.

Lo and behold, companies with a focus on profit/success are the one that receive highest profits.

Highlights

I’ve made a lot of highlights, and want to share here only some of them:

“Вони зазделегідь підготувалися і перед звільненням відкладали гроші, щоб прожити якийсь час без зарплати. За словами Палієнка, на момент старту Prom.ua партнери мали досить заощаджень на рік-півтора економного життя.”
“Їсторії, коли людина стає успішною, виїхавши зі своєї культури вже дорослою, - вкрай рідкісні. Вони є, але в загальній масі статистично незначущі.”
“Я не дуже розумію людей, які звідси їдуть. Київ - одне з найкращих місць для життя.”
“Перші гроші Prom.ua заробив вже за кілька місяців після запуску - на рахунок компанії “впало” $5 завдяки реферальній програмі AdWords.”
“У той час їхній проект заробляв досить скромні $500 на місяць, але навіть такі доходи давали чималий привід для оптимізму… тільки на рекламу за перший рік було витрачено $15-20 тис.”
“Перший рік - це час дуже великої кількості експериментів, суть яких - адаптувати продукт під ринок. Головне - не впадати в зневіру й безупинно експериментувати, підганяючи продукт під ринок на базі тестувань, здорового глузду й того, що роблять конкуренти…”
“система аналізу даних може підказати, що замовити з певною стравою, і вивести список найпопулярніших справ у цьому закладі на основі відгуків інших клієнтів.”
“хребтом, на якому тримається тіло компанї, є стратегія.”
“Стартапери часто забувають, заради чого все почалося. Вони відхиляються від необхідної стратегії і, впираючися у перепони, звертають у той бік, куди їм простіше рухатися. При цьому обрана спочатку мета може опинитися не попереду, а за спиною.”
“Наступні два роки Чечоткін реінвестував до 90% прибутку в розвиток бізнесу.”

Conclusion

It is possible to create a tech startup in Ukraine. However, at this moment you might need to focus largely on external markets. You also might benefit from registering company in US or EU.

Kyiv is a great place for HQ. Cities like Kyiv, Lviv, Kharkiv and Odesa are great places for development and R&D office.

First years would be the hardest, but founders should not lose focus and work hard to achieve success.

The most important SDE skill: Ownership

2017-05-13T23:19:00.000-07:00

”Without ownership, there can be no leadership." - http://georgecouros.ca/blog/archives/3791

This post is about one of the most important leadership skills you can find: Ownership. I see it as a root of all other leadership skills. If you don't feel being an owner, you often don't feel neither responsibility, nor interest or empowerment to bring changes.

What is Ownership

Ownership is “the act, state, or right of possession something.” You can own computer, car or house. And as such, you feel an ownership to those things: you can install new software to you computer, clean it up, fight viruses, upgrade it etc. Same story is with car and house: you feel and are empowered to make changes and in your interest to make changes for better.

“Ownership is about getting something done no matter what. ” -http://www.tandemlaunch.com/ownership-versus-leadership/

There is also another definition of ownership. You take ownership when you feel responsibility for the results of your work: finishing up project, taking care of found issues, making future improvements. As an owner, I will get this project done, I will make sure it is tested and properly working, I will make sure that it delivers what my customers expects, and I will make sure that found issues will be resolved.

Ownership in Software Development

Actually, there is nothing special about being owner in software development. Development process is quite well defined: there are phases for gathering requirements, designing and planning, coding and testing, bug fixing and launching. List of artifacts is also well defined: source code, documentation, tests, metrics, list of issues and etc.

Group Ownership

Group ownership is also a shared ownership. Multiple software engineers own codebase, projects and features, specific modules or components. Usually, engineers in a team will have different level of ownership skills:

some would just do the work and won’t care much of the result,
some would focus on finishing single task successfully and wouldn’t care much of a whole picture
others would care mostly about their own feature/component and wouldn’t care of a whole projects
others go beyond and focus their attention on a whole project.

The higher ownership skills, the larger area they normally cover. I would call it an “ownership scale”.

Off topic: Ownership and Career Growth

Interesting that you can see a clear match between ownership scale and engineer’s position: junior software engineers are barely understand what is happening and focus on finishing up single task; mid-level engineers are focused on a codebase, features that they work on and, maybe, a few more aspects; senior engineers go beyond that and focus on a whole product or larger part of it.

Ownership level tells what are you focusing on as part of your everyday job and how how successfully you will be able to deliver results.

And based on this, I make a statement that Ownership skill should be an important part of promotion evaluation, same as technical skills are.

Working in a team is always harder than working on your own. Dealing with more people that are owners of the product (just like you) is even harder.

Group ownership requires to take into consideration thoughts and ideas of other owners. There is no a simple way to make a decision now. But luckily you get more than just a headache. Discussion with other owners can help find both consensus and a better way. It is because all owners are focused on improving things and successful delivery, not on own ego satisfaction.

The strategy of “my ownership is better than your ownership” is a lose-lose strategy. Such behavior will kill or suppress ownership attitude in others. As an owner who care about the product, searching for a path to success, you lose like minded people who could help you here. Demotivated ownership might even cause an opposite to ownership behavior. Real owner would grow other owners around, b/c in a team work this is the most effective way to delivery successful product.

Examples

Here I’m going to provide a list of different behaviors and try to give my opinion on ownership level.

Mark finally finished the project he didn’t like much. And now he is not eager to work on the related bug that was found recently. I think you would agree with me, if I would say that Mark’s ownership level is not very high. First of all, Mark didn’t like the project and now attempts to skip working on issues that are result of his previous work.
Hanna finished working on the project that she was assigned. She didn’t want to work on this project, but business really wanted it. But Hanna took the project and work it through and delivered successfully. Now business loves Hanna and has a new project for her. This is a good example of ownership. Hanna took the project she didn’t want at first, but liked eventually and could deliver it successfully.
John was asked to help with task for project X. John was really helpful and could finish task successfully. He also fixed a few issues he had found, and helped Sarah to develop analytics module faster. Project X was released on time with John’s help. This is a great example of ownership. John was just asked to help with a task. Instead, he take a responsibility to fix issues and help Sarah with her work on the project. Don’t be like Mark, be like John.
Lisa works only on the tasks that she was assigned. She doesn’t spend much time learning to do her work better. Lisa also doesn’t spend time to see how she could improve the project. Most of the time she spends on tasks added by others. Whenever she finishes with task, she moves next without a necessary testing. I’m not sure that I see any ownership here. Lisa is not owner: she doesn’t care about results and quality of her work.

Summary

In todays world, having a great technical skills is only one important part that makes a successful software engineer. But there is more than it. Ownership skill is another important part. It is important for engineer to get projects done successfully, improve team, processes and software.

If, after so many years improving technically, you still find you need to get better, then switch your focus to improving your Ownership skills.

Iterate Quickly

2017-04-23T21:52:00.001-07:00

That's what others recommend

Recently I've being reading 2 different texts, and they both mentioned how high-velocity iterations are important and better than high-quality iterations.

First one, is article by Jeff Atwood, where author writes about “Boyd’s Law of Iteration”. Article is quite interesting. It introduces us into the history behind the Boyd's law of iteration. Article as well emphasizes the law itself: speed of iteration beats quality of iteration.

Second one, is latest annual later by Jeff Bezos to shareholders. In section "High-Velocity Decision Making", author also emphasizes superiority of high-velocity decisions over high-quality decisions. Interesting that Jeff Bezos finds that 70% of useful information is usually enough to make a decision. Waiting for another 20% or 30% of data might take too much time, and might be not as much beneficial.

In both cases, authors notice that making quick decisions and iterating fast can only work well if there is enough information available originally. And feedback is used to correct the course continuously during quick iterations.

In book The 5 Elements of Effective Thinking, authors also emphasize the importance to start fast. In section dedicated to making mistakes, authors encourage to not be afraid to start with mistake (if you don't know where to start), but continuously iterate incorporating received feedback and new knowledge.

Iterate Quickly in Software Development

What does it mean to iterate quickly in software development. I'm going to throw a few ideas, even though they are quite obvious:

Plan smaller and deliverables for each milestone:
- define list of milestones
- know deliverables for each milestone
- plan time to receive and incorporate received feedback
- follow the feedback, not the original plan: course-correcting is the right way to go.
Start testing faster:
- write unit tests
- write functional and integration tests
- use staging for deploying latest versions
Deliver features faster:
- if can't deliver a complete feature, then deliver part of it
- if can't deliver to all customers, then deliver to some only, like internal or beta users
Sprint should take less time:
- 1 month sprint is too slow in most cases
- don't forget to make sprint retrospectives
Collect feedback from users continuously.

Why I am Not Sure that TDD is the Only Right Thing

2017-04-09T16:08:00.003-07:00

Let's say you have a project to deliver feature XYZ. After analysis, design and planning, you came up with a term of 2 months: 2 weeks for scoping a work, 4 weeks for coding and 2 weeks for testing. Sounds great.

As anything else, testing has multiple perspectives. The most obvious one is verifying results of your work and keeping quality under control. The other one, less obvious, is minimizing the risks caused by numerous bugs. Third one is related to previous two: minimizing the cost of software development and support. The earlier you test, the easier it is to fix and less bugs you have later.

Lets look at two possible scenarios:

Scenario 1: No unit tests

So you don't believe in unit tests or you just don't have a time to write them. You'll better test everything manually during 2 weeks of test phase.

You have finished with your code, and now it is time to wire everything up and start testing. Oops. You found a small bug, where you just had forgotten to add "!" in your if statement. Quick fix. Easy peasy. You make a change. Your favorite build tool picks your last commit and makes a new build. After 10 minutes you have a binary. Another 5-10 minutes and you have it deployed and ready for testing.

You test it again, and... Oops. You found a small bug. Looks like you actually need to call that method first, otherwise you get NPE. Damn. What a stupid mistake?! Quick fix. Easy peasy. You make a change. Your favorite build tool picks you last commit and makes a new build. After 10 minutes you have a binary.

You test it again, and... Oops. You found a small bug. Well, I guess you see where I'm heading. 2 small stupid bugs, you already spent 1 hour. You have only 2 weeks, but you can’t even get it working for happy case, don’t mention for all other dozens of test cases.

And nothing prevents you from regressions during bug fixing.

You probably see now, what is wrong with this scenario. You think you've saved time during coding, but actually you didn't.

Unit tests are a convenient way to spread testing work throughout development phase. So you'd need to do less during testing phase, where fixing bug is already an expensive thing.

I rarely find bugs when I write unit tests. Maybe 1 in 20 tests that I had actually found and fixed a bug. But, man, this is so fast to fix a problem at this time. Everything is locally on my machine in my IDE. I don’t need to build and deploy change, neither others are blocked by me. Found, fixed, done! A tad of time here, saves hours later!

Scenario 2: Write unit tests during coding

So by now you've decided to try out writing unit tests. Maybe this will help to avoid the hell you've had during Scenario 1. You start writing unit tests for most of the code you produce. It has been really hard at first, but later became easier. You use mocking a lot and thus unit testing becomes almost painless. Actually, you can't imagine now how to write new code without tests, but time to time you skip writing tests for less important parts.

You've finished you coding by now, and ready to start testing. And you find that there are not that much bugs now. And when you find a bug, you know how quickly to fix it and verify with unit tests. Thus, you don't need to go through tedious fix-build-deploy-verify cycle as you used to do during Scenario 1.

As result, you spend more time during coding phase, as you need to write unit tests. You however spend less time during testing phase, as you can iterate bug-fixing faster.

Of course, you still spend a tad more time than planned: but it is because you need to learn how to estimate better!

TDD vs Plan Ahead

Assume we have 3 approaches:

you follow TDD and write tests first, and then write code to make your tests pass eventually
you take a paper or text file, and you plan ahead by writing a list of test cases that need to be converted to automated test suites; then you code and write test cases during coding or afterwards
you don't plan what test cases you have, you just write a code and cover it sporadically with unit tests

It is probably clear that 3rd approach is the lacking one here. You start writing code without defining what are your expectations. You don't have a plan of what needs to be tested. As result, it is easy to miss many important test cases.

1st and 2nd approaches sound very similar, as they both have the same goal: know and plan what you're going to do before you start coding.

However, 1st approach forces you to build constraints for you and your code before you start coding. Of course, you can refactor those later, again and again. While 2nd approach gives you a flexibility to write the code and refactor it until it fits your vision. You can now create tests during or after you've finished with piece of code.

Both 1st and 2nd approach require self-discipline, but in different ways. TDD needs a discipline to start and keep to using it. 2nd approach needs a discipline to convert test cases on paper to unit tests in code.

Not obvious difference between 1st and 2nd approaches is completeness: it is easy to miss something when you write a code. If you start defining a list of test cases on paper, it is always easier to identify completed list.

Summary

Here are some bullet points as a summary from this post:

writing unit and functional tests is important to keep quality under control, save time and money
it is never too late to write tests for the code, but now is better than later
thus, write tests during coding and before commit and code review
writing tests before code does not make things cheaper or better, unless you don't have a discipline to cover functionality with tests
we need to work on improving our estimation skills.

Structured Aggregated Log

2017-02-20T21:49:00.003-08:00

Introduction

Structured logging is an approach to logging where information is written in a structured form, that later could be read and parsed by machine with minimum efforts. In many cases, it means that immediate readability of the log is reduced, but with a bit of efforts it could have an opposite effect (we’ll see it a bit later).

Such different Structured Logging

This structure can be anything: from comma separate list of values to complex JSON/XML documents.

Structured logging’s another important difference is focusing on logging larger number of information in batches.

For example, you have an application that reads data and passes those through a chain of rules. Each rule can either vote to execute some specific action over those data later or not.

There are a few approached to logging data here. Lets review them.

Unstructured log line. Each class writes a log line with information it currently contains. There is no common structure for log printed by each rule. Thus easy to notice a different ways of saying the same. In logs below, one rule says ‘Voting for action’ while another says ‘action is picked’. Some rules print important information and others don’t. Such logs can be parsed by human but not always by computer.

Structured log line. This is better. At least same approach is used for logging all information. Yes, it is not always easy to read by human, but it is in a state where both human and computer can read and interpret those logs.

Structured aggregated log line. Simple here: you gather all the “execution” data and then output them at the end in a single structured log line. In this case it is preferable to use JSON or XML to structure the information. CSV most probably won’t be the best option here, especially because it doesn’t support nested structures. Pros: you have a single line with all data. Cons: you need to keep those data until final log line is generated and also it is hard to read those without practice or a formatting tool.

Which one is my favorite? – The last one!

Why? – Because it can hold important information in a single line and save me from grepping and parsing multiple log lines.

Structured Aggregated Log Line

For structure aggregated log, there is a little help from most of logging frameworks. So this functionality needs to be built manually.

Main components of structured aggregated log are aggregated log collector, data formatter and logging framework to output.

Lets get deeply into pros and cons of structured aggregated log line.

First start with good things:

You have all information aggregated into single log line
When using JSON or XML, you could benefit from auto-formatting tools
Even better for program to parse and read data from such log line
Better control on output data:
1. Customize output strategies
2. De-duplicate data (printing same thing once)
3. Structure data in usable way that simplifies understanding
De-duplicating log metadata: skip printing duplicate date/time, level and source of logs etc.

There are of course bad things:

Probably is not supported by “your favorite” logging library. So need to be implemented manually
Need to store aggregated data, and also building a JSON/XML document to print
Average log lines become huge and hard to parse by human (however auto-formatting tools help a lot)
Logger should support printing partially populated aggregated data (more on this below.)

I’m going to deep dive into some items mentioned above to explain in details and, if possible, with examples.

Auto-formatting

This is simple. You found a log line, copied it into auto-formatting tool, pressed the button and you have a well formatted and easier to understand log:

Customized output strategies

This one is quite important. As you have aggregated data and you need to generate your log line, it becomes similar to how you generate a “view” based on “model” in MVC pattern.

You can do many useful things now:

generate JSON log line, or XML
generate short log line with the most important information for successful use
or generate a long log line full of all aggregated data in case error or exception happened
filter out unimportant and don’t print them into output log
etc.

De-duplicating data

Lets compare examples of unstructured/structured log lines:

They both have a date/time and log level and request id for every log line. We get rid of this duplicate data if we print all data in single line.

But that’s not all. In both use cases, TrafficLightIsGreenRule and TrafficLightIsRedRule print traffic light status. Those are independent rules, and traffic light is important for both of them so they both print it. However, in structured aggregated log line, we print traffic light status only once.

Handling errors

Errors happen. So might happen one before you’ve aggregated all the data and printed them into log. How to deal with this? The most important here is to not lose data aggregated for logging: those need to be printed in any case even though they aren’t complete. Those would be invaluable source of information during error investigation.

Often it is possible and recommendable to print exhaustive data into an aggregated log.

Or alternative would be to collect error message and stack trace into the aggregated log data:

Presentation

As mentioned before, one of the strong benefits of structured aggregated log line is presence of important data in a single line. And this line is in the format that easily understood by machine.

This allows to read data out from the log and do various manipulations. Like generating a better way to present and work with data.

I tried to imagine a way I could get a JSON log line, parse out data from it and convert into readable HTML document, that gives me a better view of data. For sure, this is not very helpful with mine trivial example. But for a larger logs this is more beneficial.

Creating a simple tool where you can paste a log line and get such presentation of data wouldn’t be a big task. But would help to decrease a level of operational load.

Jenkins Pipeline

2017-01-03T02:35:00.001-08:00

Introduction

I believe in simplicity. It means I think things should be simple to work with.
It doesn't mean that it always should be simple inside. It could be complex, sometimes very complex.

At the same time, I distinguish two types of complexity:

complexity caused by indirect and non-obvious relationships between components,
complexity caused by mess within components itself and messy relationships between them

First one means product could be implemented in a very smart way, built on large number of implicit assumptions, often non-obvious.

Second one means product's complexity is exaggerated by very confusing, illogical and messed up relationships between components. Unlike the first one, relationships exists, but they make no sense not because they are so over-smartly designed, but because they are spaghetti-like.

Often those two types could be met together in same product. Hope you'd never had a chance to deal with such.

The way to fix complexity type #1 is to remove implicit assumptions by adding smaller components with visible relationships.

However, to fix complexity type #2, you need to understand it deeply. What if thing is complicated b/c you can't understand it yet at this moment? To be able to answer this question, I always start with research. And my research has some kind of diagram. I believe that visualization is the best way to tackle complexity. At least it always works well for me.

Simply put, visualization is one of the best ways to simplify hard and complex things. Even if this is a very basic diagram, it is still better than nothing. Sometimes, it requires a time to find a correct form of visualization within correct level of abstraction, but once you did it, you are half done.

Continuous Deployment

And this post is about visualization; and how it can help to combine parts into a single simple picture. And more specifically, it is about how visualization can help simplifying continuous delivery.

Continuous delivery is a process of getting you product from source code and into production. It usually happens through a pipeline of jobs: first job is compiling source code and building artifacts, then goes integration testing , deploying to staging environment, and eventually to production. For example, in Jenkins job is usually created for each of these steps, where each job triggers next one once it's finished successfully. Fore example, there would be jobs like 'Build XYZ,' 'Test XYZ,' 'Deploy XYZ to Staging,' 'Deploy XYZ to 1-box' and 'Deploy XYZ to Production'.

Default Jenkins View would present those jobs as a list, with no visible relationships between those. But there are relationships between them. And actually all those jobs are here for the single most important goal: get new version into production for the customers! So relationships play important role, which is hidden from us as users.

You might not even feel it immediately, but this presentation of jobs list brings a complexity. There is a sense behind those jobs, but it is normally hidden from viewer, unless one ready to spend time to understand how things work.

Pipelines in Jenkins

But good thing is Jenkins already allows you to remove this complexity. And it's via visualization of the pipeline of jobs you've created.

This support is brought by "Build Pipeline Plugin". This plugin adds a new "View" type called "Build Pipeline View".

In a next step you would need to pick your first job in the pipeline.

Then pipeline would be created based on jobs dependencies: this pipeline will contain jobs that would be triggered by first job, and also jobs which are triggered by that jobs and so on.

And now once someone makes a change into XYZ source, a new job for "Build XYZ" would be triggered. Once this job is finished successfully, it will trigger "Text XYZ" and so on. As you see no functionality change happened here, but with pipelines it is possible to visualize both dependencies between those jobs and what is a current state of the CD process.

That makes things so simple to work with. You can understand build structure and current state with a single glance.

More to that, Jenkins 2.0 comes with built it support for pipelines.

Blue Ocean

I'd also like to mention the initiative called "Blue Ocean" which sets a goal to build a better visualization of pipelines in Jenkins. I'd be happy to see it in live one day.

Other Products

There are bunch of other products that would help you to create build / deployment pipeline:

AWS CodePipeline - https://aws.amazon.com/codepipeline/
Concoure CI - https://concourse.ci/
Bitbucket Pipelines - https://bitbucket.org/product/features/pipelines

Stack trace dump on Go

2016-12-16T01:09:00.000-08:00

I'm from Java world. Means I spend most of the time coding things on Java and running them in Java Virtual Machine. This has own benefits and disadvantages. For example, I really wish JVM would use less memory and be faster from the start. It is awesome you can benefit from JIT eventually, but till then you should wait a code to run for thousands times before it can be translated into machine code.

At the same time, Java has bunch of benefits. I believe the most important one is a set of available tools: from small utilities to powerful IDEs to application servers.

One of such utility tools is jstack. It is use to print a stack dump of application threads, including main one. What a useful thing! Saved so many time when I tried to investigate why my application got stuck or is really hard on CPU.

And it is also one of the tools I really needed a few days ago: I had to understand why service written on Go is so eager for CPU resources. I spent some time searching for a tool that would work similarly to jstack but for a Go application. Couldn't find one. Yet found how to get a stack trace with a small change in my Go application.

Go standard library already has everything you need to print a stack dump: signals support to notify application and runtime.Stack function to get stack dump, and print it.

NOTE: signals are primarily supported on UNIX based systems, it means this approach might not work as expected on Windows.

And now you can run you program, and ask for a stack dump by sending a signal using kill command:

$ killall -SIGQUIT accountservice

and you'd see something like this

Output consists of stack traces for each goroutine. Each stack trace starts with 'goroutine', its number and current state. Stack trace, like Java one, shows the most recent operation on top, contains code file name (not a class name), line number, and also function parameters. Also stack trace contains information about goroutine creator.

Now, it should be easy to find why application got stuck, or what is it currently doing so that almost all CPU is used up. However, this solution might not work always: in case all resources are used, it might take a time until SIGQUIT notification is processed by application.

Simple RESTful API on Go (with Gin)

2016-12-03T22:59:00.001-08:00

It is very populate to use Go to create microservice applications. And actually there are many benefits when using Go comparing to other technologies: it is easy to start, easy to create, and so cheap to run that on your servers. The last one is one of my favorites. Go application has so small footprint on memory and CPU comparing to Java applications, and the same time runs faster.

In this post, I want to show how to create a simple RESTful web service on Go using awesome Gin library.

First, lets create an main function, that sets up logging, Gin router with registered paths and handlers. Gin provides a simple way to create template-like paths and register a handler for each path.

Now lets create DTOs for our business models, and methods to convert from model to DTO:

Finally, need to create functions to handle requests. Again, super easy with Gin:

And we are done. Well, almost, as those examples don't actual models and functions to work with storage.

Simple configuration on Go

2016-11-20T22:14:00.000-08:00

Very often your new tool/service needs to have a configuration defined either via parameters or configuration file.
For example, you want to have different settings in development and production environments, or you just have a separate fleets for different clients.

It is pretty easy to add configuration support to you Go application using YAML for configurations and gopkg.in/yaml.v2 for deserializing YAML files into struct value.

Here is an example YAML configuration file. It contains service name and its listening endpoint, and also database connection URL. Pretty simple!

To read configuration, we first open and read whole configuration file, and then use yaml package to unmarshal it into value of Configuration type.

And finally a part of main function in our application, where we use parameter to pass a path to the configuration file, and load it to use configuration later:

That's it. Simple and fast!

Reinforcing feedback loops

2016-11-06T23:42:00.002-08:00

There are two different types of feedback loops recognized in systems:

Balancing feedback loops
Reinforcing feedback loops

Balancing feedback loop controls inflow and outflow of resources. Example of it would be amount of money on the bank account. Amount increases if you put money into, and decreases if you withdraw them.

Reinforcing feedback loop is a different kind of animal. It isn't as strait forward as balancing feedback loop. Example of it would be interest rate for the bank account. The more you have money, the larger interest you get. The larger interest you get, the more money on you bank account you have. Another example of reinforcing feedback loop would be well know dependency between education and social status: if you have money, you can get better education, the better education you have, the more you can earn etc.

Many people are very focused on balancing feedback loops and give little attention to reinforcement feedback loops. At the same time latter one are very important in our life.

Reinforcing feedback loops are very important in our life. They can allow fast growth or decay. They are that magic behind growth of many successful companies.

The better Google's search is, the more people use it. The more people use it, the more data Google has to improve its search.

The better apps on your iPhone, the more you use it. The more you use it, the better apps are becoming. The better apps, the more you use iPhone. Similar thing with AppStore.

The more you invest into some company XTZ, they more money it would have. The more money they have, the more successful they are. The more successful they are, the more stock growths. The more stock growths, the more chances you keep investing.

And so many and many examples.

Reinforcement feedback loop is one behind "Success for successful and fail for failures" trap.

Thus, creating successful enterprise is not only about good old balancing feedback loop. It is always a lot about reinforcing feedback loop behind it.

Wonderful book on this:

Thinking in Systems: A Primer

See it on Goodreads

Links:

http://www.systems-thinking.org/theWay/sre/re.htm
https://www.youtube.com/watch?v=hdGxIameiM8

AI as a new atomic bomb: the weapon of future

2015-05-08T18:40:00.004-07:00

Artificial intelligence (AI) is an atomic bomb of future. The one who owns it - rules the everything. Companies that work to create AI become a rival to the countries that own everything, including atomic bombs.

While some countries fight to have own atomic bombs (Iran?) and others try to threat others with their own bombs (Russia?), they seem like outdated countries that live in the past. The future is not in atomic bombs, the future is in intelligence that could control them. Intelligence that could create new technologies, new approaches, new weapon and new future is much more important, much more strong than "some" bombs.

You cannot fight someone who is smarter, sees and knows more than you. You cannot win when all your steps already pre-calculated, and you play in someone else game.

However, there are still some large gap between current state of technologies and real AI. It seems we are not able to scale existing technologies enough to produce superhuman intelligence. Not clear if existing algorithms and approaches are able to be scale to that level though.

How I lost 60 pounds in 3 months

2014-11-28T00:52:00.001-08:00

November 30th, 2013. I went for my first time in recent years to the gym. Simple goal - get rid of extra weight. And I was way overweight - 230 lbs with a height of 5'10". Exactly, 3 months later, my weight was 174 lbs, ie 56 lbs less, and another 20 days and my weight was 167 lbs.

Some days, I was loosing 0.5 pound per day, other days only a small amount, but most of the days I woke up lighter and lighter. With weight decreasing, I got energy increasing. Although, I was eating less, I was moving more, and feeling more energized. But not always. Sometimes, I felt really exhausted, I couldn't work more than 7-8 hrs per day. I couldn't allow myself to skip a lunch, otherwise I became aggravated and felt bad and angry.

One way or another, 1 year after, my weight is 172 lbs and I feel myself best ever.

Lots of effort have been put by me during first 3 months, and I still keep myself in a diet and visit gym at least 3 times per week.

I also have to say, that my body has some specifics: I usually gain weight pretty easy, but can lose it fast as well.

So, here is my regime for first 3 months.

First 2 weeks I was randomly visiting gym, trying to get into the minimal shape when body does not fall apart and full of pain after minimal work out. First week, I visited a gym only once - on Saturday. Next week I made it twice. Week after than, I switched to 3 times per week: Tuesday, Thursday and Saturday. That's how I keep. Not always I can do Saturday, so sometimes move it to Sunday. If I have to skip, this usually would be Thursday (hate doing legs).

On my first week I went to amazon.com and bought and weight scale. I got it same week and started using to know my weight. I grep the internet and found that the best time to weight is on the morning, after visiting restroom, before shower, and being absolutely nude. That's how I do every day.

On my second week, I met Ben in the gym, and great guy who gave me lots of good advices. On of the few first advices was to get a journal. Next day, I went to the show and bought a simple excecise book. I was filling it with list of exercises I want to do that day, and with initial weight. So I go to the gym, and had a plan what I will do, without any ways to ignore, b/c everything is planned and writted down. Although, I cheated sometimes, you can't always be 100% productive and some days you're really exhausted even before the gym. In this case, I used the rule - do as much as you can!

Another section in the journal I used for logging my weight. I log my weight every morning since then, except a few one when I was on vacation or had a travel. When you see how numbers are going down every single day, it motivates more than any support from other person. I believe in measure everything, when it is needed to get to he goal. Measure weight, measure how much stuff you've done, measure how much calories have you lost or how much you get with a food.

So, I was visiting gym 3 times a week, another 2-3 days per week I used to walk, at least 1 hour per day. I found it so awesome, that I still keep doing that, especially when have something to think. Also, 2 hours in the gym give you a lot of time to think about many things, walking (and running especially) keeps to clear you mind and look on many things from different angle.

First 4 months, I was doing my regular work outs:

- Saturday: hands + chest
- Tuesday: back + shoulders
- Thursday: legs

Except this, every day I was doing bench press (horizontal, inclined and declined depending on the day of the week, and abs). A little bit later, I've added dips as every day required thing to do.

After I was done with weights, I did running. At least 30-40 mins. That was killing. Body was already exhausted, and I spent another 40 mins running (first very slowly even half walking, later running at speed 6-6.5 mph). I believe, that good cardio after weight training gave the best results in term of losing weight. After 4 months, I switched to same extensive running on non-gym days (instead of walking). And was doing until recently, when I got sick so had to stop running and switch back to walking. Soon going to start doing it again. Running is the best activity you could do for you body, brain and spirit.

At this moment, I'm doing things a bit different. The goal changed, from losing of weight to keep being in tone and growing muscle weight. I still do 3 times per week. I do shoulders also on Saturdays now. Ben and some other guy recommended to have a short warm up run before traing. That's what I do now: I run for 6-7 mins with good speed and incline angle to warm up, then stretch and do bench press. Everything else follows. I found that a short warm up run gives a good boost on muscle growth. I usually end up my work out with dips: do as much as I can but no more than I have to. I learned a lot, and now to everything slower, trying to feel how my body works. I take higher pauses, and combine different exercises. Goal is to do less but better, than many but somehow.

Lots of good things about food and training I took from Denis Seminikhin ( Денис Семенихин) video blog.

In my opinion, there are 3 activities that should go together in order to get the result:

1. physical efforts (work outs, cardio, just walking)
2. eating efforts (normal food, no overeating, etc. etc.)
3. continuous motivation

I described well about physical efforts. Now more about eating.

First of all, I started eating breakfast. I stopped eating after 7pm. When I was really hungry after 7 pm, I ate 1-2 pickled cucumbers. That's it. I ate only low fat fish and vegetables on dinner. Everything was dozed: up to 100 grams of fish, only a few broccoli, 3 eggs white (no egg yolk - throwing it away). After dinner gave at most 15-20 mins and started moving. Dinner was usually a oatmeal (raw 50-65 grams) with fruits, berries and non-fat Greek yogurt. Lunch is a salad, with minimum of carbs (ie no beans, no bread, no bread crumbs, only low-fat cottage cheese, chicken breast etc.) I was eating less carbs, more protein. Every morning I had a protein cocktail with 50 grams of protein. Similar one I had just after the gym. I had another 2 eat times at 10am and 4pm for some bar/apple/non-fat greek yogurt.

I think my body found a way to get the all required energy from proteins instead of relying on carbs. When I lower the protein and increase carbs, for a first few days I'm loosing the weight, until the body switches back.

I'm still on similar diet right now, although I increased the portions for dinner (more fish!), and allow myself to eat more sweets (I love sweets). However, I never buy sweets, I eat them only when they are available as free food at my work. I also allow myself to eat more carbs once a week (unless I had to much of sweets before). That usually won't be something really bad: could be nuts (so much of fat), sushi (rice), lots of fruits, lots of meat, or cheese. I avoid pizza, pasta, chips etc. I also have much more milk, and it is 1% milk, while before it was a bit of non-fat milk only for cocktails. I have a pill of omega-3 fat twice a week and multivitamins every morning. I drink tea with honey now, but trying to not do that too much.

I also don't use sugar, only sweeteners.

The last part, but is as well important as previous two, is continuous motivation. First thing one need to understand, is that we can do with our mind such wonderful things, that we can't do with our body as much. We can and should create own motivation and drive it by itself. In this case, there is no way the motivation could go away, could bring us down, or just get lost. It's always with us. That's what I did. Used something small that I had to grow to real large motivation, that I was growing up and improving. Original small motivation was not important anymore, and it was external, thus hard to rely on. I created my own, the one that didn't gave me to cheat, that was always with me at tough and good moments.

I never took anything for granted. If somebody gave me a support or said how good I am looking now, I took this like he/she gave me a $100 bill with a smile on the face. If somebody says me an opposite, I just ignore it.

I can talk a lot about motivation part, as it is the most interesting thing of the process. But won't do it now, as it's a separate topic with lots of openings and lessons.

Facebook Posters

2014-11-09T22:15:00.002-08:00

Just love those motivational posters you could find in Facebook's offices:

All those posters and more are from http://www.designforfun.com.

Simple Multiple BloomFilters data structure implementation on Java

2014-08-30T18:34:00.002-07:00

As a follow up on my previous post about using Multiple Bloom Filters data structure to identify hot values, I decided to write a simple dumb implementation on Java. And open source it.

The project can be found on GitHub and is proudly named multi-bloom-filter.

It comes with basic little class called MultiBloomFilter which does most of the job. It accepts the number of enclosed bloom filters, capacity of each BF and duration before the the head BF will be reset. One can also specify what hash function is used and how many times hash function should be applied for each value.

Simple example:

This short example shows that MBF will reset only one of the internal BFs. Means, whenever reset happens, it will remove only part of the data, and whenever the hot key is added again, it would be identified as such.

Once again, MBF is a great solution if you need to find a set of hot values for some period of time. In particular, this helps to put only hot values into the cache. If we have many hosts that use a single distributed cache service, then using MBF might save from redundant traffic of putting cold data into the cache, where they would be evicted pretty fast. Also, as hot keys are in MBF, means there is a high chance they are in the distributed cache as well. Thus application has some kind of "bloom filter" to check what is the chance that value could be found in the cache for specified key.

There are much more use cases for the MBF data structure. Being able to work in concurrent scalable environment is another "feature" that I love about BloomFilters and MultiBloomFilter in particular. For me, good implementation of BloomFilter, that is able to grow and scale correctly, has different mechanism to evict data and fight the false positives, sounds as a very useful service.

SLA

2014-08-17T08:33:00.000-07:00

SLA (Service Layer Agreement) for large distributed software is very important. It plays the role of contract between the application and it's clients. It's also not very straightforward to achieve, especially if many components participate in the process. Each component in this case should have even more strict SLA, because the result of each component SLA would sum up. For example, if single call to some service A is resulted in multiple calls to other services, it's important that other services had better SLA in order to keep service A SLA promise.

There might be different types of SLA: resources, availability, durability and performance.
Many systems provides contract on how many resources are available for the client: memory, CPU, disk space etc.
Some websites and services say that their availability SLA is 99.9%, which means that 99.9% of time they going to be available. Actually, that is not very much at all. There is a nice table on Wikipedia with conversion between availability percentage and actual time.

Some services, especially the storage services like S3, have also durabitlity SLA. This to say how rarely the service might lost the data.

Performance SLA is common for running services that need to not only be available, but return response on request within specified period of time. For performance SLA, it is always common to use some percentile of requests that would be handled within SLA. For instance, it might be said that SLA is return response in 10 ms or less for 99.9% of requests.

Twister: A Simple Way to Manage Your Scripts

2014-08-09T23:42:00.000-07:00

Imagine an average project that has many scripts, each written using different practices, uses different argument names, different namings, does something similar to other script but a bit different etc. Sometimes there are so many scripts, that it's hard to find the one you really need at this very moment. Moreover, scripts have no standard location, often put in semi-random directories, so it's really hard to find them. And even more, many developers have similar scripts for different projects. Some scripts are in the PATH, others are relative to the project directory. The one in the PATH are named in odd manner, because different versions used for different projects. And some scripts are written using bash and ruby and python etc.

Oh, so many troubles just because of the scripts. That's the reason why Twister was created. Twister is a simple framework/tool that allows to create and manage project scripts. I use it in a few of my home projects, and find it very helpful. About a year ago, I open sourced Twister. It's a python project, so it makes it simple to create scripts and execute them on any popular operating system.

Twister helps to achieve next goals:

Structure project scripts into groups (known as modules).
Separate scripts from different projects by having different twister setups and aliases
Simple and single structure to create scripts. As result, also a single place for all project scripts.
Set of built-it commands to enumerate commands, find interested command and see its description

Each script or, in terms of twister, command belongs to some module. Module consists of one or more scripts. To execute the specific script, one need also to specify the name of the module. For example, there is a need to run a command build from module dev. This can be done as simple as:

twister dev build

In case the command is used often, it's possible to create an alias for it. For example, for dev build there could be added alias bld, so previous command shorts to:

twister bld

Built-in commands does not require any module name.

Twister should be setup separately for each project. I usually put twister directory into the same Git repository. I add modules and scripts to the project's twister setup then. Thus, I keep twister and my scripts in the version control. Which is very convenient. After than, I use the add-shortcut command to create a shortcut in PATH to the project's twister. As I have multiple twister setups, I use project name instead of twister. For example, let I have a project name colibri. There is a separate twister setup for it, with custom build script. My first move is to setup a shortcut to colibri's twister:

./twister.py add-shortcut --name colibri

This will create a soft-link to the local twister setup, will name this link colibri and will place it into /usr/local/bin directory. So this makes it possible to execute the command anywhere in the path, it will be still executed in relative position to the twister setup:

colibri dev build

Each twister command comes with next parts:

list of arguments
short description
long description
executable logic itself

This makes easier to define the command, but also manage a set of commands. Every module is represented as directory in the twister's root directory. Each command is located in the appropriate module directory. Built-in commands are located in the twister directory. Each module contains file _commands.py which declares what commands are exported by module.

Twister goes with list of built-in commands, that allows quickly review available modules, commands, find commands and see their documentation. Such commands are:

find-command - find a command in any module that matches the search criteria
list-commands - list all commands in all modules
list-modules - list all modules
list-aliases - list all defined aliases

Twister goes with some useful documentation in the README file.

Reverse Hash Tree

2014-08-01T00:30:00.002-07:00

Reverse hash tree (RHT) is a tree data structure, where nodes contain a hash (ie not value only). I call this tree a "reverse" because the way node's hash value is computed is reversed to the Merkle tree way. In Merkle tree, parent nodes contain a hash of the child nodes. This allows a quick way to find the difference between 2 parent nodes in 2 trees - they would just have different hashes. The next step would be to find the child node that is actually differs.

In RHT, not a parent node hash is build based on child nodes hash, but in opposite: child node hash is build based on its own hash/value and parent node hash. For leaf nodes, value could be something useful, for intermediate nodes it could be just some version or category etc.

When child node it accessed, its hash is validated based on current value/hash and parent node hash. If the value is valid, we keep validating parent node hash with its parent node hash etc. This actually could be done on the way from root node to the child node. If found any node with inconsistent hash, this means the node and its children require a hash/data recalculation.

Because of this hash key organization, it is possible to change a value/hash for some specific parent node, and it will require all child node to re-calculate its hash based on the parent node hash. This gives an easy way to invalidate large amount of data, without actually doing this for every specific record. Here invalidate doesn't mean the same as remove data. As nodes in the tree stay untouched, they just will need to recalculate its hash. Re-calculating hash may include additional responsibilities, like check if nodes value is still valid/fresh or may require re-computation on its own.

This makes reverse hash trees a valuable structure for caches. Instead of removing part of the tree from cache, it is just marked as invalidated. Old records are still accessible, so could be used when its required (for example, getting a fresh record takes too much time). Because old records stay in the tree, another process could be used in parallel that could do refreshing data and hash.

Keeping records in RHT, when parent node version changes, allows to save system from extra work. For example, assume that parent node is changed b/c the environment changes, and new child records might be added, some old records could be removed. Also, calculating the child node value is expensive operation, but checking if value should be removed is not. RHT will allow to find the child nodes that need to be re-evaluated (removed or not?), but after that will save a lot of time, as no need to calculate value for the present nodes. Only new nodes will require additional work.

Leader Election: Gallager-Humblet-Spira (GHS) algorithm

2014-07-11T14:36:00.000-07:00

GHS is an algorithm to elect a leader in the arbitrary network. It is based on building a minimal spanning tree (MST) of the network, and then based on it electing a leader.

Algorithm for building MST is very similar to the Kruskal's algorithm, although it has some specifics for the nodes distributed in the arbitrary networks. For example, the nodes have to communicate with each other in order to detect connected components, merge etc. After the MST is built, one of the nodes is chosen to be a leader. The notification is sent to all other nodes in the MST.

Basically, the goal of this algorithm is to identify the nodes in the network, and the elect a leader in the known field.

Algorithm of building MST is very interesting:

Nodes are organized into fragment. Initial fragment size is 1 and contains only single node. On every new step, fragments are merged, so now the fragment size increases.
Each fragment (ie each node in the fragment) keeps next information: it's level and name. Level is a number of merges with same-size fragments, name usually is the name (weight) of the edge that connected the fragments.
When 2 fragments of the same level are found, they are connected into single fragment, the level of this fragment is increased by 1.
When 2 fragments of different size are found, the fragment with smaller level is merged into the fragment of larger level. The level information is changed in the merged fragment (ie fragment with lower level). Lower level means smaller size, thus smaller number of nodes need to be updated with new level information.
When there is no fragment that is connected to the existing fragment via some edge, the MST is built.

Algorithm uses number of different messages to identify neighbor fragments that could be connected. The messages to find the minimal weight edge and run by nodes are test, reject or accept. Another messages are connect the fragment, change root etc.

The message complexity of the algorithm is 2E + 5N log N, ie O(E + N log N).

There is an alternative algorithm to the GHS, which however sends more messages. It is based on wave message and extinction. Each node sends a message with it's id to each connected node. Each connected node keeps the currently active wave, that is the minimal id it has received. Whenever such node updates it currently active wave value, it sends a message with it to its connected nodes. Thus, the wave with minimal id keeps going through each node, while all other waves are terminated eventually. After all nodes sends update with currently active wave equal to the minimal id, algorithm is ready to elect the leader.