Random(Notes)

ListIterator

I'm quite ashamed right now (after like 15+ years using Java pretty extensively), but I just recently figure out that ListIterator exists. Maybe, I knew about it and just forgot? Many times I avoided modifying the list while iterating over it, just because this would result in ConcurrentModificationException, and I had to come up with some tricky ways.

ListIterator allows adding and removing elements in the list while traversing it. It also supports moving backwards, making it a great tool for some algorithm, especially those based on Linked List. And ListIterator works just fine for ArrayLists too. Here is a simple example. Lets write the code that if finds value about 100, removes it with previous and next value, but if finds values below 10, duplicates it.

  List numbers = new LinkedList<>(); // Could be ArrayList as easily
  numbers.add(10);
  numbers.add(20);
  numbers.add(30);
  numbers.add(110);
  numbers.add(45);
  numbers.add(3);
  numbers.add(35);

  ListIterator iterator = numbers.listIterator();
  while (iterator.hasNext()) {
    Integer number = iterator.next();
    if (number > 100) {
        iterator.previous(); // move prior 110
        if (iterator.hasPrevious()) {
            iterator.previous(); // move prior 30
            iterator.remove(); // remove 30
            iterator.next(); // move prior 110
        }
        iterator.remove(); // remove 110
        if (iterator.hasNext()) {
            iterator.next(); // remove prior 45
            iterator.remove(); // remove 45
        }
    } else if (number < 10) {
        iterator.add(number);
    }
  }
  
  System.out.println(numbers);

prints..

  [10, 20, 3, 3, 35]

I love the moments, when I find learn something, especially if it is going to help me do my work better.

Part 2: Python is slooow.. Rust is fast.

This is follow up from the previous post.

Now processing time dropped from 20+ seconds to only 2-3 seconds. Ten-folds. And it is not because I coded everything in assembler or moved to GPU or aggressively parallelized. Nope. I just changed to use different algorithm.

Funny enough, even with this algorithm, all the same heavy computation still happens. I just do not need to add data into the HashMap, and do another 2 full loops. And the algorithm is simpler to understand.

And also it is simple to split into parts and run them concurrently. Which gives me another opportunity to parallelize and make it slightly faster.

Lesson learned: using right algorithms and data structures is very important.

I also made the app available on internet, my first and very basic React web application. But yet I have a few features and performance optimizations to add before sharing it in my blog.

Python is slooow.. Rust is fast.

I love using Python to play with data, solutioning or just prototyping. If I need to come up with some tricky algorithm, I often prototype in Python. Python is great for it, especially with Jupyter added. No compilation time, easy scripting, lots of libraries, especially those backend by native code written in C/C++. Using numpy and similar libraries makes things pretty fast comparing to just raw Python.

But then, any time you need to do lots of processing in Python itself, especially looping through amounts of data, you get hit by a performance issues, that make it inefficient to use Python codein production. Just recently, I needed to do some math and processing of 50MM-100MM elements in 2D array, and without numpy, that would take many hours if not days. Numpy helped to get it to 10-20 minutes. Significant reduction, but still too slow for me, if I want to make similar processing for tens of thousands of times.

I tried to re-implement this in Rust. Took me sometime given I'm pretty new in Rust, but it was a huge satisfaction to see that processing time dropped to 3-4 minutes, and after a few basic optimizations to 2-2.5 minutes. That sounds much better. Then I realized, I'm running this in debug mode. Switched to release mode, which added a bunch of own optimizations, and the time droped to 20-25 seconds. Wow!

But, I think, I still can do better. Can I use CUDA?..

Projects directory structure

Background

For a very long time, I was following a directory structure very similar to what many others do: there was a directly projects in my home directory, that contained a list of all projects I was or am working. Some projects weren’t trivial and contained multiple modules, but the structure of projects directly was very simple. Project by itself would hold the source code project inside.

However, the further I go with project, the more issue I have with this structuring approach:

Where should I put a documentation? Should it be next to the source code? Should it be in git?

What about various media that is related to the project but has nothing to do with source code?

What about the data, which also irrelevant to the source code, but important for data analysis?

and many other questions.

Although the questions are different, they actually about the same topic: where do I put files that are not related to the source code in any way?

Workspaces

And today I’ve realized, that the problem is due to my outdates approach to structuring files. It came from days where my responsibility was mostly coding project, and source code was a central for my work. And while the responsibilities changed, the approach stayed the same.

Nowadays, I need more than just source code for most of the projects. Except source code, I need to collect data, notes, documents, articles and posts. And instead of placing documents into $HOME/docs/$PROJECT and data into $HOME/data/$PROJECT and source code into $HOME/projects/$PROJECT, I better put everything into same folder. And I didn’t find a better name for this type of folders than workspace.

And goal is to switch from structure

$HOME/projects
    project1/
        .git/
        module1/
        module2/
        pom.xml
        todo.txt
    project2/

$HOME/workspaces
    workspace1/
        docs/
        data/
        dev/
            todo.txt
        source/
            .git/
            module1/
            module2/
            pom.xml
        charts/
    workspace2
        docs/
        ideas/
        source/

Versioning

And then I faced a new question – should I put a whole workspace under version system control? If docs and data and whatever other resources are important for the project, why shouldn’t those be versioned and kept safe and readily available using version control system, like Git?

“That is actually a good idea!” I though, “But it would be better to have different git repositories for each part of the workspace. As documents and code are changed at different times and for different reasons.”

And that’s the decision I’ve made – it is a good practice to use version control for all parts of the workspace, but do not put a whole workspace into a single repository.

Workspace is just a collection of various resources coupled by the same topic. So it is ok to have a desire to keep workspace organized and reproducible – it should be easy to create a new workspace, that someone else can use if need. This also makes the idea of separate repositories a good one – no everyone cares about data or docs.

The desire to put a whole workspace into single repository can be too strong at different moments, mostly b/c of the reasons described in previous paragraph. And that’s where Git submodules can be helpful.

Summary

What was good before might not be good anymore. We should revisit our approaches when we change or our responsibilities change.

The previous structure that I’ve used for project directories was good for my needs. But my needs changed and thus the structure is not anymore.

I need more than just have a structure for the source code or single project. I need a new way to organize files, that would cover a set of documents, source code and data for one or multiple related projects.

And my solution to it is “workspace”.

Programming is like Writing

Everyone heard that programming is very similar to writing books. “It has many similar traits” programmers think. There is even a thing called “Literate programming” invented by famous Donald Knuth.

But, are there really much similarities between those two activities? Well, except that both require starring at screen and typing a lot of text.

One of my goals for 2017 is to improve at writing texts. I found that there is an increasing need for me to write good texts both at work and for personal needs (like this blog). At the same time, writing is not something that goes for me as easily as coding. To achieve my goal, I’ve started with picking and reading a few books which give advices on how to write texts correctly.

One of such books is “On Writing” by Stephen King. My friend was reading some fiction book written by Stephen King, so I’ve also decided to check what are some interesting books by this author. I must say, that the only book I found the most interesting for me at that moment was “On Writing”. I could have done 2 things in one: finally read some Stephen King book and make a step towards my yearly goal.

I must confess that book was an easy read. It was like a fictional book, but wasn’t. A great example of helpful book that is easy to read. This is due to 2 important parts:

Author gave background stories from his live. First 1/3 of the book is more like a biography of Stephen before he became a famous author. Maybe, it’s because I love to read biographies of famous people, but I’ve finished that part on a single breath.
Every advice author adds also comes with some background story and intuitive explanation. You not only learn good advices, but have a background that helps you remember it once you’ve closed the book.

Among the advices author gave, I have highlighted a few that were, as I felt, most important to me. I’ll share them with you a few moments later. I use some of the advices every time I write a text, like simplification, drafting, and 2nd draft rule to some degree as well.

Somewhere in the middle of the book I had an enlightenment that the writing’s general best practices from Stephen King are very similar to those we follow in programming.

Simplify and remove the clutter

Although some junior engineers love to create overly complicated (they say beautiful, flexible and extendable) designs, over-engineering in architecture and code, experienced software engineers know that simplicity is the only true thing we all must aim.

Software engineers should focus on keeping things simple, both in code and in design. And that’s what author of the book also recommends. Remove unimportant parts that don’t add up to the story, remove overly complicated and unnecessary descriptions, leave as low as needed for the reader to feel the story. Otherwise, 90% of readers would just give up on this boring book.

Not everyone is ready to read 10 pages about the colors of sunset of 20 pages about the architecture of the city. Same in code: not everyone ready to get through 5 page methods, and not everyone is ready to dig deep into the layers and layers of your code.

Avoid passive verbs

The analogy here is very simple:

use verbs for functions/methods names,
put methods into the objects that they related to.

It is not the best sentence “Pizza was delivered to my doors.” Once stop using passive verbs, you get “Courier delivered pizza to my doors.” Way better!

Same in the code: not “pizza.deliveredTo(myDoors)” but “courier.deliverPizza(myDoors)”. That makes modeling object-oriented relationships easier.

Practice Continuously

To become a good author, one should practice continuously, and write something almost every day. For example, Stephen King writes every day. He starts in the morning, and then writes until he reaches his goal.

Same is with coding. You just can’t become a great programmer, if you don’t practice.

Story from my life

Many years ago, I was a kid who wanted to learn programming. The only issue, I didn’t have a computer. But I didn’t get upset that easy, I’ve bought a book about programing on Turbo Pascal 7. That was a book with cover of green and white colors, published by Sankt-Petersburg publisher and covered TP7 from basics to writing code that draws 3D objects and generates audio. I loved that book. It was my first book on programming.

I spent a couple months going through this book: learnt about data types, arrays, pointers, files and many other things. I followed author recommendations and wrote many programs. As I mentioned, I didn’t have a computer yet, so all my programs were on pieces of papers.

But one day, my parents bought a computer. I started porting my programs from paper to Borland Turbo Pascal 7 environment. Only to find that, lo and behold, none of them worked. They even didn’t compile. Boy, was I upset!

I had to spend a few more days to fix some of the programs and got them working. Since then, there were not many periods of my live where I didn’t code for a long time. Because, I quickly realized that practice is the most important part.

For years after that, I still thought the problem was with me: I’m just a “practice” person, not a “theory” man. I learn better and faster from practicing not from reading books. That was a wrong-thinking. It’s not my problem. We all are like that!

If you want to be good at something, practice at it continuously.

Have a place for writing

Authors should have a place where they can hide from everyone and focus on the book. Place, where nobody and nothing would distract you, where things motivates you to do your work the best.

But, to be honest, not only writers need such place. Artists also need a place where they could focus on paining. Designers need such a place too. Software engineers also want to have it in their life; and, ideally, this place shouldn’t be in the open office space.

Sometimes, I imagine my ideal place for programming. It is a large room, with high ceiling, 2 walls are covered completely by bookshelves, another wall has a large glass whiteboard on it. Room has a large desk with large displays on top, and comfortable super expensive chair next to it. There is also a comfy wingback chair for reading with a standing lamp close to it. Right, my ideal “office” is both programming office and small library.

Have a toolbox

Simple here, authors have dictionaries and vocabularies, favorite software to write a book or type of paper and pen they can’t live without.

Programmers have their IDEs, programming languages, version control tools, programs for reading documentation and many many more.

Always do 1 or 2 drafts before final version

Review results of your own work, no matter if it’s code, design or documentation. See if you could improve it, whether it is bug-free and covered with tests. And, once you are fine with the version you have, hand it over for a peer review.

Similar with books: author writes first draft, reviews and modifies it a few times, and once author is ready, book is passed to editor.

Write about something you like

I don’t think any reader would be happy to read what author wrote about a topic author doesn’t like. Author would either make it extremely boring or obviously incorrect. Reader would inevitably feel discomfort reading such work.

Code, that programmer hated to write, would look like… code that written in hatred.

If you want to be successful at what you do, you need to love it. Either it is writing, programming, paining, designing, crafting or counting money.

Read Continuously

Stephen King reads a lot of books. He loves that. And he recommends it to other wanna-be authors as well. Read many different books, see what works good and what doesn’t, learn elements of style from others, and improve your own.

But that advice is exactly same as all programmers receive from their mentors all around the world: read code written by others, study design approaches created by other more experienced software engineers etc.

Summary

Writing good is hard. Coding good is hard.

But there are a few best practices we all can use. There is no magic behind them. They are universal as work for writing, coding, and almost anything else.

These best practices are:

keep simple and remove the clutter,
practice continuously,
learn from more experienced peers,
make your work be more comfortable for you to do,
love what you do.