Part 2: Python is slooow.. Rust is fast.

This is follow up from the previous post.

Now processing time dropped from 20+ seconds to only 2-3 seconds. Ten-folds. And it is not because I coded everything in assembler or moved to GPU or aggressively parallelized. Nope. I just changed to use different algorithm.

Funny enough, even with this algorithm, all the same heavy computation still happens. I just do not need to add data into the HashMap, and do another 2 full loops. And the algorithm is simpler to understand.

And also it is simple to split into parts and run them concurrently. Which gives me another opportunity to parallelize and make it slightly faster.

Lesson learned: using right algorithms and data structures is very important.

I also made the app available on internet, my first and very basic React web application. But yet I have a few features and performance optimizations to add before sharing it in my blog.

Python is slooow.. Rust is fast.

I love using Python to play with data, solutioning or just prototyping. If I need to come up with some tricky algorithm, I often prototype in Python. Python is great for it, especially with Jupyter added. No compilation time, easy scripting, lots of libraries, especially those backend by native code written in C/C++. Using numpy and similar libraries makes things pretty fast comparing to just raw Python.

But then, any time you need to do lots of processing in Python itself, especially looping through amounts of data, you get hit by a performance issues, that make it inefficient to use Python codein production. Just recently, I needed to do some math and processing of 50MM-100MM elements in 2D array, and without numpy, that would take many hours if not days. Numpy helped to get it to 10-20 minutes. Significant reduction, but still too slow for me, if I want to make similar processing for tens of thousands of times.

I tried to re-implement this in Rust. Took me sometime given I'm pretty new in Rust, but it was a huge satisfaction to see that processing time dropped to 3-4 minutes, and after a few basic optimizations to 2-2.5 minutes. That sounds much better. Then I realized, I'm running this in debug mode. Switched to release mode, which added a bunch of own optimizations, and the time droped to 20-25 seconds. Wow!

But, I think, I still can do better. Can I use CUDA?..