More

sereja · on March 16, 2024

The Russian version is more about algorithm design 101 and 102 (similar to cp-algorithms.com). I used to do competitive programming, and I co-founded an educational nonprofit where I also taught it for a few years. I organized my lecture notes and put it on a website, which is now used as a textbook by most CS students in Russian-speaking countries.

I do intend to translate it someday, but it has nothing to do with performance engineering :)

sereja · on March 16, 2024

Author here. I published most of these in one batch two years ago, and this is a relatively short time for compilers and libraries to catch up (when Daniel Lemire publishes something, it also usually takes a few years before it makes its way to the standard libraries, and he is much more involved in the downstreaming process than I am).

In my opinion, the main challenges are:

1. Lack of suitably narrow abstractions. E.g., my "binary search" implementation requires building a small static structure (6-7% of the original array size), and although it is extremely hard to imagine a practical use case where you get a sorted array but can't spend linear time on its preprocessing, it's technically not a drop-in replacement for std::lower_bound.

2. Backwards compatibility. E.g., std::set has a requirement that a node deletion should not invalidate other iterators, which makes it harder to implement it as a B-tree, which has to move a lot of keys around after a node merge/split.

3. Performance regressions. Sometimes a change can make a program 2x faster in most use cases but 1.5x slower in some specific one. If the hardware handling that use case was already at 90% capacity, it will now start to fail after the upgrade, while a 2x improvement on other use cases is just "nice" and doesn't offset it.

4. Vagueness of "better". There are about 10 blog posts on the internet now claiming they have designed the fastest hash table in the world—and every one of them is right because they are using different benchmarks tailored to their specific data set and their specific hardware.

5. Desire to implement things more generically in the middle-end of a compiler instead of the standard library, which is much harder to do. You don't want to hand code the optimal SIMD procedure for calculating the sum of an array for each CPU microarchitecture; you want the compiler to do it automatically for everything that resembles a simple "for" loop. This also leads to a diffusion of responsibility, with compiler people and library maintainers arguing over the appropriate place for an optimization to be implemented.

6. Lack of incentives. Most people who can implement these optimizations work for big tech and would look better in their performance review by contributing to their employer's library (e.g., Abseil for Google, Folly for Meta), or at least to a library with a less Kafkaesque review process like Boost, rather than the standard library.

7. Things still being in the research stage. For example, I recently discovered (but haven't published yet) a new GCD algorithm that seems to yield another ~2x improvement over binary GCD (~4x over std::gcd), and so the guy who recently pushed it in libc++ has in a certain sense wasted work.

I haven't rerun benchmarks myself, but I believe some relatively decoupled parts of the STL have actually since been upgraded in some compilers (std::lower_bound is now branchless, std::gcd now uses binary GCD, std::accumulate and similar reductions now use instruction-level parallelism when they see it) although in all these cases I didn't discover but at most only popularized them.

vlovich123 · on March 16, 2024

For the binary search index speed up, is that documented somewhere? The English version talks about Eytzinger layout but I think you’re referring to something else since that’s not an ancillary array size.

Great job on this book. Lots of great content that can be used as reference.

sereja · on March 16, 2024

https://en.algorithmica.org/hpc/data-structures/s-tree/

https://www.youtube.com/watch?v=1RIPMQQRBWk

sereja · on March 16, 2024

Author here.

For a perfect drop-in replacement of std::lower_bound, the best you can do without breaking anything is to make the search branchless and maybe add some prefetching [1]. Some compilers actually try to implement std::lower_bound this way, but these things can sometimes break from version to version because there are no semantics to reliably make compiler use predication instead of branching.

Your intuitions are very much correct: you can get rid of pointers in a B-tree and make it static and implicit and fast, especially if you use SIMD to search for the lower bound within a node [2], but it would technically not be a replacement to std::lower_bound as we need to build an additional structure (even though it's very hard to imagine a scenario where you obtain a sorted array but can’t afford to spend linear time on preprocessing). C++23 has since added std::flat_set, which seems to be an appropriate place to implement it (in the article I compared against std::lower_bound because neither I nor the vast majority of the readers knew what std::flat_set was).

You can also add pointers back to support insertion and deletion with a moderate penalty to performance [3], but this dynamic B-tree is also technically not a replacement to std::set because of the extra pointer invalidations when a node merges or splits (even though in most cases you don't need pointer stability). You can fix it by, e.g., storing separate pairs of pointers so that each iterator knows where its key is in the tree and vice versa. That would add some overhead (especially in terms of memory) but make it compliant with the standard and still quite a bit faster and lighter than std::set.

The three articles combined are like 50 pages long so for a tl;dr version you might be interested in a talk I did at CppCon [4]. You can also extend the trick for heaps, ropes, segment trees, and other tree-like structures. There is a lot of work to be done here.

[1] https://en.algorithmica.org/hpc/data-structures/binary-searc...

[2] https://en.algorithmica.org/hpc/data-structures/s-tree/

[3] https://en.algorithmica.org/hpc/data-structures/b-tree/

[4] https://www.youtube.com/watch?v=1RIPMQQRBWk

tmyklebu · on March 16, 2024

First, what a great resource you've put together! You're presenting a lot of useful material clearly and concisely, together with the reasoning behind it. I wish I had this when I started doing performance work.

> For a perfect drop-in replacement of std::lower_bound, the best you can do without breaking anything is to make the search branchless and maybe add some prefetching [1]. Some compilers actually try to implement std::lower_bound this way, but these things can sometimes break from version to version because there are no semantics to reliably make compiler use predication instead of branching.

What's imperfect about a radix-4 (or higher) search instead of a binary search for replacing std::lower_bound? For a radix-k search, you'll have k-1 fetches in flight before you decide which subarray to recurse on, and your dep chain will be log_2(k) times shorter.

sereja · on March 16, 2024

When you add prefetching (that is, compare against the middle element and fetch both the middle of the left half and the middle of the right half ahead of time) you are essentially doing radix-4 search, just with fewer actual comparisons.

(You can prefetch k layers ahead for radix-2^k search, but searching in small arrays will become more expensive this way.)

I didn't benchmark it, but I guess on mid-to-large arrays it would actually work somewhat slower than prefetching because it is more instruction-heavy, and, more importantly, prefetches are "cancelable": if the memory bus is too busy with actual requests, they will be skipped, while in explicit radix-k search you would have to wait for all (k-1) elements even if the middle element happened to be already cached and you already know which elements you need next.

That said, it could probably work with small arrays where caching is not a concern and especially if you optimize for latency and not throughput. You can also try to do the comparisons with SIMD (Wojciech Mula tried something similar and got a small boost: http://0x80.pl/articles/simd-search.html).

ribit · on March 16, 2024

Thanks, this is great! I think this conversation is a great illustration for the idea that data structures can be seen as more fundamental than algorithms.

sereja · on May 7, 2023

Actually, during the Medvedev presidency, there were serious efforts to negotiate visa-free travel and reduced tariffs (which is the essence of what everybody wants, not full integration). If these things were granted, the revolution / civil war in Ukraine, primarily motivated by one part of the country mostly trading with Russia and the other with the EU, probably wouldn't have happened.

hgsgm · on May 8, 2023

That doesn't account for how eastern Ukraine has a much larger percent of ethnic Russians than western, nor Russia's invasions andntakeovers of other former Soviet territories.

sereja · on May 7, 2023

The EU is primarily a trade bloc, and it is in natural competition with the Eurasian Economic Union: countries can have tariff-free trade with either one bloc or the other, but not both at the same time. The current scramble over Ukraine is primarily motivated by the EU and the EAEU economically benefiting from the country being in their respective trade zone.

gumby · on May 7, 2023

The EU is in origin and execution a political entity in which trade is one of its tools. The value of adding, say, Romania to the EU was hardly economic, but important for the anti-war mission. The integration of Turkey would be economically quite significant and valuable, but (sadly) lacks impetus on the political side. It is a NATO member.

Likewise, Ukraine’s prewar economy was so small (midway between Hungary and Slovakia) that it would have been economically irrelevant to the EU but, as we have seen, important in the anti war basis.

The economic size of the Eurasian Economic Union is quite small and I doubt it plays into the planning of anyone but its members and some of its immediate neighbors.

I am from a small country with an economy roughly the size of Russia’s (Australia) and it’s clear that the EU has little interest, positive or negative with trade with Australia. Conversely I have literally never seen mention of the Eurasian version in an Australian newspaper.

thriftwy · on May 7, 2023

The elephant in the room is that if you invite everybody in your anti-war club, but one country, that remaining country will get really nervous and you might as well get a war (what we are seeing).

If the situation is that you can't invite everybody in, then the plan is defective from the start.

carlmr · on May 7, 2023

The EU is quite liberal in accepting countries to the anti-war club, but they need to adhere to some standards regarding human rights, trade, and democracy.

If you crash your own party with some people that don't fit in, you will cause the guests you wanted to have to leave.

Any countries not in the EU don't want to be in the EU. And that's fine.

And looking at what the EU does you don't have to be scared if you don't join. It's a defensive union first and foremost.

Switzerland isn't scared about the EU invading despite largely uncontrolled borders.

egao1980 · on May 7, 2023

Latvia and Estonia are allowed to have Nazi parades and repress Russian speaking population i.e. non-citizens, language police and criminal sentences for pro-Russian activists. EU looks some other way all the time. See for example https://www.independent.co.uk/news/world/europe/outrage-as-s... or https://www.haaretz.com/world-news/europe/2019-03-16/ty-arti...

This is not how human rights and democracy should look like.

hgsgm · on May 8, 2023

The US has Confederacy parades, and also has criminal sentences for being an unregisteres agent of a foreign power.

ghostwriter · on May 8, 2023

> The US has Confederacy parades, and also has criminal sentences for being an unregisteres agent of a foreign power.

No wonder the US did everything to save the nazi collaborator responsible for the massacre of ~100k Poles in Eastern Galicia from a post-war tribunal justice.

https://en.wikipedia.org/wiki/Mykola_Lebed#Post-war_activiti...

hgsgm · on May 8, 2023

Would the EU be on good terms with Germany if a Nazi party member became the current dictator?

That's basically the situation in Russia with Putin after the USSR.

gumby · on May 7, 2023

NATO discussed bringing Russia in during the 1990s, and Russia was invited and did join some NATO regular meetings until relatively recently, but wasn't that interested in joining.

I'm not faulting Russia for their position, just pointing out the thing you are claiming should have happened did indeed happen.

But I will point out that Australia is as rich as Russia yet has not started any wars (except against its own aboriginal population, true of Russia as well) so there is no anti-Australia alliance. Two of the three largest war-mongers in Europe (Germany, France, and until recently UK) have been restrained in a web of agreements and alliances, not seeing them as a threat but as a reassurance. If Russia feels encircled, it should start by looking in the mirror.

red-iron-pine · on May 8, 2023

The Russians didn't want to play 2nd fiddle to the US, and had significant domestic arms production -- a non-trivial amount of industry and GDP -- to look after.

STANAG and interoperability standards meant that the Russian gear would have to change, and the Russian weren't willing to give up the control of their national policy, or put their industry at risk.

It was a non-starter for a lot of reasons.

Yoric · on May 7, 2023

Not really?

I mean, I haven't seen any rumor of war between the EU and Switzerland, or Turkey, or the United Kingdom, or really any other of its neighbors other than Russia.

thriftwy · on May 7, 2023

Have you heard of North Cyprus? Also, North Ireland.

Yoric · on May 8, 2023

That seems to be deviating quite a lot from the concept of the existence of EU causing wars, as you suggested in the message I answered. By a few decades for North Cyprus (or 100+ years, depending from your definition of that conflict) a few centuries in the case of North Ireland.

Or did I misunderstand what you wrote?

thriftwy · on May 8, 2023

EU did not cause the war in Ukraine. Soviets did, by appending Donetsk Krivorozhye Republic to Ukraine SSR.

But, the fact is that expansion of EU and EU influence in Ukraine were conductive of that war instead of being a deterrent.

dragonwriter · on May 8, 2023

> EU did not cause the war in Ukraine.

Correct.

> Soviets did, by appending Donetsk Krivorozhye Republic to Ukraine SSR.

No, the Russian Federation did, by launching an unprovoked war of aggression against Ukraine.

thriftwy · on May 8, 2023

Ukraine did not control Crimea, Donetsk and Lugansk for almost a decade. The north cyprus / north ireland situation existed before the war, which does look idiotic to me presently.

dragonwriter · on May 8, 2023

The war started with the 2014 invasion, not the 2022 escalation.

thriftwy · on May 8, 2023

Okay, so? Another war was started in 1974 by Turkey. I rest the case.

sereja · on April 29, 2023

Also available on YouTube: https://www.youtube.com/watch?v=1RIPMQQRBWk

sereja · on April 29, 2023

Author here. There is a newer and significantly expanded version of the article: https://en.algorithmica.org/hpc/data-structures/binary-searc...

snakey · on April 29, 2023

Thank you for your incredible work! I have been reading Algorithms for Modern Hardware as part of my after-work study and I have found it invaluable—I reference it daily.

I can’t wait for Part II :)

ahefner · on April 29, 2023

Thank you for your article. I had difficulty understanding the "Binary search implementation" section of the original article for various reasons, despite knowing what binary search is, being familiar with the 2k/2k+1 encoding (from binary heaps), and generally thinking I understood what you're trying to do. I wondered if was the only one struggling with this.

Among the difficulties:

"restore the index of the resulting element" - it's not clear what this means until after reading the remainder of the section (many times, in my case, due to other difficulties)

"We compare it against 4, 2, and 5..." - An error? You compared 'x' against 4, 2, and 3, or b[1], b[2], and b[5], but never against 5 - this would be to the right of 4 and a binary search would not visit it.

"..and so we just need to find the number of trailing ones in the binary notation and right-shift by exactly that amount." - You need to shift right that amount plus one. In your example for search(4), after the loop and prior to shifting k=11. In binary this is 1011, which has two trailing ones, but must be shifted right by 3 bits (not 2) to yield k=1 such that b[k] == 4.

"eytzinger: 4 2 5 1 6 3 7 8" - it took me surprisingly long to realize this was describing the permutation of values of a[] to indexes of b[] rather than listing the contents of b[] itself with the zero element elided (which coincidentally are the same for the first two elements, despite otherwise not making any sense). Perhaps brevity forces this sort of thing.

The presence of 8 in the example's array appears to be an error. At index 8 it would be the left child of 1, which it can't be. Letting n=7 and a={1,2,3,4,5,6,7} agrees with your example, whereas including 8 and setting n=8 rearranges things and forces the root to be 5 rather than 4.

WinLychee · on April 29, 2023

Thanks for organizing this information into one place! It's super interesting and really relevant to high performance software. A lot of this material is scattered between conference talks, obscure forums, and hacks in closed-source software.

einpoklum · on April 29, 2023

So, in your implementation, you always reduce the search range to n - n/2 to avoid branches. Why can it not be always n/2 - seeing as you've just examined and ruled out your middle element? i.e. seeing as how n/2 >= (n-1)-(n-1)/2 ?

sereja · on April 29, 2023

This is a tricky part. The middle element is still part of the search range if we go "left" (≥). After we compare against it, the search range length becomes either floor(n/2) or ceil(n/2), in the latter case including the middle element (we will never compare against it again, but it still needs to be the first element of the search range).

To avoid additional checks and branching, we can just always make the next search range length ceil(n/2), effectively adding that middle element to the search range in case we go "right" (<).

sereja · on Jan 26, 2023

You don't need to hack Yandex. The entire monorepostory is synced on every intern's laptop, and you can do whatever your want with the files. I always wondered how it hasn't been leaked before to be honest.

wizzwizz4 · on Jan 26, 2023

> The entire monorepostory is synced on every intern's laptop,

If they're still using Subversion, why would that be? Subversion lets you check out just part of a repo. (Then again, interns, I guess.)

COMMENT___ · on Jan 26, 2023

1. Subversion lets you configure granular permissions.

2. Subversion lets you check out portions of your repository.

3. Subversion working copy does not have all regions history. Just a snapshot of the checked out revision.

So at least Subversion would have allowed them to limit the amount of leaked data + no revision history leaked.

Meegoo · on Jan 26, 2023

They don't. SVN is an option, but the vast majority use in-house VCS called Arc.

flas9sd · on Jan 26, 2023

my cards for the source on the leak are more on dissent than outside party infiltrating. If you were brought up on working in open source the last 20-30 years with contributors from all over the world, getting into the friend/foe mindset is much harder.

sereja · on Sept 21, 2022

The mobilization only includes people who formerly served in the military (at least 1-2 years of training). A certain percentage of them are "reservists" who work normal jobs but are summoned for a month-long military training every year to be ready in the case of war, and so they will probably be prioritized (at least according to Putin & Shoigu).

This is not cannon fodder. Russia is running out of willing soldiers, not soldiers in general.

sereja · on Aug 30, 2022

Interestingly, the disdain for democracy in both Russia and China is strongly motivated by "we've already tried giving people freedom and it didn't work".

https://en.wikipedia.org/wiki/Warlord_Era

https://en.wikipedia.org/wiki/1993_Russian_constitutional_cr...

dundarious · on Aug 31, 2022

China also learned a lot from the brutal liberalization in the breakup of USSR. Their own shift to markets was explicitly designed to avoid such a catastrophe. The book "How China Escaped Shock Therapy" on the topic is interesting.

_abox · on Aug 30, 2022

Warlordism has nothing to do with democracy. Each warlord was just their own little dictator fighting for expansion. The people were just cannon fodder.

In fact the Chinese did manage to create a thriving democracy after the warlord era, which is still here today. But it's based in Taipei and the mainland Chinese leaders want to take it down because it undermines their narrative.

And the Russians having the easiest beginnings of democracy for a few months claiming they "tried it and it didn't work"? Never heard them say it but if they did it's just dogma.