varrakesh's comments

varrakesh · 2026-04-24T10:01:47 1777024907

China hasn't done anything with Taiwan other than saber-rattling. Hong Kong, Xinjiang, etc. are all part of China.

The US is (mostly) protective of its citizens but (depending on administration) varyingly hostile to outsiders (immigrants, starting wars, etc.).

China is suppressive towards its own citizens, but has been largely peaceful with other countries and immigrants/visitors. (Granted, China has way fewer immigrants than the US, so this is not comparable).

varrakesh · on Sept 9, 2019

This isn’t well-specified enough to be a real challenge. They say it can be a server - but not too beefy of a server. What does that even mean? If I put eight NVMe SSDs in a 128-core server, is that too beefy? What about 64, 16 cores?

Can I know (or bound) the number of orders or products in advance to preallocate? Can I design the dataset myself with certain assumptions (e.g. sorted with respect to time)? Can I bound certain aspects of the dataset (e.g. orders must not contain more than 255 products, orders always contain the prices of everything, etc.)?

Latency isn’t apparently a factor - so if I’m processing 1B records, do we care how quickly it gets done? If not, I’ll just stream the data off to a GPU and get the results later?

jjenkov · on Sept 10, 2019

Hi Varrakesh, the reason it is not "well specified" is, that all of your suggestions are interesting to try out and benchmark. Rather than saying "it has to be exactly like this" we have left it more open ended by saying "what would it take to get to 1 billion records per second?".

The answer might be different on different types of hardware, and with different types of data sets, and with different types of data set sculpting. Yes, it is okay to have one benchmark where there are no more than e.g. 255 products, or 255 customers, but then we should probably also benchmark with e.g. up to 65.536 products and 65.536 customers, and up. Part of achieving high performance data streaming is the ability to make your data small.

It would also be okay to use a GPU - although we have not (yet) plans about doing that. Still, it would be very interesting to see what kind of results you could get with that design.

We just have the requirement, that the data streaming engine must not be exclusively designed for this challenge. It must be a reasonably functional general purpose data streaming engine.

By the way, we hope to reach the 1 BRS milestone on a single server, i7-6700 Quad-Core Skylake CPU, with 2 NVME SSDs mounted in RAID 1. 1 GB of memory to run the benchmark app should be enough, but the server will probably have 64 GB by default.