The true killer feature in my book is the compression: I get ~93% for my largest...

srcreigh · on Feb 28, 2023

PG arrays also compressed on disk. It may not be all that different than timescale. So ya if you store data in ts|id|value row format, it’s 70GB, but id|values array the values array is all compressed. I’ve seen 40x disk size reduction in practice using this method.

wswope · on Feb 28, 2023

I’m surprised by that 40x - I remember benchmarking TOAST compression and being very underwhelmed by the savings (I believe it took me from ~60GB to ~20GB).

What sorta impact does that schema have on insert performance? I would expect the DB to have to rewrite the entire array on every update, though that cost could be mitigated by chunking the arrays.

Are you ditching timestamps and assuming data is evenly spaced?

srcreigh · on March 1, 2023

In a real-life situation where I tried this the keys were very wide, compound, with text keywords. That's where most of the savings came from since before each row was mostly key and not very much actual associated data. Sorry to mislead you about TOAST compression.

It was daily data too, literally 10 or more orders of magnitude less ingestion than Timescale is built for, so the arrays fit inside 1-2 postgres pages and so writing was absolutely not a problem every for 5-10 years of data.

Timescale may be the right solution when you need, I quote from their docs, "Insert rates of hundreds of thousands of writes per second", or somewhere within a few orders of magnitude of that. Good for the niche, but the niche is uncommon.

Yes ditching timestamps. The full row configuration used amazon keywords as keys, so they looked something like start_date|end_date|company_id|keyword_match_type|keyword_text|total_sales_by_day. They match_type and keyword text were like 40 bytes per row, so that's where the huge savings came from.

The data was assumed to be contiguous, aka there's end_date-start_date+1 entries in the total_sales_by_day array.

If this data were in Timescale, the giant keys would be compressed on disk, but I believe (would need to check) that there would be a lot of noise in memory/caches/processing after decompressing while processing the rows.

Anyway, in conclusion, I do think Timescale has its niche uses, but I've seen a lot of people think they have time series data and need Timescale when they really just have data that is pre-aggregated on a daily or even hourly basis. For these situations Timescale is overkill.

wswope · on March 1, 2023

Yeah, fully agreed with the overall thesis. Postgres is quite capable straight out of the box.

I really like the idea of TOAST compression as an archival format too for big aggregates - I’ll have to check out the performance of a (grouping_id, date, length_24_array_of_hourly_averages) schema next time I get an excuse.