barabbababoon's comments

barabbababoon · 2025-05-01T21:05:44 1746133544

Very cool stuff. Is this some kind of lighter weight duckdb-wasm? did I get this right?

barabbababoon · 2025-04-16T16:31:48 1744821108

Plenty of stuff in common with dbt's philosophy. One big thing though, dbt does not run your compute or manage your lake. It orchestrate your code and pushes it down to a runtime (e.g. 90% of the time Snowflake).

This IS a runtime.

You import bauplan, write your functions and run them in straight into the cloud - you don't need anything more. When you want to make a pipeline you chain the functions together, and the system manages the dependencies, the containerization, the runtime, and gives you a git-like abstractions over runs, tables and pipelines.

korijn · 2025-04-16T22:30:41 1744842641

I see, this is a great answer. So you don't need any platform or spark or anything. Just storage and compute?

jtagliabuetooso · 2025-04-16T23:30:06 1744846206

You technically just need storage (files in a bucket you own and control forever).

We bring you the compute as ephemeral functions, vertically integrated with your S3: table management, containerization, read / write optimizations, permissions etc. is all done by the platform, plus obvious (at least to us ;-)) stuff like preventing you to run a DAG that is syntactically incorrect etc.

Since we manage your code (compute) and data (lake state through git for data), we can also provide full auditing with one liners: e.g. "which specific run change this specific table on this data branch? -> bauplan commit ..."

barabbababoon · 2025-04-16T16:26:55 1744820815

- Yes. it is a service and at least the runner will stay like that for the time being.

- We are not quite live yet, but the pricing model is based on compute capacity and it is divided in tiers (e.g. small=50GB for concurrent scans=$1500/month, large can get up to a TB). infinite queries, infinte jobs, infinite users. The idea is to have a very clear pricing with no sudden increases due to volume.

- You do not have to swap your storage - our runner comes to your S3 bucket and your data never ever have to be anywhere else that is not your S3.

- You do not have to swap your orchestrator either. Most of our clients are actually using it with their existing orchestrator. You call the platform's APIs, including run from your Airflow/Prefect/Temporal tasks https://www.prefect.io/blog/prefect-on-the-lakehouse-write-a...

Does it help?