Hacker Newsnew | past | comments | ask | show | jobs | submit | seism's commentslogin

Every hackathon should use this.

A test that runs for 15 hours on a high powered rig is going to be hard to reproduce or scale. But I think this addresses a widespread concern, which affects all kinds of cloud services. What you ping is not necessarily what you get.

My reading of the article is that the first audience for this test is the vendors themselves. The test is long and comprehensive to give the vendor confidence in its own hosting.

You can run the whole suite once at the start for each vendor, then roll through each part of it over a two or four week cycle, mimicking regular use. That jeeps the evaluation up to date over time.

That sly remark at 22:40 on the telephone ringing :)

Check out Apertus, the publicly funded model from a research team that goes to great lengths to remove icky content.


You made my day. Thank you!


Beautiful. I am a little worried that you’ll soon lose interest and stop supporting the app without a business model. Do you plan to at least get sponsored, e.g. by some kind of foundation, like ones that are behind mental health initiatives?


Excellent. Starting with the domain name :)


For 6 minutes in 90 days. Not bad!


Wonderful thanks for sharing.


The "using with AI" support is really interesting, should help bootstrap some serious vibe coding.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: