These guard types are great and I've heavily used them in the past. But why codegen them?
E.g. the jwt auth example has some major problems since the verification rules aren't fully specified in the spec. The jwt-token verified rule only checks that the string isn't empty, but it doesn't actually verify that it is correctly parsed, non-expired, and signed by a trusted key. The authenticated-user rule doesn't check that the user-id actually came from the jwt. If you hand-wrote your constructor, you would ensure these things. Similarly, all the other constructors allow passing in whatever values you like instead of checking the connections of the real objects.
By calling the constructor for these types, you are making an assertion about the relationship of the parameter values. If AI is calling the constructor, then it's able to make it's own assertions and derive whatever result it wants. That seems backwards. AI should use the result of tenant-access to deduce that a user is a member of tenant, but if they can directly call `(tenant-access user-id tenant-id true)`, then they can "prove" tenant-access for anything. In the past, we have named the constructors for these types `TenantAccess.newUnverified`, and then heavily scrutinized all callers (typically just jwt-parsers and specific database lookups). You can then use `TenantAccess.{userId,tenantId}` without scrutiny elsewhere.
I think you're right on the substance. A production-grade spec (or guard type) needs stronger assertions than the toy example in the post — predicates for signature verification, claim-binding, and expiry-from-token, at minimum. The example is only illustrating the proof-chain shape, and isn't a good example of a full-fledged JWT validator.
Your underlying point, that calling the constructor is the assertion so AI passing `true` can "prove" whatever — is true of any smart-constructor pattern, including your own `newUnverified` approach. The trust still has to live somewhere. In your pattern it lives in the small set of audited callers; in shengen's case it lives in the same place — the wrappers (like `CheckTenantAccess`) that actually establish the premise via a DB query or a JWT parse. Structurally the two approaches are doing the same thing. To harden it, you'd keep the raw constructors package-private and export only the wrappers, so the handler code the LLM is writing physically cannot call NewTenantAccess(..., true) — only CheckTenantAccess.
On the deeper question about "why codegen": the short answer is "obviously, you don't have to." But if we assume that we're using AI to write at least some of the code, now you have to either (1) describe the constructor in very precise English and have the LLM generate it, (2) inject yourself into the loop closely with the LLM, or (3) not use an LLM for this part. My proposition is that writing the core invariants as proofs that can be deterministically checked for internal consistency and written declaratively is (1) more efficient, (2) less lossy, and (3) easier for the developer to read and reason about than writing the constructor from scratch. This puts a lot of trust in the codegen, as you point out; but as a practical matter, having a formal representation of what you want plus an English prompt is stronger context to the LLM anyway.
The other reason I started down this path, which I didn't get into in the post because I haven't figured out yet if it's truly practical, comes from a property specific to Shen: it has a very small kernel that has been ported into a lot of runtimes — Lisp, C, JS, Go, Python, Erlang, Scheme, Java, etc (https://shen-language.github.io/#downloads). That opens up the possibility of writing specs whose predicates run as runtime gates from the same Shen expression, no translation step — and even mixing compile-time and runtime assertions into the same spec. I find this very interesting conceptually, but I'm not sure yet whether it's practically useful for anything.
Specifically it decrements the TTL of routed packets, so hotspot traffic will tend to have a TTL of 63 instead of 64. You could theoretically disable this at the risk of creating infinite routing loops, although android probably makes it inaccessible if the kernel has a setting for it at all, so you might have to rewrite packets in user space.
It has been a long time since I've done this, but:
If your Android is rooted, it's pretty easy to get tethering working. There's magisk modules that can fix the TTL problem and/or disable the hidden carrier-installed software that Android will ask for permission before enabling tethering.
Yes. The original Dapper used 64 bit trace ids and collisions were rarely a problem.
If you don't drop any spans from a trace, you can completely disambiguate a collision since the trace will have two distinct root spans. If you are missing spans, you might have a break in the parent-child links.
Even with infinite retention, your analysis will bucket by time somehow, so a collision might have no effect if the collision doesn't happen at a proximate time. If you are manually looking at traces, it will be very obvious there is a collision unless they happen at the same time.
Also, birthday paradox only expresses probability that there is a collision somewhere, but if you are filtering or looking at single spans, then the probabiliy that you actually see a collision is greatly reduced.
I think for basically all systems, an additional 64-bits has insignificant additional cost, so you may as well prevent collisions, but I think it could be a reasonable tradeoff if it mattered.
I thought it was clear and am also surprised by the reaction (en-US speaker). "Is/are expected" is generally used as a passive-voiced form of "we/they predict" (obviously without having to specify a specific pronoun). E.g. "It's expected to rain tomorrow" means a weather forecast says it will rain tomorrow and usually not that people want it to rain tomorrow.
I wonder if this phrase has different connotations among other English readers? A lot of these comments are fairly early for US timezones.
I don't think US vs. non-US has anything to do with it. It's an ambiguous phrase, whose meaning is usually resolved by context.
"It's expected to rain tomorrow" is a prediction, whereas "students are expected to behave themselves" is an expectation (with consequences, presumably).
In the former case we clearly aren't saying we want it to rain, just that we believe it's likely, whereas in the latter example we are clearly expressing that we do want students to behave.
It's ambiguous because "expect" has two different meanings:
A specific process can use mlockall to keep all its mapped memory resident and prevent swapping or page cache eviction. That's what earlyoom does so that it can stay responsive when memory gets low. It's unfortunately underutilized in other infrastructure. It's also all-or-nothing: everything stays resident until it's munlocked regardless of how frequently it's used.
I had hoped that something like Linux Pressure Stall Information (PSI) would become more useful for low-memory scenarios. E.g. you could put critical processes in a cgroup that could rate-limit swap-outs/evictions so that it was always responsive. There are some cgroup knobs that affect reclamation, but you need a really good guess about how much memory something needs, which makes it hard to use.
This has the wrong explanation of the proposed rseq (Restartable Sequences) solution.
> a Linux kernel facility that lets userspace code detect whether it was preempted or migrated during a critical section and restart it if so. PostgreSQL's spinlock paths would use rseq to detect preemption and retry, avoiding the scenario where a preempted lock holder stalls all waiting backends.
The real proposal is about time-slice extension, which is a feature that uses the abi for rseq but otherwise has nothing to do with retrying critical sections. While a process holds a s_lock, it would set a request bit. If the kernel tries to preempt that thread while the request bit is set, it instead extends the time slice once and returns control back to the thread. It's further explained here: https://docs.kernel.org/userspace-api/rseq.html
If a waymo costs $200k (car+sensors+install labor) and drives 200k miles, then amortizing up-front costs alone are about $1/mile. We don't really know what the TCO of a waymo is, and it's possible it could go down with economies of scale. Rideshare drivers can get paid $1-2/mile although it varies a lot.
That's just the current cost. The long term cost structure should be based on cars that come out of the factory with all the right stuff pre-installed. There's a BOM for some extra components; many of which you might already find in some cars. Otherwise it's just another EV. So, long term the extra cost per mile relative to a driver driving the same car should be cents rather than dollars per mile. And of course if there is no driver, some components like manual controls, dashboards, mirrors, etc. actually become redundant as well. So the total BOM might actually be lower long term.
The driver cost is of course the big saving. And they need breaks as well and don't drive 24x7. A robo taxi only has down time when there are no rides, or for charging and maintenance/cleaning.
Mostly what Waymo is doing currently with customized vehicles is not actually super scalable. But it helps at their relatively small current scale. You wouldn't design a custom vehicle + factory for their current growth rate. That becomes more interesting when they start scaling beyond tens of thousands of new vehicles per year. They are probably in the lower thousands currently.
I think they raised close to 20-30B so far. They say they are doing 500K rides per week. At 15$ per ride that adds up to ~390M/year. That's revenue, not profit. But if they could 100x that by rolling out to more and more cities and larger and larger areas, it's going to add up to annual revenues that add up to more than what they raised. That's not going to happen overnight, obviously. But they seem on a path where they are scaling, optimizing, reducing their cost, and growing.
The risks here are mainly that they won't have the market to themselves. Others are doing robo taxi's too and if any of them starts scaling faster and cheaper, Waymo could hit some growth issues. Also, with multiple companies competing, prices per ride would eventually go down. The next five years are going to be interesting.
> A robo taxi only has down time when there are no rides, or for charging and maintenance/cleaning.
It's wierd to see this fantasy of machines on HN, of all places - that they have no downtime, no additional costs - it's only a savings from employing people), and (not said here) they don't make mistakes.
Lots of machines have far more downtime and cost than people. Many have more maintenance hours than operating hours.
Of course they have downtime. They need to charge. Waymo partners with car rental companies for depot operations where they inspect for damage and clean the cars. Damaged cars need to get repaired.
But, a Waymo car has a huge impact compared to gig drivers. Even ones that do it 12 hours a day. The Waymo fleet is maximally available for both rush-hour busy times plus closing time at bars and nightclubs.
I don't know if they've released updated fleet statistics, but in the past I've been shocked at how small it is relative to the visibility and passenger miles. This indicates that downtime isn't a limiting factor.
Gig drivers optimize, or try to, their individual revenue and other preferences the shape their responsiveness to customer demand. In other words it's two sided market among individuals. Waymo isn't. Waymo optimizes across their whole fleet for revenue, presumably, and customer satisfaction.
> a Waymo car has a huge impact compared to gig drivers. Even ones that do it 12 hours a day. The Waymo fleet is maximally available for both rush-hour busy times plus closing time at bars and nightclubs.
What makes you say that there is a difference, the difference is in Waymo's favor, or that it's so large?
Rideshare drivers also know about rush hour and closing time, and rideshare companies adjust availability for those times. There's no reason to think Waymo will handle it's inventory of rides better than Uber.
> Gig drivers optimize, or try to, their individual revenue and other preferences the shape their responsiveness to customer demand. In other words it's two sided market among individuals. Waymo isn't. Waymo optimizes across their whole fleet for revenue, presumably, and customer satisfaction.
That is really a bizarre notion. Why would Waymo's interests be any more aligned with customers than rideshare drivers? Waymo cars are robots operated by a corporation in another state or country. Rideshare drivers are human beings in the same car, who live in the same city or town.
Of course they operate in the same market, and the influences of that market on supply are the same. But the response of gig drivers and autonomous vehicles to these influences and incentives is obviously not the same. It can't be the same.
Gig drivers have range of needs and preferences. Waymo, same algorithm, same car, same user experience all the time. Forums like HN also tends to have a blind spot about consumer preferences of women. These and other reasons are why Waymo has an outsize impact on a market compared to fleet size.
Why would the people who control Waymo be better at these things than the people who control Uber?
Again, the premise seems to be the fantasy that computers are objective, autonomous gods that don't have biases or errors, none of which is true. And again, it's bizarre to encounter that on HN of all places.
It's a very dangerous fantasy that sacrifices individual, rational thought and places all power in the hands of those who control the computers, like a cult.
I'm a big believer in this workflow and generally agree with all the guidelines about when to do and not do this.
Code review has value, but I don't think we are always honest about the costs. At most places I've worked, there has been an informal understanding that code should be reviewed within 24 hours or so, but there is a world of difference between getting a review within 2 hours and 23 hours.
If you have to go back and forth a second time, it's so much more likely that the approval comes much later due to straddling end-of-day in someone's timezone.
Tangentially, if you are designing policies for code review at your org, try to think both about minimum requirements (e.g. PRs should get a first look within 24 hours) and typical requirements (e.g. most PRs should get reviewed within 2-3 hours). What typically happens is what determines overall velocity for your org, so it's much more important than whatever strict SLA policy you have. These are also fixable problems if you have targeted conversations with people. E.g. "I noticed X, Y, Z times, you had unreviewed PRs at the end of the day. Can you set aside some time to review code before EOD? Or try installing PR reminder?"
For Americans, 120 kph is ~75mph and 140kph is ~85mph. I think there is a single road in the US with an 85 speed limit, and only some states use 75-80.
I’ve definitely driven 85mph on the I10 in texas. The roads are flat and straight, so it almost makes sense - but you have basically zero margin for error at those speeds. If I recall, the road fatality numbers for texas aren’t exactly good.
I have noticed LLMs have a propensity to create full single page web applications instead of simpler programs that just print results to the terminal.
I've also struggled with getting LLMs to keep spec.md files succinct. They seem incapable of simplifing documents while doing another task (e.g. "update this doc with xyz and simply the surrounding content") and really need to be specifically tasked at simplifying/summarizing. If you want something human readable, you probably just need to write it yourself. Editing LLM output is so painful, and it also helps to keep yourself in the loop if you actually write and understand something.
E.g. the jwt auth example has some major problems since the verification rules aren't fully specified in the spec. The jwt-token verified rule only checks that the string isn't empty, but it doesn't actually verify that it is correctly parsed, non-expired, and signed by a trusted key. The authenticated-user rule doesn't check that the user-id actually came from the jwt. If you hand-wrote your constructor, you would ensure these things. Similarly, all the other constructors allow passing in whatever values you like instead of checking the connections of the real objects.
By calling the constructor for these types, you are making an assertion about the relationship of the parameter values. If AI is calling the constructor, then it's able to make it's own assertions and derive whatever result it wants. That seems backwards. AI should use the result of tenant-access to deduce that a user is a member of tenant, but if they can directly call `(tenant-access user-id tenant-id true)`, then they can "prove" tenant-access for anything. In the past, we have named the constructors for these types `TenantAccess.newUnverified`, and then heavily scrutinized all callers (typically just jwt-parsers and specific database lookups). You can then use `TenantAccess.{userId,tenantId}` without scrutiny elsewhere.
reply