More

mcmatterson · on Jan 21, 2025

As an HTTP server author, this doesn't surprise me.

We've ceded HTTP specification development to the big guys, and in so doing have made it more or less impossible to implement without resources on their scale. Have you looked at RFC 9000 et al? They're monstrously big, far larger than most independent shops could ever hope to economically pull off. The only way to comprehensively implement something of that scale is to have Google level resources to throw entire teams of engineers and years of focus at it.

I've long said that any protocol worthy of being foundational should be reasonably implementable as a fourth-year term project. It doesn't have to be production ready or ergonomic or even generally useful, but if a group of fourth year CS students can't pull an end-to-end implementation together in a semester, the protocol is just too complex. It's not perfect, but it's as good of a yardstick as I've found.

HTTP/1 passes this test easily; you can make a working version of it in about ninety seconds right in your terminal. HTTP/2 looks intimidating at first glance, but it's so much better specified than HTTP/1 that it's almost easier to get to a reasonable implementation. HTTP/3 on the other hand is...... well, weeks (if not months) of work just to get a QUIC foundation working reasonably well enough that you could hope to start iterating on connections from a 'real' peer, and THEN you have to start on RFC 9114. Not to mention that the way it's structured you end up doing most of that work in the dark, hoping that you line everything up just so so that your first Hello World actually works. It's a way of working that is completely at odds with the hacker ethos that the best foundational protocols have in spades, and ends up looking and acting like what it is: a tool by the big guys, for the big guys. The rest of the internet need not apply.

mcmatterson · on April 4, 2024

I'd just mitigated this exact thing in Bandit last month!

https://github.com/mtrudel/bandit/blob/main/lib/bandit/http2...

TBH, from an implementors perspective this is a super obvious thing to cover off. It had long been on my radar and was something that I'd always figured other implementations had defended against as well.

hinkley · on April 4, 2024

Well you know what happens when we assume. You make a front page headline out of you and me.

eek2121 · on April 4, 2024

As someone who worked for a terrible startup that 'assumed' they would have scalability issues, engineered their entire software stack around solving said issues, and ended up with a worthless codebase that nobody could wrap their head around as a result, I feel this comment.

Later they began a small refactor which easily handled the loads they were "assuming" could not be handled in the way that the refactor handled, and it was wildly successful and the code was much simpler to work on.

To developers: don't over engineer. Most languages/frameworks/libraries can handle scale beyond what you'll ever get in your initial implementation. No, you entire website does NOT need to be asynchronous. It is very possible to have too many background jobs. I know this because I've seen the horror. I've also written an entire jobless/synchronous platform that serves millions of users without issue. If you run into scaling issues, that is a good problem to have. Tackle it as it happens.

Bottom line is focus on secure, quality code above all else. Don't make assumptions.

snailscale · on April 5, 2024

The default way we write applications is actually pretty scalable already.

It always hurts to build something that “won’t scale” because it was framed as a negative.

Realizing that something “scales” if it meets your current needs is pretty important.

Framing scale in terms of how many people can work on it, how fast they can work on it, and how well it meets needs is often a better way of considering the “scale” of what your building.

As you said, when request per second becomes a limiting factor you can adjust your scales but doing it from start rarely makes sense (largely because req / sec already scales pretty well)

hinkley · on April 5, 2024

It’s often a fear or trauma response. Nobody wants to spend 6 months out of the year trying to keep the plates spinning, and they definitely don’t want to spend 60 hours a week when things get ahead of them. Everything takes twice as long as we think it will and we don’t trust that we can keep ahead of user demand. Many of us have experienced it or been adjacent, and for a long time after we overreact to that scenario.

Because we don’t trust that we can keep the wheels on.

Over time the memory fades, and the confidence improves, and we get more comfortable with things being okay instead of unassailable. But it can be a rough road until then.

moqmar · on April 5, 2024

Yeah, and out of that fear, people often use stacks that require vast amounts of knowledge to actually keep things working at all, at any scale. Kubernetes is the best example where I don't trust me to keep the wheels on because it's scalable.

mcmatterson · on Feb 6, 2024

The fact that a critical piece of the evidence was cell phone photos sent between workers coordinating door re-assembly doesn't exactly instill a whole lot of confidence in their permit-to-work process. I didn't like it when it was medical teams doing shift handover via a Google Doc, and I don't like it when it's a matter of flight safety either. Or, as Homer might eruditely say: "guess I forgot to put the bolts back in" [1]

[1] (https://www.youtube.com/watch?v=IiNPLIauEig)

ipython · on Feb 6, 2024

This is a puzzling attitude to me. Every time we technologists see a crappy proprietary solution being used for a problem, the first exclamation is, "why not use <commodity solution X>? That's so dumb, they spent $10k on that tool when they could have spent $100 on X!"

There must be a middle ground here- the paradox is that Google, Apple, etc have this ability to generate user friendly software and hardware at scale. But they aren't considered "battle proven". The expensive proprietary systems that are used instead tend to be hard to use and brittle, so what's the middle ground?

michael1999 · on Feb 6, 2024

The issue here isn't using google chat, the accusation is that this was Spirit and Boeing conspiring to not record these in the proper work order system under the pretence that this work was being done by Spirit as-if-it-were pre-delivery.

Read https://www.airlinepilotforums.com/safety/146074-boeing-inte...

And then this from the doc: "The investigation continues to determine what manufacturing documents were used to authorize the opening and closing of the left MED plug during the rivet rework."

https://s3.documentcloud.org/documents/24410269/report_dca24...

ethbr1 · on Feb 6, 2024

A key point I read somewhere in the accusations is that non-Boeing contractors (e.g. Spirit) by policy cannot have access to CMES.

Consequently, you have a system of record that a major party to work doesn't have access to.

As we've all seen, this leads to "actual coordination" being instead done in a system all involved parties do have access to (SAT).

Which inevitably leads to a desync between CMES (SOR by fiat) and SAT (SOR in practice).

michael1999 · on Feb 7, 2024

But isn't that exactly the underlying fraud? Boeing and Spirit are conspiring to cover up Spirit's deficient delivery by allowing Spirit to work onsite at Renton and do post-delivery re-work and pretend that it is as-if delivered, and _outside_ Boeing's system of record. To reject the delivery and wait for Spirit to fix would ruin their delivery schedule, so they fudge it and muddle along.

https://www.seattletimes.com/business/boeing-aerospace/faa-p....

calebpeterson · on Feb 6, 2024

SOR - system of record SAT - ? CMES - ?

ethbr1 · on Feb 6, 2024

They're described in parent's first link. Boeing systems.

gowings97 · on Feb 6, 2024

The data/photos should be in the ERP/MES.

imoverclocked · on Feb 6, 2024

> The investigation continues to determine what manufacturing documents were used to authorize the opening and closing of the left MED plug during the rivet rework.

I mean, there is already a ton of documentation and process surrounding the construction of an airplane. Adding more process doesn't safety make. Having a safety culture without the fear of retaliation, on the other hand, makes a world of difference.

seo-speedwagon · on Feb 7, 2024

If the door was removed (which the NTSB report and the whistleblower post linked elsewhere around here say must have happened) there should be documentation for, at minimum, the removal and reattachment. If the door was not removed but was opened and closed, there should be documentation for both of those actions instead.

I don't know if this should be considered "adding more process" because it has been standard process for a very long time. All work done on an airplane is authorized, by someone, and after completion is recorded, by someone. Discrepancies and deviations from this standard operating procedure are a big deal.

TillE · on Feb 6, 2024

That line stood out to me, because it implies that no proper "manufacturing document" was used for the work. If that's true, that's very bad; unapproved maintenance procedures have been the cause of multiple crashes.

mcmatterson · on Nov 28, 2023

I have this exact kit hosting my house's RPi server!

A couple of things:

0. The physical quality of this build is out of the world good. PCB, plastics, switches, it's all amazing

1. The software as provided has a pretty old school build process (part of the charm?). I tightened up a bunch of it and dockerized it at https://github.com/mtrudel/pibox/tree/main/pidp11

2. I wish the build would have used something like an MCP23017 for IO instead of claiming so many RPi GPIOs. There's only a few (2-3 IIRC) GPIOs unused by the front panel, and the matrix LED/switch scan setup burns a ton of CPU

sleepytimetea · on Nov 28, 2023

I read his description of the LEDs being powered by brute force looping...even though he uses a buffer IC, the CPU still does the monotonous and continuous looping to keep the LEDs intermittently supplied with current.

I wish he had used a few MAX7219 LED driver ICs - they just need 4 IO pins, can be chained together, and don't need any maintenance CPU cycles unless a change in config is desired.

They also ensure perfect brightness control, and are current controlled LED drivers instead of using 330 Ohm resistors that waste power...all power goes to illuminating the LEDs and none is wasted as heat in resistors.

Maybe I should write him and offer my help in switching for the next spin of the PCBs.

mcmatterson · on April 30, 2023

BMath (CS) 2002 here. This description is spot on the way I think about development, and is a bit of a superpower to be able to do well. I'm not totally sure it's a UW-ism though. I can certainly recall a couple of very formative Tompa courses where he impressed the importance of taking a data-first view of design, and I think we had a stronger bias towards data structures than most other schools whose grads I've worked with. But overall I think that sentiment grew weaker in my upper years, when a more conventional algorithms approach took over.

I will say though, that I've also noticed the contrast before with MIT grads, who tend to have a very strong LISP bent to their styles. It's true that each school has their own unique flavour, and much like accents it may just be that you don't notice your own.

andersentobias · on April 30, 2023

I understand "Lisp-style" as building (stateless) mini-APIs early and everywhere.

Isn't this very data-centric in nature?

mcmatterson · on Dec 31, 2022

Bandit author here. Correct! The byline of Bandit is ‘a web server for Plug applications’ and being able to focus on that narrowed set of requirements is a large part of where the perf boost comes from (less code, easier to reason about, fewer processes, etc).

mcmatterson · on Dec 31, 2022

The PR you linked to adds support for generating new phx apps with the relevant change already incorporated; it’s just a generator change. We’re waiting a bit to incorporate this change to ‘soft launch’ Bandit support.

mcmatterson · on Dec 31, 2022

Bandit author here. We’re fully supported on phx 1.7+; it’s a one line change to your existing app, described on the Bandit readme!

mcmatterson · on Dec 7, 2021

Fascinating talk. The overall approach here strikes me as being extremely influenced by QNX's design. Send/Receive/Reply as a messaging primitive is too often overlooked, and provides incredibly powerful that (as Cliffe mentions) renders an enormous amount of scheduling complexity as moot.

Anyone who's done the UWaterloo trains course will recognize these patterns immediately, and (IIRC) interrupt dispatching is was done in a similar manner there as well.

Finally, the supervision patterns here strike me as being very similar to those within the BEAM, and remind me of the infamous quote from Robert Virding (http://erlang.org/pipermail/erlang-questions/2008-January/03...). Obviously a necessary reimplementation here, but humorous nonetheless.

Great project, great talk.

mcmatterson · on Dec 7, 2021

I couldn't stop thinking about this, so I went back and dug up an ancient (20yo+) copy of my OS implementation for the trains course (mat(t)OS). Sure enough, we did indeed dispatch the upper half of interrupts to processes, albeit via a dedicated blocking syscall rather than send/receive/reply semantics. Bottom half handlers (which were implemented behind task gates and had persistent stacks) just did the standard bottom half stuff: disabling interrupts & managing state so we could properly unroll after the interrupt had been handled.

What a throwback!

Interrupt code is at https://gist.github.com/mtrudel/c29fa60e5b2f3b6fdc46a9e3c65d.... I've been meaning for years to clean this stuff up and resurrect it. Maybe this is the kick in the ass I need to finally do so!

steveklabnik · on Dec 7, 2021

QNX is an explicit influence, yes.

bcantrill · on Dec 7, 2021

And, transitively, Thoth![0]

[0] https://en.wikipedia.org/wiki/Thoth_(operating_system)

mcmatterson · on Dec 7, 2021

https://www.youtube.com/watch?v=031vKBPk5eA

mcmatterson · on Oct 28, 2021

Bandit is an HTTP server, not a client. It performs the same job in the stack as Cowboy does currently (just up to 5x faster as this post's title alludes to).

The Bandit project shares a small amount of code with the Mint project (Andrea & Eric were kind enough to factor out their HTTP/2 header compression library for use in Bandit), but this is mostly due to HTTP being a largely symmetric protocol (ie: there is a lot of functionality used equally by both clients and servers).