These companies fail because vertical integration, and even a monorepo, is needed to make these efforts successful. This is completely at odds with the existing OEM/Tier 1 business model and engineering process grown up around it. Also, neither OEM nor Tier 1 have software cultures up to the challenge.
This is why the Chinese OEMs, Tesla, and Rivian are able to move fast.
I think NUMA management is high level enough that in a microkernel it would be comfortably managed in userspace, unlike things relevant to performance-critical context switches. And seL4 is currently intended only for individual cores anyways.
Under deadline scheduling, every pending job _eventually_ has highest priority as time elapses. (Assuming new jobs can’t arrive with deadlines in the past.) Every job is eventually serviced.
The “pain” experienced in an overload situations is spread among all late jobs. Contrast this with fixed-priority scheduling, where lowest priority jobs will be starved completely, until the overload is resolved.
That's where I was saying in the case of insufficient capacity, the available capacity is divided among the priorities in non-equal but non-zero portions. The SLO/deadline method effectively would be doing something similar as everything would be overdue and the most overdue get highest priority to run. The only difference is that the there's no unequal portioning unless there's additional logic to say x overdue of job A is more important than x overdue of job B, which amounts to setting priorities in the end.
> The “pain” experienced in an overload situations is spread among all late jobs. Contrast this with fixed-priority scheduling, where lowest priority jobs will be starved completely, until the overload is resolved.
Though this is often not a good way to spread out the pain.
It's probably much worse for the 15s job to be a minute late than for the 8h job to be a minute late, but basic deadline scheduling will treat them the same. So you want sharing, but uneven sharing.
In terms of processing raw bytes, inference on raw camera data, bypassing the ISP, is a more relevant application. This can reduce latency in camera inference.
SCHED_IDLE is problematic. There is no priority inheritance with SCHED_OTHER threads. SCHED_OTHER threads block for prolonged periods of time when they contend for a mutex in the kernel with a SCHED_IDLE thread. The SCHED_IDLE thread holds the mutex, but it is unable to complete its critical section because it keeps getting preempted.
That's not strictly true. There are dozens of real-time sleep-based locking protocols that support bounds on worst-case blocking time. Now, even if you have a suitable protocol available to you, your deadlines may be so tight that you can't afford the overhead of context switching. Then yes, you may need a lock-free or wait-free synchronization mechanism.
I'm not sure if we're disagreeing here, but in the context of this article "a lock" is a kernel lock. What that implies will obviously depend on your kernel, but in Linux it means your thread may wait for an unbounded amount of time.
This is why the Chinese OEMs, Tesla, and Rivian are able to move fast.