Well, I'm surprised to see this on the front page, let alone as #1. Ask me anyth...

Lerc · on Feb 28, 2022

Since you said anything... This is not strictly related to the article but your expertise seems to be in the right area.

I have a process that executes actions for users, at the moment that process runs as root until it receives a token indicating an accepted user, then it fork()s and the fork changes to the UID of the user before executing the action.

Is there a better way? I hadn't actually heard of vfork() before reading this article. I'm guessing maybe you could do a threaded server model where each thread vfork()s. I'm not really aware what happens when threads and forks combine. Does the v/fork() branch get trimmed down to just that one thread? If so what happens to the other thread stacks? It feels like a can of worms.

cryptonector · on Feb 28, 2022

If the parent is threaded, then yes, vfork() will be better. You could also use posix_spawn().

As to "becoming a user", that's a tough one. There are no standard tools for this on Unix. The most correct way to do it would be to use PAM in the child. See su(1) and sudo(1), and how they do it.

> I'm not really aware what happens when threads and forks combine. Does the v/fork() branch get trimmed down to just that one thread? If so what happens to the other thread stacks? It feels like a can of worms.

Yes, fork() only copies the calling thread. The other threads' stacks also get copied (because, well, you might have pointers into them, who knows), but there will only be one thread in the child process.

vfork() also creates only one thread in the child.

There used to be a forkall() on Solaris that created a child with copies of all the threads in the parent. That system call was a spectacularly bad idea that existed only to help daemonize: the parent would do everything to start the service, then it would forkall(), and on the parent side it would exit() (or maybe _exit()). That is, the idea is that the parent would not finish daemonizing (i.e., exit) until the child (or grandchild) was truly ready. However, there's no way to make forkall() remotely safe, and there's a much better way to achieve the same effect of not completing daemonization until the child (or grandchild) is fully ready.

In fact, the daemonization pattern of not exiting the parent until the child (or grandchild) is ready is very important, especially in the SMF / systemd world. I've implemented the correct pattern many times now, starting in 2005 when project Greenline (SMF) delivered into OS/Net. It's this: instead of calling daemon(), you need a function that calls pipe(), then fork() or vfork(), and if fork(), and on the parent side then calls read() on the read end of the pipe, while on the child side it returns immediately so the child can do the rest of the setup work, then finally it should write one byte into the write side of the pipe to tell the parent it's ready so the parent can exit.

aidenn0 · on March 1, 2022

What about fork(2) for network servers? I've written parallel network servers two ways; open the socket to listen on and call fork() N times for the desired level of parallelism, and just create N processes and use SO_REUSEPORT. I prefer the former. I suppose there is hidden option C of "have a simple process that opens the listening port and then vfork/execs each worker" I find that to be a bit strange because the code will be split into "things that happen before listening on the port" (which includes, e.g. reading configuration files) and "things that happen after listening on the port" (which includes, e.g. reading configuration files)

ahmedalsudani · on Feb 28, 2022

No questions yet as I am yet to read ... but I can already comment and say grade A title.

cryptonector · on Feb 28, 2022

It's a bit opinionated. It's meant to get a reaction, but also to have meaningful and thought-provoking content, and I think it's correct in the main too. Anyways, hope you and others enjoy it.

ahmedalsudani · on Feb 28, 2022

That was a great read. Thank you for writing it up; I learned quite a few things!

Especially appreciated the OS minutiae and opinionated commentary (... and the doc vs reality observation in Linux's vfork).

The piece lives up to the great title :)

disgruntledphd2 · on Feb 28, 2022

What do you mean by zones/jails and why are they better than containers?

cryptonector · on Feb 28, 2022

Zones -> Solaris/Illumos Zones

Jails -> BSD jails

They're software VMs. It's a lot like containers, yes.

The problem with containers is that the construction toolkit for them is subtractive ("start by cloning my environment, then remove / replace various namespaces"), while the construction toolkit for zones/jails is additive ("start with an empty universe, and add namespaces or share them with the parent").

Constructing containers subtractively means that every time there's a new kind of namespace to virtualize, you have to update all container-creating tools or risk a security vulnerability.

Constructing containers additively from an empty universe means that every time there's a new kind of namespace to virtualize, you have to update all container-creating tools or risk not getting sharing that you want (i.e., breakage).

I'm placing a higher value on security. Maybe that's a bad choice. It's not like breaking is a good thing -- it might be just as bad as creating a security vulnerability.

ape4 · on Feb 28, 2022

Yes if we starting again today, we wouldn't do containers as they are now.