Could you expand on why you think that solves dependency hell?
It sounds like you're describing tree-shaking, which is already commonplace, particularly in the JS ecosystem. Or LTO dead-code elimination, in compiled languages - neither of which solve dependency hell.
Dependency hell happens when code is subject to an external dependency. Within an application, you can determine which of your functions calls some other version of a function/dependency, can remove code, add code, etc. But you can't do that to other applications; you only control your own application's code and what it depends on/is built against/tested against. So every application is effectively locked into what it's view of the world is, and it just has to hope that never deviates, and that every system it runs on is set up exactly the same. You can play fast and loose with your code by trying to keep the same API/ABI as your underlying code changes, but subtle differences will eventually cause bugs/regressions when expected behaviors change.
The way to solve the problem of dependency hell is to version every function, and only call functions based on their versions, and ship every old version of every function. Then the application itself, or a dynamic linker, must find and execute the correct version of the function it needs to call. In this way, dependencies can change at any time, but every line of code will only ever call functions that they were originally built and tested against, because those versions are still knocking around somewhere in the dependency. You can upgrade a dependency 50 times but your application will still just keep calling the old version of the dependency's function, and so the behavior of your application remains the same while the dependency receives upgraded functions. You can upgrade any application at any time and it will just pull in the latest dependency, and continue to call the versions of that dependency that they need.
The basic concept for this already exists in glibc as you can ship version-specific functions and link/call a version-specific function. But literally no one uses it. To get widespread adoption and make the paradigm implicit/automatic, we would need to modify the programming language, the compiler, the way programs are executed, require lots more storage, probably new paradigms of caching and layering and shipping code. Package deps would become metadata in a build process, package management would become "scan all executables for dependent versions & download them all".
Nix OS is an attempt to work around all of this by pretending that the problem is just with the linker or the file tree or PATH or something. But it's simply impossible to resolve dependency hell entirely without versioned functions, due to rare but intractable intra-package dependency conflicts. And it doesn't address data model versions or network service interface version changes.
Containers are an attempt to work around all of this by literally having a different copy of the world for every containerized application. It works well enough, except again the boundary between containers is subject to application interface versions and data model versions changing. (The data model and code are basically the same thing since you need code to do anything with data) The only way to have dependency-hell-free containerized applications is if you version their interfaces/data models and have them call the correct version for other containerized apps [or network services], so again you're back to versions of functions.
> The way to solve the problem of dependency hell is to version every function, and only call functions based on their versions, and always ship every old version of every function. Then the application itself, or a dynamic linker, must find and execute the correct version of the function it needs to call.
If this is your vision, why would you dynamically link? If you static link your code to the library functions it calls, the runtime environment can't be changed out from under you (well --- not without a lot of work), and you'd presumably only build with the version of the library functions you like, so you'd be set there too. If you want to update a dependency, pull it in and rebuild.
I don't think this is a popular vision, because people want to believe that they can update to the latest OpenSSL and fix bugs without breaking things and sometimes, they can.
You still have a difficult problem when you share data with code you didn't fully control the linking of. If your application code needs to setup an OpenSSL socket, and then pass that socket to a service library, and the service library uses OpenSSL A.B.C-k and you use A.B.C-l, maybe that works, maybe it doesn't; if it doesn't, that's a heck of a problem to debug. Of course, it's even worse if you're not on the same minor version or across major versions.
While I'm picking on OpenSSL, because it's caused me (and others) a lot of grief, this kind of thing comes up with lots of libraries.
> people want to believe that they can update to the latest OpenSSL and fix bugs without breaking things
Yeah. It's a bug in the culture, really, and culture is much harder to change than software.
> problem when you share data with code you didn't fully control the linking of
Yeah, the data model needs to be versioned too. It's impossible to pass data between applications of different versions without the possibility of a bug. The options I'm aware of are A) provide that loose-abstraction-API and hope for the best, or B) provide versioned drivers that transform the data between versions as needed.
A is what we do today. B would be sort of like how you upgrade between patches, where to go from 6.3.1 to 9.0.0, you upgrade from 6.3.1 -> 6.4.0 -> 7.0.0 -> 8.0.0 -> 9.0.0. For every modified version of the data model you'd write a new driver that just deals with the changes. When OpenSSL 6.3.1 writes data to a file, it would store it with v6.3.1 as metadata. When OpenSSL 9.0.0 reads it, it would first pass it through all the drivers up to 9.0.0. When it writes data, it would pass the data in reverse through the data model drivers and be stored as v6.3.1. To upgrade the data model version permanently, the program could snapshot the old data so you could restore it in case of problems. (Much of this is similar to how database migrations work, although with migrations, going backward usually isn't feasible)
Who's going to write those migration drivers though? Not OpenSSL, because they don't think it's valid to link to multiple versions of their library in the same executable. But also, it will be difficult for it to be anybody else, because the underlying incompatible data structures were supposed to be opaque to the library users. Note that I'm talking about objects that only live in program memory, they're never persisted to disk.
This is the underlying problem: it's the software developers' philosophy and practice that are the limitation, not a technical thing. Doesn't matter if it's program memory or disk or an API or ABI, it's all about what version of X works with what version of Y. If we're explicit about it, we can automatically use the right version of X with the right version of Y. But we can't if the developers decide not to adopt this paradigm. Which is where we are today. :(
Wasteful? Honestly, the docker images I use take up more RAM than they take up disk space. If I had to give up containers you would have to use VMs instead and those are significantly more wasteful.
Also, nothing stops you from putting your statically linked go app in a container which can then use e.g. kubernetes or nomad for horizontal scaling.
"The way to solve the problem of dependency hell is to version every function... always ship every old version of every function..."
This seems like hell to maintain and transitively speaking, a mountain of code. For this much work, I'd prefer to invest in obscenely detailed unittests that allow the team to retire old function-versions and everybody stay on HEAD.
That said, I can imagine cases where old-versions might be helpful for some period of time, and you could call them via their function name and commit hash... then a dependency detector only keeps old versions as needed, a warning tool detects ancient versions to consider retiring, etc. You'd need editor/debugger support so developers can see and interact with old versions, and I'm not sure how this works with raw text editors - perhaps the dependency detector copies the (transitive) code into a new subdirectory?
Yeah, I updated my comment to reflect that, you definitely would only need versions of functions on disk where some code depends on it. That's basically how container layers work.
I think CI/CD practices are really important to get rid of the old code. Automated build systems should be constantly downloading new versions of deps and running tests so devs can fix bugs quickly and release new versions that depend on new deps. The quicker that cycle happens, the quicker old code can disappear, new functionality can be implemented, bugs can be fixed, etc. Because everything would still use pinned versions of deps, it would still work exactly as it was tested, so you wouldn't be sacrificing reliability for up-to-date-ness. Speed and automation of that process are critical.
You might find a video of unison language interesting. (I'm not sold on there approach per se, but they do grapple with this problem and its logistics.)
No, this is not the silver bullet for solving dependency hell. Apart from the problem of maintaining all those older version code, the problem of creating different versions of the same function is already non-trivial. For imperative programming languages, you have preconditions on global states that depends on other functions to work correctly.
Say, you have a certain access sequence for locks that has to be obeyed by all functions in this library. Suddenly in some version, you want to add another lock or expose some API that does things with finer granularity. Suddenly the precondition for older code breaks, so there is no guarantee that users can sometimes call the old version of the API and sometimes call a newer version of the API.
I am not saying that the weird example I mentioned is a good way of programming, I just don't think that function versioning is the magic bullet to solving dependency hell. Intra-package dependency conflict is a difficult problem, neither the NixOS approach nor your approach can magically fix that.
Without a more detailed example I'm not sure how you mean (also I'm very tired) but it seems function versions would still solve that problem. A1 and B1 use library C1, and C1 has primitive locking. C2 is released with new locking. A2 is released which uses C2. B1 still runs against C1 still. If B1 tries to call A, it will call A1, both still use C1. If during development, A2 tries to call B, it will call B1, which, during the testing of A2, would fail (if B1 is incompatible with C2), so the developer of A2 would either have to implement a workaround, or upgrade to B2, and then A2 would be calling B2, which would work with C2.
If A and B are developed independently, but some day somebody makes A and B try to talk together, and they were never tested together, they could either 1. make a best effort to try to work and then die, 2. compare mutual dependency trees and try to walk the trees to find versions of themselves (A and B) that work with the same version of a dependency (C), or 3. simply not run at all because they don't know what version of each other to use.
In theory you could use CI/CD to build every version of every app against every version of every dep, store them all in a remote registry, so that whenever some weird combo of apps wanted to use something, they could look up a version of that app built against the right dep.
Yes, developers will need to implement workaround in the case when A1 and B1 cannot use the same version of C1. So how does the function versions approach solve the problem?
For using CI/CD to build every version of every app against every version of every dependencies, even if we just ignore the problem of combinatorial explosion, some software may be abandonware and some may be entirely new. The set of versions of dependencies that work with them may not even overlap, so the 'correct' combination may not even exist.
I think the function versioning approach is just a way to maintain backward compatibility, it is not required for backward compatibility, nor enough for backward compatibility.
Thank you, this is what I believe as well. At work I solved the previous team's dependency hell in this way. Our app is installed on many computers in the organization, and it has dependencies that fortunately are mostly developed in-house, where I'm the one ultimately controlling the API and the ABI. I wanted to enable those dependencies to be developed without me recompiling everything, and vice versa. Previously everybody just recompiled everything, and this slowed development to a crawl since upgrades took forever.
This isn't a server, it's something that people install on their machines and are free to upgrade in whole, or in parts (upgrade just a single dynamic library to fix a bug), or not at all.
My solution was to use a "COM-inspired" versioning. I defined an interface-based ABI. Not a function-level versioning as suggested here, it's more coarse than that, but it was enough. See my comment here [0].
More specifically, it seems like you're solving a subset of dependency issues (mainly version conflicts), while making the other issues worse (e.g. keeping sub-dependencies up-to-date, ballooning disk space usage, less easily cachable requests to package repos).
Well that's what dependency hell is, a subset of issues: https://en.wikipedia.org/wiki/Dependency_hell The other issues are of course important but require their own solutions. Keeping dependencies up to date [or, in the case of the above method, recompiling applications to use new function versions with security fixes] requires CI/CD, feature flags, auto-upgrading, avoiding drift, etc. Disk space and caching issues require storage with built-in delta layers/CoW/compression (which can also be applied to dependencies, e.g download the dependent versions of functions rather than keeping all of them on disk at once).
As we get more advanced, the hell will get deeper for sure.
> The way to solve the problem of dependency hell is to version every function, and only call functions based on their versions, and ship every old version of every function. Then the application itself, or a dynamic linker, must find and execute the correct version of the function it needs to call. In this way, dependencies can change at any time, but every line of code will only ever call functions that they were originally built and tested against, because those versions are still knocking around somewhere in the dependency.
That breaks as soon as you have a diamond situation. A depends on B and C, which each depend on D. Often you need that to be the same version of D. If B and C are allowed to depend on different versions of D, you get worse problems. There are languages that have tried this model and it has its advantages, but IME the cure is worse than the disease.
> The basic concept for this already exists in glibc as you can ship version-specific functions and link/call a version-specific function. But literally no one uses it.
It doesn't actually work properly, mainly because linux package management sucks. No-one cares about backwards compatibility on linux because you're either using a rolling release distro or you're using a traditional distro that will do all your compatibility checking and versioning for you and upgrade the global version of everything when they're good and ready. Windows has much better support for this kind of thing and that's why you don't really hear about "DLL hell" anymore.
This isn't the biggest thing holding back software development, not by a long shot. It's a side issue that we tweaked and can keep tweaking. But being able to version libraries and functions doesn't work, at least not alone; it breaks the idea of having a platform that mostly-independent components can build on.
> That breaks as soon as you have a diamond situation. A depends on B and C, which each depend on D. Often you need that to be the same version of D. If B and C are allowed to depend on different versions of D, you get worse problems.
Yeah :( If B1 and C2 are installed, they might depend on D1 and D2, respectively. But luckily, A can be built against B1 and C1, which can both depend on D1. Or, A can depend on B1 and C2, and B1 will talk to D1, and C2 will talk to D2; when B1 talks to C, it will talk to C1, because that's the version B1 was built/tested against; if C2 talks to B, it will talk to B2. However A was built and tested, the versions of functions that it used are what will be called. (There may need to be some deeper, ugly static mapping of function deps from compile time, but essentially it should be possible for every version of an application to record what dependencies it used and those dependencies, and for function calls to automatically download and call the correct dep, up/down/across the hierarchy)
The data model is the really sticky wicket, but I think migrations provide a path to deal with it, or just a loosely coupled API.
> mainly because linux package management sucks
Yes that too :) But if apps used versioned functions then package management wouldn't suck so bad!
> This isn't the biggest thing holding back software development
It's what keeps us from changing software frequently/easily, which I think is the biggest problem limiting advancement of the discipline. Changes cause bugs and bugs cause a waste of time, and fear of bugs makes us jump through hoops and keep us from making the changes we need to move other things forward. People are trying to work around it by vendoring deps or statically linking but it's a bad hack because the app & data interfaces are still volatile.
For example, all Cloud technology is currently mutable. An S3 bucket as a whole concept cannot be snapshotted, changed, and then reverted automatically. That's a limitation of the service and the API. Making it immutable requires redesign, which would probably break existing stuff. Every single cloud service is like this. It will take 10+ years to redesign everything in the Cloud to be immutable. But imagine if we could just make it immutable tomorrow, and nothing would break, because old software dependent on S3 would keep using the old version! Make any change you want at any time and ship it right now and nothing will break. That's the future that nobody can imagine right now because they haven't seen it. Just like they haven't seen an OS with only primary storage.
> But luckily, A can be built against B1 and C1, which can both depend on D1. Or, A can depend on B1 and C2, and B1 will talk to D1, and C2 will talk to D2; when B1 talks to C, it will talk to C1, because that's the version B1 was built/tested against; if C2 talks to B, it will talk to B2.
That only works if B knew about C, or if there's a version of D that there are releases of both B and C that were built against. Neither of these can be relied on, especially when you bring in E and F and all the other dependencies that a real application will have (e.g. maybe B updated E before updating F, whereas C updated F before updating E).
> The data model is the really sticky wicket, but I think migrations provide a path to deal with it, or just a loosely coupled API.
The data model is the root of the problem, if you can't fix that you can't fix anything.
> For example, all Cloud technology is currently mutable. An S3 bucket as a whole concept cannot be snapshotted, changed, and then reverted automatically. That's a limitation of the service and the API. Making it immutable requires redesign, which would probably break existing stuff. Every single cloud service is like this. It will take 10+ years to redesign everything in the Cloud to be immutable. But imagine if we could just make it immutable tomorrow, and nothing would break, because old software dependent on S3 would keep using the old version! Make any change you want at any time and ship it right now and nothing will break.
But as long as there are still applications that mutate them, your buckets aren't immutable! Effectively you're just introducing a new, incompatible cloud storage system that client applications have to migrate to - but that's something you could already have done.
The problem isn't depending on old versions versus new versions. The problem is getting consensus on the semantics between all parts of your system, and being able to depend on an older version of something just moves the problem around.
> But if apps used versioned functions then package management wouldn't suck so bad!
Wishful thinking. The main issue with Linux package management is the fragmentation between distros and the fact that human volunteers can't keep up with upstream.
It sounds like you're describing tree-shaking, which is already commonplace, particularly in the JS ecosystem. Or LTO dead-code elimination, in compiled languages - neither of which solve dependency hell.