I dunno I have 96GB of RAM and I still get the whole "system dies due to resource exhaustion" thing. Yesterday I managed to somehow crash DWM from handle exhaustion. Man, people really waste resources....
Sad to see you being downvoted, but you're exactly right. Well, almost - if you can afford to invest in a good integration test suite, that can catch many errors without requiring a human to regression-test every time.
At the same time, many quality attributes can't really be automatically tested, so automation shouldn't try to replace manual testing, it should be used to augment it.
Not really. I actually tried building an "old" game (read: not updated since 2014 or so) on Linux when I used it. It didn't work because autotools changed, some weird errors with make, and the library APIs have changed too.
In the end I gave up and just used proton on the windows .exe. Unbelievable. :(
I should clarify my original comment about stability only applies to glibc itself. Once we go out of glibc there will be varying degrees of API/ABI stability simply because at that point it’s just different groups of people doing the work
In some cases such libraries are also cross-platform so the same issues would be found on Windows (eg: try to build application which depends on openssl3 with openssl4 and it will not work on either Linux or windows)
For future reference if you ever need to do that again, it would be way easier to spin up a container with the build environment the software expects. Track down the last release date of the software and do podman run —-rm -it ubuntu:$from_that_time and just build the software as usual.
You can typically link the dependencies statically during build time to create system independent binaries. So the binary produced inside the container would work on your host as well.
Yeah exactly. High-level people think the low-level stuff is magic, and us from the other side think the high-level stuff is magic (how can you handle all that complexity?...)
The 4-bit stuff is a hangover from Notch doing this (I'd maybe even say a similar-calibre programmer to Chris Sawyer...). The sound has nothing to do with technical limits, that's a post-facto rationalisation.
The game never played midi samples, it was always playing "real" audio. The style was an artistic choice, many similar retro-looking games were using chiptune and the sorts. It's a deliberate juxtaposition...
The CPP variant doesn't really perform better anymore either.
Fair enough, I mostly meant to point out some of those design decisions predate MS, as much as I love to hate on them. The music was just an interesting bit of trivia I read the other day.
Yeah, 100% :) Ironically, the design constraints are one of the big things which made it work so much! If it was designed in a "traditional" way, it would have been much less ambitious.
This is all true but IMO forest for the trees.... For example the compiler basically doesn't do anything useful with your float math unless you enable fastmath. Period. Very few transformations are done automatically there.
For integers the situation is better but even there, it hugely depends on your compiler and how much it cheats. You can't replace trig with intrinsics in the general case (sets errno for example), inlining is at best an adequate heuristic which completely fails to take account what the hot path is unless you use PGO and keep it up to date.
I've managed to improve a game's worst case performance better by like 50% just by shrinking a method's codesize from 3000 bytes to 1500. Barely even touched the hot path there, keep in mind. Mostly due to icache usage.
The takeaway from this shouldn't be that "computers are fast and compilers are clever, no point optimising" but more that "you can afford not to optimise in many cases, computers are fast."
My point wasn't "don't optimize" it was "don't optimize the wrong thing".
Trying to replace a division with a bit shift is an example of worrying about the wrong thing, especially since that's a simple optimization the compiler can pick up on.
But as you said, it can be very worth it to optimize around things like the icache. Shrinking and aligning a hot loop can ensure your code isn't spending a bunch of time loading instructions. Cache behavior, in general, is probably the most important thing you can optimize. It's also the thing that can often make it hard to know if you actually optimized something. Changing the size of code can change cache behavior, which might give you the mistaken impression that the code change was what made things faster when in reality it was simply an effect of the code shifting.
I originally got into writing compilers because I was convinced I could write a better code generator. I succeeded for about 10 years in doing very well with code generation. But then all the complexities of the evolving C++ (and D!) took up most of my time, and I haven't been able to work much on the optimizer since.
Fortunately, D compilers gdc and ldc take advantage of the gcc and llvm optimizers to stay even with everyone else.
The thing which would really help IMNSHO is to nail down the IR to eliminate weird ambiguities where OK optimisation A is valid according to one understanding, optimisation B is valid under another but alas if we use both sometimes it breaks stuff.
"The future of software is not open. It is not closed. It is liberated, freed from the constraints of licenses written for a world in which reproduction required effort, maintained by a generation of developers who believed that sharing code was its own reward and have been comprehensively proven right about the sharing and wrong about the reward."
This applies to open-source but also very well to proprietary software too ;) Reversing your competitors' software has never been easier!
If they really believed that their process eliminated any licensing conditions, why would they limit themselves to open source projects?
High quality decompilers have existed for a long time, and there's a lot more value in making a cleanroom implementation of Photoshop or Office than of Redis or Linux. Why go after such a small market?
I suspect the answer us that they don't believe it's legal, they just think that they can get away with it because they're less likely to get sued.
(I really suspect that they don't believe that at all, and it's all just a really good satire - after all, they blatantly called the company "EvilCorp" in Latin.)
This is satire but this is where things are heading. The impact on the OSS ecosystem is probably not a net positive overall, but don't forget that this also applies to commercial software as well.
There will be many questions asked, like why buy some SaaS with way too many features when you can just reimplement the parts you need? Why buy some expensive software package when you can point the LLM into the binary with Ghidra or IDA or whatever then spend a few weeks to reverse it?
I was discussing that very point yesterday with a colleague after telling him of recent events. I pointed out that leaning on copyright/copyleft for software has always been a risky move.
The patent application hasn't been published yet so I can't link it, but it's the integration of a bot management system with a queuing system (think, preventing bots from taking space in the line waiting to buy tickets from Ticketmaster when everyone's in the waiting room)
Yeah, the only big problem with approx. sqrt is that it's not consistent across systems, for example Intel and AMD implement RSQRT differently... Fine for graphics, but if you need consistency, that messes things up.
Newer rsqrt approximations (ARM NEON and SVE, and the AVX512F approximations on x86) make the behavior architectural so this is somewhat less of a problem (it still varies between _architectures_, however).
When Intel specced the rsqrt[ps]s and rcp[ps]s instructions ~30 years ago, they didn't fully specify their behavior. They just said their relative error is "smaller than 1.5 * 2⁻¹²," which someone thought was very clever because it gave them leeway to use tables or piecewise linear approximations or digit-by-digit computation or whatever was best suited to future processors. Since these are not IEEE 754 correctly-rounded operations, and there was (by definition) no software that currently used them, this was "fine".
And mostly it has been OK, except for some cases like games or simulations that want to get bitwise identical results across HW, which (if they're lucky) just don't use these operations or (if they're unlucky) use them and have to handle mismatches somehow. Compilers never generate these operations implicitly unless you're compiling with some sort of fast-math flag, so you mostly only get to them by explicitly using an intrinsic, and in theory you know what you're signing up for if you do that.
However, this did make them unusable for some scenarios where you would otherwise like to use them, so a bunch of graphics and scientific computing and math library developers said "please fully specify these operations next time" and now NEON/SVE and AVX512 have fully-specified reciprocal estimates,¹ which solves the problem unless you have to interoperate between x86 and ARM.
Take a look at the "rsqrt_rcp" section of reference [6] in the accuracy report by Gladman et al referenced above. I did that work 10 years ago because some people at CERN had reported getting different results from certain programs depending on whether the exact same executables were run on Intel or AMD cpus. The result of the investigation was that the differing results were due to different implementations of the rsqrt instruction on the different cpus.
Microbenchmarks. A LUT will win many of them but you pessimise the rest of the code. So unless a significant (read: 20+%) portion of your code goes into the LUT, there isn't that much point to bother. For almost any pure calculation without I/O, it's better to do the arithmetic than to do memory access.
Locality within the LUT matters too: if you know you're looking up identical or nearby-enough values to benefit from caching, an LUT can be more of a win. You only pay the cache cost for the portion you actually touch at runtime.
I could imagine some graphics workloads tend compute asin() repeatedly with nearby input values. But I'd guess the locality isn't local enough to matter, only eight double precision floats fit in a cache line.
Cache size and replacement policies can ruin even a well-tuned LUT once your working set grows or other threads spray cache lines so "just use a LUT" quietly turns into "debug the perf cliff" later. If the perf gain disappears under load or with real input sets you realise too late it was just a best-case microbenchmark trick.
reply