How do you learn that solution curve and their tradeoffs? Swap jobs and hope you see them all in your lifetime? Read articles and hope you pick everything up correctly?
OP is giving a problem solving framework, it's not tied to specific solutions or tradeoffs. It's more of a method for thinking through a solution.
In more detail, the steps go like this.
Find as many solutions to a given problem as you can. Bad engineers run with the first solution that comes to mind, letting confirmation bias drive them.
Evaluate each solution for its costs and benefits. Imagine two steps in the future when the solution is implemented. What pains are there?
Search for creative new solutions that create win-win scenarios. That's riding the solution curve.
Given all viable scenarios, compare the costs and benefits against the quality measures for your specific context. Some projects value speed over precision. Some projects value performance over extensibility. Some solutions are easier to change later than others. This is choosing a specific point on the solution curve that best fits your context.
You can apply that method of problem solving to any problem, large or small. You don't need a ton of experience to practice it.
This comment made me realize what my mentor did years ago. He would have the entire team list all possible solutions to a problem, including the obviously bad ones, then have us whittle down the list based on pros and cons of each until we reached consensus on what to do. It was a teaching exercise and I didn’t realize it.
I’ve repeated it that exercise with junior engineers to great effect. Some catch on over time and start intuitively considering the trade offs of a few reasonable solutions to a problem; some don’t.
I never reflected on what he was doing there; thanks.
To add to that, some of the smartest "good" ideas come from "bad" ideas that people rejected out of hand, sometimes from unusual sources. It pays to do brainstorming thoroughly.
The best software engineers know when to solve a problem without using any code at all, like the classic "just do it manually" for a complicated task that is worth more than $x per task.
Definitely this. Nothing like learning from watching your solution hitting roadblocks (performance, extensibility etc) and reassessing your earlier approaches and again coming with a solution to address the newer challenges
You can also dig into the problems that existed when you got to the project, and try to work out how they came to be. Project forensics is a skill set unto itself. And listen to and help people on other projects at your company. You can see how their story arc goes and where it surprises you.
An intelligent man learns from his own mistakes. A wise man learns from the mistakes of others.
You can never truly be perfect, you can only approach it. Understanding your limitations is just as important as understanding your capabilities. Depth vs breadth is hard and you only have so much life to go so deep into so many domains.
- Identify and document possible solutions. Think hard about why each solution is good and each decision is bad.
- Within each solution, attempt to find common patterns in that problemspace (when working with Sharded databases, this approach brings these results, etc)
- Try to match your personal and organization requirements against the various patterns you find and the pros/cons you defined previously
During implementation
- Keep a running list of things that seem weird, or things that seem great, or things that turned out to be untrue
- Don't stop implementing, but for each thing you found that wasn't great in step 1, try to find a better way to approach that thing (while moving forward)
Post implementation
- Compile all the notes you made in pre/during into a manageable list of goods/bads.
- Use this to drive A.) iterations of your solution, and B.) future solutions
If you do this enough times, you'll have a pretty decent list of your experiences in a space, and you'll start to notice patterns as you try different things and become exposed to problems, their solutions and their tradeoffs. Eventually you won't even have to look at your previous notes very often, as you'll have built up a pretty decent amount of experience in various problemspaces such that you can predict what goes well/doesn't go well.
Also worth noting, this type of behavior doesn't really have to be applied to software, but will also gain you big points with future employers when interviewing. You'd be surprised to find how many candidates never stopped to reflect on the work they'd done, why it was sub-optimal, or how it could be corrected moving forward. Response like "We did it ____ way because that's how we always did it." Which, from a progress standpoint, is basically a none-answer.
Note: this is obviously just my opinion, and I'm no expert in "Becoming a Master of Stuff", but these are the methods that I have used and seen my peers use (in one form or another) over quite a long time. Sometimes not as obvious (doing these things in your head vs. writing them down), but the shape is always pretty similar. Ultimately I think being reflective is one of the best skills a person can possess (with respect to employment and I guess also relationships). Acting without thinking is reckless, I think.
I once was trying to design a validation framework. I thought it should work this way. The other lead thought it should work that way. So I proposed we write some sample code for these APIs and show it to the team. TDD with pseudocode before any of us knew what TDD was.
Only, I didn’t like having two to chose from. Something told me three would be better. So we paired, I wrote mine, then hers, then just made another one up on the spot. Once we agreed they were complete and implementable, I shopped them to the team.
About 2/3rds preferred the made up one. Including me. So that’s what I implemented.
Why does btrfs have those issues compared to other filesystems? I'd love to use btrfs too. Note that I deeply respect people who can write such complicated code, which I couldn't.
I sincerely hope you are joking but I realize this mindset is quite common these days, so let me reiterate: Rust does not magically solve problems for you. Btrfs has a lot of issues and some of them may well be of the not possible in Rust sort, but I'm quite sure most of them are not and there is nothing Rust can fix about them. Rewriting a 13 years old and very complex thing in another language is a massive effort and a big opportunity for introducing some more, possibly worse bugs along the way.
It is like a repeating story, every time someone sells the magically solution to all problems and apply a name on it people blindly believe it.
This are tools. A screwdriver is not the right tool, when you need a power drill. And vice versa. Yes, sometimes you can use both and stick with your accustomed tool.
Anyway. C and C++ and the tool chain are constantly improving like others.The moern memory sanitizers in GCC and LLVM are awesome.
Any rewrite must be carefully evaluated, true. And it may not be advisable. I'm trying to understand the issues.
Are they not solvable because the kernel does not give enough guarantees as it gives to userspace, because the c-interfaces of the kernel have to be wrapped in unsafe or because of other reasons (architecture, data model, kernel constraints, ...)?
I think the issue is that you haven’t understood the issue, and just proposed using Rust. Don’t get me wrong, I’m a huge fan of Rust, but seeing people blindly suggest it as a panacea gets frustrating for people. The reality is that when people talk about Rust being fact, it’s because a decently optimised Rust program might be comparable to an equivalent C program. C is generally not the problem when it comes to performance issues in the kernel, which means rewriting that thing in Rust wouldn’t magically make it faster.
The issue will probably be some sort of pathological case in an algorithm being used, or perhaps from a poorly chosen algorithm. The point being, it’s not clear yet, and to solve that requires understanding the problem, not effecting a needless rewrite in a new language.
> C is generally not the problem when it comes to performance issues in the kernel, which means rewriting that thing in Rust wouldn’t magically make it faster.
Agreed.
> The point being, it’s not clear yet, and to solve that requires understanding the problem, not effecting a needless rewrite in a new language.
Which is why the first question is why btrfs has issues others do not have. Some mention its CoW architecture, system design, too many features and not limiting storage to 90% and so increasing complexity. Others mention usual kernel issues.
Others mention that they had no issues to begin with and that its mixed reputation is unwarranted. I'm not clear who is right, but they are data points.
Thank you for your input in cautioning of rewrites to avoid needless work, I appreciate it.
> Why does btrfs have those issues compared to other filesystems?
As someone that has built infrastructure on BtrFS for years, the scary stories are mostly just hot air and the stability of other filesystems is really not significantly better.
Bugs like this happen, this is why Linus releases many release-candiates every kernel, this one got through as 10 was a rather massive kernel and there were several regressions. Including one that caused a new release just hours after the supposedly final one. Distros wait a bit longer before shipping an updated kernel and none of these hit actual users.
As far as I know there is no reason to abstain from using Btrfs. When Fedora talked about not using it, they had as reason that they had no in-house expertise.
> As someone that has built infrastructure on BtrFS for years, the scary stories are mostly just hot air and the stability of other filesystems is really not significantly better.
I've lost 2 root filesystems to btrfs, on a laptop with only 1 drive (read: not even using RAID). Have you considered that you're just lucky?
In my experience, the stories are real. Our entire company was offline for a day when our central storage server quit accepting writes despite having over 50% free space. That's when I learned the hard way about the data/metadata split (something I was aware of but wasn't exactly top of mind) and BTRFS balance. You can certainly say it was my fault for not reading ALL the documentation before using BTRFS, and I'd find it hard to disagree, but any other filesystem wouldn't have had this problem.
I can't speak to if there are other foot-guns waiting around or how common problems like this are because we migrated back to FreeBSD and ZFS shortly after that experience. I do know they have since updated BTRFS to make that scenario less likely (but still not impossible).
Btrfs is the C++ of file systems: it’s powerful and works for a great many people. But the tooling is intimidating to new comers and unless you know exactly what you’re doing, it’s a ticking time bomb due to the plethora of foot guns and hidden traps.
This is why some people claim to have success with it while a great many other people, rightfully, claim it’s not yet ready for prime time.
ZFS, on the other hand, has not only protected me against failing hardware but it also has sane defaults and easy to use tooling thus protecting me against my own stupidity.
We expect filesystems to work robustly. We do not expect them to fail after an arbitrary time interval merely by being used. Even terrible filesystems like FAT don't do that. They might get fragmented and slow, but they don't just stop. I find it incredible that this is often minimised by people; it's a complete show-stopper irrespective of the other problems Btrfs has.
I made exactly the same migration you did. ZFS has been solid, and it does exactly what it says on the tin.
My guess is this (compared to ZFS): With a CoW-file system like btrfs you have to problem that you need new file system space to delete something. This is problematic if the file system is full and you want to be able to write to it again by deleting something. ZFS solved this by just saying one can only fill a file system to 90% usage. At some point they even decreased this (during upgrade) and I had the issue that I couldn't write to the ZFS file system because this was lowered.
Btrfs tries to fully use the space and gets all the associated complexities. Additionally, because data/metadata ratio is not fixed one can get into situations where the file system is full and there is no more metadata space.
For every action it needs to carefully check if there is enough space to actually perform the action even if the file system is nearly full. Improvements in this area caused this regression.
And no Rust wouldn't help. How often do you get a kernel Oops, dead lock or memory leak? Rust would help with those.
Interesting decision they made, I wonder whether they would decide differently now after seeing all the complexity.
So Rust does not decrease the complexity, but only removes certain kinds of errors which the compiler can detect.
Neither logic errors nor speed regressions.
> I wonder whether they would decide differently now after seeing all the complexity.
Decide what? Cow is a fundamental part of how btrfs works, and Rust didn't exist for the majority of Linux's life. (Although if you're into that, look at Redox)
Please understand that the rust memory and thread security mainly applies to "normal" applications.
In kernel, you can run a privileged cache or mmu instruction or a write to some magical memory position and all the sudden the "normal" rules don't apply anymore.
(But I think there are other parts of rust that are nice to have in kernel or any complex software).
I thought the Rust compiler solves issues that you wouldn't immediately see with pure C, which is why I had the idea.
I didn't know this requires certain features which are not available inside the kernel. I only knew all existing interfaces may be unsafe because they are in C though. Rust does not seem as useful then.
It's not so much that rust the language requires them as much as it is that other non-rust parts can quite easily stomp all over the guarantees of rust without there ever being a way of knowing it happened. So rust alone won't solve many problems, but it would let you say "this code can't do these things itself", which is still a useful distinction. It also doesn't allow you to deal with misbehaving hardware that changes memory underneath you in ways it said won't happen. Hardware sucks.
Do other parts stomp often? :) But true that can happen. Especially on non-ECC systems.
I didn't think about the hardware issues, hmm. I can't see how to do that, when the compiler guarantees get invalidated by hardware. Checks are also needed like in C? (assuming there are checks which do not get compiled out..)
When the hardware can't make the guarantees, then software really can't do anything about it. There's really not any checks you can do, but modern hardware is getting the capabilities to try to prevent those kinds of issues with the IOMMU units, but operating system support is still hit or miss for most hardware and it won't prevent everything (just devices stomping on each-other with DMA). That's basically how the thunderbolt attacks have worked and the solutions to them.
Anecdotally, I've encountered issues with both ZFS and BTRFS at about the same rate. A public example of an apparent ZFS performance issue is https://github.com/openzfs/zfs/issues/9375 Both are much more quirky than simpler filesystems like ext4. Data integrity verification from checksumming makes it worth it though.
The ZFS vs. BTRFS choice, I think, depends more on whether you need specific features like offline deduplication or L2ARC / SLOG cache devices. And which one you're more familiar with (can troubleshoot better).
ZFS is also extremely well designed at a system level... which is not the impression I get from BtrFS. (Disclaimer: I have not bothered looking at BtrFS for years because ZFS has handled everything I've thrown at it very admirably. Including complicated setups with RAID-Z, etc.)
Granted, there are some limitations to the design, but it doesn't affect my use cases, so whatever...
> Why does btrfs have those issues compared to other filesystems?
Why? There are several reasons, but if you go right back to the beginning, there's a single reason which caused all the other problems: they started coding before they had finished the design.
All of the other problems are fallout from that. Changing the design and the implementation to fix bugs after the initial implementation was done. Introducing more bugs in the process. And leaving unresolved design flaws after freezing the on-disc format.
When you look at ZFS as a comparison, the design was done and validated before they started implementing it. Not unsurprisingly, it worked as designed once the implementation was done. Up-front design work is necessary for engineering complex systems, it really goes without saying.
This isn't even unique to Btrfs, but filesystems are one thing you can't hack around with without coming to grief; you have to get it right first time when their sole purpose is to store and retrieve data reliably. Many open source projects are ridden with problems because their developers were more interested in bashing out code than stopping and thinking beforehand. Same with a lot of closed source projects as well for that matter.
In the case of Btrfs, which was aiming from the start to be a "better ZFS", they didn't even take the time to fully understand some of the design choices and compromises made in ZFS, because they ended up making choices which had terrible implications. Examples: using B-trees rather than Merkle hashes; this is at the root of many of its performance problems. Not having immutable snapshots; again has performance implications as well as safety implications, and is rooted in not having pool transaction numbers and deadlists. Not separating datasets/subvols from the directory hierarchy; presents logistical and administration challenges, while ZFS datasets can freely inherit metadata from parents and the mount locations are a separate property. ZFS isn't perfect of course, there are improvements and new features that could be made, but what is there is well designed, well thought out, and is a joy to work with.
Can you tell how such evaluation on a design is done? Is some kind of formal verification, analysis or rather experimentation to figure out its properties normal?
I wasn't involved so can't personally provide details of how this was done at Sun. Most of my knowledge comes from listening to talks and reading books on ZFS.
For work I'm involved in relating to safety-critical systems, we use the V-model for concepts, requirements, design and implementation, with extensive validation and verification activities at each level. Tools are used to manage all of the requirements, design details and implementation details and link them all together in a manner which aims to require self-consistency at all levels. When done correctly, this means that the person writing the code does not need to be particularly creative at this stage: the structure is completely detailed by the formal design. But it does require significant up-front effort to carefully consider and nail down the design to this level of detail. But it does avoid the need to continually revise and adapt an incomplete or bad design in a never-ending implementation phase.
This approach is definitely not for everyone, and there are many things one can criticise about it. But if you are willing to bear the financial cost and time costs of doing that detailed design work up front, the cost of implementation will be much lower and the product quality will be much greater. There is a lot to be said for not madly mashing keys and churning out code without thinking about the big picture, and Btrfs is a case study in what not to do.
The V-model is interesting. I'm a student and kinda new to the different development models.
How to decide whether such meticulous design is necessary or not? In hindsight Btrfs may have benefited, but how to decide when to and when not to in the future?
I would also be interested to know what tools are used for this. The ones I looked at seemed quite dated.. :-)
Thank you for answering! This is very interesting to learn about
This is just my own personal take on things; I'd definitely recommend reading up on the differences between Waterfall, Agile and the V-model (and Spiral model). Note that you'll see it said that the V-model is based upon Waterfall, which is somewhat true, but it's not necessarily incompatible with Agile. You can combine the two and go all the way down and back up the "V" in sprints or "product increments", but you do need the resources to do all the revalidation and reverification at all levels each time, and this can be costly (this is effectively what the Spiral model is).
In terms of deciding if meticulous up-front design is necessary (again my own take), it depends upon the consequences of failure in the requirements, specifications, design and/or implementation. A random webapp doesn't really have much in the way of consequences other than a bit of annoyance and inconvenience. A safety-critical system can physically harm one or multiple people. Examples: car braking systems, insulin pumps, medical diagnostics, medical instruments, elevator safety controls, avionics etc. It also depends upon how feasible it is to upgrade in the field. A webapp can be updated and reloaded trivially. An embedded application in a hardware device is not trivial to upgrade, especially when it's safety-critical and has to be revalidated for the specific hardware revision.
For filesystems the safety aspect will relate to maintaining the integrity of the data you have entrusted to its care. Computer software and operating systems can have all sorts of silly bugs, but filesystem data integrity is one place where safety is sacrosanct. We set a high bar in our expectation for filesystems, not unreasonably, and after suffering from multiple dataloss incidents with Btrfs, it's clear their work did not meet our expectations. We're not even going into the performance problems here, just the data integrity aspects.
I can't say anything about the tools I use in my company. There are specialist proprietary tools available to help with some of the requirements and specifications management. I will say this: the tools themselves aren't really that important, they are just aids for convenience. The regulatory bodies don't care what tools you use. The important part is the process, of having detailed review at every level before proceeding to the next, and the same again when it comes to validation and verification activities.
Often open source projects limit themselves to some level of unit testing and integration testing, which is fine. But the coverage and quality of that testing may leave some room for improvement. It's clear that Btrfs didn't really test the failure and recovery codepaths properly during its development. Where was the individual unit testing and integration test case coverage for each failure scenario? Where the V-model goes above and beyond this is in the testing of the basic requirements and high-level concepts themselves. You've got to check that the fundamental premises the software implementation is based upon are sound and consistent.
When SwiftUI came out I learnt it by trying out each question about it. This way it was easy to get an introduction, because all were new to the technology and so were interested in figuring the questions out. And the questions were simple enough because no one had much of a grasp.
With old technology you do not necessarily have enthusiasts who try to "figure it out too" at the moment and so it gets more difficult.
Tonsky, the creator of the popular 'Fira Code' monospace font, has a great blog post about monitor DPI and font rendering. A big section is on just how terrible font rendering and display scaling on a modern Mac really is.
I only use the desktop mac so that's not much an issue probably, there it is probably 2x scaling. Interesting how Windows can handle anything as long as the hardware supports it!
250 sounds nice, compared to 220 retina ppi for macs. If its only constrained by the display that would be good. I will love buying 3080/5950x as my new workstation. hmm
I run Arch on a Surface Pro 7, (ironic, I know), which has a HiDPI display and the experience is good for me under GNOME, even fractional scaling is supported.
One thing to keep in mind is that Windows/Mac/Linux do render fonts differently - they may not look identical to macOS out of the box, but there's a lot of configuration options (if you want to play with them) to get them to look the way you want.
I've read that COVID-19 can sometimes be found inside the body even after testing negative for some time. This being a reason for continued weakness and impairment after infection.
Could this possibly continued infection lead to faster mutation, or is the amount of people infected the cause for the rapid mutations, or something else?
It's suspected that this mutation arose from a single patient who had a long case of COVID-19. Here's an interesting pre-print study on an individual (not the one from this mutation) who harboured many mutations over a 101-day period. [https://www.cogconsortium.uk/news_item/persistent-sars-cov-2...]