Reproducing bugs is a luxury and not even close to required for analyzing and fixing issues. Even if the issue is external (hardware, antivirus, etc.), the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.
The problem is you can never close the bug report if you can't reproduce. I guess, you could, as the other commenter suggests, mathematically prove that it can't happen, but otherwise you're prematurely closing it.
How do you differentiate that you solved the bug and not a similar looking bug?
> the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.
But this doesn't solve the problem.
- What if it is an upstream issue? They have to be connected, since they are deleting data. Maybe it is completely a bug on their end? Doesn't matter how defensive you are if the bug was "anytime an email has 'man man' and is pulled between 00:00-00:04 everything deletes" then what can you do?
- What if the user was hacked and the hacker just deleted all the data?
- What if the user was just dumb and deleted the data themselves. Either not knowingly or were embarrassed to say anything.
- What if it is another program on the user's computer that is deleting the data because of some weird unexpected collision?
I'm sure you can think of more situations that still won't solve the problem.
How do you close the report if you can not make strong guarantees that it is resolved?
A luxury? Not even close to required? You are not afraid of words! I'm not looking forward to receive a bug report from you!
Yeah, reproducing is not theoretically mathematically necessary. In theory you could prove your code is correct with formal methods¹. Now, nobody does this because it is impractical (borderline impossible), reproducing is in practice so useful as to be almost essential:
- it lets you study how your code behave in the problematic case and identify what's causing the exact issue the user is seeing
- it lets you check that your fix does indeed address the bug
I have indeed already fixed trivial bugs without reproduction cases from a vague description of a bug because I'm intimately familiar with the code and it immediately rings a bell: the cause is immediately obvious. But that's not the usual case.
> the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.
What if the code is already designed like this (and I sure do hope it is currently written like that, because that's almost common sense, if not the only sensible way of moving something) but somehow fails for some currently inexplicable reason? It smells race condition to me.
In the case of the discussed bug, users have described a reproduction case that's not 100%. But someone will need to find a 100% reproduction case. Users, or devs. It will not be optional. You can't play a guessing game, attempt to fix the code and hope for the best. You might be able to actually fix the bug, but without much confidence. Best case, you'll be able to find a reproduction case after fixing the bug (that you'll probably use as a functional test), to prove you fixed the bug for this specific case you found. You'll not be 100% sure you addressed the user's case.
A bug can hide another one, so you could find and fix a bug, but the issue is still present in the user's case. You can only be sure with their reproduction case.
But I agree that it is hard to reproduce a race condition.
¹ which in practice applies to code of trivial size (static analysis), or consists in checking a model but not the actual implementation (model checking), or does apply useful checks but is not exhaustive and has false positives / negatives (static analysis), or does apply useful exhaustive checks but only on a limited number of executions (runtime verification, and we do have functional tests that serve a similar purpose in practice - and you'll actually need the reproduction case here so you have the right execution to check), or requires you to write your code in a specific language (stuff like coq) and you cross your fingers that this specific language's implementation is itself correct. In short: not applicable here.