I have two kind of nits with this logic, but I could totally be wrong, and you should feel absolutely free to correct me.
I'm fairly positive that a unit of measure should _never_ be variable, otherwise it's fairly pointless. And if you don't think hashing a 1TB string takes significantly longer than a 100byte string ...
Futher more, big O notation is supposed to be a wide upper bound, but I don't think that works well if it's not actually an upper bound. If you were to tell your bosses something took constant time, and it took an hour for string A (1TB) and 100ms for string B (10B), I'm pretty sure your opinion wouldn't mean much after that.
The hashmap should absolutely be considered in terms of the hash function, because it's the longest running part of the algorighthm. To do otherwise is disingenous.
Using that logic, I could call any iterative algorithm O(1) since the top level function only gets called once.
I think the problem you would have with your boss would be that your boss asked you 'how long will this program run', and if you told them O(1), the question you are really answering is 'what is the time complexity of this algorithm, parameterised by the number of entries, as the number of entries tends towards infinity'.
If your boss really wanted O(), then they wouldn't care that hashing one key takes a day and another a second, because they're thinking in terms of a hash with infinite entries, so the difference between a day and a second to hash is irrelevant.
If you released software that was exponential, but your QA department only ever tested small inputs, I think it would be negligent to omit to your boss and|or clients the rate at which run-time could expand.
and I think it would be a horrible manager to not care about the difference between a day and a second.
To implement the hash table you wouldn't have to hash the whole string... of course this will depend on the data that you are trying to store. Assuming that the data is random, 100 bytes vs. 100 terabytes, you only need to figure out what bucket the data is saved. You could still base it on this concept if sightly modified
Yes. A valid hash function certainly can be defined as a F(n): N x N -> n % 100
But that certainly cannot be considered a reasonable hash function. A string is basically an array of bytes (or code-points, in case of UTF-8).
To have any decent property (like, producing different outputs for miniscule changes in the input), you have to touch every element in the array.
For custom objects, yes, you don't have to hash every property, but for strings, yeah, the hash function will almost always depend on the length of the string.
I'm fairly positive that a unit of measure should _never_ be variable, otherwise it's fairly pointless. And if you don't think hashing a 1TB string takes significantly longer than a 100byte string ...
Futher more, big O notation is supposed to be a wide upper bound, but I don't think that works well if it's not actually an upper bound. If you were to tell your bosses something took constant time, and it took an hour for string A (1TB) and 100ms for string B (10B), I'm pretty sure your opinion wouldn't mean much after that.
The hashmap should absolutely be considered in terms of the hash function, because it's the longest running part of the algorighthm. To do otherwise is disingenous.
Using that logic, I could call any iterative algorithm O(1) since the top level function only gets called once.