I have a client whose background process crashes once or twice a month (thank you Windows). The management dashboard checks a data base and includes a warning message if the process hasn't run in the past hour. Since "someone" checks the dashboard every 5 minutes or so anyway, action can be taken to restart the background process.
But what if, for some reason, no one checks the management dashboard for hours? I was recently asked if there was a better way. "You have 2 options, either (a) upgrade to Unix and reboot once a year whether you need to or not, or (b) run a background process to see if the background process is still running." We're not spending any money on (a), so I guess it's (b).
Now I need a PHB dashboard for "someone" to check every 5 minutes to see if the background process that checks the background process is still running.
Doesn't that worry you a bit? I remember amazon having trouble re starting machines that had been up for several years. I think they ran into several problems. Boot sector demagnetization was one. I think they had problems with drives that could spin, but failed in such a way they couldn't start spinning.
Really, you should consider power cycling every 90 days or so, just to make sure you can restart. at least use several different brands of HD in your raid arrays.
Yeah, that is good advice. I'd do it if I cared at all about what was on them. They're just part of the "long tail" of hardware, long migrated off of. Now we just let them run for the novelty of saying, "look its been up for YEARS!" One does a webcam of the server room.
As a side note, every single production server we now use is a vmware instance... how do you count uptime there as it gets passed around various physical machines?
Nice idea, but a pity that they couldn't just buy an intelligent power switch that was IP addressable. That's how I manage the worst case scenario for my servers.