Site was down for a couple of hours. It appears like the HDD where the main database was located has some bad sectors already. This caused some database issues, which also affected the game servers in the process. Unfortunately, it’s the kind of error that my automated scripts are not able to detect, so the games have been inaccessible for a number of hours.
The same incident happened last month, though I was quicker to catch the problem then. So today I made some major changes to prevent this from happening again. I have moved the entire database to the other server. This means more load to that particular server, but ultimately, it’s a more stable setup. I’m gonna have to replace the server with the failing HDD, which would take some time. I figured it’s probably better to just provision an entirely new server instead of just replacing the hard drive (as I likely have to re-install/re-configure everything anyway). In the meantime, the current setup looks stable enough.
I’ve also coded some new scripts, so I will be notified immediately in case the same issue happens again in the future. This should help lessen potential down time from hours, to just minutes. Since I am unable to monitor everything 24/7, and since hardware components are bound to have issues eventually, I think this is a good enough solution for now.