Problems arise in the morning
The last night we had a big software upgrade at one of our larger customers.
I am normally not in the update team, as the 2nd Level Support is handling them alone.
I talked to my department head and mentioned that it probably would make sense that I will be on call if something with our part (the web part) of the software would have some upgrade problems.
He agreed and my alarm went off at 3:30 am. I didn’t stand up at that time but I turned the volume of my phone up so I would hear the call if one would come. I turned the light back off and went to sleep again.
Obviously an hour later I received the first call. Two out of three servers in the cluster setup worked. The oldest one made problems. I gave some quick instructions on what the problem could be and said if it doesn’t work, they should call me again.
They proceeded with the update and it looked okay.
Fast forward another half hour.
The update worked, but on that particular server, no one can log in. I was out of ideas, so I got up and prepared for work.
It turned out that somehow the Mongo Database we are using as cache and for session storage somehow managed to lose the authentication database. Our software couldn’t connect to it anymore as username and password were invalid.
Since we don’t store anything important in that database, I quickly stopped the service and created the needed users again. A restart later, everything is working fine again.
The interesting fact is that the Mongo database wasn’t even touched in this upgrade.
But it works now and that’s all that matters for now. If the same problem doesn’t arise at another’s customers upgrade, I will probably blame it on a Windows fault. If you can’t reproduce it, it’s probably Windows doing some bad stuff to our files.
Nodejs and Windows Server just don’t work together good sometimes.