@noc said in The fuck happened yesterday - short recap:
Was the Database issue where it could no longer find it, a MariaDB issue or network routing issue where once it got exposed onto the network, routing fuck ups occurred or do we not know?
We do not know. But we had errors on OS level saying too many files opened. This would explain that you can't open new connections while running applications keep theirs alive. Doesn't match whatever happened with IRC because we had new connections a lot but they instantly died again.
@clint089 said in The fuck happened yesterday - short recap:
Thx for the Post Mortem!
What i would suggest:
- Make "critical" changes when you guys have time to monitor/support, even if that means ppl cannot play (once a week to balance it out?)
-- There is always something that needs attentions after an update
The irony here is that both changes independently would have been no brainers. If you had asked me beforehand I would've said nothing can go wrong here.
- It would be nice if we got any info about the current state (Discord), some players could start games where others could not and nobody was saying "we know about issues and we are on it, we will post an update in x hours"
We mostly try to do that. Unfortunately in this case all status check said our services work. And it takes some time to read the feedback aka "shit's on fire yo" and to also confirm what is actually up.
- Give the http-client a lil love, it's creating so many issues if the server is not responding/behaving as expected.
- noticed that when the replay server was not available, it crashed the client bc. of an unhandled exception.
I have no clue what you mean by http client here, but that sounds like client department
PS: You guys still looking for support to migrate?
The next milestone is gettings ZFS volumes with OpenEBS running. Support on that would be nice, but it's a very niche, I doubt we have experts here.