How long should FAF keep old replays?
-
Currently the server disk is slowly running full (~50gb left before trouble).
Out of the 900 GB disk space used, 660GB of it is replays. Old replays back to 2014, broken replays, desynced replays, replays of invalid games, replays of single players, replays against AI and so forth and so on.
We could try to move them somewhere else, but is anybody really interested in that? I'd rather move on to some sort of cleanup policy, such as "don't keep replays older than 3 years".
What are you thoughts on that?
-
A lot of the games are singleplayer so if we would delete all replays that have 1 player and are older then 1 year we would hopefully reduce the memory usage enough while not removing the abillity to watch older multiplayer games. I do sometimes watch older games, even from 2015-17 because that's what you need to do to actually watch some really high rated 1vs1 games that were not part of some tourney.
-
Remember that current replay format is a bit wasteful and we can cut down replay size by 40% or so with better compression. Patches to the java client and current replay server that do just that have been ready for a while, maybe we could merge at least the java client patches? This way we could switch to the new format once, say, 1.5.0 is out.
Other than that I vote for throwing out all replays (currently) older than 5 years.
And all dualgap / thermo /astro replays while we're at it -
@Tagada But essentially you are only interested in valid ranked games right? Out of our 9.2 million games only 3 million are valid ranked games.
-
I disagree on "only valid ranked games" because the 'pros' unrank most of their games and tourny games.
-
Remove if all are true:
- Older than 5 Years
and if one or more of the following is true:
- Involves AI
- Has too many desyncs
- Has only 1 party
If it isn’t enough cleanup we can do more stringent requirements.
-
I think the old games are important historical sources and should be kept. 5 years is an arbitrary criterion. Maybe clean up all single player games and all ai games and all games with players under 1000 rating (these players dont care about the game enough to watch old shit anyway) and keep the rest.
-
How easy is it to get more disk space? If it's just a matter of buying a 4TB HDD then I could probably stretch to that
-
I think if we end up deleting replays (which I'm in favor of) we should have a poll on it first, since this is in my opinion a big change.
-
One question: would it make the replay search faster if we deleted replays?
I think its a nice feature to be able to look at all games played all time. To see how the meta changed and to see how bad the old gods truely were. Maybe give access to the files to a database query connoisseur like arma to parse some stats like relative storage size of bot games, single player games etc to see if that fixes the issue -
@LargeMaleBennis Definitely ALL games should be archived somewhere, even if they're not available through the client.
I do rarely go back and watch some old games. I have my own replays saved on my computer but we shouldn't permanently delete any ladder matches.
Even an AI game might have meaning to some people. So even if a game is not available through the server, ALL games should be saved somewhere.
-
@LargeMaleBennis It wouldn't change search speed, because we would only delete the replay file, not any records in the database.
-
If people wish to watch AI games, it would be recent AI games. I never see anyone talking about watching AI games (even viking) but even if those people exist, it’s why I included the timeframe criteria. I sincerely don’t think the replays I mentioned would upset anyone if they were deleted.
I also don’t see the need for a poll here. We should decide what quantity of free space is “good enough” and chisel down until we meet it.
-
If there's a way to automatically tell if a replay desyncs then I'd be in favor of getting rid of those and replays with only a single person in it as others have said. I don't think we should get rid of any valid replays outside of those criteria if we can help it. I'm also very curious about what it would take to increase the size of the servers disk space, seems like that should be straight forward assuming there's money to do so.
-
The only people watch AI games are AI Devs whom, as they often ask for replays when something happened in the game. To confirm or otherwise the issues. Tbf modded games, the only folks I know who care to watch old mod replays are the modders even then. Basically same situation “send replay” (well i personally couple times a week search scta to see if any scta games were played but I’m weird)
-
I would do it differently for different categories. For example: Remove all replays less than 5 min long and older than 2 months (in case something a moderator needs to look at happened and the report is being worked on, I just assumed 2 months max)
Remove single player games after 1 month; if it's interesting the player should just save it themselves. Really no added value in keeping them for long.
Keep all rated and watchable (no desyncs) games forever, maybe put them on a different server just for storage after 5 years or so (so you can still download them if you need to, but they don't take up space on the main server).
There are probably a few more categories that I didn't think of that would make sense. -
AI games can be things like team survival games. Some of those games are posted to YouTube. People care enough to put the videos online, that means people care about the games. Maybe very few people, other than the players themselves, care about solo games involving AI. But you can't say as a general rule that nobody cares about AI games.
The cost of making a backup of the games before deleting them from the server is minimal. What even is the cost of 300GB of cloud storage? Just save them somewhere. Maybe in 5 years the cost of storage will be so small that we can restore all the replays to the server at basically no cost. Or some other way to make the replays available. There is no good reason to delete history even if most of the history is crap. Once it is deleted we can never get it back.
Do we even have a complete backup of all replay files? Just load them into a single 1TB drive and send it to a FAFer's house. 1-2 copies like this would ensure that history is not lost.
-
Cloud storage for our purposes is around 5€ per month per 100GB. Previous calculation said we have a growth of about 20GB per month, so the costs are constantly increasing. Just keeping everything forever is not a viable option.
Also running it on a private copy is essentially the same as deleting for 99% of the users who just see "ah, replay not available, well bad luck".
-
Would it be possible to look up the storage savings from the various proposed solutions?
-
@Brutus5000 I wouldn't mind volunteering to be personally responsible for responding to people who want really old replay files for some reason. We wouldn't even need cloud storage, I would just keep a single 1TB or 2TB drive with zero ongoing costs to FAF. If the files were transmitted to me electronically it would be close to zero cost to FAF. That situation could then continue essentially indefinitely.
I doubt there is very much interest by players in those old replays, so we're probably talking about 1 request per month. Even if I have to spend 15 minutes a day responding to those old requests, I wouldn't mind.