How long should FAF keep old replays?
-
I disagree on "only valid ranked games" because the 'pros' unrank most of their games and tourny games.
-
Remove if all are true:
- Older than 5 Years
and if one or more of the following is true:
- Involves AI
- Has too many desyncs
- Has only 1 party
If it isn’t enough cleanup we can do more stringent requirements.
-
I think the old games are important historical sources and should be kept. 5 years is an arbitrary criterion. Maybe clean up all single player games and all ai games and all games with players under 1000 rating (these players dont care about the game enough to watch old shit anyway) and keep the rest.
-
How easy is it to get more disk space? If it's just a matter of buying a 4TB HDD then I could probably stretch to that
-
I think if we end up deleting replays (which I'm in favor of) we should have a poll on it first, since this is in my opinion a big change.
-
One question: would it make the replay search faster if we deleted replays?
I think its a nice feature to be able to look at all games played all time. To see how the meta changed and to see how bad the old gods truely were. Maybe give access to the files to a database query connoisseur like arma to parse some stats like relative storage size of bot games, single player games etc to see if that fixes the issue -
@LargeMaleBennis Definitely ALL games should be archived somewhere, even if they're not available through the client.
I do rarely go back and watch some old games. I have my own replays saved on my computer but we shouldn't permanently delete any ladder matches.
Even an AI game might have meaning to some people. So even if a game is not available through the server, ALL games should be saved somewhere.
-
@LargeMaleBennis It wouldn't change search speed, because we would only delete the replay file, not any records in the database.
-
If people wish to watch AI games, it would be recent AI games. I never see anyone talking about watching AI games (even viking) but even if those people exist, it’s why I included the timeframe criteria. I sincerely don’t think the replays I mentioned would upset anyone if they were deleted.
I also don’t see the need for a poll here. We should decide what quantity of free space is “good enough” and chisel down until we meet it.
-
If there's a way to automatically tell if a replay desyncs then I'd be in favor of getting rid of those and replays with only a single person in it as others have said. I don't think we should get rid of any valid replays outside of those criteria if we can help it. I'm also very curious about what it would take to increase the size of the servers disk space, seems like that should be straight forward assuming there's money to do so.
-
The only people watch AI games are AI Devs whom, as they often ask for replays when something happened in the game. To confirm or otherwise the issues. Tbf modded games, the only folks I know who care to watch old mod replays are the modders even then. Basically same situation “send replay” (well i personally couple times a week search scta to see if any scta games were played but I’m weird)
-
I would do it differently for different categories. For example: Remove all replays less than 5 min long and older than 2 months (in case something a moderator needs to look at happened and the report is being worked on, I just assumed 2 months max)
Remove single player games after 1 month; if it's interesting the player should just save it themselves. Really no added value in keeping them for long.
Keep all rated and watchable (no desyncs) games forever, maybe put them on a different server just for storage after 5 years or so (so you can still download them if you need to, but they don't take up space on the main server).
There are probably a few more categories that I didn't think of that would make sense. -
AI games can be things like team survival games. Some of those games are posted to YouTube. People care enough to put the videos online, that means people care about the games. Maybe very few people, other than the players themselves, care about solo games involving AI. But you can't say as a general rule that nobody cares about AI games.
The cost of making a backup of the games before deleting them from the server is minimal. What even is the cost of 300GB of cloud storage? Just save them somewhere. Maybe in 5 years the cost of storage will be so small that we can restore all the replays to the server at basically no cost. Or some other way to make the replays available. There is no good reason to delete history even if most of the history is crap. Once it is deleted we can never get it back.
Do we even have a complete backup of all replay files? Just load them into a single 1TB drive and send it to a FAFer's house. 1-2 copies like this would ensure that history is not lost.
-
Cloud storage for our purposes is around 5€ per month per 100GB. Previous calculation said we have a growth of about 20GB per month, so the costs are constantly increasing. Just keeping everything forever is not a viable option.
Also running it on a private copy is essentially the same as deleting for 99% of the users who just see "ah, replay not available, well bad luck".
-
Would it be possible to look up the storage savings from the various proposed solutions?
-
@Brutus5000 I wouldn't mind volunteering to be personally responsible for responding to people who want really old replay files for some reason. We wouldn't even need cloud storage, I would just keep a single 1TB or 2TB drive with zero ongoing costs to FAF. If the files were transmitted to me electronically it would be close to zero cost to FAF. That situation could then continue essentially indefinitely.
I doubt there is very much interest by players in those old replays, so we're probably talking about 1 request per month. Even if I have to spend 15 minutes a day responding to those old requests, I wouldn't mind.
-
Arma for The Giver 2021
-
arma for Archive Councilor 2021
-
Before deleting any replays, consider the following:
- New format for .fafreplay, compressed achive of the scfareplay-file and the json-data without the final b64-encoding
Comparsion of size: scfafreplay: 1222 KB fafreplay-current: 260 KB (21.2% of original size) fafreplay-zip: 196 KB (16.0% of original size) fafreplay-7z:: 141 KB (11.5% of original size)
-
Publish the replay vault (maybe IPFS?) so people can mirror / clone the replays for historic reasons.
-
Create a service that collects local replays from players to be able to repair current broken / missing ones in vault
EDIT:
Did some more testing and the compression numbers were a bit messed up earlier. I've updated the post now with my results. -
+1 on the service,
you could use some sort of cheaper object storage like AWS glacier or azure archive as place to dump it + a cron job to just upload xGB every + some sort of proxy api to pull the data when requested (probably almost never).EDIT: looks like wasabi could be relatively cheap https://wasabi.com/