I think if we end up deleting replays (which I'm in favor of) we should have a poll on it first, since this is in my opinion a big change.
One question: would it make the replay search faster if we deleted replays?
I think its a nice feature to be able to look at all games played all time. To see how the meta changed and to see how bad the old gods truely were. Maybe give access to the files to a database query connoisseur like arma to parse some stats like relative storage size of bot games, single player games etc to see if that fixes the issue
@LargeMaleBennis Definitely ALL games should be archived somewhere, even if they're not available through the client.
I do rarely go back and watch some old games. I have my own replays saved on my computer but we shouldn't permanently delete any ladder matches.
Even an AI game might have meaning to some people. So even if a game is not available through the server, ALL games should be saved somewhere.
@LargeMaleBennis It wouldn't change search speed, because we would only delete the replay file, not any records in the database.
If people wish to watch AI games, it would be recent AI games. I never see anyone talking about watching AI games (even viking) but even if those people exist, it’s why I included the timeframe criteria. I sincerely don’t think the replays I mentioned would upset anyone if they were deleted.
I also don’t see the need for a poll here. We should decide what quantity of free space is “good enough” and chisel down until we meet it.
If there's a way to automatically tell if a replay desyncs then I'd be in favor of getting rid of those and replays with only a single person in it as others have said. I don't think we should get rid of any valid replays outside of those criteria if we can help it. I'm also very curious about what it would take to increase the size of the servers disk space, seems like that should be straight forward assuming there's money to do so.
The only people watch AI games are AI Devs whom, as they often ask for replays when something happened in the game. To confirm or otherwise the issues. Tbf modded games, the only folks I know who care to watch old mod replays are the modders even then. Basically same situation “send replay” (well i personally couple times a week search scta to see if any scta games were played but I’m weird)
I would do it differently for different categories. For example: Remove all replays less than 5 min long and older than 2 months (in case something a moderator needs to look at happened and the report is being worked on, I just assumed 2 months max)
Remove single player games after 1 month; if it's interesting the player should just save it themselves. Really no added value in keeping them for long.
Keep all rated and watchable (no desyncs) games forever, maybe put them on a different server just for storage after 5 years or so (so you can still download them if you need to, but they don't take up space on the main server).
There are probably a few more categories that I didn't think of that would make sense.
AI games can be things like team survival games. Some of those games are posted to YouTube. People care enough to put the videos online, that means people care about the games. Maybe very few people, other than the players themselves, care about solo games involving AI. But you can't say as a general rule that nobody cares about AI games.
The cost of making a backup of the games before deleting them from the server is minimal. What even is the cost of 300GB of cloud storage? Just save them somewhere. Maybe in 5 years the cost of storage will be so small that we can restore all the replays to the server at basically no cost. Or some other way to make the replays available. There is no good reason to delete history even if most of the history is crap. Once it is deleted we can never get it back.
Do we even have a complete backup of all replay files? Just load them into a single 1TB drive and send it to a FAFer's house. 1-2 copies like this would ensure that history is not lost.
Cloud storage for our purposes is around 5€ per month per 100GB. Previous calculation said we have a growth of about 20GB per month, so the costs are constantly increasing. Just keeping everything forever is not a viable option.
Also running it on a private copy is essentially the same as deleting for 99% of the users who just see "ah, replay not available, well bad luck".
Would it be possible to look up the storage savings from the various proposed solutions?
@Brutus5000 I wouldn't mind volunteering to be personally responsible for responding to people who want really old replay files for some reason. We wouldn't even need cloud storage, I would just keep a single 1TB or 2TB drive with zero ongoing costs to FAF. If the files were transmitted to me electronically it would be close to zero cost to FAF. That situation could then continue essentially indefinitely.
I doubt there is very much interest by players in those old replays, so we're probably talking about 1 request per month. Even if I have to spend 15 minutes a day responding to those old requests, I wouldn't mind.
Arma for The Giver 2021
arma for Archive Councilor 2021
Before deleting any replays, consider the following:
Comparsion of size:
scfafreplay: 1222 KB
fafreplay-current: 260 KB (21.2% of original size)
fafreplay-zip: 196 KB (16.0% of original size)
fafreplay-7z:: 141 KB (11.5% of original size)
Publish the replay vault (maybe IPFS?) so people can mirror / clone the replays for historic reasons.
Create a service that collects local replays from players to be able to repair current broken / missing ones in vault
Did some more testing and the compression numbers were a bit messed up earlier. I've updated the post now with my results.
+1 on the service,
you could use some sort of cheaper object storage like AWS glacier or azure archive as place to dump it + a cron job to just upload xGB every + some sort of proxy api to pull the data when requested (probably almost never).
EDIT: looks like wasabi could be relatively cheap https://wasabi.com/
I'm in favor for deleting them after 3 - 5 years. i don't value old replays as much because they were played with an older version of the game. Very good replays will be casted most likely. Also if you want to watch old gpgnet replays https://www.gamereplays.org/supremecommanderforgedalliance/replays.php?game=44
Hi guys, thanks for your feedback so far.
Fortunately we finalized some work on replay stuff that was mostly done already, so we'll compress replays with zstandard instead of (zip encoded with base64). Support will come in the next client release and we'll need to find a way to convert all the old replays. This will be a big breaking change for some tools out there, but it's the best chance we have.
Some words on the alternative suggested here:
While renting storage in the cloud with some cheap providers is possible of course it massively increases develop and (more important) maintenance efforts which I'm not willing to pull. Storage in our Hetzner datacenter would be possible though as it can be natively integrated into the existing server infrastructure. Nevertheless we still strive for a cost neutral solution if possible.
Eventually this topic will reappear in a few years again, so the discussion is not off the table forever.
We'll see which storage savings we can achieve with the current solution.
Will this break the replay parser?
@FemtoZetta said in How long should FAF keep old replays?:
Will this break the replay parser?
Perhaps, but it should not be difficult to find a Java library that can decompress the ztd file format and add that to the parser itself. So it shouldn't be difficult to update the parser to handle this. If anyone is actively maintaining the parser, it should be an easy fix.
Even if there's no way to update an existing tool, it should be relatively easy to decompress the ztd files "by hand" and re-compress them in the fafreplay file format in order to hand them off to a software tool.
OR someone could even make a software tool to convert replay files in the new (ZTD) format into the old (zip) format. Automating the process of converting files from a more-efficient format to a less-efficient format so that old tools can use them is kind of silly but it would solve the immediate problem of tools being broken.