FAF Statistics Megathread 2 Statistics Boogaloo
-
Over the past few months, I’ve been digging into various aspects of FAF data — initially for moderation-related purposes, but the scope quickly grew into a broader analysis of FAF player activity, rating dynamics, and behavioral trends.
This thread can be considered an unofficial continuation of an earlier post by @tatsu from back in 2020, which itself was a continuation of the thread from 2012 on the old forums. See: FAF Statistics Megathread and FAF stats (old forum).
I will start with a general introduction to the data sources, goals, and some of the quirks that come with working with FAF’s historical records. I will also explain what data I have collected. If you have suggestions on further statistics or visualisations based on this data that I have not yet included, comment your suggestions and I'll have a look.
Future posts will dive into specific topics, with accompanying visualizations and summary tables.
Goals
This thread visualizes various statistics based on data pulled from the Forged Alliance Forever database through the API. This data was collected with the aim of getting a clearer understanding of the FAF player base and its interactions with moderation. The goals of this analysis were:
- Track trends in active player counts and rating distribution — who’s playing, how much, and how the landscape changes.
- Identify long-term trends in player retention, smurf behavior, and rating inflation/deflation.
- Explore moderation data to better understand the volume and nature of reports, bans, and appeals.
- Provide data-backed insights for both community discussions and FAF governance (e.g. balance, matchmaking, moderation policy).
- Improve transparency by publishing findings, allowing others to cross-check and build upon them.
This data may guide future efforts by the various FAF teams, including for example the promotion team and the moderation team.
Data description
Overview
This analysis draws on three primary datasets: players, reports, and bans, containing data starting from the beginning of the FAF project in February 2012.
Players data was collected from the API’s /data/players endpoint and contains information such as the account registration date, username, and playerID. This data was combined with select data from /data/gamePlayerStats, which contained information about which players played in which games and /data/leaderboardRatingJournal, which contained info on player ratings. By combining these, further information such as the date of the last played game, the number of games, the lifespan of the account, or the time it took for a player to play 10 games was calculated.
Reports data contains all the metadata related to reports, including report status (whether the report was discarded or completed, for example), the IDs of the reporter and reported players, the report description and game, and which moderator handled the report.
Finally, information from the bans dataset contains all related metadata including the ban reason, duration, and category.
A complete list of the variables for the three datasets will be posted below.
Data acquisition
Data was collected through the FAF API using a number of custom python scripts. Specifically, the endpoints /data/bans, /data/moderationReports, /data/players, /data/gamePlayerStats, and /data/leaderboardRatingJournal were used to collect all existing data. Together, this resulted in about 50GB of data in JSON format, which was later processed to CSV format.
Some of these endpoints cannot be fully accessed without an authorization token with at least moderator privileges, as some endpoints—the data/players endpoint in particular—produce data that contain private information that falls under GDPR regulations. In line with GDPR guidelines on data minimization, data such as player emails or steam-links were not collected.
Unusual features of the data
Due to a number of reasons, some of the data in these datasets does not follow expectations. Examples include:
- While most playerIDs correlate with account age, this is not the case for all playerIDs. Especially for accounts made in 2012, though also at later dates—at least up to 2016—it may occur that younger accounts have a lower ID than should be expected.
- Some data on user accounts was not properly recorded until February 2016.
- While bans were given out prior to May 2017, these dates have not been recorded. Rather than having been left empty or filled with a placeholder value, these dates were apparently set to 2017-05-10. As the ban expiry date was properly recorded, some bans have a negative 'ban duration' value.
- While the time of creation for a report was recorded, the update time was not. Consequently, for reports that were discarded and thus do not have an associated ban—from which the update time can be derived based on the ban’s creation date—it is impossible to determine when they were processed.
- Because of the various data artifacts that are present in the players dataset, for several graphs the choice was made to exclude all data from before March 2016. This will be mentioned alongside the graph when applicable.
Data accessibility
As it has been longstanding moderation policy to not share the details of reports and associated bans with the wider public, the datasets collected for the analysis of moderation-related statistics will not be released. If you're part of a FAF project (e.g., balance, moderation, events) or just an interested player and would be interested in access to more detailed data, feel free to reach out. I’m happy to provide anonymized raw data segments on request if you want to dig deeper or cross-check any findings.
Data analysis
Data was processed, analyzed, and visualized using a combination of Python and R scripts, with a lot of help from ChatGPT (I suck at programming). I may at some point release the relevant code to Github when I can be bothered to clean the code enough to be presentable.
List of collected and calculated variables:
Dataset Columns players player_id, create_time, last_login, username, has_accountlink, games_played, first_game_time, last_game_time, tenth_game_time, hundredth_game_time, thousandth_game_time, time_to_10games, time_to_100games, time_to_1000games, create_year players_by_bracket year_month, rating_bracket, active_players_10games players_all_ratings player_id, latest_rating bans ban_id, level, player_id, ban_create_time, ban_expires_at, ban_reason, author, revoke_reason, revoke_time, related_report, ban_revoked, ban_duration_days, units, ban_permanent, ban_category reports report_id, report_status, reporter_id, reported_user_ids, report_description, game_incident_timecode, game_id, moderator_private_note, moderator_notice, last_moderator_id, create_time, related_ban_id, last_moderator_name, ban_create_time, ban_expires_at, ban_reason, ban_revoked, time_to_ban games replay_ID, end_time, lobby_name, start_time, rated_validity, victory_condition, host_player_ID, gamePlayerStats_IDs Gamestats gamePlayerStatsIDs, rating_deviation, rating_mean, player_color, player_faction, score_time, team, replay_ID, player_ID -
Analysis: Playerbase
User registrations
Based on the data from March 2016 onwards, FAF has gained on average 2668 (+/- 678) new users every month. Month by month numbers fluctuate significantly, possibly in response to the SupCom:FA game being sold with a discount, or due to the effect of third party promotions such as game casts and tournament streams.
Graph 1: Count of created accounts by month. The data for February 2016 was not shown as its value was a significant outlier (12666).Yearly registration numbers have been fairly stable at around 28000 since at least 2017. However, the effect of the corona pandemic is distinctly visible in the significant increase in new users in 2020.
Graph 2: Sum of created accounts by year. Peak of the pandemic lockdowns happened in 2020. Data for February 2016 not included.Another way of visualizing this data is to show the cumulative registered users.
Graph 3: Cumulative registered users over time. Data for February 2016 not included.
From these graphs we can conclude that FAF continues to have a relatively stable influx of new players.User activity
However, registration does not equal activity. In fact, a significant portion of the users that register an account never log in, register a steam account, or play more than a few games. To visualize this, in the graph below accounts are divided in 7 bins: the accounts with no steam or GOG link, accounts that are linked but played 0 games, and accounts that played at least 1, 10, 100, 1000, and 10000 games. This data is then plotted as absolutely numbers of accounts (first graph) and as the proportion of the total accounts registered that month (second graph), visualizing the shift in total number of games played for the accounts over time.
Graph 4: User activity up to April 2025 by date of registration
Graph 5: Proportional user activity up to April 2025 by date of registration.For each month the accounts registered in that month are divided in categories based on the number of games they played up to now. These values are plotted as proportion of all accounts for that month.
A few things immediately jump out. Because newer accounts will have had less time to reach a lot of games, the proportion of games with higher number of games decreases as the accounts get younger. Those rare few accounts with more than 10,000 games, for example, are all registered before 2020, with most of them having been registered in the first 2 years of the FAF project.
Another noteworthy observation is that the proportion of accounts without an account link—that is, accounts that were registered but never were connected to a steam or GOG account and thus cannot play any games—has significantly increased since 2024. These accounts may not be genuine accounts, and instead be the result of attempts at breaking FAF services (related to the DDOS attacks FAF has been experiencing since around that time) or attempts to create accounts to spam the forums with spam and advertisements (as has been happening in the past).
Users and Players
It is noteworthy that there is a large proportion of users that have played fewer than even 10 games. To look at this further, it makes more sense to only look at ‘players’, which I here define as any account that has played at least one game.
Graph 6: Proportional player activity up to April 2025 by date of registration.This graph clearly visualizes how almost half of the playerbase have played fewer than 10 games: 44.2% of the total playerbase, and 49.5% of all players registered in the past 5 years. Moreover, this proportion seems to be increasing in more recent years, with as much as 60.6% in the last year.
For larger number of games this could again be explained as a result of new accounts not having had the chance to play a large number of games, but given that this data only includes accounts up to the 1st of March and thus all accounts are nearly 2 months old at the time of writing, it should not be excessive to expect players to have played 10 games.
Time to reach 10 or 100 games
That is an assumption, however. To verify this, the mean and median time to reach 10 or 100 games was calculated for all accounts that had played at least that many games. Some players represented outliers, such as those that registered accounts in 2012, but did not play more than a handful of games until more than 10 years later. Such outliers were removed by only analyzing the data from players who completed 10 games within one year of registering.
Table 1: Summary statistics for times to reach 10 or 100 games.
Year Time to 10 Games – Mean (SD) Time to 10 Games – Median (IQR) Time to 100 Games – Mean (SD) Time to 100 Games – Median (IQR) 2012 55.8 (87.5) 13 (58.1) 619 (887.9) 242.1 (599.5) 2013 37.2 (68.9) 7.8 (27) 645 (925.6) 191.6 (786.9) 2014 42.6 (77.2) 8.1 (31.3) 646.7 (850.8) 254.8 (810.5) 2015 45.9 (81.7) 8 (35.1) 628 (806) 267.1 (781.1) 2016 33.8 (70.1) 5.7 (18.8) 553.1 (701.1) 223 (732.5) 2017 35 (71.5) 6 (19.9) 501.8 (637.5) 202.4 (681) 2018 33.7 (70.7) 5.9 (19.1) 433.1 (543.5) 179.1 (587.9) 2019 38.3 (75.6) 6.3 (22.6) 360.7 (444) 166.2 (444) 2020 31.4 (66) 6 (18) 304 (382.2) 125.1 (375.2) 2021 33.3 (68.9) 6.2 (19.8) 264.6 (310.7) 132.9 (314.6) 2022 33.4 (68.2) 6.2 (19.8) 213.2 (249.6) 93.3 (276.7) 2023 34.3 (68.3) 7 (20.4) 174.5 (172.6) 104.9 (224.3) 2024 25.4 (52.6) 6.5 (16) 95 (86.7) 59.9 (92.7) All years 36.2 (71.6) 6.8 (22.6) 445 (646.4) 167.8 (494.9) A quick reminder for those who have forgotten their stats 101 course: the standard deviation (SD) measures the spread of data around the mean (the average). A high SD means there is a lot of variation in the data. When data has a lot of variation, it is often better to look at the median instead.
The median is the value that sits at the middle of the data. In this table, it represents the number of days it takes for half of the players to reach 10 or 100 games. The interquartile range (IQR) measures the spread around this middlepoint. It measures the range between the 25th percentile and the 75th percentile. Or, more simply said:
-
In 2024, the median time it took for players who played at least 10 games to reach 10 games was 6.5 days, with an IQR of 16 days. This means that 75% of all players that played at least 10 games, played them within 14.5 days after registration.
-
The median time it took for players who played at least 10 games to reach 100 games was 59.9 days, with an IQR of 92.7 days. This means that half of all players that played at least 100 games, played them within 106.2 days after registration. Incidentally, the fastest player on record to play 100 games after registration played those games within 0.2 days.
This data shows that there is a lot of variation in how active players are, but also shows that most players need about 6.8 days to play 10 games, and 167.8 days to play 100 games. The data from the table was plotted in the graphs below.
Graph 7: Distribution of time it takes for players to play 10 games, by year. Median marked.
Graph 8: Distribution of time it takes for players to play 100 games, by year. Median marked.From this data we can conclude that it is not unreasonable to expect accounts that are older than 2 months to have played at least 10 games. This means that the increase in the proportion of newly registered players in the last year that have not played at least 10 games to 60.6% is the consequence of other factors. There are several reasons for why this percentage has increased. One immediately obvious candidate is the connectivity issues that have plagued FAF in recent years.
While it is impossible to definitively determine what the reason is that players quit early on, at least without having these players full in a survey or otherwise directly asking them why they stop, we can have a look at some other data to see if the connectivity issues are the cause. The number and proportion of single-player games, coop games, and multi-player games may offer some insight, for example. Similarly, comparing the average duration for which an account stays active between players that near-exclusively play single-player and multi-player, as well as the change in this average over time, may be interesting. This data will be looked at later.
-
-
Great and long-waited work.
To visualize this, in the graph below accounts are divided in 7 bins: the accounts with no steam or GOG link, accounts that are linked but played 0 games, and accounts that played at least 1, 10, 100, 1000, and 10000 games.
Could you make the same graph but count only ranked (rated) games?
-
More to come over the days as I get around to finishing some graphs I already had ready, but the weather is nice this weekend so we'll see if I have the time.
@Sainse said in FAF Statistics Megathread 2 Statistics Boogaloo:
Great and long-waited work.
To visualize this, in the graph below accounts are divided in 7 bins: the accounts with no steam or GOG link, accounts that are linked but played 0 games, and accounts that played at least 1, 10, 100, 1000, and 10000 games.
Could you make the same graph but count only ranked (rated) games?
Yes, I didn't note it in the first post (I'll edit it later to include) but I also have all the end-of-game reports that include things like factions used, game validity, gametime, and whether it was ranked or not. I'll make some graphs related to that info soon. It'll be interesting to see the activity of rated games and compare them to co-op games and solo-with-AI games.