How exactly do we expect low rated players to play the game?

@nex

I apologize, it was a low question.

I agree I don't think we strayed too far from the main topic.
 
A metric is a quantifiable measure of the system, but in practice does not (and cannot) account for every element of a system. Win/lose, kills:deaths, the number given by the TrueSkill system--none of these account for everything that can happen, yet all are metrics.

Instead of belaboring this point further, I think the way forward is to experiment with either A. introducing additional metrics into the score calculation a la TrueSkill 2 or B. modifying the score calculation function to something other than a linear sum. Let's conclude here until we have something to discuss in that regard.

@clyf said in How exactly do we expect low rated players to play the game?:

A. introducing additional metrics into the score calculation a la TrueSkill 2

I think that's off the table (ftx also often rants about how this would be a bad idea), because players play the rating system and whichever metric you use to calculate their rating, the players will try to maximize that and any metric aside from win/loss will inevitably warp the game and might even cause toxicity within the team. (kills/eco/quick win/score, all of these will lead to certain kinds of abuse)

@clyf said in How exactly do we expect low rated players to play the game?:

B. modifying the score calculation function to something other than a linear sum

While good in theory (as I already mentioned in my response to sylph_), I don't believe there is a mathematically sound solution to this, as it is very opinion based.

I think the problem is also that in custom games the your contribution-rating ratio is very "random" since there is no control how/where you got that rating and how it compares to what you are playing now.
And in ladder the sample sizes are quite low, since there aren't that many games played and almost no high level players queue ladder/tmm.

Sidenote:
@clyf said in How exactly do we expect low rated players to play the game?:

A metric is a quantifiable measure of the system, but in practice does not (and cannot) account for every element of a system. Win/lose, kills:deaths, the number given by the TrueSkill system--none of these account for everything that can happen, yet all are metrics.

I guess me calling these metrics is wrong🤔
Mathematically the metric we want is the players ability to play the game and cooperate with their team well, because that directly influences the games outcome.
But we can't measure that directly so we approximate this by using other "metrics" and any "metric" where A > B (in the skill metric) and A <= B (in the the approximation metric) is just a wrong metric to me.
For win/loss you could probably argue that this is also the case for certain games, but it will average out given enough games. So a player that consistently wins more games than he looses is better than his opponents were, while a player that consistenly has a high kda is not necessarily better than his opponents were.
(Just so you understand where I'm coming from. We should probably cut this discussion here and accept we have different definitions/assumptions)

Systems that revolve around in game metrics are used, but the firms that use them specifically blackbox the exact calculations and people just guess what the potential metrics could be/are weighted as. This isn’t exactly an option in an open source project and so you’re immediately going to have people making maps that give them a bias towards gameplay that boosts their rating or playing in a way that does it.

Too much player freedom on FAF to ever allow metric based ratings.

@nex

I don't know, kind of feels like we're exiting the vampire castle.

A rigorous analytical solution for a better team rating is beyond my mathematical ability without some heavy research, but that doesn't mean there isn't one floating around out there.

Also, given the setup we have, it's also possible to silently test alternative systems (TrueSkill was tested in the same way against Elo, graded based on their percent of wins predicted correctly).

Mathematically the metric we want is the players ability to play the game and cooperate with their team well

I would refer to that as a goal/outcome--but now I know where you're coming from. And I agree, that is the outcome I'm looking for.

I can spitball about what metrics would be good for FAF (in game score? mass efficiency? kills?) but can't confidently say that using any of them wouldn't be abused/go sideways.

@FtXCommando

A meta-level correction would be to incorporate map statistics. Map gen maps have a larger effect, maps that other people play a lot have a larger effect (and vice versa), playing the same map over and over has a smaller effect*.

*Could grade this based on size, terrain, and distribution of mass spots to catch all the gap clones.

EDIT: After asking my friend Mr. GPT-4, looks like TrueSkill2 is a good starting point for all the above.