How exactly do we expect low rated players to play the game?

@nex

If the model for team performance doesn't reflect individual performance, you're just averaging out noise. Consider team slayer: your individual performance is far more impactful (with your kills and deaths being reflected directly in the game deciding score) than two ~400 ranked players sitting in the back of an otherwise 1000-1500 ranked match.

it is less efficient at evaluating players that play in teams

What's the basis for you believing this if you don't agree with the above?

@Sylph_

all their stuff get's given to stronger players

Heavily dependent on share mode, yes. And yes, usually it's the stronger player that has the ability to finish the game out that plays a critical role.

Wow lots of discussion about score and rating here.

An important thing that makes FAF different from a shooter game (on which much of this score research has been done) is that the map - and the finite resources on the map - matter more in FAF. Everybody knows that if you kill the 900 player in a 900/1500/1900 team fighting a 1400/1400/1500 team, that you are doing that team a favor, particularly in full share, as the 1900 player will make far-greater use of the limited resources. I've seen at least one game in which the lower rated player gifts the higher rater player his base and hides his ACU in water for this exact reason. This is why "Share Until Death" became meta in the first place. (In unmodded SupCom, there were no restrictions on share... either you gifted your base manually when you died, and your team got it, or you didn't do this and they didn't.)

@funkoff As a somewhat new player 'looking in', the current 'share until death' mechanics seem really, really problematic!
I can see how high-rated players like it, but honestly it seems like a disaster for matchmaking.

(I have similar 'noobie' misgivings about being able to gift engineers to allies... I guess at least that makes faction balancing easy, since they're all basically the same! :D)

If you think it’s problematic or a disaster for matchmaking you haven’t realized

  1. how much of an issue d/c’s can be in faf
  2. how little map variation is allowed to make share until death remotely playable
  3. How one note the tactics are

T2 air is considered OP in full share where killing any ACU doesn’t carry an innate 20k of mass killed in infrastructure at min 10.

@clyf said in How exactly do we expect low rated players to play the game?:

with your kills and deaths being reflected directly in the game deciding score

Without knowing much about professional Halo(that's what team slayer is reffering to?) I can tell you that is wrong. The only relevant metric is you won, rating players based on their kills turns the team game into a FFA where there are some people you can't shoot.

@clyf said in How exactly do we expect low rated players to play the game?:

performance is far more impactful [...] than two ~400 ranked players sitting in the back of an otherwise 1000-1500 ranked match.

Impact on the game ofc differs by rating and position, but over several games there will be bad players in impactful positions in both teams.

@clyf said in How exactly do we expect low rated players to play the game?:

If the model for team performance doesn't reflect individual performance

ofc the outcome (win or loose) depends 100% of the performance of each player, so the metric measures to a certain degree the performance of each player no matter how much they contributed, but as the number of players go up the noise (as you also mention) goes up, but over several games that noise gets evened out by putting that player in different teams.
So if team A wins, you know that team is (probably) better, but you have no idea if player 1 was better than 2 or 3 was better than 4, to find that out you need more games with different permutations of the players

@blackyps said in How exactly do we expect low rated players to play the game?:

@redx said in How exactly do we expect low rated players to play the game?:

How are people going to get better when they're lucky to get 1 or 2 decent games in a night? I understand not wanting to play with terrible players, but we also need to be realistic about how many players are actually on at a given time and what's realistic to expect if want a health playerbase.

Low rated people are be the biggest rating bracket. I don't know why we don't see loads and loads of noob lobbies. Maybe these people don't feel confident to host a lobby. Maybe they all play in the matchmaker instead, after all that was a big reason to create it. But you already said that the matchmaker is dead in your timezone, so honestly I have no idea where the low rated players in your timezone are hiding. But they must be somewhere.

I'm not sure where all the low rated players go either. Even in all welcome lobbies it's usually a bunch of 1k-1600 players and a couple 800s or so. Maybe they all hide in gap, I tend to just ignore gap lobbies altogether.

Side note, can we keep the full share debate to another thread and keep from sidetracking this one too far?

@nex

"professional" Halo
team slayer
I can tell you that is wrong

Well, good that you A. don't have the faintest idea that I'm talking about B. couldn't take 30 seconds to google it and C. were still confident enough to tell me that I'm wrong!

I'd imagine this "lost" group of low rated players are just fighting AI. Stress free, toxicity free, etc. Venturing into playing against other people is a big step.

@clyf I googled 🤔 that's how I got that "team slayer" is some halo gamemode where 2 teams compete to get 50 kills first and it fit into the rest of your statement, so I assumed that's what you meant.
But it doesn't matter which game this is about. 4v4 FAF is about the first team to reach 4 ACU kills, but rating players based on how many ACUs they killed would be stupid.

@nex

but rating players based on how many ACUs they killed would be stupid

Exactly! Why does rating by number of kills make perfect sense for team slayer, but would be stupid for FAF?

@clyf said in How exactly do we expect low rated players to play the game?:

Why does rating by number of kills make perfect sense for team slayer

It doesn't because it disregards players actually playing in a team.
what if you have some supportive role that allows your team to get more kills?
What if you secure important map positions that hold good weapons?
Just by rating players by this metric you change the game from a team game into a single player target shooting game, where your "teammates" are your opponents and your "opponents" are just targets you are competing to shoot. That's where a lot of toxicity in online teamgames comes from (see league of legends and the term "killsteal").
Players never play what you believe your game is, they play what the rating system tells them what the game is. So if your rating system tells them the more kills you get the better, then they will do everything to get more kills themselves and won't do anything else that could help their team. some would even start shooting teammates if that didn't have penalties, just to get more kills to themselves.
Consider hurdling: The runners don't avoid the obstacles because the games designer envisioned it so, but because there's a penalty on their rating when they don't.

The rating system is the actual game in which you compete. The "game" is just the means to do so.

what if you have some supportive role that allows your team to get more kills?
What if you secure important map positions that hold good weapons?

My brother, the name of the gamemode, the gamemode that TrueSkill was invented to judge, is called slayer. It is judged by the number of kills you get, and how few kills you give up to the enemy team.

The good weapons allow you to get more kills. Not a ton of support interactions in team slayer (CTF, oddball, another story). Definitely different in other shooters, but in Halo you support your team by helping to kill your opponents before they can kill your teammates.

TrueSkill judges team performance as a sum of individual performance because, in a game where you get points by killing the other team and the other team gets points by killing you, team performance literally just is a sum of individual kills and deaths in a way that doesn't relate to an RTS where map texture is much more important.

But we're getting way off track here. Why do you think TrueSkill is not as efficient at evaluating players on teams if you don't agree with any of the above?

@clyf like I said above since you only have data on the performance of the team not the individuals, since there is no 100% correct algorithmic way to tell how much someone contributed to the win. So to rate a single player accurately you need them to play with different teams.
So to find the correct rating for a player in team games, that player will need to play more games than they would need to if they only played 1v1.

You need to play more team games to rate a player accurately, but playing more team games doesn't guarantee you can rate a player accurately.

More games is not the issue, I think everybody recognizes that. How the system models how an individual player contributes to the success of the team is the issue.

If who your teammates are is the dominant factor for success, "your" rating will converge on the (shifting) average of the performance of who you are playing with.

@clyf said in How exactly do we expect low rated players to play the game?:

If who your teammates are is the dominant factor for success, "your" rating will converge on the (shifting) average of the performance of who you are playing with.

But who the deciding factor is differs between games and as there are bad players in both teams the team with the less bad one is still more likely to win (assuming the "good" players are about equal).
The trueskill test was also done on halo 2 data and a player having 0 kills and 1 death probably less of a contribution than a player with 10 kills and 0 deaths, but their algorithm completly disregards that and only compares team performance and from that infers the players individual new skill estimate without creating some kind of player performance estimation from the game data. And according to their tests that worked and the ratings converged.

Though in FAF, if the rating discrepancy becomes too high and it would be straight up better for the lower players to give their base and not play, then it becomes hard at evaluating their contribution, but as long as they are contributing their rating will over time adjust to their real rating. Depending on teamsizes and the actual contribution they had this might take non feasible amounts of games though. But that's basically like a Halo player just sitting at the spawn to not die, which would also prevent the system from accurately rating him. (Unless you consider that this was just the optimal play in that situation so he is actually not that bad)

The problem is that it is simply impossible to design a rating system that accurately evaluates a players contribution without affecting how people play the game in unintended ways, unless your game is completely solved and you can precisely rate each move at each step. Unless you know the perfect path to victory you can't tell by how much a player diverged from it during the course of the game.

@nex

the team with the less bad one is still more likely to win

This is the part I don't agree with. At a certain point bad players drop off the bottom of the influence spectrum.

@nex said in How exactly do we expect low rated players to play the game?:

Though in FAF, if the rating discrepancy becomes too high and it would be straight up better for the lower players to give their base and not play, then it becomes hard at evaluating their contribution, but as long as they are contributing their rating will over time adjust to their real rating.

@clyf said in How exactly do we expect low rated players to play the game?:

This is the part I don't agree with. At a certain point bad players drop off the bottom of the influence spectrum.

yes, but it's like that in any other game too

No, because in the example of team slayer, bad players will continue to contribute (negatively, most likely) to the outcome of their team no matter how bad they are.

@clyf they could just stay at their spawn and do nothing, that might even be better than bad players hogging eco in FAF.
While not every bad player has "influence" on the game win (I think even in FAF bad players have at least some influence on the game even if they don't feel like that) they do have the opportunity cost of having someone better in that spot.

// Even read in Discord recently from some ~2k player (was it zwaffel??) that being matched together with a 1k against 2 ~1600 is just unwinnable. This is 2v2, but it translates to teamgames, just their contribution will be much lower.

I apologise for the following misunderstanding if people are talking about trueskill 2 here... I tried searching!

TrueSkill wasn't just invented to judge the teamslayer game mode. What made people think that? Are you sure you're not talking about trueskill2?

I've looked into trueskill (but admittedly not trueskill2), and the internet search results I'm putting in now about it are fraught with explanations that really seem to miss how much the ELO system is already doing!
There are straightforward ways to get a normal distribution out of the statistics, but without doing anything needlessly complicated (and ultimately pointless) you can summarise what trueskill added to ELO as simply as:

Trueskill was designed basically to be the ELO system, with 3 major differences (and they really do 'oversell' how different it is from ELO, imo!)

  • One "change" (if you can even call it that) was to have the 'k factor', (K-factor: the uncertainty used in ELO systems, essentially how much you're 'gambling' on the current game, typically higher for new players, lower for veterans, to help zero-in on skill quickly), to be a lot more granular and dynamic (many chess organisations did this already, to varying extent, so it's hard to even call this a trueskill change, but trueskill does seem to introduce an enlargement of the K-factor from unexpected results, so I'm giving them credit there...). This really isn't as clever as it sounds - you just go from a k-factor of '24 for a new player, 12 for someone we know', to 'K-factor is 24 to start, goes down by 5% every time a match ends, unless it didn't end how we predicted, in which case it goes up by 5%'... That level of simplicity works just fine (though microsoft 'tarted it up' a bit.)!
  • Another, rather simple change was to factor in team games (And it's really not hard to make the ELO system do this - we're talking addition and division with an adjustment factor based on rating 'spread'; though many games also add a separate factor for premades.)
  • Third (and I think this is both the biggest, and most 'brilliant' change): it was designed to make players 'feel' better about their rating, by ensuring that it trends upwards at the start of a player's career - and this change is even easier to achieve - you just hide their actual ELO, and instead display whatever rating you're relatively sure they're better than! (We've already done all the work with the dynamic K-factor, which as I said many ELO systems also already did!)

It isn't some special complex system that microsoft designed purely for teamslayer Halo games. Now it wouldn't surprise me if a certain gamemode used a variant, but it would shock me if a game used anything other than victory, defeat, and draw as possuble after-game result (pipe up, I'm eager to learn!)
Trueskill is essentially the ELO system with a bit of stuff that tons of people were already doing with the ELO system, and an added ego boost!
@clyf Trueskill2 is doing more, factoring in more than wins and losses, from my understanding (I haven't seen the maths behind it). Again, I apologise if this is what was being discussed and I missed it! Could that be what you're talking about here?

.

As for why supreme commander can't base skill rewards on things other than games won; @nex summed it up well when talking about what behaviour you want to reward players for, aka what the 'intended' behaviour you want to encourage is.

It's true that basing the system on a factor less binary than 'win or lose' would probably zero-in on a rating faster than this, but it would be (quickly) 'zero-ing in' on the WRONG rating.

Rewarding the idiot that gladly suicides their entire army and commander to kill the red-health ACU that was about get strat-bombed regardless?
(Or so many similar situations) : you don't have to play many moba games to realise just how eager players are to act this way if a number next to their name is involved!

I've written rating systems for team-based strategy games before, and tried all sorts of metrics. Games won has to be the only relevant metric for a few reasons, but the best I can do if those explanations don't feel like 'enough' is just saying I promise you - I've tried other metrics, they don't represent actual skill nearly as well.

tldr: There's no point 'zero-ing in' more quickly if we're zero-ing in on the wrong rating.

(And tbh, part of the success of the trueskill system is down to how players have more fun during the zeroing-in process! So it doesn't really need reducing in many ways!)