How exactly do we expect low rated players to play the game?

Nex

@nex said in How exactly do we expect low rated players to play the game?:

Though in FAF, if the rating discrepancy becomes too high and it would be straight up better for the lower players to give their base and not play, then it becomes hard at evaluating their contribution, but as long as they are contributing their rating will over time adjust to their real rating.

@clyf said in How exactly do we expect low rated players to play the game?:

This is the part I don't agree with. At a certain point bad players drop off the bottom of the influence spectrum.

yes, but it's like that in any other game too

clyf

No, because in the example of team slayer, bad players will continue to contribute (negatively, most likely) to the outcome of their team no matter how bad they are.

Nex

@clyf they could just stay at their spawn and do nothing, that might even be better than bad players hogging eco in FAF.
While not every bad player has "influence" on the game win (I think even in FAF bad players have at least some influence on the game even if they don't feel like that) they do have the opportunity cost of having someone better in that spot.

// Even read in Discord recently from some ~2k player (was it zwaffel??) that being matched together with a 1k against 2 ~1600 is just unwinnable. This is 2v2, but it translates to teamgames, just their contribution will be much lower.

Sylph_

I apologise for the following misunderstanding if people are talking about trueskill 2 here... I tried searching!

TrueSkill wasn't just invented to judge the teamslayer game mode. What made people think that? Are you sure you're not talking about trueskill2?

I've looked into trueskill (but admittedly not trueskill2), and the internet search results I'm putting in now about it are fraught with explanations that really seem to miss how much the ELO system is already doing!
There are straightforward ways to get a normal distribution out of the statistics, but without doing anything needlessly complicated (and ultimately pointless) you can summarise what trueskill added to ELO as simply as:

Trueskill was designed basically to be the ELO system, with 3 major differences (and they really do 'oversell' how different it is from ELO, imo!)

One "change" (if you can even call it that) was to have the 'k factor', (K-factor: the uncertainty used in ELO systems, essentially how much you're 'gambling' on the current game, typically higher for new players, lower for veterans, to help zero-in on skill quickly), to be a lot more granular and dynamic (many chess organisations did this already, to varying extent, so it's hard to even call this a trueskill change, but trueskill does seem to introduce an enlargement of the K-factor from unexpected results, so I'm giving them credit there...). This really isn't as clever as it sounds - you just go from a k-factor of '24 for a new player, 12 for someone we know', to 'K-factor is 24 to start, goes down by 5% every time a match ends, unless it didn't end how we predicted, in which case it goes up by 5%'... That level of simplicity works just fine (though microsoft 'tarted it up' a bit.)!
Another, rather simple change was to factor in team games (And it's really not hard to make the ELO system do this - we're talking addition and division with an adjustment factor based on rating 'spread'; though many games also add a separate factor for premades.)
Third (and I think this is both the biggest, and most 'brilliant' change): it was designed to make players 'feel' better about their rating, by ensuring that it trends upwards at the start of a player's career - and this change is even easier to achieve - you just hide their actual ELO, and instead display whatever rating you're relatively sure they're better than! (We've already done all the work with the dynamic K-factor, which as I said many ELO systems also already did!)

It isn't some special complex system that microsoft designed purely for teamslayer Halo games. Now it wouldn't surprise me if a certain gamemode used a variant, but it would shock me if a game used anything other than victory, defeat, and draw as possuble after-game result (pipe up, I'm eager to learn!)
Trueskill is essentially the ELO system with a bit of stuff that tons of people were already doing with the ELO system, and an added ego boost!
@clyf Trueskill2 is doing more, factoring in more than wins and losses, from my understanding (I haven't seen the maths behind it). Again, I apologise if this is what was being discussed and I missed it! Could that be what you're talking about here?

.

As for why supreme commander can't base skill rewards on things other than games won; @nex summed it up well when talking about what behaviour you want to reward players for, aka what the 'intended' behaviour you want to encourage is.

It's true that basing the system on a factor less binary than 'win or lose' would probably zero-in on a rating faster than this, but it would be (quickly) 'zero-ing in' on the WRONG rating.

Rewarding the idiot that gladly suicides their entire army and commander to kill the red-health ACU that was about get strat-bombed regardless?
(Or so many similar situations) : you don't have to play many moba games to realise just how eager players are to act this way if a number next to their name is involved!

I've written rating systems for team-based strategy games before, and tried all sorts of metrics. Games won has to be the only relevant metric for a few reasons, but the best I can do if those explanations don't feel like 'enough' is just saying I promise you - I've tried other metrics, they don't represent actual skill nearly as well.

tldr: There's no point 'zero-ing in' more quickly if we're zero-ing in on the wrong rating.

(And tbh, part of the success of the trueskill system is down to how players have more fun during the zeroing-in process! So it doesn't really need reducing in many ways!)

Nex

@sylph_ The teamslayer mode was just an example where different rating mechanisms (KDA) might be applied. I also only specificall referred to the tests they did with trueskill on the Halo 2 dataset (including team games) that were mentioned in the paper.
No idea what they currently use for team slayer or what trueskill 2 is.

clyf

@nex

Even read in Discord recently from some ~2k player (was it zwaffel??) that being matched together with a 1k against 2 ~1600 is just unwinnable.

If it's impossible for a ~3000k team to win against a ~3200k then the score isn't a good predictor of team composite performance. Do you understand that?

Nex

@clyf that is not the fault of the rating system per se, but more of the different environments the players play in. because if they'd play more games together then the ratings would start to match their real skill for that gamemode again. It's the same how some players get to 3.6k rating.
Rating only really makes sense if there is sufficient exchange between the whole playerbase.
And of course the high volatility in the performance of a low game count 1k player. might work well might not and also 50% winrate is considered an autoloss for the average player since for a game to feel balance you need a 66% winrate, which is impossible for everyone.
But in this game the low rated player has a high impact on the game, if it's actually unwinnable is something that can and should be questioned.

Sylph_

@nex said in How exactly do we expect low rated players to play the game?:

No idea what they currently use for team slayer or what trueskill 2 is

Trueskill 2 is a proposed system that includes more metrics (like kills) than just win/loss/draw in its statistics considered.
(unlike trueskill)
which paper you read might be very important here, since trueskill (1) has been released, and assigns skill based ONLY on whether a game was won, lost, or drawn (no KDA or ingame score or frags or anything).
Trueskill 2, I believe, still has its maths 'hidden' (and all the overhyping that such a move enables!) This might have changed since I last read about it a few years back, though.

As I understand it, trueskill 2 is geared towards FPS games with scores, which is why I thought it might be what was being referenced here regarding 'teamslayer modes' and the like. At the very least, it's confirmed to take other factors besides a binary victory/defeat into account.

clyf

@nex

that is not the fault of the rating system per se, but more of the different environments the players play in

If there's one environment where the player is rated at 1000 and winning 50% of their games (steady state, 4v4 with 4*1k/4*1k rated players in game), and another where a player is rated at 1000 and getting crushed (2k,1k/2*1.5k), and no further distinction is made between those two environments, then their score is converging on a different point for each environment, and no amount of games played will result in their score converging on a single point.

(emphasis for emphasis, not to be a dick)

Nex

@clyf my point was not that the statement "2k/1k vs 1.5k/1.5k is unwinnable" is true, but that the opposite of your statement "low ranked players have no influence on the match" also exists and the truth lies somewhere in between.
The reason most 2k+ players don't play with <1.5k players together is that it immediately ruins the game, which would not be the case if they had no impact. Lower rated players are just a lot more inconsistent than high level players, which makes a game more of a coin flip.

clyf

@nex

My original statement in regards to how TrueSkill evaluates expected team performance:

At higher level games, where players have the knowledge and wherewithal to identify and exploit weaknesses, it seems the weak link is the deciding factor.

At lower level games, where players are on average less able to identify and exploit weaknesses, it seems the strong link is.

To refine the entire point: TrueSkill assumes that the relationship between individual performance and composite term performance is mathematically linear, while empirical evidence in FAF suggests that it is not.

clyf

Okay, here's our problem--

I said:

your kills and deaths being reflected directly in the game deciding score

You said:

I can tell you that is wrong. The only relevant metric is you won

What you thought I was saying was:

your kills and deaths are used to calculate your TrueSkill rating

... whereas what I actually meant is:

Your kills and deaths [are summed with the kills and deaths of your teammates] to determine the score [which determines who wins the game, which is then used to calculate TrueSkill]

At no point did I say that any metric other than win/lose was used in the TrueSkill calculation.

Nex

@clyf said in How exactly do we expect low rated players to play the game?:

At no point did I say that any metric other than win/lose was used in the TrueSkill calculation.

Yes, but you suggest that this is a metric that shows how good you are / how much you contributed to the win, which is false as there are parts to the game that are not reflected in that score. (like killing an enemy about to kill your teammate, which in turn generates more kills)

@clyf said in How exactly do we expect low rated players to play the game?:

To refine the entire point: TrueSkill assumes that the relationship between individual performance and composite term performance is mathematically linear, while empirical evidence in FAF suggests that it is not.

As soon as teamplay is involved the performance of the team is no longer simply the sum of the individuals, otherwise it wouldn't ever make a difference if you play with a friend vs. you playing with random people on your team.
But assuming a linear relationship is fine as it evens out with enough games, unless someone intentionally plays with higher rated players and takes the spot with the least influence on the game, but that's a problem of custom games in general. Sometimes your spot makes the difference sometimes it doesn't. Sometimes you are even the high rated player in a lobby.

The rating system makes a statement about the statistical distribution of wins/draws/losses over time. So a game can feel absolutely unwinabble if you play 1k/2k vs 1.5/1.5, but that is just part of the randomness involved and you get that feeling because suddenly your own contribution is lower than you expect since a lot hinges on the fact if the worse player has a good day or not.

clyf

@nex

metric that shows how good you are / how much you contributed to the win, which is false as there are parts to the game that are not reflected in that score

You don't understand what a metric is.

Are you Armistice840 on the discord?

RedX

How the hell did this thread veer so far off course? Shoo.

clyf

I wanted to downvote you but I'm going to make the right choice and bail on this whole conversation instead.

Sylph_

@clyf I get what you're saying.
Many teamgame systems, on top of just adding up the total skill of each team, also add modifiers for various factors (I'm sure you know this, but I'm just detailing what I'm talking about before exploring further)... Eg, modifying the calculated 'rating' for a team, both for matchmaking and post-game skill adjustments, for a large rating spread (1000+3000 vs 2000+2000).
Another common adjustment is for pre-made teams (huge* in MOBA games).

I understand your (Clyf) point about how such modifiers might be harder given the nature of high-skill vs low-skill games, and how a large rating spread can be an advantage in one place, and a disadvantage in another; but I don't think it's impossible to overcome this, by plugging in more variables... Eg. make the adjustment based on both the spread, and the average rating, or even factor in maximum or minimum rating.
I don't envy whoever tries to do it though! I'd expect people to have vastly different opinions on where to start, and I'm not sure whether there's enough data to find a perfect 'maths only' approach!

I assumed FaF already used these kind of modifiers, but I admit that was perhaps a bad assumption to make! It might be relevant to ask a developer about it?

Nex

@clyf said in How exactly do we expect low rated players to play the game?:

Are you Armistice840 on the discord?

I take that as an insult.

But anyway

@clyf said in How exactly do we expect low rated players to play the game?:

I wanted to downvote you but I'm going to make the right choice and bail on this whole conversation instead.

Guess we can just agree to disagree on the performance of the rating system.

@redx said in How exactly do we expect low rated players to play the game?:

How the hell did this thread veer so far off course? Shoo.

Not sure if we actually went that far off
The whole discussion about the rating system was, because the reason for higher rated player to have rating requirements on their game is because (they think) mixing in with lower rated players destroys their experience. So we were discussing if this was a problem with the rating system (so a 2v2 of 1k/2k vs 1.5k/1.5 is truly just unbalanced) or if this is just because lower rated players inherently play more unreliable.
If this was simply a problem in the rating system it could be changed. Then games with higher rating discrepancies would be more balanced and thus more fun.

@sylph_ said in How exactly do we expect low rated players to play the game?:

make the adjustment based on both the spread, and the average rating, or even factor in maximum or minimum rating.
I don't envy whoever tries to do it though! I'd expect people to have vastly different opinions on where to start, and I'm not sure whether there's enough data to find a perfect 'maths only' approach!

Yeah while this sounds like a nice idea, I don't think there will be a mathematical solution that is consistent in it's predictions (even less so, when you go the step further and try to incorporate map slot influence).

@sylph_ said in How exactly do we expect low rated players to play the game?:

I assumed FaF already used these kind of modifiers, but I admit that was perhaps a bad assumption to make! It might be relevant to ask a developer about it?

I think there is a premade bonus for TMM, but nothing aside from that I think.

clyf

@nex

I apologize, it was a low question.

I agree I don't think we strayed too far from the main topic.

A metric is a quantifiable measure of the system, but in practice does not (and cannot) account for every element of a system. Win/lose, kills:deaths, the number given by the TrueSkill system--none of these account for everything that can happen, yet all are metrics.

Instead of belaboring this point further, I think the way forward is to experiment with either A. introducing additional metrics into the score calculation a la TrueSkill 2 or B. modifying the score calculation function to something other than a linear sum. Let's conclude here until we have something to discuss in that regard.

Nex

@clyf said in How exactly do we expect low rated players to play the game?:

A. introducing additional metrics into the score calculation a la TrueSkill 2

I think that's off the table (ftx also often rants about how this would be a bad idea), because players play the rating system and whichever metric you use to calculate their rating, the players will try to maximize that and any metric aside from win/loss will inevitably warp the game and might even cause toxicity within the team. (kills/eco/quick win/score, all of these will lead to certain kinds of abuse)

@clyf said in How exactly do we expect low rated players to play the game?:

B. modifying the score calculation function to something other than a linear sum

While good in theory (as I already mentioned in my response to sylph_), I don't believe there is a mathematically sound solution to this, as it is very opinion based.

I think the problem is also that in custom games the your contribution-rating ratio is very "random" since there is no control how/where you got that rating and how it compares to what you are playing now.
And in ladder the sample sizes are quite low, since there aren't that many games played and almost no high level players queue ladder/tmm.

Sidenote:
@clyf said in How exactly do we expect low rated players to play the game?:

A metric is a quantifiable measure of the system, but in practice does not (and cannot) account for every element of a system. Win/lose, kills:deaths, the number given by the TrueSkill system--none of these account for everything that can happen, yet all are metrics.

I guess me calling these metrics is wrong
Mathematically the metric we want is the players ability to play the game and cooperate with their team well, because that directly influences the games outcome.
But we can't measure that directly so we approximate this by using other "metrics" and any "metric" where A > B (in the skill metric) and A <= B (in the the approximation metric) is just a wrong metric to me.
For win/loss you could probably argue that this is also the case for certain games, but it will average out given enough games. So a player that consistently wins more games than he looses is better than his opponents were, while a player that consistenly has a high kda is not necessarily better than his opponents were.
(Just so you understand where I'm coming from. We should probably cut this discussion here and accept we have different definitions/assumptions)