Lets Talk about Rating

Caliber

Suggestion ; reduce the amount of rating gained or lost from each game, curently averages out for me about 10 points per game.

Aim ; Reduce the large amount of rating variation players can experiance through randomness

why ;

This is my rating chart, its an example of the huge amount of rating variation a player can experiance, I havent changed the way I play or gotten any better or worse from what I can tell its purely a result of the random encounters in team games.

reducing the points lost or gained from each game will reduce the massive variation experianced and will produce more stable and accurate representation of player skill through rating and produce more equaly balanced games overall. Meaning that a player will have to play consistantly good or bad in order to gain or lose points and not just be the result of a few badly balanced games.

Brutus5000

Sure. Let's ignore all the efforts of scientific analysis by professional statisticians that microsoft paid to develop the Truskill system and trust your gut feeling instead.

Rating gain/loss is not a reward/punishment for winning/losing.

Also the rating number you see is just an simplified approximation of what the system uses internally.

This was discussed a million times. We are not going to fiddle with the Truskill algorithms and nobody came up with a better system than Truskill yet (e.g. ELO isn't better suited for our use case)

Nomander

@Caliber The parameters are tuned using this forum post: https://forums.faforever.com/viewtopic.php?f=45&t=11698 It's much higher effort than yours so I'm inclined to trust it more. No offence, just letting you know what you're competing with.

Your graph doesn't have any axes. Here's your global rating for the past year (the client's graph grid is very faint unfortunately):
7a269d82-86e4-4d97-9b90-59b9ae8dbe09-{C705BFDC-26EB-4E34-9BC5-D2ADECA45E5B}.png

You variance of 300 rating takes ~30 net losses (with the 10 rating change/game you gave yourself). That's a number of games that I think cancels out any "random encounters" in team games. I've had large rating swings as well, and in my experience it's due to changing the group of players I play with, the map type, or just being in a different mood in terms of playstyles (strong aggressive t1-t4 unit use vs weak ultra-greedy eco or overly aggressive t1 spam). Maybe some of that sounds familiar to you? I don't want to thoroughly analyze your replays, but I did notice that you were playing much less 3v3/4v4 when you were higher rated global. That could make you naturally gravitate to a playstyle that is better for global mapgens with high rating disparity among the players, which is different from the smaller, lower eco, lower imbalance 3v3/4v4 games.

ZLO

I always thought that bumping up uncertainty slightly before every game is the FAF invention / implementation of TrueSkill. (Afaik that is why very high rated player can lose a tiny bit of rating after winning against low rating player)

Why? because there were tons of complains about rating moving too slowly and getting very stale over time

Caliber

@Brutus5000 I dont beleive I mentioned any desire to change true skill.

Rather slowing it down a bit

this would mean that in my situation my chart would look more like these

Without such a large change in rating so quickly

Strydxr

I've heard enough, one gorbilgillion rating reduction for non Faf-Elite.

IndexLibrorum

@Caliber said in Lets Talk about Rating:

@Brutus5000 I dont beleive I mentioned any desire to change true skill.

Rather slowing it down a bit

Huh?

FtXCommando

FAF’s system is the result of fiddling with the results of professional statisticians. Our tau is like triple the value those professional statisticians recommended which is what this guy is complaining about. Still a good change though, otherwise people would be gaining or losing 3 points a game. Supposed to win 30 games in a row to climb 100 rating?

Caliber

@FtXCommando said in Lets Talk about Rating:

FAF’s system is the result of fiddling with the results of professional statisticians. Our tau is like triple the value those professional statisticians recommended which is what this guy is complaining about. Still a good change though, otherwise people would be gaining or losing 3 points a game. Supposed to win 30 games in a row to climb 100 rating?

Yes pretty much bang on, seeing as the average rating increase/decrease for a standard team game of 10 players results in a 10 points increase or decrease and you describe it as being a 3 point change then perhaps a middle ground may be a suitable alternative.

also seeing as the outcome of the game becomes vastly more difficult to influance the players exist in the game maybe create a larger change based off of player numbers in game, such as I see points in a 4v4 change being at 11 and larger team games can still be 10 a smaller change based of off the number of players maybe a worthwile thing to took into, perhaps more like 2 points per player in game decrease so like this,

1v1 15 points
2v2 13 points
3v3 11 points
4v4 9 points
5v5 7 points
6v6 5 points

obvisously this excludes player uncertainty calculations in this example but you get the idea

as an example, seeing as the lower the player count the more you impact the game and the rating change would be more inline with individual performance.

@Nomander in the link you gave the guy that created the "optimimum" parameters (axle) even states that is is very volitile, which is exactly what im trying to say

Caliber

This post is deleted!

FtXCommando

You can’t do what you’re talking about because you’re using shown rating for these values and trueskill as a system doesn’t care about the shown rating value. Every trueskill adjustment is a tinkering of your mu and your sigma, saying to “just” adjust it by X amount based on the size of the game is impossible because trueskill doesn’t internally communicate like that.

If you wanted a lower tau than argue for it to be 2% of sigma rather than 3%.

Skrat

Hi!
If you want to reduce rating variations, play several global 1x1 rating games. For the next few months, your rating variations will be like this. xD

Nomander

@Caliber said in Lets Talk about Rating:

@Nomander in the link you gave the guy that created the "optimimum" parameters (axle) even states that is is very volitile, which is exactly what im trying to say

The purpose of trueskill is to correctly predict games, which includes getting players up to their skill level quickly and having the rating value be accurate enough after reaching that level. If you can prove that some parameter adjustment improves its predictions, that would be convincing. Axle linked a github for his work, so it should be possible to adapt for someone savvy.

rampeer

I wanted to pick it up, as I have some expertise (once I made a program that computes Elo rating for chess puzzles), and wanted to contribute to FAF somehow.

But realized that it's hard to devise any sort of controlled experiment. Some players get better over time, some players get worse, expecting the rating to be stable is wrong, and it's impossible to differentiate between rating drift due to misconfiguration and . Maybe let different AIs battle each other?..

But my hunch is that current change-per-game is too high. These zigzags are not normal, just look how rating graph looks for chess:

People stay in FAF for many thousands of games, so I do not buy "it's for people to get to their real rating quicker" argument.

Also, 4.5% draw probability is just wrong. Who has that many draws?

waffelzNoob

that's because chess doesn't have wildcard teammates and opponents with a rating of 1800 that can perform anywhere between 1200 and 2000. global rating varies so much because the performance of your opponents and teammates is almost completely random. if you play 1v1 ladder, like in chess, your rating won't go up and down more than 100 rating

rampeer

I do not believe you. 1v1 rating graph looks just as jagged as global:

I am certain it's possible to sacrifice a bit of ramp-up time (note spike on the left) for overall smoother graph and more stable rating.

Also, what about draw probability? It feels like all the coefficients are off; will look into it later.

maudlin27

One problem is when rating covers a wide range of games as one number

Most obviously an issue with global - you can play like a 1700 on one popular map and like an 800 on another
Making rating take far longer to adjust makes it in turn far harder for people to try different games/maps, and could hurt retention. It also takes longer for a returning player’s rating to realise they’re not as good after a multi-year gap and only playing infrequently vs when they were playing constantly.

So it feels better to err on the side of faster rating adjustments than slower (outside new players) to me. I also don’t see having a smoother graph as being all that big a benefit compared to the downsides.

waffelzNoob

@rampeer said in Lets Talk about Rating:

I do not believe you. 1v1 rating graph looks just as jagged as global:

I am certain it's possible to sacrifice a bit of ramp-up time (note spike on the left) for overall smoother graph and more stable rating.

Also, what about draw probability? It feels like all the coefficients are off; will look into it later.

that is a graph that deviates no more than 100 rating up and down, as i said. the rating is fairly solid around 1500-1700 and that is normal because humans are inconsistent. one day we play well, one day we don't. one day we run into an opponent whos having a good day, one day we don't. same thing happens in chess, where you also lose/gain 10 rating per game btw. and there is no problem with this system because going on a 10-game winstreak and only getting 50 rating sucks

and this 100 rating deviation is an entirely different scenario than what caliber described with his global rating - he lost 400, not 200 (i now see this 400 was actually exaggerated - he was 1850 and dropped to 1550, not 1900 to 1500).

Sainse

@rampeer said in Lets Talk about Rating:

Also, 4.5% draw probability is just wrong. Who has that many draws?

The draw probability is estimated by FAF to be 10%. The math behind it is already mentioned above. Lower draw expectation would actually increase the jumps.

Caliber

One other point i would like to raise that i forgot about in the opening statement, was that at least through my experiance is that most games seem to be won/lost quite heavily, indecating that although most games are rated at 90% + balance they often go so very heavily one sided, I would say that only around 1 in 10 games actually seem to be a good balance that last a while at least.

Lets Talk about Rating

See all my projects: