Upcoming rating changes: Gaining global from playing matchmaker

Katharsas · 28 Nov 2021, 12:53

@blackyps said in Upcoming rating changes: Gaining global from playing matchmaker:

So we effectively do not trust ladder rating?

No? How do you get that impression?

Because you are assuming that ladder rating is correct and global is not when both rating contexts have built up different predictions of skill difference between two players.

Inserting a prediction into the match that Trueskill never actually made (by replacing ONLY the opponents rating with their ladder rating) is bonkers!

It seems like there is a misunderstanding here. I'll give you a 2v2 example: When performing the adjustment for player 1 we take the tmm ratings of everyone and replace only player 1's rating with global. Then we let trueskill rate this game and if the conditions mentioned in the first post are met, save the new global rating for player 1. Then we do this with all other players. So for player 2 everyone else will have tmm rating again and player 2's global will be used.
We could maybe perform the adjustment with using only global ratings, but we don't know the global ratings of the players at all. The only thing we know is that their tmm ratings are similar enough to give a balanced game, so we use these. Also we have more trust in the matchmaker ratings than global as I explained above.

There is no misunderstanding here. You are using ladder rating numbers, which have no meaning inside the global rating context, as prediction to adjust the prediction of the global rating context, in fact mixing them inside the same calculation. That makes no sense.

If you adjust the global prediction you MUST only use global predicitions as input! The different rating contexts gain inflation and deflation independently of each other, as i have already sid, 500 in global is not guarantueed to be close to 500 in ladder!

Why is it important to know what their global ratings are?? You don't need to! Trueskill was made so you don't have to worry about this, just let it calculate inside its context and improve its own prediction!

I hope you plan to remove global at some point then because if you do it as it is currently planned you are no longer doing things that make sense for having a predictable well-behaved rating system, you are deep inside band-aid mode already.

Yes, I do not believe that global as it is now is a healthy rating system. There are too many variables players can control in custom games to influence their rating. You can see the result in people that are way higher rated than you would expect them to be. Removing global right now would make the gap, setons and astro players rise up in arms, so unless you have an idea how to solve their needs, global stays for now. In the long term people hopefully play the matchmaker more, so global becomes less relevant, but when it has become irrelevant it doesn't really matter if it gets removed or not.

And the interpolation i suggested would let you slowly phase out global by adjusting the global weight over time, which is another advantage it has.

So if i am correct this is the first time we really try to fuck with Trueskill itself!

I really fail to see what we are fucking with. Yes, this change will lead to "communication" between the ratings, but as we all initialize them with 1500+-500 I don't see a problem with that.

The problem is that every single player entering or leaving the rating context shifts that context ratings range! If a 2000 rated player stops playing global, then the global rating pool just lost 2000 rating! So right from the start the averge mean rating will start to diverge. The default values only guarantuee that deviation has the same meaning, mean rating is not predictably related!

The proposed change would also make simulating a change to global rating parameters with past data much harder i think, because global game outcomes are no longer enough to redo all global rating calculations.

Does anyone actually want to do this? This is a genuine question. I know that in the past it was demanded to do this to test changes to trueskill, but to me it seemed more like a ostensible reason to prevent changes to the rating system. There has been a change of the tau value in the past and I don't think it was simulated beforehand with all the rating data we have.

Yeah that Tau value was changed when we still had a person that had a lot of confidence in its own understanding of Trueskill. But if i remember correctly, Tau does not really do much, it basically just changes the average amount of mean rating gain/loss. So an very non-risky change to make.

Also, how about we (as a community) discuss such intrusive changes BEFORE they are implemented?

There has been discussion about this. I don't remember everything where it took place, but there was discussion on the issue: https://github.com/FAForever/server/issues/845
We also had a voice call with Morax as the player councillor, some developers and some ladder team members.

So no open discussion in the forum where such things need to take place if you want others than developers and councillors to give input on these things.

Katharsas · 28 Nov 2021, 13:10

@blackyps said in Upcoming rating changes: Gaining global from playing matchmaker:

I'm sorry, but at the moment it looks more like you are the one having strange ideas about the rating system. A player winning his first match will have a mean higher than 1500 this the way it has always been. That is not a problem, because when he keeps playing it's not like his mean will stay that way. When he loses against players with a much lower mean than him, his mean will also drastically reduce. In ladder this happens because of the rating interpolation we use for matching, in global this happens because people balance by displayed rating, so our "1600 mean conceiled by high deviation" guy would play in a game with let's say average 500 players and because he is new he won't win all of these games.

So we are potentially destroying the balance of 1 to maybe 5 global games (if deviation was reduced a lot it needs to build back up to where mean rating changes are bigger) whenever a ladder player enters global so that we can cleanup a shitty global mean rating that was basically produced out of thin air? Why is that an OK solution?

Why don't we just randomly generate a small number between -150 and 150, add that to starting rating and then make displayed rating in lobby always white? Simple, no discussion necessary, guarantueed to not have bad side effects.

There are various solutions to this problem, and the proposed one is not close to being necessary.

BlackYps · 28 Nov 2021, 13:11

So what specific problems do you think this introduces? I get that you think it messes with the integrity, but what issues will arise from that?

In my experience asking about complicated things on the forum doesn't really give good results. Most of the time you either get no answers or input that is not really well thought out. If there are in depth answers they are most of the times from people that are already contributing, so basically the developers and councillors you mentioned. I linked the github issue because I know that you are also a developer and more familiar with that. I don't expect regular players to browse github.
In a way the open forum dicussion was the retention thread were people repeatedly mentioned that it is very hard to get let into games when you are a new player. So we started working on a solution in the typical developer media. The implementation details of the solution are not really suitable to be discussed on the forum.

I don't know in what parts of the project you are contributing to, but if you are also a server developer how did you manage to completely miss the development of that pull request?

BlackYps · 28 Nov 2021, 13:18

So we are potentially destroying the balance of 1 to maybe 5 global games (if deviation was reduced a lot it needs to build back up to where mean rating changes are bigger) whenever a ladder player enters global so that we can cleanup a shitty global mean rating that was basically produced out of thin air?

What would happen when an established ladder player started playing global is that basically he would start with his ladder rating (because his global would float up until it is more or less equal to his ladder rating. It was not produced out of thin air) instead of 0 rating (assuming his ladder is under the 1400 threshold). How would that destroy the balance of the first global games more than starting with 0?

Why don't we just randomly generate a small number between -150 and 150, add that to starting rating and then make displayed rating in lobby always white? Simple, no discussion necessary, guarantueed to not have bad side effects.

Please don't drag this thread down into shitposting territory

Katharsas · 28 Nov 2021, 13:23

I am not a developer (anymore)? That is the point.

Also that was not a shitpost, but a genuine suggestion. What are the problems with it that make you think it was shitposting?

Im am just disappointed with bad and complex solutions being implemented in the face of substantially better possible solutions, two of which i have proposed here. In the past, rating related changes have always been extensivly discussed in the forum prior to being implemented. Can we not at least trying to discuss such things in the forum, even if we do not always end up with valuable input?

What would happen when an established ladder player started playing global is that basically he would start with his ladder rating (because his global would float up until it is more or less equal to his ladder rating. It was not produced out of thin air) instead of 0 rating (assuming his ladder is under the 1400 threshold). How would that destroy the balance of the first global games more than starting with 0?

No he would not start with his ladder rating! Because you can have the same displayed rating while having entirely different mean and deviation! Trueskill rating cannot be expressed in a single number if you want to actually manipulate it in a way that makes sense to Trueskill! For the god of love at least make the 3 conditions from initial post operate on mean rating instead of displayed rating!

BlackYps · 28 Nov 2021, 13:37

I am not a developer (anymore)? That is the point.

Oh, then I got the wrong impression, my apologies!

Also that was not a shitpost, but a genuine suggestion. What are the problems with it that make you think it was shitposting?

It was the adding a random number thing combined with "guarantueed to not have bad side effects". Making lobby rating always white is a good suggestion and already on the way to being implemented. I believe it is planned to be included in the next faf patch. But I don't understand what adding a random number is supposed to accomplish?

No he would not start with his ladder rating! Because you can have the same displayed rating while having entirely different mean and deviation! Trueskill rating cannot be expressed in a single number if you want to actually manipulate it in a way that makes sense to Trueskill!

Yes that was a bit of an oversimplification by me. Only if he only played ladder and nothing else his global rating would be exactly the same in mean and dev to his ladder rating. However, even if he also played other games, a settled deviation is somewhere between 60 and 80, so I am pretty confident that if he played enough games in the matchmakers to have established rating there, his global would also have a similar deviation and in consequence a similar mean.

The rules operate on displayed rating, because that is what the client shows to the players and we would accidentaly exclude new players from the adjustments, because their mean is massively higher than their displayed rating in comparison to established players.

I don't really understand why you think it is so much worse to have these rules use displayed rating instead of mean.

CheeseBerry · 28 Nov 2021, 13:46

I assume the mean vs displayed rating thing is Katharsas assuming that you are going to apply the trueskill algorithm to the displayed rating.

That wouldn't make much sense and also not how I understood askaholics initial post.

I understood it such that you are going to apply the trueskill algorithm to the mean+deviation, as that is the only thing that makes sense and use the displayed ratings only to figure out if you should apply the algorithm in the first place.

Am I correct?

Katharsas · 28 Nov 2021, 14:03

@blackyps said in Upcoming rating changes: Gaining global from playing matchmaker:

I am not a developer (anymore)? That is the point.

Oh, then I got the wrong impression, my apologies!

Also that was not a shitpost, but a genuine suggestion. What are the problems with it that make you think it was shitposting?

It was the adding a random number thing combined with "guarantueed to not have bad side effects". Making lobby rating always white is a good suggestion and already on the way to being implemented. I believe it is planned to be included in the next faf patch. But I don't understand what adding a random number is supposed to accomplish?

Preventing a clean 0 showing up inside the player tooltip (unless that tooltip would also get removed). You probably would have to hide it anyway though to hide deviation, because 500 deviation is just as obvious as 0 mean.

The rules operate on displayed rating, because that is what the client shows to the players and we would accidentaly exclude new players from the adjustments, because their mean is massively higher than their displayed rating in comparison to established players.

I see. Here is another possible solution:

When a player plays his first global game after having played at least one ladder game, simply set their global starting rating to their ladding rating once (and increase global deviation by 100 points compared to ladder deviation).

This one should be abuse-free since it is a one-time operation for each player. It should have close to no side-effects.

I still think that there is a lot of good benefits to the idea of treating every ladder game as a global game (in addition to it being a ladder game). This would not only make life easier for ladder players coming to global, it would also simply improve the prediction capability of the entire global rating context because more games are effecively being played in "global".

I also don't see how it could be abused unless you loose by purpose in ladder to be able to be underrated in global (but you could already do that by loosing in global on purpose).

@CheeseBerry
Trueskill itself has no real notion of displayed rating, so you need to plugin the mean and deviation into the algorithm (there is no way to just put in a single number). So we are discussing only the rules in the first post being based on mean or displayed rating. But im not too interested in that discussion because i think that both would not be great.

Edit:
I think that treating every ladder game as also being global is really close to your current solution except you don't need to recalculate the change for every player that meets conditions and you have no conditions. Should be significantly simpler in code without any new code being required really.

Askaholic · 28 Nov 2021, 17:46

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

We can’t just treat ladder games as global. That’s what was done in the past and it lead to massive rating manipulation by people who wanted to artificially lower their global rating. So the restriction of only doing positive adjustments is absolutely necessary to prevent intentional abuse. This then necessitates the other restrictions to prevent your global from just infinitely inflating. It doesn’t really matter if we use displayed rating or mu to implement the restrictions, but using displayed rating is way more maintainable in the long run as that’s what people are familiar with, and makes it much easier to figure out what the right configuration settings need to be. It will also prevent all of the questions of “why did I suddenly lose 5 global rating by playing ladder” that will inevitably pop up in the other case.

The only thing useful that I think this discussion has yielded so far is the idea that the games could be rated using the global rating of all players instead of just the one for the player who’s being adjusted. The reason I did it the other way was because the I know the other ratings will at least be somewhat balanced, but maybe that doesn’t matter. The most important thing is that the global rating of new players has a chance to change.

Anachronism_ · 29 Nov 2021, 16:11

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

Whenever you play a matchmaker game (1v1, 2v2, etc) and the game is rated, the rating system will perform an additional rating step using players global rating. This works by taking the matchmaker ratings of all players, substituting in the global rating for the player who's rating is being adjusted. The changes to global are then applied to players that meet a few conditions:

The player's global rating must be below a threshold (currently 1400)
The player's matchmaker rating must be higher than the player's global rating
The global rating change must result in an increase in displayed rating

Any players who don't meet all of those conditions will not have their global rating changed by the game.

This proposed idea is heavily flawed, and I don't think it should be implemented in this manner.

Afaik, the rating system is currently a zero sum game*. If the idea proposed above is implemented, FAF would have perpetual global rating inflation; it would no longer be a zero sum game. That seems like a very bad change to make, especially when there are alternative ways to address the issues regarding new players.

So, I propose that we implement an alternative solution:

One alternative solution would be to change the above proposal to specifically affect rating sigma but not rating mu. The conditions for when it would apply could be the same, and the value to change sigma by could be calculated in whatever way seems most sensible (such as changing sigma by an amount that would result in the same change in displayed rating as you would get via normal rating calculations). This would avoid perpetual rating inflation, as players' base ratings would remain the same, while grays' displayed ratings and rating certainty could increase (if the conditions are met) towards the proposed 1400 rating threshold.

Another alternative solution would be to create a universal rating that is affected by all rated normal game types on ladder, TMM, and global, and display that (perhaps in a different color, such as gold) in lieu of global for players with high rating uncertainty (grays).

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

This might be a complicated topic, but that doesn't mean it shouldn't be discussed by more of the community than the particular developers/etc who happened to look over the relevant commits on github/etc.

In fact, there are several other alternative solutions (beyond my suggestions and the basic idea of just applying regular global rating calculations to ladder/TMM games for all players) that potentially could be suggested/considered/discussed as well. There is no reason we can't have transparency and useful community discussion on this.

*Yes, you can make the argument that players permanently quitting adds rating inflation/deflation. Regardless, the proposed change detailed in Askaholic's post would add further perpetual global rating inflation on top of whatever we may or may not already have. So, increasing inflation is still something that would be good to avoid.

FtXCommando · 29 Nov 2021, 17:05

Somebody explain to me why inflation ruins a TrueSkill system. Doesn’t matter if 800 on global is -47108689 on ladder, the dudes gaining rating will feed it back into global as he loses games and there are hardly that many players with global lower than their ladder or other matchmaker ratings. Even less that are only that temporarily and actually are somehow great at those game modes but terrible in custom games.

Making a new rating that is expressed differently from global changes 0, people will still kick anybody considered an unknown entity. If he’s 1000 on ladder and 0 in the global game he might be good, but he still ruins “my rating” by being here in my 1000 median rating lobby where he might win.

Also, FAF has general deflation not inflation in all its implementations.

CheeseBerry · 29 Nov 2021, 17:10

Afaik, the rating system is currently a zero sum game*.

I'm not sure this is actually true, or there are at least a couple examples where it is definitely not true.

If your uncertainty is high, you gain and loose many more points of mean rating, than when your uncertainty is low, while your opponent doesn't gain or loose that much.

A completely new player may loose like 100 points of mean rank in a single loss, while his non-grey opponent only gains like 10 rank.
In essence, 90 rank just vanished into the ether.

There may still be some conservation law given by the algorithm of trueskill (maybe something like mean + n*deviation is always conserved?) but I don't know what it would be.

While I agree with FTX that inflation really isn't that big of a problem, should it even occur significantly, we could figure out what the result of implementing the above system would be:

If we run the new algorithm over the games that have been played in the last year, we can see how global rank would have changed.

Also, FAF has general deflation not inflation in all its implementations.

It does?

FtXCommando · 29 Nov 2021, 17:17

Yes, every year’s players have settled at a lower and lower average as time has gone on. During the first few years FAF matches closer to the intended distribution around 1500, then it slowly deviated to where 1000 or so mu is now the peak of the curve.

I attribute it to a skewed sample at the start of FAF’s implementation which skewed the “skill level” of players since the system got settled on some win rate against 1200s (who may 4 years later have been considered 1800s at that skill level) being the expected competency of a 1500.

As time has gone on, less and less old players arrive with the new players and so it’s more people with zero exposure to the game and average rating in that “year” decreases.

Does this matter? Not really. It’s all about your relativist position on the distribution. Doesn’t matter if we rate players from (0,1) (0,100) or (0,10000). In the end people will still lose their games, biggest issue is the efficiency of your initial games since 1500 is intended to be the top of the curve, but we already went away from that because of interpolation due to FAF’s deflation.

CheeseBerry · 29 Nov 2021, 17:24

Oh cool, so it's not that rating deflation is in the math, but instead a result of its population. Does it matter? Not really, its quite interesting though.

FtXCommando · 29 Nov 2021, 18:45

Also there isn’t a conservation law but rather a parameter (tau) which FAF adjusted to be higher. It essentially controls a “floor” for your uncertainty. This is why people seemingly hover around the 70-100 mark depending on the types of games they play.

With regards to the idea of conservation you could vaguely stretch it to exist, but it’s really just TrueSkill ironing out where you should exist based on your performance across a variety of other entities. The problem with “settled” ratings comes in when the system has a solid pool of players it has placed at 1200 with low uncertainty and it takes A LOT of games where they beat the “true 1200s” for them to adjust. This is partially why I imagine FAF did adjust their tau value in the past as many complaints of having to farm weaker players for near no gain in rating existed.

It doesn’t care about the new 1500 that a new player puts into the system as the singular impact gets dissipated across the whole population. For it to matter you need to specifically target and farm new players for your rating (playing all welcome games as an 1800 and farming the 1500,500 new players for 600 more rating and then never or rarely playing with others) which cannot happen in a coherent trueskill implementation.

Askaholic · 29 Nov 2021, 19:35

It does seem from my empirical tests that the sum of trueskill means is conserved before and after doing a rating calculation. I’ll have to go find that trueskill paper again after work to see if that is actually always the case. One thing we could do is rate the game as a global game but only apply the change of both displayed ratings increase. The only problem is that this really only happens when two new players play against eachother so it won’t help any new players that get opponents who have played more than like 1 or 2 global games.

I don’t like adjusting only sigma as that will just make the system more confident in whatever rating the player had which makes no sense.

ThomasHiatt · 29 Nov 2021, 22:30

Rating isn't supposed to be a currency that is conserved, it's supposed to be a number that represents a person's skill at the game. There isn't a fixed amount of skill on FAF that is conserved and traded between players. I can improve without someone else getting worse at the game.

Also, 1500 mean are added into the "economy" every time a player joins FAF, either they are quitting fast enough that it doesn't inflate rating, or it isn't a zero sum system.

Katharsas · 30 Nov 2021, 00:50

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

As somebody who cares about how rating works and has done quite a bit to educate people on how it works, a decision that manipulates global in a very new way was made without asking the wider community, and in a borderline careless fashion compared to how rating changes were made in the past. Yes, i was angry at the start of the post, and i would have had no reason to be angry if i this thread had been opened to actually ask people how to solve the problem, or present suggestions, instead of presenting a decision that was defacto already made and implemented.

So you better take at least a partial responsibility for the not very stelar constructiveness of the thread. If you feel like FAF decisions in general should be made like this, then im just gonna stop caring. Because what is the point? How could i even contribute in a constructive way if no open discussion takes place?

I propose to immedtialy lock such threads in the future so you can only get upvoted and no longer have to deal with any angry responses, and we no longer need to pretend that there is a point for a non-dev to try to impact such decisions.

BlackYps · 30 Nov 2021, 21:00

I get the impression that you are not upset that the community hasn't been asked, but more that specifially you have not been asked. I already explained that asking the general public about implementation details is in general not feasible. You are a special case, because you have been a developer at some point and have experience with trueskill. However you have stopped being active and therefore missed the the developer discussions about this topic. That is unfortunate. But it is the simple reality that knowledge about who past contributors are, that could give valuable input, gets lost over time, so you can't expect people to come to you to ask you.
So in my opinion you have to decide if you want to be treated as a normal player where it is reasonable that they have no detailed knowledge about trueskill, or if you want to be treated as developer, but then you also have to participate in the relevant discussion channels to make your opinion heard.

Regarding the constructiveness of the thread, proposing to immediatly lock such threads is not a contructive thing. You can still make your case here, however I am not convinced yet why the current solution should not be used.
You made two main arguments:

This change interferes with trueskill like nothing before
True, but that doesn't say anything about if this is good or bad.
re-simulating the evolution of global ratings gets significantly harder
Debatable, but more importantly I don't see why this is relevant at all. This has no impact on the players and would really only be relevant if we needed that for some kind of testing. I have never heard anyone calling for something like this in the last years.

So please give me specific problems you see that will arise when using our modification.