Upcoming rating changes: Gaining global from playing matchmaker

Anachronism_

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

Whenever you play a matchmaker game (1v1, 2v2, etc) and the game is rated, the rating system will perform an additional rating step using players global rating. This works by taking the matchmaker ratings of all players, substituting in the global rating for the player who's rating is being adjusted. The changes to global are then applied to players that meet a few conditions:

The player's global rating must be below a threshold (currently 1400)
The player's matchmaker rating must be higher than the player's global rating
The global rating change must result in an increase in displayed rating

Any players who don't meet all of those conditions will not have their global rating changed by the game.

This proposed idea is heavily flawed, and I don't think it should be implemented in this manner.

Afaik, the rating system is currently a zero sum game*. If the idea proposed above is implemented, FAF would have perpetual global rating inflation; it would no longer be a zero sum game. That seems like a very bad change to make, especially when there are alternative ways to address the issues regarding new players.

So, I propose that we implement an alternative solution:

One alternative solution would be to change the above proposal to specifically affect rating sigma but not rating mu. The conditions for when it would apply could be the same, and the value to change sigma by could be calculated in whatever way seems most sensible (such as changing sigma by an amount that would result in the same change in displayed rating as you would get via normal rating calculations). This would avoid perpetual rating inflation, as players' base ratings would remain the same, while grays' displayed ratings and rating certainty could increase (if the conditions are met) towards the proposed 1400 rating threshold.

Another alternative solution would be to create a universal rating that is affected by all rated normal game types on ladder, TMM, and global, and display that (perhaps in a different color, such as gold) in lieu of global for players with high rating uncertainty (grays).

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

This might be a complicated topic, but that doesn't mean it shouldn't be discussed by more of the community than the particular developers/etc who happened to look over the relevant commits on github/etc.

In fact, there are several other alternative solutions (beyond my suggestions and the basic idea of just applying regular global rating calculations to ladder/TMM games for all players) that potentially could be suggested/considered/discussed as well. There is no reason we can't have transparency and useful community discussion on this.

*Yes, you can make the argument that players permanently quitting adds rating inflation/deflation. Regardless, the proposed change detailed in Askaholic's post would add further perpetual global rating inflation on top of whatever we may or may not already have. So, increasing inflation is still something that would be good to avoid.

FtXCommando

Somebody explain to me why inflation ruins a TrueSkill system. Doesn’t matter if 800 on global is -47108689 on ladder, the dudes gaining rating will feed it back into global as he loses games and there are hardly that many players with global lower than their ladder or other matchmaker ratings. Even less that are only that temporarily and actually are somehow great at those game modes but terrible in custom games.

Making a new rating that is expressed differently from global changes 0, people will still kick anybody considered an unknown entity. If he’s 1000 on ladder and 0 in the global game he might be good, but he still ruins “my rating” by being here in my 1000 median rating lobby where he might win.

Also, FAF has general deflation not inflation in all its implementations.

CheeseBerry

Afaik, the rating system is currently a zero sum game*.

I'm not sure this is actually true, or there are at least a couple examples where it is definitely not true.

If your uncertainty is high, you gain and loose many more points of mean rating, than when your uncertainty is low, while your opponent doesn't gain or loose that much.

A completely new player may loose like 100 points of mean rank in a single loss, while his non-grey opponent only gains like 10 rank.
In essence, 90 rank just vanished into the ether.

There may still be some conservation law given by the algorithm of trueskill (maybe something like mean + n*deviation is always conserved?) but I don't know what it would be.

While I agree with FTX that inflation really isn't that big of a problem, should it even occur significantly, we could figure out what the result of implementing the above system would be:

If we run the new algorithm over the games that have been played in the last year, we can see how global rank would have changed.

Also, FAF has general deflation not inflation in all its implementations.

It does?

FtXCommando

Yes, every year’s players have settled at a lower and lower average as time has gone on. During the first few years FAF matches closer to the intended distribution around 1500, then it slowly deviated to where 1000 or so mu is now the peak of the curve.

I attribute it to a skewed sample at the start of FAF’s implementation which skewed the “skill level” of players since the system got settled on some win rate against 1200s (who may 4 years later have been considered 1800s at that skill level) being the expected competency of a 1500.

As time has gone on, less and less old players arrive with the new players and so it’s more people with zero exposure to the game and average rating in that “year” decreases.

Does this matter? Not really. It’s all about your relativist position on the distribution. Doesn’t matter if we rate players from (0,1) (0,100) or (0,10000). In the end people will still lose their games, biggest issue is the efficiency of your initial games since 1500 is intended to be the top of the curve, but we already went away from that because of interpolation due to FAF’s deflation.

CheeseBerry

Oh cool, so it's not that rating deflation is in the math, but instead a result of its population. Does it matter? Not really, its quite interesting though.

FtXCommando

Also there isn’t a conservation law but rather a parameter (tau) which FAF adjusted to be higher. It essentially controls a “floor” for your uncertainty. This is why people seemingly hover around the 70-100 mark depending on the types of games they play.

With regards to the idea of conservation you could vaguely stretch it to exist, but it’s really just TrueSkill ironing out where you should exist based on your performance across a variety of other entities. The problem with “settled” ratings comes in when the system has a solid pool of players it has placed at 1200 with low uncertainty and it takes A LOT of games where they beat the “true 1200s” for them to adjust. This is partially why I imagine FAF did adjust their tau value in the past as many complaints of having to farm weaker players for near no gain in rating existed.

It doesn’t care about the new 1500 that a new player puts into the system as the singular impact gets dissipated across the whole population. For it to matter you need to specifically target and farm new players for your rating (playing all welcome games as an 1800 and farming the 1500,500 new players for 600 more rating and then never or rarely playing with others) which cannot happen in a coherent trueskill implementation.

Askaholic

It does seem from my empirical tests that the sum of trueskill means is conserved before and after doing a rating calculation. I’ll have to go find that trueskill paper again after work to see if that is actually always the case. One thing we could do is rate the game as a global game but only apply the change of both displayed ratings increase. The only problem is that this really only happens when two new players play against eachother so it won’t help any new players that get opponents who have played more than like 1 or 2 global games.

I don’t like adjusting only sigma as that will just make the system more confident in whatever rating the player had which makes no sense.

ThomasHiatt

Rating isn't supposed to be a currency that is conserved, it's supposed to be a number that represents a person's skill at the game. There isn't a fixed amount of skill on FAF that is conserved and traded between players. I can improve without someone else getting worse at the game.

Also, 1500 mean are added into the "economy" every time a player joins FAF, either they are quitting fast enough that it doesn't inflate rating, or it isn't a zero sum system.

Katharsas

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

As somebody who cares about how rating works and has done quite a bit to educate people on how it works, a decision that manipulates global in a very new way was made without asking the wider community, and in a borderline careless fashion compared to how rating changes were made in the past. Yes, i was angry at the start of the post, and i would have had no reason to be angry if i this thread had been opened to actually ask people how to solve the problem, or present suggestions, instead of presenting a decision that was defacto already made and implemented.

So you better take at least a partial responsibility for the not very stelar constructiveness of the thread. If you feel like FAF decisions in general should be made like this, then im just gonna stop caring. Because what is the point? How could i even contribute in a constructive way if no open discussion takes place?

I propose to immedtialy lock such threads in the future so you can only get upvoted and no longer have to deal with any angry responses, and we no longer need to pretend that there is a point for a non-dev to try to impact such decisions.

BlackYps

I get the impression that you are not upset that the community hasn't been asked, but more that specifially you have not been asked. I already explained that asking the general public about implementation details is in general not feasible. You are a special case, because you have been a developer at some point and have experience with trueskill. However you have stopped being active and therefore missed the the developer discussions about this topic. That is unfortunate. But it is the simple reality that knowledge about who past contributors are, that could give valuable input, gets lost over time, so you can't expect people to come to you to ask you.
So in my opinion you have to decide if you want to be treated as a normal player where it is reasonable that they have no detailed knowledge about trueskill, or if you want to be treated as developer, but then you also have to participate in the relevant discussion channels to make your opinion heard.

Regarding the constructiveness of the thread, proposing to immediatly lock such threads is not a contructive thing. You can still make your case here, however I am not convinced yet why the current solution should not be used.
You made two main arguments:

This change interferes with trueskill like nothing before
True, but that doesn't say anything about if this is good or bad.
re-simulating the evolution of global ratings gets significantly harder
Debatable, but more importantly I don't see why this is relevant at all. This has no impact on the players and would really only be relevant if we needed that for some kind of testing. I have never heard anyone calling for something like this in the last years.

So please give me specific problems you see that will arise when using our modification.