Upcoming rating changes: Gaining global from playing matchmaker

Katharsas

@blackyps said in Upcoming rating changes: Gaining global from playing matchmaker:

I am not a developer (anymore)? That is the point.

Oh, then I got the wrong impression, my apologies!

Also that was not a shitpost, but a genuine suggestion. What are the problems with it that make you think it was shitposting?

It was the adding a random number thing combined with "guarantueed to not have bad side effects". Making lobby rating always white is a good suggestion and already on the way to being implemented. I believe it is planned to be included in the next faf patch. But I don't understand what adding a random number is supposed to accomplish?

Preventing a clean 0 showing up inside the player tooltip (unless that tooltip would also get removed). You probably would have to hide it anyway though to hide deviation, because 500 deviation is just as obvious as 0 mean.

The rules operate on displayed rating, because that is what the client shows to the players and we would accidentaly exclude new players from the adjustments, because their mean is massively higher than their displayed rating in comparison to established players.

I see. Here is another possible solution:

When a player plays his first global game after having played at least one ladder game, simply set their global starting rating to their ladding rating once (and increase global deviation by 100 points compared to ladder deviation).

This one should be abuse-free since it is a one-time operation for each player. It should have close to no side-effects.

I still think that there is a lot of good benefits to the idea of treating every ladder game as a global game (in addition to it being a ladder game). This would not only make life easier for ladder players coming to global, it would also simply improve the prediction capability of the entire global rating context because more games are effecively being played in "global".

I also don't see how it could be abused unless you loose by purpose in ladder to be able to be underrated in global (but you could already do that by loosing in global on purpose).

@CheeseBerry
Trueskill itself has no real notion of displayed rating, so you need to plugin the mean and deviation into the algorithm (there is no way to just put in a single number). So we are discussing only the rules in the first post being based on mean or displayed rating. But im not too interested in that discussion because i think that both would not be great.

Edit:
I think that treating every ladder game as also being global is really close to your current solution except you don't need to recalculate the change for every player that meets conditions and you have no conditions. Should be significantly simpler in code without any new code being required really.

Askaholic

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

We can’t just treat ladder games as global. That’s what was done in the past and it lead to massive rating manipulation by people who wanted to artificially lower their global rating. So the restriction of only doing positive adjustments is absolutely necessary to prevent intentional abuse. This then necessitates the other restrictions to prevent your global from just infinitely inflating. It doesn’t really matter if we use displayed rating or mu to implement the restrictions, but using displayed rating is way more maintainable in the long run as that’s what people are familiar with, and makes it much easier to figure out what the right configuration settings need to be. It will also prevent all of the questions of “why did I suddenly lose 5 global rating by playing ladder” that will inevitably pop up in the other case.

The only thing useful that I think this discussion has yielded so far is the idea that the games could be rated using the global rating of all players instead of just the one for the player who’s being adjusted. The reason I did it the other way was because the I know the other ratings will at least be somewhat balanced, but maybe that doesn’t matter. The most important thing is that the global rating of new players has a chance to change.

Anachronism_

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

Whenever you play a matchmaker game (1v1, 2v2, etc) and the game is rated, the rating system will perform an additional rating step using players global rating. This works by taking the matchmaker ratings of all players, substituting in the global rating for the player who's rating is being adjusted. The changes to global are then applied to players that meet a few conditions:

The player's global rating must be below a threshold (currently 1400)
The player's matchmaker rating must be higher than the player's global rating
The global rating change must result in an increase in displayed rating

Any players who don't meet all of those conditions will not have their global rating changed by the game.

This proposed idea is heavily flawed, and I don't think it should be implemented in this manner.

Afaik, the rating system is currently a zero sum game*. If the idea proposed above is implemented, FAF would have perpetual global rating inflation; it would no longer be a zero sum game. That seems like a very bad change to make, especially when there are alternative ways to address the issues regarding new players.

So, I propose that we implement an alternative solution:

One alternative solution would be to change the above proposal to specifically affect rating sigma but not rating mu. The conditions for when it would apply could be the same, and the value to change sigma by could be calculated in whatever way seems most sensible (such as changing sigma by an amount that would result in the same change in displayed rating as you would get via normal rating calculations). This would avoid perpetual rating inflation, as players' base ratings would remain the same, while grays' displayed ratings and rating certainty could increase (if the conditions are met) towards the proposed 1400 rating threshold.

Another alternative solution would be to create a universal rating that is affected by all rated normal game types on ladder, TMM, and global, and display that (perhaps in a different color, such as gold) in lieu of global for players with high rating uncertainty (grays).

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

This might be a complicated topic, but that doesn't mean it shouldn't be discussed by more of the community than the particular developers/etc who happened to look over the relevant commits on github/etc.

In fact, there are several other alternative solutions (beyond my suggestions and the basic idea of just applying regular global rating calculations to ladder/TMM games for all players) that potentially could be suggested/considered/discussed as well. There is no reason we can't have transparency and useful community discussion on this.

*Yes, you can make the argument that players permanently quitting adds rating inflation/deflation. Regardless, the proposed change detailed in Askaholic's post would add further perpetual global rating inflation on top of whatever we may or may not already have. So, increasing inflation is still something that would be good to avoid.

FtXCommando

Somebody explain to me why inflation ruins a TrueSkill system. Doesn’t matter if 800 on global is -47108689 on ladder, the dudes gaining rating will feed it back into global as he loses games and there are hardly that many players with global lower than their ladder or other matchmaker ratings. Even less that are only that temporarily and actually are somehow great at those game modes but terrible in custom games.

Making a new rating that is expressed differently from global changes 0, people will still kick anybody considered an unknown entity. If he’s 1000 on ladder and 0 in the global game he might be good, but he still ruins “my rating” by being here in my 1000 median rating lobby where he might win.

Also, FAF has general deflation not inflation in all its implementations.

CheeseBerry

Afaik, the rating system is currently a zero sum game*.

I'm not sure this is actually true, or there are at least a couple examples where it is definitely not true.

If your uncertainty is high, you gain and loose many more points of mean rating, than when your uncertainty is low, while your opponent doesn't gain or loose that much.

A completely new player may loose like 100 points of mean rank in a single loss, while his non-grey opponent only gains like 10 rank.
In essence, 90 rank just vanished into the ether.

There may still be some conservation law given by the algorithm of trueskill (maybe something like mean + n*deviation is always conserved?) but I don't know what it would be.

While I agree with FTX that inflation really isn't that big of a problem, should it even occur significantly, we could figure out what the result of implementing the above system would be:

If we run the new algorithm over the games that have been played in the last year, we can see how global rank would have changed.

Also, FAF has general deflation not inflation in all its implementations.

It does?

FtXCommando

Yes, every year’s players have settled at a lower and lower average as time has gone on. During the first few years FAF matches closer to the intended distribution around 1500, then it slowly deviated to where 1000 or so mu is now the peak of the curve.

I attribute it to a skewed sample at the start of FAF’s implementation which skewed the “skill level” of players since the system got settled on some win rate against 1200s (who may 4 years later have been considered 1800s at that skill level) being the expected competency of a 1500.

As time has gone on, less and less old players arrive with the new players and so it’s more people with zero exposure to the game and average rating in that “year” decreases.

Does this matter? Not really. It’s all about your relativist position on the distribution. Doesn’t matter if we rate players from (0,1) (0,100) or (0,10000). In the end people will still lose their games, biggest issue is the efficiency of your initial games since 1500 is intended to be the top of the curve, but we already went away from that because of interpolation due to FAF’s deflation.

CheeseBerry

Oh cool, so it's not that rating deflation is in the math, but instead a result of its population. Does it matter? Not really, its quite interesting though.

FtXCommando

Also there isn’t a conservation law but rather a parameter (tau) which FAF adjusted to be higher. It essentially controls a “floor” for your uncertainty. This is why people seemingly hover around the 70-100 mark depending on the types of games they play.

With regards to the idea of conservation you could vaguely stretch it to exist, but it’s really just TrueSkill ironing out where you should exist based on your performance across a variety of other entities. The problem with “settled” ratings comes in when the system has a solid pool of players it has placed at 1200 with low uncertainty and it takes A LOT of games where they beat the “true 1200s” for them to adjust. This is partially why I imagine FAF did adjust their tau value in the past as many complaints of having to farm weaker players for near no gain in rating existed.

It doesn’t care about the new 1500 that a new player puts into the system as the singular impact gets dissipated across the whole population. For it to matter you need to specifically target and farm new players for your rating (playing all welcome games as an 1800 and farming the 1500,500 new players for 600 more rating and then never or rarely playing with others) which cannot happen in a coherent trueskill implementation.

Askaholic

It does seem from my empirical tests that the sum of trueskill means is conserved before and after doing a rating calculation. I’ll have to go find that trueskill paper again after work to see if that is actually always the case. One thing we could do is rate the game as a global game but only apply the change of both displayed ratings increase. The only problem is that this really only happens when two new players play against eachother so it won’t help any new players that get opponents who have played more than like 1 or 2 global games.

I don’t like adjusting only sigma as that will just make the system more confident in whatever rating the player had which makes no sense.

ThomasHiatt

Rating isn't supposed to be a currency that is conserved, it's supposed to be a number that represents a person's skill at the game. There isn't a fixed amount of skill on FAF that is conserved and traded between players. I can improve without someone else getting worse at the game.

Also, 1500 mean are added into the "economy" every time a player joins FAF, either they are quitting fast enough that it doesn't inflate rating, or it isn't a zero sum system.

Katharsas

@askaholic said in Upcoming rating changes: Gaining global from playing matchmaker:

I think this thread is a great example of why having development discussions on the forums is not useful. There is a ton of text here and a lot of rage but not very much that is actually productive.

As somebody who cares about how rating works and has done quite a bit to educate people on how it works, a decision that manipulates global in a very new way was made without asking the wider community, and in a borderline careless fashion compared to how rating changes were made in the past. Yes, i was angry at the start of the post, and i would have had no reason to be angry if i this thread had been opened to actually ask people how to solve the problem, or present suggestions, instead of presenting a decision that was defacto already made and implemented.

So you better take at least a partial responsibility for the not very stelar constructiveness of the thread. If you feel like FAF decisions in general should be made like this, then im just gonna stop caring. Because what is the point? How could i even contribute in a constructive way if no open discussion takes place?

I propose to immedtialy lock such threads in the future so you can only get upvoted and no longer have to deal with any angry responses, and we no longer need to pretend that there is a point for a non-dev to try to impact such decisions.

BlackYps

I get the impression that you are not upset that the community hasn't been asked, but more that specifially you have not been asked. I already explained that asking the general public about implementation details is in general not feasible. You are a special case, because you have been a developer at some point and have experience with trueskill. However you have stopped being active and therefore missed the the developer discussions about this topic. That is unfortunate. But it is the simple reality that knowledge about who past contributors are, that could give valuable input, gets lost over time, so you can't expect people to come to you to ask you.
So in my opinion you have to decide if you want to be treated as a normal player where it is reasonable that they have no detailed knowledge about trueskill, or if you want to be treated as developer, but then you also have to participate in the relevant discussion channels to make your opinion heard.

Regarding the constructiveness of the thread, proposing to immediatly lock such threads is not a contructive thing. You can still make your case here, however I am not convinced yet why the current solution should not be used.
You made two main arguments:

This change interferes with trueskill like nothing before
True, but that doesn't say anything about if this is good or bad.
re-simulating the evolution of global ratings gets significantly harder
Debatable, but more importantly I don't see why this is relevant at all. This has no impact on the players and would really only be relevant if we needed that for some kind of testing. I have never heard anyone calling for something like this in the last years.

So please give me specific problems you see that will arise when using our modification.