Matchmaker Algorithm Feedback Thread

BlackYps

I also don't like the idea of a button. It is a bandaid for the problem that currently the matchmaker doesn't do a good job.
Your example match of 700 + 1200 vs. 1300+ 1300 has a whopping 700 rating disparity. For established players this shouldn't match, because the chance for the first team to win this is extremely small.
At the moment it does match eventually, because there is a bug in the current server version that rates extremely badly balanced games better than it should. If you want to play this anyway you could ask in aeolus who is queued up and if they want to play a custom instead, but this doesn't warrant an extra ui element.

To give you all a better understanding of how the algorithm works, I prepared some graphs for you. I created a script that passes artificial data to the matching algorithm and plots the results. In the bottom right you can see the rating distribution used for that run. Newbies are people with at most 10 games played in that rating. The distribution is based on real data based on global. Newbies get extra bonuses to match faster. This is especially important, because they can drop to extremely low ratings if they lose their first games and they would have the same difficulty to get matched like top players. We don't want them to get stuck, so we need to help them a little to get games. "search" in the diagram refers to a queued up party, that can contain multiple players. In the top right you can see the wait time of each search based on their average rating.
The graph in the top left shows some metrics about the created games. The games where sorted beforehand, so the game number doesn't correlate to when the game got created. Rating disparity is the difference in total rating between the two teams. Rating deviation is the standard deviation of the ratings of all the participating players in the game. Skill difference is the rating difference between the lowest and highest rated players in the game. It is roughly 2,5 to 3 times the standard deviation. This is just mathematics. You can ignore the rating deviation line and instead focus on the rating disparity and skill difference, because these are the "hands on" game metrics. Finally the graph in the lower left depicts the wait time again. This time sorted by wait time. Honestly the most interesting part are the averages and means written in the top right corner of the plot.
All plots have fixed y ranges (except the rating distribution). I did this, so it is easier to spot differences. This means however that some very high values are cut off. I consider these outliers, so it doesn't matter too much, just keep it in mind. I still have the maximum values in my spreadsheet.

This should be enough introduction, let's start with the bucket team matchmaker. This one was used until the server update 10 days ago.
BucketTeamMatchmaker 0-2.png

The currently running configuration of the new matchmaking algorithm looks like this:
current 2v2 0-2.png

As you can see it improved the overall game quality and equalized the wait time a bit. It still suffers from extremely bad games, just like the bucket team matchmaker. The reduced wait time outliers is what you experience as getting very bad games when you are high rated or are queueing during quiet hours. This is mainly because of the mentioned bug, but we can still get some more performance by tuning the parameters.
This is what I came up with, that will be available with the next server update:
higher_newbie_bonus 2v2 0-2.png
As you can see the lines in the top left don't go off the charts anymore, so we got rid of the horrible games. The general skill difference also improved a bit. The top players are back to being in queue forever if no suitable match can be found. The careful use of bonuses makes sure that we match more aggresively on the lower rated end. By the way, this is why you see these spikes at the end of the curves. They are the newbie matches being more lenient with balance.

Because the algorithm is configurable we will even be able to see some of these improvements before the next server update.

I hope I could answer some of your questions. I know that I covered a lot of stuff and I just glanced over some of some of the details to keep the post readable, so if you have any follow up questions, don't hestiate to ask.

BattleMoose

The biggest issue you are running into is the implicit assumption behind your metrics, that is, a rating difference represents the same skill difference at all rating levels. For example, a 500 rating difference from 800 to 1300 represents a much greater level in skill than does 1900 to 2400. This "error" is also included in opti balance games when there are large skill differences and results in very unbalanced games even though the algorithm can assign it a very high game quality metric.

Also, it doesn't appear that your game numbers correspond to specific games. But rather that you monotonically sorted your metrics and plotted them. If you plotted specific games along the x-axis, sorted by rating disparity and plotted the skill disparity, that would be revealing.

Also, I really woudn't be worrying about the wait times for the top end of the spectrum. There simply aren't many of them and to have them all online at the same time, randomly, is extremely unlikely. But if a competitive scene develops, they will do what they already do and communicate with each other and search at the same time, in a cooperative fashion.

Having hard limits on skill differences at the different rating levels would be what I would want to see. Should just take a few if functions...

FtXCommando

These are the exact same differences as far as how Trueskill works. To be 2300 you need a positive win rate against 2200s, to be 400 you need a positive win rate against 300, both at the same rate.

The confusion likely comes from the fact that the lower the rating, the higher the typical deviation is. Meanwhile all the people at 2200 have thousands of games with minimum deviation and so their skill is "settled" and seems definitive because everything looks like the expected result.

BattleMoose

@ftxcommando said in Matchmaker Algorithm Feedback Thread:

These are the exact same differences as far as how Trueskill works.

I guessed as much. Which is excatly the issue....

Askaholic

Well the new algorithm uses displayed rating now (μ -3σ), not trueskill mean, so that probably makes a difference.

BlackYps

@battlemoose said in Matchmaker Algorithm Feedback Thread:

Also, it doesn't appear that your game numbers correspond to specific games. But rather that you monotonically sorted your metrics and plotted them. If you plotted specific games along the x-axis, sorted by rating disparity and plotted the skill disparity, that would be revealing.

That's right. But I don't think it really makes any difference. What do you expect to see? I expect to see basically the same graph again, with more noise, because these metrics are correlated. You can't have all players be the exact same rating when you have a big rating disparity between the two teams.

Having hard limits on skill differences at the different rating levels would be what I would want to see. Should just take a few if functions...

What limits would you like to have?

BattleMoose

@blackyps said in Matchmaker Algorithm Feedback Thread:

What limits would you like to have?

Based on the highest rated player in the game:

less than 1000: no limit on skill difference
1000<1500: maximum skill difference of 500
more than 1500: maximum skill difference of 800

As a first pass, obviously be adjusted depending on results and such.

FtXCommando

You are the first person I've ever seen to suggest that the difference between a 2200 and 2300 player is less than a 700 and 800 btw; I actually misread your first post because I thought it was the classic post that everyone below 1000 is the same skill level.

Also, these suggestions would make you match with an even larger difference than what currently exists. It would basically double the search range for 1500 players. Or are you suggesting that teammates can only be within these rating limits? In which case I don't understand these rating limit brackets.

Why would you have no skill limit, followed by a 500 skill limit, followed by an 800 one? That doesn't even make any logical sense. It can't be based on playerbase size because the group with 70% of the players (<1000) has no limit, but it also can't be based on any sort of rationale about skill level being larger or lower based on your place on the rating spectrum.

BattleMoose

@ftxcommando said in Matchmaker Algorithm Feedback Thread:

Also, these suggestions would make you match with an even larger difference than what currently exists.

This is demonstrably untrue because these limits can only prevent games with large skill differences: impossible to produce more... My suggestion is the imposition of hard limits where currently none exist. I cannot even begin to try and untagle what you think I am suggesting...

BlackYps

Why would you have no skill limit, followed by a 500 skill limit, followed by an 800 one?

I'm still interested in your explanation for this.
Also it would really make the discussion easier if you replied to more than one question per post.

BattleMoose

@blackyps said in Matchmaker Algorithm Feedback Thread:

I'm still interested in your explanation for this.

For higest rated player less than 1000: no limit is as it currently is. Effectively no change. Its so hard to determine accurately skill in this bracket that I think its fine.

For highest rated player greater than 1000 and less than 1500: max skill difference is suggested at 500. For a 1200 player, to have a 700 rated player as a teammate vs a 1000 and a 900 I think this is a very big ask. Personally as a 1200 player you effectively need to carry the 700 to victory, it removes the "fun" of a 2v2 with such a large skill disparity.

Note that in this example rating disparity will be zero and alone would be an indication of high quality or balanced game. Which is why I recommended in my previous post as to looking at these metrics together and shoult not be seperated. Rating disparity and skill differences do not neccessarily increase together but could be largely indpendent of each other.

For highest rated player of 2000 or more being matched with a 1200, I think could still result in a viable game. You can generally expect a 1200 to be independent and contribute to a fight in a meaningful way right up to the t4 stage of a game.

Matching a 2000 with a 300 I think will just result in a bad gaming experience for everyone and discourage people from searching tmm. It has for me. Which is why I am suggesting a hard limit on skill difference.

I am much less interested on what the numbers actually should be. I don't know what they should be. But I certainly think that if they are allowed to be too large, whatever too large is, that will just produce games that people will not want to play.

FtXCommando

And how many 2000 rated players have you talked to for their opinion on 1200s? Because I promise you they view them even worse than how you seem to be viewing 700s. Plenty of 2k players refuse to play lobbies with rating minimums below 1.8k.

It literally makes zero sense for rating disparity and skill difference to not be correlated. To be 200 points above 2000 or 200 requires the exact same level of “skill” beyond 2000 or 200 to consistently beat those levels and maintain your rating. The only difference here is that 200s have higher deviation in general.

BattleMoose

"And how many 2000 rated players have you talked to for their opinion on 1200s? Because I promise you they view them even worse than how you seem to be viewing 700s. Plenty of 2k players refuse to play lobbies with rating minimums below 1.8k.Plenty of 2k players refuse to play lobbies with rating minimums below 1.8k."

All the more reason to use a hard limit on skill difference. This has been my main point the entire time. I really don't need to have it parrotted back to me as if I don't understand something.

https://forum.faforever.com/topic/2172/can-we-please-talk-about-tmm-match-making

I was specifically asked what I thought the limits should be and responded.
I was specifically asked to justify my thoughts, I did.

But you don't like my numbers or reasons, FINE. Fix it or don't fix it: I am done here.

archsimkat

@battlemoose said in Matchmaker Algorithm Feedback Thread:

For example, a 500 rating difference from 800 to 1300 represents a much greater level in skill than does 1900 to 2400.

Assuming the same number of games, an 800 has the exact same chance of beating a 1300 as a 1900 has against a 2400.

Cyborg16

Honestly @BlackYps that sounds like good work. I've partly given up playing 2v2 due to bad experiences, but will have to try some more.

Slightly off topic: will the 3v3 and 4v4 queues share the 2v2 ratings or use new ratings? I can see why 1v1, 2v2 and global use separate ratings (team communication/coordination being a separate skill, and half of global games being gap/setons/canis), but TMM games are probably similar enough to share ratings?

Askaholic

I’m not 100% up to date on what the plans for ratings are but afaik, 4v4 will definitely have a different rating, and 3v3 will get bundled in with either the 4v4 one or the 2v2 one. There will also be a queue that uses global rating.

Cyborg16

Last night's game (https://replay.faforever.com/15370273) showed another issue with the TMM ratings: my opponents were both rated around 580. In reality, one was a much better player than the other; presumably they always played 2v2 together thus always won/lost (rating) together. Perhaps one could call the result "balanced" (1248 + 16 vs 580 + 584), but their ratings still massively misrepresented the players (the better of the two is 985 on 1v1 and 1100 on global, the other ~500 on global).

BlackYps

People having inaccurate ratings is a problem that the matchmaker can't do anything about.
For specific players it should go away over time when they play more.

Valki

@cyborg16 I thought this too... After being initialized at the wrong rating and then playing always with the same friend I feared I was still massively overrated. I am 1100 he was 600

Somehow though, my winrate is pretty close to 50% when queuing up solo. Apparently I really am as good in 2v2 as my rating suggest (now at least).

Katharsas

my opponents were both rated around 580. In reality, one was a much better player than the other; presumably they always played 2v2 together thus always won/lost (rating) together

Yeah no rating system in the world can really fix that (if they indeed always play together), unless it measures other things than win/loss, which would open the door to people trying to game the rating system (potentially grieving their teammates to gain points) instead of trying to win.