No, my assumption is correct. Your assumption that the result of a single game is evidence of the long term failure of trueskill is what is flawed.
Your analogy about trueskill is also flawed because it can only make a relativistic distribution based on the data it has. If you have a distribution based on 5x5 maps, then it is accurate for gauging that. If you suddenly add 20x20 maps, then you have added an error factor for the rating. Since tmm eliminates the ability to select for slots and maps (but it keeps the ability to select for teammates), it is about as close as you can get to rating the quality of that individual as a teammate.
So: since there is no situation where trueskill needs to account for data of 1 2k player playing 20 100 rating players on a 5x5 map, it is irrelevant to the situation. All that matters is 2v2 capability on the curated pool of the matchmaker.
So can a 2k player beat 20 100 rated players on the maps in tmm? If he can't, then he isn't a 2k according to the construction of this trueskill environment. If he can, then he is. Luckily this is not a possibility in matchmaker so it isn't a consideration for the system.