Desync on all replays and games with Windows 11 Arm on M1 Max Apple

@nex Exactly, you would need an emulator. The M1 is not powerful enough to emulate FA though.

Having issues with connectivity / ICE? Talk to me.

@geosearchef Not sure what you're talking about here, you can perfectly play solo campaign, so yeah the M1 is perfectly fine. Problem is to do a multiplayer game or run a replay.

Not sure if there will ever be a way to mitigate that calculation discrepancy.

On that note, did someone had the possibility to try running 2 FAF setups on M1 in a multiplayer game ?

We do know that between a Windows PC and a windows with emulation on M1 it does not work... but how about 2 identical emulated setups ?

In addition to FPU shenanigans, lightly quoting Marcan, the cores in the M1 speculate execution extremely hard, and that is making pre-existing unknown bugs surface. It's possible FA itself is broken in a way that never shows up in Intel/AMD cpus, and if that were the case it's probably pretty much unfixable.

It would also mean that the game would likely stop working at some future point when x86 cpus get more complex and trigger the bug. (you can talk about retro compatibility being x86's only reason to exist though and argue that last point may never happen)

@corsaire said in Desync on all replays and games with Windows 11 Arm on M1 Max Apple:

Not sure what you're talking about here, you can perfectly play solo campaign,

Yes, what he/we meant, was that you would need to perfectly emulate an X86 to play with other X86.
Singleplayer and playing with other M1 should/might be fine?

But i also heard from someone on discord, that he successfully plays ladder with an M1, but gets desyncs in lobbies, which is quite strange.

I've checked some sources. In https://gafferongames.com/post/floating_point_determinism/, a developer of SupCom confirms that they already use the IEEE754 float standard to avoid desync problems with different x86 processors (AMD and Intel). They use the _controlfp command to determine how floating-point calculations work. The problem is that on ARM Windows, _controlfp has a slightly different effect - it affects a different set of registers (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2?view=msvc-170).
This is only a guess, but it seems like we can just rewrite some code and change the used float registers.
I'm not sure that this will solve everything, but it looks like a good first attempt.

@Eyvind that is an interesting find 👍 , I'm not sure how trivial it would be to fix what you describe. We don't have access to the source code of the engine, for example

A work of art is never finished, merely abandoned

@Eyvind Great find! Even without source code, perhaps it could be possible patch? (or if Microsoft needs to do something?)

It would be interesting to compare with Qualcomm ARM based Windows 11 laptop, if anyone have access to one of those?

From the first link these two function calls are important to ensure deterministic floating point calculations across different systems (such as Intel and AMD):

_controlfp(_PC_24, _MCW_PC)
_controlfp(_RC_NEAR, _MCW_RC)

The first is Precision Control, the second is Rounding Control. According to the microsoft documentation linked for controlfp - setting the Precision Control to 24 bits is not supported on ARM. Making this call on ARM will simply not change anything. The Rounding Control to NEAR seems to be ok.

(Precision Control is not supported for x64 either, but supreme commander FA is 32-bit application I think)

On a second thought, might also be this assert that fails the first tick:

gpAssert( (_controlfp(0, 0) & _MCW_PC) == _PC_24 );

Since setting the _PC_24 flag does not do anything on ARM, when checking the flags, that will be unchanged. First thought - could we disable one of these asserts with a patch? Or somehow pretend the flags are always set as expected on Apple Silicon with _PC_24 set (or redefine _MCW_PC)?

However, two more interesting bits of information I found, comparing Intel and Apple Silicon, which seems to hint at calculations using binary64 (double) match but binary32 (single float / 32 bit (24 bit precision)) don't match:

https://stackoverflow.com/questions/71441137/np-float32-floating-point-differences-between-intel-macbook-and-m1

https://eclecticlight.co/2021/04/22/can-you-trust-floating-point-arithmetic-on-apple-silicon/

If so, that would mean the proper fix would be to replace all single floats in the game with double floats.