Desync on all replays and games with Windows 11 Arm on M1 Max Apple

@harzernoob said in Desync on all replays and games with Windows 11 Arm on M1 Max Apple:

@corsaire apparently I am quoted now for super low level debugging B).
If it is indeed due to some non deterministic part of the emulation microsoft might be able to fix it. But good luck getting them to trouble shoot it.

Well I just did that.
Chances are slim yeah, but hey, no harm in pointing out the discrepancy between x86-64 and emulated on arm64... Maybe with a bit of luck someone at Redmond will be interested.

One can hope 😉

Sorry to resurrect an older thread, but I'm wondering if anyone's had or heard of any progress in this front? I just got FA on a crazy discount on steam because I used to love this game... and I'd love to get it running smoothly if possible!

Tried again now with latest parallels and windows updates:

Edition Windows 11 Pro Insider Preview
Version 22H2
Installed on ‎6/‎12/‎2022
OS build 25136.1000
Experience Windows Feature Experience Pack 1000.25136.1000.0

Same issue still, i.e. immediate desync on replay playback (but the replay continues fine). Single player works.

Maybe with Metal 3 with MacOS Ventura and new DX12 support in Parallels later this fall it will be improved? Will try again in a few months. 🙂

Just to add my two cents to this issue, this sounds a lot like you're running into a numerical issue. This basically means your system makes an error during the calculation. This is completely normal and taken into account by most software.

The IEEE754 standard defines the minimum precision for floating point calculations. It doesn't define a maximum precision. Therefore, different CPUs may produce slightly different results for the same computation / code. Adding numerical instability, those errors can get large, fast. FA deals with this using strict / stable floating point calculations. (I think they don't do floating point, they do fixed point math)

The issue now arises when there is some aspect of the way your non-x86 CPU works that wasn't anticipated by the developers of FA (e.g. when computing a sine/cosine as mentioned above), that causes your system to make a different calculation error than everyone else. Then you get a different result and a desync.

If those speculations are correct, that would make it nearly impossible to run FA on a non x86 chip to produce the same deterministic results as on an x86 chip.

Having issues with connectivity / ICE? Talk to me.

@geosearchef Well if your Arm PC emulates a X86 (perfectly ofc.), then you could do it.

But yeah, playing a synchronized simulation on two different architectures would need special software to make sure the two softwares generate the same output.
Since FA developers didn't expect people to run it on arm, you would need to implement that step your self (in the form of an X86 emulator for arm).

@nex Exactly, you would need an emulator. The M1 is not powerful enough to emulate FA though.

Having issues with connectivity / ICE? Talk to me.

@geosearchef Not sure what you're talking about here, you can perfectly play solo campaign, so yeah the M1 is perfectly fine. Problem is to do a multiplayer game or run a replay.

Not sure if there will ever be a way to mitigate that calculation discrepancy.

On that note, did someone had the possibility to try running 2 FAF setups on M1 in a multiplayer game ?

We do know that between a Windows PC and a windows with emulation on M1 it does not work... but how about 2 identical emulated setups ?

In addition to FPU shenanigans, lightly quoting Marcan, the cores in the M1 speculate execution extremely hard, and that is making pre-existing unknown bugs surface. It's possible FA itself is broken in a way that never shows up in Intel/AMD cpus, and if that were the case it's probably pretty much unfixable.

It would also mean that the game would likely stop working at some future point when x86 cpus get more complex and trigger the bug. (you can talk about retro compatibility being x86's only reason to exist though and argue that last point may never happen)

@corsaire said in Desync on all replays and games with Windows 11 Arm on M1 Max Apple:

Not sure what you're talking about here, you can perfectly play solo campaign,

Yes, what he/we meant, was that you would need to perfectly emulate an X86 to play with other X86.
Singleplayer and playing with other M1 should/might be fine?

But i also heard from someone on discord, that he successfully plays ladder with an M1, but gets desyncs in lobbies, which is quite strange.

I've checked some sources. In https://gafferongames.com/post/floating_point_determinism/, a developer of SupCom confirms that they already use the IEEE754 float standard to avoid desync problems with different x86 processors (AMD and Intel). They use the _controlfp command to determine how floating-point calculations work. The problem is that on ARM Windows, _controlfp has a slightly different effect - it affects a different set of registers (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2?view=msvc-170).
This is only a guess, but it seems like we can just rewrite some code and change the used float registers.
I'm not sure that this will solve everything, but it looks like a good first attempt.

@Eyvind that is an interesting find 👍 , I'm not sure how trivial it would be to fix what you describe. We don't have access to the source code of the engine, for example

A work of art is never finished, merely abandoned

@Eyvind Great find! Even without source code, perhaps it could be possible patch? (or if Microsoft needs to do something?)

It would be interesting to compare with Qualcomm ARM based Windows 11 laptop, if anyone have access to one of those?

From the first link these two function calls are important to ensure deterministic floating point calculations across different systems (such as Intel and AMD):

_controlfp(_PC_24, _MCW_PC)
_controlfp(_RC_NEAR, _MCW_RC)

The first is Precision Control, the second is Rounding Control. According to the microsoft documentation linked for controlfp - setting the Precision Control to 24 bits is not supported on ARM. Making this call on ARM will simply not change anything. The Rounding Control to NEAR seems to be ok.

(Precision Control is not supported for x64 either, but supreme commander FA is 32-bit application I think)

On a second thought, might also be this assert that fails the first tick:

gpAssert( (_controlfp(0, 0) & _MCW_PC) == _PC_24 );

Since setting the _PC_24 flag does not do anything on ARM, when checking the flags, that will be unchanged. First thought - could we disable one of these asserts with a patch? Or somehow pretend the flags are always set as expected on Apple Silicon with _PC_24 set (or redefine _MCW_PC)?

However, two more interesting bits of information I found, comparing Intel and Apple Silicon, which seems to hint at calculations using binary64 (double) match but binary32 (single float / 32 bit (24 bit precision)) don't match:

https://stackoverflow.com/questions/71441137/np-float32-floating-point-differences-between-intel-macbook-and-m1

https://eclecticlight.co/2021/04/22/can-you-trust-floating-point-arithmetic-on-apple-silicon/

If so, that would mean the proper fix would be to replace all single floats in the game with double floats.