Wifi: when better emulation makes things worse
After LAN was merged, I was informed that it wasn't working as smoothly as the old season2 branch. In particular, it was harder to get a connection going in a game.

I didn't quite know where to look. The LAN interface itself functions the same way as its season2 counterpart. The only thing I could think of was the improvements that we'd done to wifi emulation since season2.

I fired up my other computer and gave LAN a test. I had to do a bit of setup to get both computers connected over ethernet, since wifi has too much latency. But then I observed the same thing that had been reported to me: I could sometimes get a connection going, but it had a high failure rate.

To further confirm it, I even happened to get a connection failure over local multiplayer. It was just a lot less likely there.

When logging the wifi traffic, I observed something peculiar: the client would send a 802.11 authentication frame and receive another authentication frame from the host, then it would send an association frame, but at that point the host had entered power saving mode. It would only leave power saving mode a lot later, and that was too late.

The LAN interface does something a bit particular: due to how it functions, it may receive frames at any time, and these are added to a receive queue to be consumed where needed. To avoid clogging up that queue, frames that are more than 16 milliseconds old are considered stale and are deleted. This should normally be more than enough: when the wifi system is active, it checks for incoming frames fairly often, that is atleast every 0.5 milliseconds.

I could see how this was a problem in the aforementioned case, but raising the 'stale frame' threshold as far as 500ms did not fix the problem. So I had to approach it differently, namely, by considering how things work in wifi land.

The host sends beacon frames every ~200 milliseconds, to advertise its presence to potential clients. A client that wants to connect will then initiate the process by sending the host an authentication frame after receiving a beacon, and so on. The authentication+association exchange is aligned to beacon frames, and that is important.

The host uses power saving mode between beacons. The basic process is as follows: send beacon frame, wait for a while to see if any clients want to connect, then enter power saving mode until next beacon. There is a timer (W_POST_BEACON) that determines how long to wait before entering power saving mode. It is typically set to 10 milliseconds or so, which is more than enough for an authentication+association exchange.

But what happens when we try to emulate wifi communication over different protocols with different constraints? We have a synchronization mechanism that ensures everything is received on time, but it only kicks in after a client has connected. Before that, things are much less reliable, and the authentication/association frames may lag behind and arrive too late (like when the host has already entered power saving mode). This is exactly what is happening with LAN here. Sometimes the frames may be delivered the next time the host leaves power saving mode, sometimes it works but sometimes not.

And why was this not a problem before? Because power saving mode wasn't emulated correctly. It's all part of this post: I implemented channels, but things broke precisely because of the power saving bugs, and you know how this goes. Either way, we were kinda just receiving things at all times, which made things work.

So, how do we fix this bug, anyway? I could think of two possible ways.

First way, extending the synchronization mechanism to kick in earlier, during the auth+assoc process or even earlier. It could be worth exploring, but it is also likely to cause more problems than it would fix. The sync mechanism was designed around the multiplayer protocol especially, so I don't know how well it would perform outside of that use case, or if it could cause problems when connecting more than two players.

Second way, extending the post-beacon interval when receiving authentication or association frames. It is simply done by increasing W_POST_BEACON by 10 or so, making sure there will be enough time for the auth+assoc process to complete. However, it's a hack. It deviates from hardware behavior, and if anything is coded to expect a specific post-beacon interval (instead of just waiting for the post-beacon IRQ), it will cause problems.

I decided to experiment with the second way. It is less invasive, considering how heavy-handed the sync mechanism is. I also doubt that it will ever cause problems, since the post-beacon interval stuff is mostly used for power saving purposes. But if it ever does cause problems, we can always change it back (and we get to write another juicy tech blog post about it, heh).

It's a bit anticlimactic given our standards, but remember, emulation is a bunch of compromises. Especially when we have to emulate low-level wifi communication with its peculiar timing requirements.

In the meantime, this hack seems to have fixed the problems I had observed with getting connections going over LAN. It seems to work pretty smoothly now.
Foxeh says:
Aug 14th 2024
ah interesting, i was wondering why suddenly i was having a harder time since season 2.
so it was because power saving was fixed lol

LEGO_Vince says:
Aug 15th 2024
I feel so listened to. I'll take a look at the newest update. Thanks so much for the responsiveness.
mixo says:
Aug 16th 2024
Sorry that this question has nothing to do with the post (which I read in its entirety, it's very interesting to see how you present the problems and the solutions) but, I'm not very informed about the subject, forums, etc, so I ask here. Is there a plan to implement the option to split the screens into different windows? (upper and lower screen in 2 different windows) It would help a lot to those who use more than 1 monitor (and those of us who do some tricks like using phones as touch screens with moonlight)
thisisaname says:
Aug 16th 2024
A friend and I tried the Nightly Build (the one marked with an increased player limit) to play Dragon Quest IX, and we still get a communication error, while having ~90ms ping. Tried a few times and switched some settings around, but doesn't seem to be working- we had managed to connect before season 2, too (although it was super unstable).
Post a comment
Name:
DO NOT TOUCH