|Home | Downloads | Screenshots | Forums | Source code | RSS|
Fun with custom WFC servers
Nov 18th 2018, by StapleButter
PeeJay Bonobo and his friends have been having some fun with melonDS and custom WFC servers. For example:
So, what do we learn from this?
• This is some pretty cool shit!
• Graphically, melonDS is hardly distinguishable from the real thing
• The wifi stack is also fairly robust! There are no stability issues. Although this is less demanding than local multiplayer (nifi).
• They managed to get this experimental, undocumented feature working.
So yeah, that last point.
Local multiplayer was celebrated and widely advertised, despite suffering from data loss every now and then. But internet connectivity was later implemented but never mentioned in any changelogs. There was only this thread mentioning it.
The first part just requires building up the UI for selecting a network adapter. As well as some extra code for naming them, atleast under Windows: winpcap provides a 'name' and a 'description' for each adapter, the former is some GUID-like identifier string and the latter seems to always be 'Microsoft', so, not too user-friendly. I will have to dig into the Windows API to look for a better method. Haven't checked under Linux but you probably get names like your typical /dev/eth0, which would be good enough.
Second issue is due to the way this works.
Packets sent by the emulated DS will be forwarded as-is over the network, bearing the DS's MAC address. In return, packets sent to the DS will be readable from the host using promiscuous mode, which is widely supported. Basically, on the network, the emulated DS is seen as an entirely separate device from its host. It will typically contact the DHCP and get its own IP address and all.
This is the simplest way to make it work. The main downside is that, well, it doesn't work over wifi. Reason is simple enough: over wifi, traffic goes through an access point, and all devices talking to that access point have to be authenticated and associated with it. This will be the case for the host, but the guest (emulated DS) is seen as an entirely separate device, and has never associated to the AP. Thus, any guest traffic will be rejected.
Two solutions for that:
1. Associating melonDS to the AP.
Amusingly, this would basically be an emulated DS talking to an emulated AP (melonAP) talking to an actual AP.
But there is a giant pile of complication arising from this.
First, we have to somehow figure out that the connection is a wifi connection. Then, figure out the AP's MAC address. Associate melonDS to it, either somehow grabbing the password from the system or asking the user for it. How fun.
Then there is the whole issue that this requires a wifi adapter that supports monitor mode and injection, so we are able to send proper auth/assoc frames. If we have that much control over it, might as well just drop melonAP and directly forward melonDS's wifi traffic to the host wifi adapter, that would be a whole lot simpler. (although we might want to come up with something to take care of WPA2/etc transparently, so you don't need an insecure network)
In the end, this is so unwieldy that requiring an ethernet connection is a better alternative.
2. Doing our own DHCP and NAT
The idea is to create a small subnet between the host and the guest. A small DHCP will hand out a fixed IP address for each. The host will basically act as a bridge between the actual network and the guest.
The advantage would be that this method would be agnostic to the kind of connection used. All the traffic would be explicitly going through the host, instead of pretending to be a separate device.
Of course, we still need to figure out the host's network adapter details, such as its MAC address and its current IP address. libpcap doesn't have all the functionality needed there so we'll likely need some platform-specific code (which we'll need anyway to get proper interface names).
Then it'll need to do some analysis on packets going through it, to change their MAC/IP addresses as needed.
Of course, we could keep the old direct mode as an option, more straightforward and less likely to break but requiring an ethernet connection.
That is, when we're done with the timing renovation.
How's that going btw? Still testing and measuring the DS.
I have general timings more or less covered up now. But, while I was at it, I had to get rid of this old hack:
void CmdFIFOWrite(CmdFIFOEntry& entry)
if (CmdFIFO->IsEmpty() && !CmdPIPE->IsFull())
//printf("!!! GX FIFO FULL\n");
// temp. hack
// SM64DS seems to overflow the FIFO occasionally
// either leftover bugs in our implementation, or the game accidentally doing that
// TODO: investigate.
// TODO: implement this behavior properly (freezes the bus until the FIFO isn't full anymore)
This is when we have a complete GX command entry that we can store in the FIFO. Note the case where the FIFO is full.
What happens then? It stalls the bus until there is room in the FIFO. According to GBAtek, this goes as far as stalling the ARM7.
What does the code above do? Run GX commands until the FIFO isn't full anymore, but this doesn't stall the bus, it basically acts like the FIFO is infinite. Probably has weird implications on timings, given ExecuteCommand() will still count cycles.
This dates back from the original GXFIFO implementation (it's nearly two years old, wow). After some fixing, I eventually had it working well enough that all the games I tested worked without shitting themselves. Except, of course, Super Mario 64 DS, which still kept overflowing the FIFO. I just figured that the game was, well, not very well programmed (it uses immediate-mode DMA instead of the adequate GXFIFO mode, for instance). So I hacked it and postponed the whole thing, instead focusing on, you know, building the emulator core and making it be more than a curiosity.
But now we're at the stage where our core is mostly good and we have to make the emulator be awesome in all other ways.
And, especially, the timing renovation. Attempting to fix a whole bunch of bugs that are timing issues.
So, of course, when I went there, I ran into SM64DS again. Such is destiny. But I went on anyway, and implemented GXFIFO bus stalls.
And of course, SM64DS wasn't too happy with it. The stalls caused visible lag spikes. Those don't occur on hardware, so clearly we're doing something wrong there.
Logging shows that the game is constantly blasting the GXFIFO with 118-word bursts, regardless of FIFO levels. It never checks GXSTAT. It just keeps transferring chunks over and over again until all is transferred. It doesn't care what the GPU has to say about this.
So I inspect the code that is doing the transfers. This is not a bug, the code is meant to work this way.
Side note: the 118-word chunk looks like an attempt at replicating the GXFIFO DMA mode, where it transfers 112-word chunks whenever the FIFO level is below 128. I don't see what the point, even moreso when the implementation is flawed in that it never actually checks FIFO levels and just keeps firing at the FIFO.
So the immediate implication of this is that melonDS is likely taking too long to run the display lists.
Which is where I do a few hardware tests, checking whether writing to the GXFIFO has higher waitstates or whatever, but nope...
However it turns out that some commands execute faster than we thought.
For example, TEXCOORD, NORMAL, VTX_16. Individually, they take 1, 9 and 9 cycles respectively, as GBAtek says. All is good.
If you combine the three, it should thus take 19 cycles. But on hardware, it takes... 9 cycles.
This is showing us that there is parallel execution going on. Which explains why our display lists are taking too long.
So, here it is. I'm way down the rabbit hole, working out how this works, which commands support parallel execution, with which commands and when.
|8 comments have been posted.|