melonDS RSS The latest news on melonDS. melonDS 0.9.5 is out! -- by Arisotura Thu, 03 Nov 2022 21:35:25 +0000
As of today, the melonDS project is 6 years old. For this occasion, we present you this special version of the melonDS logo, recolored to the same pretty sky-blue color as 6 itself:

We wanted to have Peach bake a cake shaped like this, but Bowser kidnapped her again. We aren't great bakers at the melonDS HQ, so... yeah.

Regardless, these 6 years are a great success. Back in 2016, when I started working on melonDS, I was mostly just making it to have fun and pass time until my job started. I had absolutely no idea the project would go on for so long, and be as much of a success as it has been. So, first of all, I want to thank all the comrades who have helped make this possible. The melonDS team and other contributors. nocash and his great documentation. Everybody else who has been involved in reverse-engineering the DS/DSi hardware, cracking the DSi security, etc... And of course, everybody who has been using melonDS, testing games in it, reporting issues, suggesting improvements, etc...

Thank you all. melonDS is a team effort, and you deserve your part of the birthday cake.

And, of course, the birthday present. There's only one, but it's a big one. We bring you melonDS 0.9.5, and if you've been following the blog lately, you know it's going to be big.

melonDS 0.9.5

So what are the highlights of this 0.9.5 release?

Improved local multiplayer

This is a big change. If you know melonDS, you know that local multiplayer has always been finicky. You had to disable your framerate limiters, sacrifice some goats to the wifi deities, and hope everything would work without disconnecting. Well, this is past now! melonDS 0.9.5 is the result of the first season of the local multiplayer saga, and I dare say that the result is pretty good.

As an example: there was no way to get Mario Kart DS multiplayer to stay connected for more than a few seconds, before. Now? It's smooth as butter. Many games have been tested and most of them work absolutely fine, atleast in two-player mode. It is also possible to go for three players and more, but more atypical multiplayer settings might run into problems, or suffer from decreased performance. Oh and if you have an original DS firmware, it's also possible to use download play and Pictochat.

However, it is worth noting that I've had to completely rework the way local multiplayer communication and sync were handled. The new method requires all participating melonDS instances to be running on the same machine, so it isn't possible to play over LAN. Not that it has ever worked well tbh. But, essentially, the new IPC communication layer has extra smarts to avoid lag as much as possible, which are made possible by the use of shared memory -- these would be much more difficult, if not impossible, to replicate with BSD sockets over the network.

The melonDS UI has also been revamped to make the multiplayer experience smoother. It is possible to launch new isntances of melonDS easily from the emulator's System menu (opening them just by opening the melonDS executable should also work). Certain parts of the emulator's configuration will be unique to each instance, for example it is possible to configure each instance to use a different joystick, to select which instances can output sound, and so on.

There are a couple shortcomings to this. First one, some emulator settings (like BIOS/firmware files) are shared across all instances, but if you modify them, they may not reflect to instances which are already opened. These settings should be easily identifiable as I made them editable from the first melonDS instance only. Also, keyboard input, due to how it works, won't be suitable for playing a multiplayer game with a friend -- you will need joysticks for that.

However, we have big plans for this. There will be a second season to the local multiplayer saga, where we will implement netplay. This will make it possible to play with your friends over LAN, or even over the internet. The downside is that it will require each participating machine to run every participating melonDS instance, but we're confident that any decent computer can handle this. Besides, this is the cost to keeping a multiplayer game in sync. Due to how tight the local multiplayer timings are, there's no way Nintendo's wifi protocol would ever work over the internet.

The improvements to local multiplayer include improvements to wifi emulation itself, which may also improve stability in WFC games.

DSi camera support

melonDS had basic DSi camera emulation since version 0.9.1, but it wasn't very useful as it was just feeding a fixed stripe pattern as camera input. You guess this doesn't help sell our DSi emulation, seeing as cameras are the number one feature of that console, and the main reason some games support it at all instead of just sticking to the original DS.

melonDS 0.9.5 has actual DSi camera support now. That is, you can configure camera input to be sourced either from physical cameras on your computer, or from a fixed image file. Camera emulation has also been improved, which means that for example it is possible to take pictures in the DSi camera app, and they will be saved to your emulated SD card if you have one. Not the most useful thing, but this should also mean better camera support in DSi games.

Revamped OpenGL context handling

The way melonDS handled OpenGL contexts in the Qt frontend was weird. It was responsible for a number of problems, such as for example the inability to support proper vsync.

This is past now. Generic has been porting Stenzek's OpenGL context code from DuckStation to melonDS. This reworks the way OpenGL contexts are handled to be more sane and less prone to problems. This also means that we now have an actual, proper vsync setting.

DSi DSP support

Well, technically, we've had support for that since melonDS 0.9.3. It just didn't work due to a bunch of issues with it. Now it does.

Don't get overly excited over this, though. While I have verified that it atleast works to some extent, teakra (the DSP interpreter) is slow enough that melonDS will likely fall to a single-digit framerate when the DSP is used. That's an improvement over just freezing, but certainly nowhere near playable. We will work on a DSP JIT to alleviate this.

CLI improvements courtesy patataofcourse

The command line interface (CLI) of melonDS has been revamped. Most notably:

• It is possible to boot melonDS with no game loaded, or to load a game without booting, with the -b (or --boot) switch (values: always/never/auto)
• the -f (or --fullscreen) switch lets you start the emulator in fullscreen
• when loading an archive via the CLI, it is possible to specify which file to load from the archive
• the CLI help (invoked by the --help option) will show all the possible CLI arguments and values

We plan for this release to be the last of the 0.9.x series. melonDS 1.0 is coming next, and it will be big.

In the meantime, enjoy, and stay tuned for more!

melonDS 0.9.5, Windows x64
melonDS 0.9.5, Linux x64
melonDS 0.9.5, Linux ARM64
melonDS 0.9.5, macOS universal]]>
Having 'fun' with the DSP -- by Arisotura Fri, 14 Oct 2022 11:32:45 +0000 aging cart for the DSi has been discovered, which will help implement and test the DSi features more thoroughly than commercial games do. And then there are the various quality-of-life improvements that come to mind, like not requiring BIOS/firmware/NAND dumps...

Anyway, the DSP.

The thing I have always kept pushing back, and for two good reasons. First, the DSP instruction set and encoding is a mess, and the documentation on it is lackluster. Second, there's hardly anything on the DSi that uses the DSP. The DSi sound app, and a few other DSi titles, and that's it. Everything else sticks to the old DS sound mixer.

But what we have going for us is that the 3DS uses the same DSP, and it is much more popular there with every game using it, so the 3DS scene has already been dealing with it. Namely, we have teakra, which is a fairly reliable DSP interpreter and disassembler.

PoroCYon had integrated teakra into melonDS in an attempt at bringing DSP emulation. It didn't work, but it was atleast a pretty good base. I had tried quickly fixing a couple bugs in it, with no real success. I wasn't really looking forward to having to debug DSP code, either, to be honest.

Lately, I felt like looking into it again.

I launched the DSi sound app, and was sidetracked by another, unrelated bug: the sound app crashes when starting due to a bad memory access. It went unnoticed before because we didn't emulate data aborts, but now, we do, and we can't go back on that.

So I researched that bug. It's a timing issue.

The crash happens when trying to dereference a particular pointer, because it is NULL. During startup, the main thread will allocate some memory, then run a bunch of initialization, then initialize that pointer. During the initialization, it sends an IPC request to the ARM7 to determine whether headphones are connected. While it waits for the ARM7 to respond, another thread runs, which does some other initialization, sends other IPC requests to the ARM7 (to get the date/time and battery status), then tries to do things with the aforementioned pointer, which is expected to have been initialized by now.

On the ARM7, the IPC-receive IRQ handler dispatches requests to their appropriate callbacks, which will then forward the request to the appropriate thread, which later services the request and responds to the ARM9. It's worth noting that the threads which service the RTC and PMIC requests have higher priority than the one which services requests like the aforementioned 'get headphone status'.

What happens in melonDS is that the ARM9 runs too fast, and sends its IPC requests too fast, causing the RTC and PMIC requests to take over the initial headphone-status request. When they are serviced, the ARM9 thread which sent them will then try to access the problematic pointer, before the ARM7 had a chance to service the headphone-status request, and thus before the main ARM9 thread had a chance to finish its initialization. The problematic pointer is NULL, hence the crash.

While discussing timing issues with Generic, he brought up that the unused ARM9 instruction cache implementation in melonDS, when hooked up, helped fix some of the known timing issues. So I gave it a quick try, and it fixes the DSi sound app crash. So this is something we need to think about -- it doesn't magically fix all the timing issues, but it seems to help more than I originally thought. It's also worth noting that the performance penalty from emulating the instruction cache isn't very bad, because instruction fetches are largely predictable.

For now, I kept it as a quick fix that I didn't commit or anything, just so I could run the DSi sound app and try to get the DSP working.

The first issue was that the DSP was just not running at all, because accesses to the DSP registers were rejected if the DSP wasn't already running. Except you need to access these registers to start the DSP, so... yeah.

A couple fixes later, the DSP was running... except it wasn't doing much at all, besides crashing after a while, because the memory it was reading was all zeroes. Because, due to another silly bug, this time in the NWRAM mapping code, the I/O writes that mapped NWRAM banks to the DSP weren't getting through.

At this point, the DSP was running its code, and all seemed good... except it didn't do much besides get stuck in a loop. The ARM9, on its side, was waiting for feedback from the DSP, but wasn't getting anything.

So this meant I had to dive into DSP code. The instruction set itself isn't as bad as I thought. Given the mess the encoding is, I expected DSP code to be an unreadable mess, but it wasn't nearly that bad and I could somewhat figure out what the code was doing. Now, I had to figure out why it was getting stuck in that loop and what was the expected operation. When I talked about this in the emudev Discord, PSI gave me a disassembly of aac.a (the DSP binary the DSi sound app uses), which helped a lot. Not only I didn't have to awkwardly hook into the teakra disassembler and generate lengthy instruction logs to try figure out what was going on, but the disassembly also has function names, which helps a lot with figuring out what the code is trying to achieve.

In this case, we looked at the code, and found that it was getting stuck into that loop after an unsuccessful malloc() call. That call was failing because the requested size was wrong: it was 0xD1C0, but that particular malloc() implementation couldn't take sizes larger than 0x8000. Except the size passed to malloc() was loaded from memory, and wasn't initialized by the DSP code, so it was part of the DSP binary itself.

For a while, both PSI and me were stumped. We couldn't figure out how this was supposed to work.

Until I finally figured it out. PoroCYon was also aware of the issue. Due to the way the DSi sound app does NWRAM mapping, it is done in separate steps: each given NWRAM bank is first disabled, then remapped to the DSP, then re-enabled, then the offset is changed (changing where the bank appears within its region's address space). The DSP-side mapping code in place was ignoring the last change, which resulted in NWRAM banks in the DSP data region ending up in the wrong place. Hence the wrong malloc() size.

The way NWRAM mapping was done was also problematic in general. teakra uses its own, flat memory buffer to emulate the DSP code and data memory. To get around this, the DSP interface code was trying to detect NWRAM mapping changes and copy data around to keep teakra's buffer in sync with melonDS's NWRAM banks. I decided that this way of doing things was too complex and prone to problems, and instead modified teakra to directly access melonDS's NWRAM banks.

Then, the aforementioned malloc() call got a size of 0x14, which seemed much more reasonable, and resulted in a successful allocation.

This is the point I've gotten to, now. The DSP runs and communicates with the ARM9. It's not perfect yet, though. We have yet to implement the SoundEx module, so the DSP's audio output can be interleaved with that of the DS audio mixer. There are probably also other bugs on the DSP side (there are lots of underrun warnings in the DSP audio output module for some reason). teakra itself is also unoptimized, and pretty slow (the DSi sound app runs at a whopping 4FPS for me).

But we atleast have a working base we can build upon. It's possible to optimize teakra once everything is good, but a more productive avenue will be to build a DSP JIT (or integrate an existing one into melonDS if possible).

All in all, was certainly a fun ride.]]>
Merge party -- by Arisotura Sun, 02 Oct 2022 16:55:01 +0000

First, local_wifi has been deemed close enough to completion, and has been merged.

I added some tidbits of wifi emulation (for example, the Inazuma Eleven games use WEP during local multiplayer, so I had to implement enough of that to make them work), but most notably, I worked on the UI side of this feature. No more opening melonDS instances from different folders, MAC randomization, and other awkward workarounds. Now melonDS is able to detect when coexisting with other melonDS instances, and makes sure to save things like user settings, save file, etc, to separate files. For example, that makes it possible to configure each instance to use its own joystick.

It is probably still not perfect, but it's certainly a start. I also want to hear some user feedback on all this.

Regarding local multiplayer, the BSD socket interface is going byebye. This means it's no longer possible to play over LAN (not that it has ever really worked). But we plan to implement netplay for melonDS 1.0, so that will make up for it. It should even work better, in that the connection will be more reliable, but due to having to emulate all the participating consoles on every user's computer, it will be more demanding (although testing has shown that any decent computer can handle atleast two melonDS instances at fullspeed).

Next, I finished the work on camera_betterer, and merged it too.

Basically it was only matter of finishing up the UI side of things, adapting the Qt camera code for Qt6, and fixing up some tidbits (like camera image formats -- most cameras should be able to provide YUYV image data, but Mac cameras can only provide NV12).


You can, of course, feed video input from a physical camera into melonDS, but you can also choose to just feed a fixed picture into it.

There's also a setting for horizontally flipping the picture. A note on that: the DSi cameras have a register for flipping the picture horizontally and vertically, which is typically used to horizontally flip the picture from the inner camera, and melonDS emulates that. But, if for whatever reason the picture you get isn't in the correct orientation, you can use the provided horizontal flip setting to fix that.

I might also add in some more fun features, so stay tuned! And of course, if you're willing to experiment, we're open to issue reports on these new features. They will come in melonDS 0.9.5, but for now you can get dev builds from Github to try them out.]]>
The local multiplayer saga, ep 10 -- by Arisotura Thu, 08 Sep 2022 08:18:45 +0000
We have also been porting our semaphore code so that our shared-memory based communication layer could work under Linux and macOS. This has shown to be a bit challenging, and I've been making several attempts to try and determine what was the best method for fast and reliable IPC.

What is left is mostly UI-related work. Handling several situations that can be problematic during multiplayer: supporting multiple controllers, maybe also things like multiple firmware settings, making sure the multiple running games don't write to the same save file, firmware file, and so on. I have a few ideas for that.

That will help turn local multiplayer support into a more finished product and less of a clunky experimental thing.

But for now, I'm going to take a break. I've been pouring a lot of my free time into this, and I think I need to relax for a lil' while.

I'm also going to debate the direction to take now with the rest of the melonDS team. There will be a second season to this saga, and it will be about implementing netplay. The results of all this wifi work are well beyond my original expectations, and now we have big ideas and hopes for this. But first, I want to take a collective decision about whether to make a release before season 2, whether to go straight for melonDS 1.0, these things. Stable local multiplayer was one of the must-haves for melonDS 1.0, but that's not all there is.

I'm also collecting ideas for netplay, so if you know more about this than I do (which is, relatively little), feel free to reach out to me!]]>
The local multiplayer saga, ep 9 -- by Arisotura Sun, 28 Aug 2022 15:47:19 +0000
I looked deeper into the issue. I couldn't let that slip. But I also didn't really have an idea what could be causing it or where to begin looking.

I thought about bit0 in USCOMPARE, again. Basically, when they're connected to a host and receiving beacons from it, games do that weird little dance where they take the beacon's timestamp value, add some offset to it, and write that to USCOMPARE, with bit0 set.

A bit of background. In the DS wifi module, there are two ways of triggering IRQ14: BEACON_COUNT and USCOMPARE. The former triggers IRQ14 every time the BEACON_COUNT timer reaches zero, the latter triggers it when USCOUNT matches USCOMPARE. All fine and dandy.

Back then, I had done hardware tests to figure out what bit0 in USCOMPARE does, given it's a special write-only bit. I had observed that it blocks BEACON_COUNT from triggering IRQ14 until USCOUNT matches USCOMPARE, effectively ensuring that the next IRQ14 will be triggered by USCOMPARE.

But that still didn't quite make sense. Here, the value written to USCOMPARE was based off the beacon's timestamp, which was basically the host's USCOUNT register, which was obviously different than the client's. So I couldn't really see what that was supposed to achieve.

Yesterday, as I went to bed, I had an idea. What if, when receiving a beacon frame with the right BSSID, the DS automatically sets its USCOUNT register to the beacon's timestamp?

But you'd think that sounds far-fetched? I thought so too.

I tested it anyway this morning. Guess what I found out?

The DS does exactly that.


And implementing that in melonDS got rid of the slowdown problem. Local multiplayer connections are now smooth as butter.

To make the whole experience equally smooth, I'm also in the process of ironing out the kinks that may happen when trying to connect multiple games together. Turns out, some games do a bunch of weird shit, and I need to make my comm layer resilient to that.

Also, since the BSD-socket interface was brought up in the comments to the last post: I'm thinking of bringing it back, but now there are some issues to that. The shared-memory interface was originally a simple dumb comm layer akin to the BSD-socket interface. But, as I worked to iron out the kinks in multiplayer connections, I had to add in some extra intelligence to avoid hitting timeouts and causing lag as much as possible. A lot of it works due to the availability of a shared memory space, and would be difficult to port to the BSD-socket interface. So, while it could be brought back and updated to work again, it may be suboptimal under certain circumstances.]]>
The local multiplayer saga, ep 8 -- by Arisotura Fri, 26 Aug 2022 14:48:02 +0000
In this case, it was related to the power-down registers. The game code may regularly decide to stop and restart the wifi hardware to reset some of its state, and I found out that sometimes that would happen while a frame was being received, which caused a bunch of problems in melonDS.

The code uses register W_POWERFORCE to turn off the wifi transceiver. But, that's the thing, we thought that operation was instant. However, the game had a loop waiting on some status registers after writing to W_POWERFORCE, which implied that the shutdown operation might not always be instant. Hmm...

A couple hardware tests later, this theory is confirmed: if the transceiver is turned off while transmitting or receiving a frame, it will first finish that operation before actually turning off. There we go.

Implementing that into melonDS didn't fix the issue with NSMB minigames, but it helped make local-multiplayer connections more stable overall, for example the slowdowns I had observed in MvsL seem to be gone now. Then I also fixed the issue with the minigames, which happened to be caused by some leftover code that was bad and no longer needed.

As it seems that I have gotten local multiplayer pretty stable and resilient now, folks from our community have been stress-testing it. For example:

All in all, certainly not bad. Consider that there is no way this would ever work reliably on the current melonDS release.

However, we're not done yet. While we can claim victory on the horseman of wifi, there's still a bunch of things to do for a better user experience.

First of all, I need to clean up my code. Then, as far as wifi emulation is concerned, I want to add some features to make it more resilient all around, and rewrite it to use a proper event scheduler instead of running every microsecond, which should hopefully improve performance.

I also need to handle disconnects more gracefully. Right now, there's a chance that they will just leave the other instances hanging or slowing to a crawl, and I need to deal with that. The current system is cool and all but it certainly doesn't fail gracefully. I think most of it is the egregious receive-wait timeout, but I might still have some work to do in the error handling department. The timeout will also be lowered, and maybe made user-configurable.

Something I'm considering is bringing back the old BSD-socket comm layer, in complement to the current shared-memory one. It atleast deserves a fair performance comparison, as it appears that its original slowness was due to the initial, overengineered sync mechanism -- according to a quick test, BSD sockets aren't slower than shared memory, atleast when all instances are running on the same computer. It is also theoretically possible to play multiplayer games over LAN, but it needs an enterprise-grade network to be somewhat usable, so I'm not yet sure whether it's worth making this possible.

And of course, there are many improvements to be done to the UX side of things, to make local multiplayer comfortable. Reworking input handling, sound output, how things like melonDS.ini, savefiles, etc should be dealt with, etc...

And, once all that is taken care of, then can begin season 2: netplay support.

NSMB MvsL over the internet, anyone?]]>
The local multiplayer saga, ep 7 -- by Arisotura Thu, 25 Aug 2022 12:51:11 +0000
In the previous post, we saw that we had lag in some games because the client was sometimes not replying, so I had to deal with that issue.

I introduced the new local-multiplayer data exchange/sync mechanism in ep 2, and since then it has seen several iterations as I made it simpler and more efficient.

For one, I used special-purpose microsecond counters for this instead of relying on USCOUNT. The first issue there was with that was that USCOUNT can be disabled or modified at any time by software, which isn't ideal if we're going to rely on it for sync purposes. Second benefit is that now, when a client connects, I can just sync up its counter to the host's, and not have to deal with any offset between the two. So, all in all, it's more reliable this way, and a lot easier to deal with.

Next change was that I got rid of the specific 'sync point' messages and simply attached timestamps to packets. This reduces the amount of data being exchanged, and let me simplify the sync mechanism too. Now, when receiving a packet from the host, clients know when to actually start emulating reception of that packet, and how long they can keep running on their own before having to wait for another packet from the host. This way, clients can't run too far ahead of the host, and we also ensure the host doesn't run too far ahead by dropping any replies that are too old. All in all, this system is more efficient.

Lately, I also reworked how replies are dealt with, to make things more efficient and reliable, and to deal with the issue at hand in this post. Replies go through a separate FIFO, and instead of being broadcast to everybody, they're only received by the host, which mirrors the way local-multiplayer communication works in real life.

To deal with the problem of clients sometimes not being ready to reply, I simply made clients send a blank if they receive a CMD frame they can't reply to. The blank is a frame with no data, that the host ignores -- it's just there so that there is something to receive, so that we don't hit the timeout.

A problem with this system is that it won't function well if there are two or more concurrent local-multiplayer games running on the same computer, but I don't think that's a use case we need to worry about.

Anyway, this isn't perfect yet, but the results are encouraging. Tetris DS and Mario Kart DS connect with no hitch now, so that issue is dealt with. However, I'm noticing some regressions. For example, NSMB minigames won't connect: I immediately get a comm error screen on the client. I looked into it quickly, and found out that the client is doing weird things (like occasionally setting RXBUF_RDCSR to 0x05FE, which doesn't look right), but I need to look deeper into it. It's possible that the issue was always here but was less apparent due to the different way of doing things. I have also seen some occasional slowdown in NSMB and Mario Kart, so that may be related.

Oh well, we'll get there. It's getting close at this point.]]>
The local multiplayer saga, ep 6 -- by Arisotura Mon, 22 Aug 2022 08:35:56 +0000
And I certainly did deal a blow to that horseman back in 2017 when a working local multiplayer connection was emulated for the first time, but of course it wasn't going to be all. While the connection worked, it was always finicky, and getting it to work was a bit like voodoo magic. After melonDS 0.4 was released, melonDS changed from being a mere curiosity to being 'the wifi emulator', and many people came in for that feature. And we just kept telling them to disable their framerate limiter, pray, sacrifice a goat or two to the wifi deities, and hope things will work.

It was certainly substandard for melonDS. One of the main reasons why it stayed that way for long was, besides having to work on other fronts, that it was a precarious position of equilibrium. Any attempts at improving wifi emulation resulted in a more unstable connection. I thought it would take an overhaul of how the connection is emulated to make it work reliably, but I always had that fear of putting much time and effort into it only to get terrible results, so the status quo was maintained. But you know how it is: when making your emulation more accurate gives worse results, it means you're missing something else. After 5 years of business, we're no stranger to this.

So what does it take to beat the horseman of wifi? Two things: a stable, reliable medium of communication, and reverse-engineering of the DS wifi hardware, which is full of fun little details that sometimes matter a lot.

But, of course, it's not going to go down without resistance. It's a horseman of apocalypse. It will put up a fight at every stage.

Lately, we got to a point where local multiplayer connections are pretty stable, and it's possible to reach decent speeds on good hardware. For reference, my 2014 laptop Crepe can run New Super Mario Bros. MvsL multiplayer at near-fullspeed with the JIT on. I'm confident that any decent computer will be up to the task when we're done.

But, of course, that's with well-behaving games, where everybody sends their packets as they should and everything is good. While we were testing other games, it appeared to us that there are... cursed games. Tetris DS and Mario Kart DS, for example, have exhibited intense lag while trying to get a multiplayer game going.

I looked at what was going on in Tetris DS. The lag is caused by the host hitting the timeout when waiting for incoming client replies, because sometimes the client isn't sending any reply at all. It was egregious because I set the timeout to a ridiculous value (500 milliseconds) to make sure everything would get through reliably. A lower timeout reduces the lag in this situation, but also increases the chance of missing incoming packets when we need them, so this is essentially a compromise (we might end up making the timeout user-configurable).

But, why are we not receiving replies from the client to begin with? In a previous post, I have mentioned that the DS, when it acts as a multiplayer client, always replies to CMD frames, even when no reply frame is configured, and melonDS emulates this behavior too.

But how does the DS know when it's acting as a multiplayer client? When the AID_LOW register is set to a non-zero value. That register is essentially the client number in a multiplayer setting. If it's set to zero, the DS will not reply to CMD frames.

This is what is happening in Tetris DS. When the game receives a CMD frame, it sets up a callback that will run after the end of the entire CMD exchange and reset AID_LOW to zero. Then, whenever it wants to send a reply frame, it will set AID_LOW to the correct value. Why it's doing that, I don't know. But there's a small chance that AID_LOW will still be zero by the time a CMD frame is received, which means no reply will be sent.

I don't know if this is an issue with melonDS or if this also happens on hardware. All I know is that I had trouble falling asleep last night because I was thinking about ways to deal with this without hitting timeouts and lagging.

Oh well. I refuse to give up. You're going down, horseman of wifi.]]>
The local multiplayer saga, ep 5 -- by Arisotura Sat, 20 Aug 2022 11:40:08 +0000
Lately, I have been working on faster IPC, so I could get a more accurate idea of how much it would take to emulate local multiplayer at acceptable speeds, and whether my ideas were feasible at all. First order was getting rid of the old inefficient BSD sockets.

Instead, I built a data exchange system based on shared memory. The idea is simple: have a large block of memory shared between all melonDS instances, where you have a FIFO buffer for exchanging wifi packets, another FIFO buffer for exchanging sync points, and a small header containing some useful status bits. Then each instance has a set of semaphores that other instances can use to signal when new packets or sync points are sent. All fine and dandy.

I figured this would be easily taken care of, with QSharedMemory and QSystemSemaphore. And I ran into my first issue: QSystemSemaphore sucks. Basically, it isn't possible to wait on a QSystemSemaphore with a timeout, which means that waiting on it will block until it is signaled. What if, for whatever reason, the other side isn't signaling the semaphore? That's a deadlock. In our situation (and probably many others), this is unacceptable. The problem has been reported to the Qt team in 2008 and they don't seem very interested in fixing it. So this basically means I had to code my own thing, directly using the OS's named semaphores. Currently, I have only coded it for Windows, so it will have to be ported to Linux and macOS.

On the other hand, QSharedMemory works like a charm, so there's that.

I also reworked the sync system some. It's currently not perfect, but atleast now I know I'm not being held back by faulty emulation: as long as no frames are being dropped or delayed, the communication should work fine.

Overall, this gives much better performance than the previous iteration: NSMB multiplayer reaches near-fullspeed on my laptop Crepe, and that's with the interpreter. There are still issues to iron out, but this gives me hope that we can pull off fun multiplayer features in melonDS.

Also worth noting that there is another candidate for optimization, which I haven't talked much about yet: the wifi module itself.

The wifi module, you say? What's worth optimizing in there?

The wifi module has a bunch of microsecond-precision timers, all sorts of fun status bits, and some parts of Nintendo's code relies on packet transmission/reception taking time and not being instant. And then you have local-multiplayer CMD/reply/ack exchanges and their tight timings. You guess the picture.

So currently, in melonDS, it is emulated at the microsecond level, which means it is run every 33 system cycles (not exactly one microsecond, but it works well enough for our purposes), and all the various counters and things are checked and updated every time. Which is less than optimal.

Compare this to the main emulation loop in melonDS: instead of checking on all the DS subsystems every CPU instruction or every clock cycle, melonDS uses an event scheduler. It is basically a way of planning hardware events in the future (like "VBlank needs to happen in 342 cycles") and keeping track of them. melonDS can then determine the next event to come and how many cycles are left before it needs to be run, the CPUs are run for that many cycles and the event is run. Rinse and repeat. Compared to checking everything per-instruction, this system offers much better performance while retaining decent accuracy.

So it would make sense to use a similar system to emulate the wifi module. The main issue is that it would require extra cleverness to emulate things like the timers, or W_RXTX_ADDR (wifi RAM pointer that is updated during transmission or reception to reflect which address the hardware is accessing). Nothing insurmontable, but it's a pretty big change and it has quite the potential to break things.

So, all in all, I'm hopeful we can get local multiplayer working and good. It will take time, but we will definitely get there for melonDS 1.0.]]>
The local multiplayer saga, ep 4 -- by Arisotura Wed, 17 Aug 2022 22:37:45 +0000
So I did.

Except I still had data loss problems and random hangs in Pictochat.

I ran several hardware tests to make sure I wasn't missing any detail of this feature, and all seemed correct.

So I turned to debugging Pictochat. I noticed that when the client was getting stuck, it just stopped sending reply frames entirely. I backtracked and backtracked to find out why it was doing that, and finally found the cause. It was quite stupid.

Basically, when the DS sends a reply frame, it marks the previously sent frame by changing the first byte of its header to 0x01. All fine and dandy. melonDS of course does the same thing.

Except that when I added support for substituting a frame with a default-empty frame, I copypasted some code without paying enough attention to what it did. And that code was setting the internal reply frame address to zero, which caused the above frame header manipulation to fail unless the previous frame happened to be at address zero (basically, the beginning of the wifi RAM).


After fixing this oversight, I was rewarded with a perfectly stable and smooth connection. No hiccups, no data loss, no hanging. I didn't even see the host try to retransmit any frames.

Finally, I did it. Stable local multiplayer communication.

There's still quite some work before this can be released, though.

For one, I'd like to implement some more features of the wifi hardware. Especially those related to error handling, so if we ever need more lax sync which could incur some data loss, we could let the game deal with it properly.

There are also some issues to iron out with the sync mechanism. Said mechanism is still more or less a proof of concept, it is currently hardcoded for two players and doesn't really fail gracefully if someone leaves the game abruptly. And of course, I also need to implement faster IPC. With the current setup, NSMB multiplayer runs at ~20FPS, which is nice if you enjoy slideshows, but not very playable.

We would also need to integrate multiplayer support more cleanly into melonDS. Maybe we should just go for supporting multiple consoles inside the same melonDS instance.

Stay tuned!]]>