Having 'fun' with the DSP
DSi support in melonDS has been getting to a pretty good state lately. Basically, the only remaining 'big' thing to deal with is DSP support. The rest is mostly bugfixes, implementing misc features (hi, power button), etc, you get the idea. Plus, an aging cart for the DSi has been discovered, which will help implement and test the DSi features more thoroughly than commercial games do. And then there are the various quality-of-life improvements that come to mind, like not requiring BIOS/firmware/NAND dumps...

Anyway, the DSP.

The thing I have always kept pushing back, and for two good reasons. First, the DSP instruction set and encoding is a mess, and the documentation on it is lackluster. Second, there's hardly anything on the DSi that uses the DSP. The DSi sound app, and a few other DSi titles, and that's it. Everything else sticks to the old DS sound mixer.

But what we have going for us is that the 3DS uses the same DSP, and it is much more popular there with every game using it, so the 3DS scene has already been dealing with it. Namely, we have teakra, which is a fairly reliable DSP interpreter and disassembler.

PoroCYon had integrated teakra into melonDS in an attempt at bringing DSP emulation. It didn't work, but it was atleast a pretty good base. I had tried quickly fixing a couple bugs in it, with no real success. I wasn't really looking forward to having to debug DSP code, either, to be honest.

Lately, I felt like looking into it again.

I launched the DSi sound app, and was sidetracked by another, unrelated bug: the sound app crashes when starting due to a bad memory access. It went unnoticed before because we didn't emulate data aborts, but now, we do, and we can't go back on that.

So I researched that bug. It's a timing issue.

The crash happens when trying to dereference a particular pointer, because it is NULL. During startup, the main thread will allocate some memory, then run a bunch of initialization, then initialize that pointer. During the initialization, it sends an IPC request to the ARM7 to determine whether headphones are connected. While it waits for the ARM7 to respond, another thread runs, which does some other initialization, sends other IPC requests to the ARM7 (to get the date/time and battery status), then tries to do things with the aforementioned pointer, which is expected to have been initialized by now.

On the ARM7, the IPC-receive IRQ handler dispatches requests to their appropriate callbacks, which will then forward the request to the appropriate thread, which later services the request and responds to the ARM9. It's worth noting that the threads which service the RTC and PMIC requests have higher priority than the one which services requests like the aforementioned 'get headphone status'.

What happens in melonDS is that the ARM9 runs too fast, and sends its IPC requests too fast, causing the RTC and PMIC requests to take over the initial headphone-status request. When they are serviced, the ARM9 thread which sent them will then try to access the problematic pointer, before the ARM7 had a chance to service the headphone-status request, and thus before the main ARM9 thread had a chance to finish its initialization. The problematic pointer is NULL, hence the crash.

While discussing timing issues with Generic, he brought up that the unused ARM9 instruction cache implementation in melonDS, when hooked up, helped fix some of the known timing issues. So I gave it a quick try, and it fixes the DSi sound app crash. So this is something we need to think about -- it doesn't magically fix all the timing issues, but it seems to help more than I originally thought. It's also worth noting that the performance penalty from emulating the instruction cache isn't very bad, because instruction fetches are largely predictable.

For now, I kept it as a quick fix that I didn't commit or anything, just so I could run the DSi sound app and try to get the DSP working.

The first issue was that the DSP was just not running at all, because accesses to the DSP registers were rejected if the DSP wasn't already running. Except you need to access these registers to start the DSP, so... yeah.

A couple fixes later, the DSP was running... except it wasn't doing much at all, besides crashing after a while, because the memory it was reading was all zeroes. Because, due to another silly bug, this time in the NWRAM mapping code, the I/O writes that mapped NWRAM banks to the DSP weren't getting through.

At this point, the DSP was running its code, and all seemed good... except it didn't do much besides get stuck in a loop. The ARM9, on its side, was waiting for feedback from the DSP, but wasn't getting anything.

So this meant I had to dive into DSP code. The instruction set itself isn't as bad as I thought. Given the mess the encoding is, I expected DSP code to be an unreadable mess, but it wasn't nearly that bad and I could somewhat figure out what the code was doing. Now, I had to figure out why it was getting stuck in that loop and what was the expected operation. When I talked about this in the emudev Discord, PSI gave me a disassembly of aac.a (the DSP binary the DSi sound app uses), which helped a lot. Not only I didn't have to awkwardly hook into the teakra disassembler and generate lengthy instruction logs to try figure out what was going on, but the disassembly also has function names, which helps a lot with figuring out what the code is trying to achieve.

In this case, we looked at the code, and found that it was getting stuck into that loop after an unsuccessful malloc() call. That call was failing because the requested size was wrong: it was 0xD1C0, but that particular malloc() implementation couldn't take sizes larger than 0x8000. Except the size passed to malloc() was loaded from memory, and wasn't initialized by the DSP code, so it was part of the DSP binary itself.

For a while, both PSI and me were stumped. We couldn't figure out how this was supposed to work.

Until I finally figured it out. PoroCYon was also aware of the issue. Due to the way the DSi sound app does NWRAM mapping, it is done in separate steps: each given NWRAM bank is first disabled, then remapped to the DSP, then re-enabled, then the offset is changed (changing where the bank appears within its region's address space). The DSP-side mapping code in place was ignoring the last change, which resulted in NWRAM banks in the DSP data region ending up in the wrong place. Hence the wrong malloc() size.

The way NWRAM mapping was done was also problematic in general. teakra uses its own, flat memory buffer to emulate the DSP code and data memory. To get around this, the DSP interface code was trying to detect NWRAM mapping changes and copy data around to keep teakra's buffer in sync with melonDS's NWRAM banks. I decided that this way of doing things was too complex and prone to problems, and instead modified teakra to directly access melonDS's NWRAM banks.

Then, the aforementioned malloc() call got a size of 0x14, which seemed much more reasonable, and resulted in a successful allocation.

This is the point I've gotten to, now. The DSP runs and communicates with the ARM9. It's not perfect yet, though. We have yet to implement the SoundEx module, so the DSP's audio output can be interleaved with that of the DS audio mixer. There are probably also other bugs on the DSP side (there are lots of underrun warnings in the DSP audio output module for some reason). teakra itself is also unoptimized, and pretty slow (the DSi sound app runs at a whopping 4FPS for me).

But we atleast have a working base we can build upon. It's possible to optimize teakra once everything is good, but a more productive avenue will be to build a DSP JIT (or integrate an existing one into melonDS if possible).

All in all, was certainly a fun ride.
🤔 says:
Oct 15th 2022
Huh. I remember improved audio was one of the TWL enhanced games key features, or at least it was marketed as so.
What did they do if you are telling me the DSP is almost unused?
ari32 says:
Oct 15th 2022
Eh, nintendo rarely makes use of the shiny new features they add. Still haven't played a game with HD rumble on the switch between zelda, smash, mario kart and splatoon.

Anywho, I was under the impression you were a pretty good programmer Aristotura, but the rate at which you implement such massive subprojects makes it seem like this stuff is child's play to you. I'm always really impressed at how fast you pull this project along, and with your seemingly endless work ethic despite all that's going on in your life.
AsPika says:
Oct 16th 2022
Keep moving forward! 👍
BLsquared says:
Oct 17th 2022
Super fun read! Wow.
Arisotura says:
Oct 18th 2022
I wouldn't call it child's play :P

but regardless, thanks!
poudink says:
Oct 18th 2022
Would Citra's DSP implementation be a viable alternative if Teakra is too slow?
EDIT: Wait, does Citra use Teakra for LLE audio? I don't remember it being that slow, though.
EDIT2: Nevermind, tried it again and it is indeed that slow.
EDIT3: Does Corgi3DS have a decent DSP implementation?
Sam says:
Oct 19th 2022
On a unrelated topic, is it possible to add "Enable Triple Buffering" to 1.0 release? I enable v-sync to avoid screen-tearing but can't stand the input lag; also, there is no way to force triple buffering on my Intel integrated graphics card...
poudink says:
Oct 20th 2022
Icache fixes TWEWY too, right? Also potentially Sonic Chronicles and a few more games. Worth it, IMO. If there's another way to fix those titles, then by all means, but TWEWY is a cult classic, Sonic Chronicles is pretty notorious (for the wrong reasons) and people are nostalgic for the sound app. This'd allow some of the most popular titles among those that remain broken in melonDS to finally work, so I think it's worth losing a little performance for.
Shadowwolf1337 says:
Oct 21st 2022
i'm always patiently waiting for the day when Mario Party DS netplay becomes a reality

you're doing god's work
poudink says:
Oct 22nd 2022
In the meantime,I wonder how well the new wifi implementation wotks over parsec.
Maxim Katsur says:
Oct 29th 2022
I hope 0.9.5 will have a good number of quality-of-life updates for DSi emulation, especially the DSi CDN file support...
hoangson2007 says:
Oct 29th 2022
Also, I hope 0.9.5 would make setting up an SD card image for modding the emulated DSi easier. I made multiple SD card images, but Unlaunch either say "Error: Empty MBR entry", "Error: Cluster too small", or just didn't recognize the SD card at all.
hoangson2007 says:
Oct 29th 2022
And the SD card trouble comes after the 0.9.4 version straight-up refuses to launch Twilight Menu, forcing me to download a pre-release build.
Post a comment