|Home | Downloads | Screenshots | Forums | Source code | RSS | Donate|
|Register | Log in|
|< Merge partymelonDS 0.9.5 is out! >|
Having 'fun' with the DSP
Oct 14th 2022, by Arisotura
DSi support in melonDS has been getting to a pretty good state lately. Basically, the only remaining 'big' thing to deal with is DSP support. The rest is mostly bugfixes, implementing misc features (hi, power button), etc, you get the idea. Plus, an aging cart for the DSi has been discovered, which will help implement and test the DSi features more thoroughly than commercial games do. And then there are the various quality-of-life improvements that come to mind, like not requiring BIOS/firmware/NAND dumps...
Anyway, the DSP.
The thing I have always kept pushing back, and for two good reasons. First, the DSP instruction set and encoding is a mess, and the documentation on it is lackluster. Second, there's hardly anything on the DSi that uses the DSP. The DSi sound app, and a few other DSi titles, and that's it. Everything else sticks to the old DS sound mixer.
But what we have going for us is that the 3DS uses the same DSP, and it is much more popular there with every game using it, so the 3DS scene has already been dealing with it. Namely, we have teakra, which is a fairly reliable DSP interpreter and disassembler.
PoroCYon had integrated teakra into melonDS in an attempt at bringing DSP emulation. It didn't work, but it was atleast a pretty good base. I had tried quickly fixing a couple bugs in it, with no real success. I wasn't really looking forward to having to debug DSP code, either, to be honest.
Lately, I felt like looking into it again.
I launched the DSi sound app, and was sidetracked by another, unrelated bug: the sound app crashes when starting due to a bad memory access. It went unnoticed before because we didn't emulate data aborts, but now, we do, and we can't go back on that.
So I researched that bug. It's a timing issue.
The crash happens when trying to dereference a particular pointer, because it is NULL. During startup, the main thread will allocate some memory, then run a bunch of initialization, then initialize that pointer. During the initialization, it sends an IPC request to the ARM7 to determine whether headphones are connected. While it waits for the ARM7 to respond, another thread runs, which does some other initialization, sends other IPC requests to the ARM7 (to get the date/time and battery status), then tries to do things with the aforementioned pointer, which is expected to have been initialized by now.
On the ARM7, the IPC-receive IRQ handler dispatches requests to their appropriate callbacks, which will then forward the request to the appropriate thread, which later services the request and responds to the ARM9. It's worth noting that the threads which service the RTC and PMIC requests have higher priority than the one which services requests like the aforementioned 'get headphone status'.
What happens in melonDS is that the ARM9 runs too fast, and sends its IPC requests too fast, causing the RTC and PMIC requests to take over the initial headphone-status request. When they are serviced, the ARM9 thread which sent them will then try to access the problematic pointer, before the ARM7 had a chance to service the headphone-status request, and thus before the main ARM9 thread had a chance to finish its initialization. The problematic pointer is NULL, hence the crash.
While discussing timing issues with Generic, he brought up that the unused ARM9 instruction cache implementation in melonDS, when hooked up, helped fix some of the known timing issues. So I gave it a quick try, and it fixes the DSi sound app crash. So this is something we need to think about -- it doesn't magically fix all the timing issues, but it seems to help more than I originally thought. It's also worth noting that the performance penalty from emulating the instruction cache isn't very bad, because instruction fetches are largely predictable.
For now, I kept it as a quick fix that I didn't commit or anything, just so I could run the DSi sound app and try to get the DSP working.
The first issue was that the DSP was just not running at all, because accesses to the DSP registers were rejected if the DSP wasn't already running. Except you need to access these registers to start the DSP, so... yeah.
A couple fixes later, the DSP was running... except it wasn't doing much at all, besides crashing after a while, because the memory it was reading was all zeroes. Because, due to another silly bug, this time in the NWRAM mapping code, the I/O writes that mapped NWRAM banks to the DSP weren't getting through.
At this point, the DSP was running its code, and all seemed good... except it didn't do much besides get stuck in a loop. The ARM9, on its side, was waiting for feedback from the DSP, but wasn't getting anything.
So this meant I had to dive into DSP code. The instruction set itself isn't as bad as I thought. Given the mess the encoding is, I expected DSP code to be an unreadable mess, but it wasn't nearly that bad and I could somewhat figure out what the code was doing. Now, I had to figure out why it was getting stuck in that loop and what was the expected operation. When I talked about this in the emudev Discord, PSI gave me a disassembly of aac.a (the DSP binary the DSi sound app uses), which helped a lot. Not only I didn't have to awkwardly hook into the teakra disassembler and generate lengthy instruction logs to try figure out what was going on, but the disassembly also has function names, which helps a lot with figuring out what the code is trying to achieve.
In this case, we looked at the code, and found that it was getting stuck into that loop after an unsuccessful malloc() call. That call was failing because the requested size was wrong: it was 0xD1C0, but that particular malloc() implementation couldn't take sizes larger than 0x8000. Except the size passed to malloc() was loaded from memory, and wasn't initialized by the DSP code, so it was part of the DSP binary itself.
For a while, both PSI and me were stumped. We couldn't figure out how this was supposed to work.
Until I finally figured it out. PoroCYon was also aware of the issue. Due to the way the DSi sound app does NWRAM mapping, it is done in separate steps: each given NWRAM bank is first disabled, then remapped to the DSP, then re-enabled, then the offset is changed (changing where the bank appears within its region's address space). The DSP-side mapping code in place was ignoring the last change, which resulted in NWRAM banks in the DSP data region ending up in the wrong place. Hence the wrong malloc() size.
The way NWRAM mapping was done was also problematic in general. teakra uses its own, flat memory buffer to emulate the DSP code and data memory. To get around this, the DSP interface code was trying to detect NWRAM mapping changes and copy data around to keep teakra's buffer in sync with melonDS's NWRAM banks. I decided that this way of doing things was too complex and prone to problems, and instead modified teakra to directly access melonDS's NWRAM banks.
Then, the aforementioned malloc() call got a size of 0x14, which seemed much more reasonable, and resulted in a successful allocation.
This is the point I've gotten to, now. The DSP runs and communicates with the ARM9. It's not perfect yet, though. We have yet to implement the SoundEx module, so the DSP's audio output can be interleaved with that of the DS audio mixer. There are probably also other bugs on the DSP side (there are lots of underrun warnings in the DSP audio output module for some reason). teakra itself is also unoptimized, and pretty slow (the DSi sound app runs at a whopping 4FPS for me).
But we atleast have a working base we can build upon. It's possible to optimize teakra once everything is good, but a more productive avenue will be to build a DSP JIT (or integrate an existing one into melonDS if possible).
All in all, was certainly a fun ride.
|13 comments have been posted.|
|< Merge partymelonDS 0.9.5 is out! >|