Fixing Pokémon White
This post will get into the bugfixing process for a particular bug. Bugfixing in emulation may look like black magic, but it's not so different from general bugfixing-- it boils down to understanding the bug and why it occurs. Of course, emulation makes it harder when it involves blobs of machine code for which you have no source code, but nothing insurmontable.

Anyway, the issue with Pokémon White (and probably others) was that the game wouldn't boot unless it was launched from the firmware. Not really convenient, especially as at the time of writing this, firmware boot in melonDS is unstable.

I first suspected the RAM setup done prior to a direct boot (NDS::SetupDirectBoot). There are several variables stored in memory by the firmware, which games can use for various purposes. For example, the firmware stores the cartridge chip ID there, games then check the cartridge chip ID against the stored value, and throw an exception if it has changed (which typically means that the cart was ejected).

However, some testing revealed that there was nothing the direct boot setup was missing that could have broken Pokémon, atleast regarding the RAM.

So I had to dig deeper into the issue. It turned out that during initialization, the ARM9 got interrupted by an IRQ, but for some reason, it never returned to its work after processing the IRQ.

DS games often use multiple threads, so it isn't uncommon to switch to a different task after an IRQ. But that wasn't the case here, it wasn't even picking a thread to return to. It got stuck inside the IRQ handler.

The IRQ occured upon receiving data from the IPC FIFO. In particular, the ARM7 sent the word 0x0008000C. The ARM9-side handler was coded to panic and enter an infinite loop upon receiving this word.

More investigation of the FIFO traffic showed that the ARM9 first sent the word 0x0000004D, which is part of the initialization procedure. To which the ARM7 replied with 0x0008000C. But it appeared that the ARM7-side FIFO handler was coded to do that. For a while, that stumped me. I couldn't understand how it was supposed to work.

I then logged FIFO traffic when booting the game from the firmware, whenever it successfully booted, to see where the exchange differed. The ARM9 sent 0x0000004D, to which the ARM7 replied 0x0000004D. But it appeared that in that case, the ARM7 was using a different handler.

I checked the ARM7 code responsible for setting up the FIFO handler. It first set up the 'good' handler, then called a function, then set up the 'bad' handler. Looking at the function inbetween, I noticed that it checked register 0x04000300, which is POSTFLG, and is set to 1 by the firmware.

Hey, what if...

So I quickly checked the direct boot setup, and it wasn't initializing the POSTFLG registers. And, surprise, addressing that fixed the issue, letting Pokémon White boot directly.

Well, not bad. Next up is fixing the issues that plague firmware boot, I guess.
Danzou22 says:
Apr 26th 2017
Very interesting post, thanks!
Johnny Doe-Eye says:
Apr 27th 2017
These kinds of posts give me the weirdest hard-ons.
Post a comment