More fun fixes
So as I said in the last post, I've been fixing the firmware boot issues. Basically, when booting from the firmware, there was a high chance that the game would crash upon boot. But the bug and its effects were random. Sometimes it would simply hang on a white screen, sometimes it would jump to an invalid address... and sometimes it would work fine.


So we're going to see how we try to fix a random bug.


First thing to do is finding a way to reproduce the bug consistenly. So, hoping my bug would be affected by the time taken before clicking the health/safety warning screen, I set up a quick hack to automatically click the screen after a fixed number of frames. That didn't cut it, so I made the RTC return a fixed time. Which finally allowed me to reproduce the bug reliably.

I landed on a variation where the ARM9 jumped to an invalid address, which made it easy to find where it was doing that. I then started backtracking, which eventually led me to find out that some data were being copied from the wrong place, but it appeared the bug had more consequences than anticipated, making the backtracking long and tedious.

So instead, I dumped the RAM at the time the game booted, and compared it with a RAM dump from before a direct boot. It appeared that a chunk of code got accidentally erased. The range that got erased was within the cart's secure area, so I checked the code that handled secure area reads, but the bug wasn't there.

So I tracked where that code region was getting erased. The bug appeared, and it was another stupid bug. It turned out to be a DMA from the ARM7 accidentally doing that.

The bug was that during cart transfers, melonDS tried to start DMA for both CPUs. The cart interface can only be enabled for one CPU at a time, and thus DMA should only be checked for that CPU.

The bug resulted from a combination of factors:

1. The ARM7 BIOS loads the cart secure area, using DMA. When it's done, it leaves the DMA enabled(!). So when the ARM9 went to load something else from the cart, it accidentally triggered the ARM7-side DMA.

2. The secure area is made of 4 blocks, which the BIOS loads in a random order. So, if the last secure area block loaded happened to be the last one, the bug wouldn't overwrite the secure area and would only write after it -- which isn't a big deal because the rest of the game binary is loaded later. So in that rare case, the game would boot fine.


Was a fun ride, though.
Nick says:
Apr 28th 2017
I'm really enjoying these blogs! They provide a good balance of tech and explanation so I can follow along. I'm excited to see MelonDS as it progresses.
Lurkon says:
Apr 29th 2017
Please keep writing these, they're really enjoyable!
River says:
May 8th 2017
Thank you for sharing your dev insights! I got a lot of fun from it. A man devotes his time to interest is respected. Keep it up!
FleshMechPilot says:
Jul 31st 2024
This is great and all but... How does one fix/workaround the bug though?
Post a comment
Name:
DO NOT TOUCH