|Home | Downloads | Screenshots | Forums | Source code | RSS | Donate|
|Register | Log in|
|< melonDS 0.9.3 is out!|
Sometimes issues are simple... sometimes not
Sep 5th 2021, by Arisotura
You might have noticed that one of my goals for the 1.0 release is to get DSi mode in melonDS up to par with DS mode. Not just in the sense of running DSi games faithfully: a good reproduction of the DSi environment is also useful to would-be homebrewers.
However, the road to DSi emulation is paved with all sorts of challenges. One example of a fun issue that had been reported a while ago: the DSi menu would freeze after the health/safety screen if any pictures were stored that could be displayed on the top screen. The issue was another unimplemented AES feature, and was fixed in melonDS 0.9.3.
Sometimes I wish all issues were this simple. I felt like looking at another of the known DSi-mode issues: the fact that we currently don't implement the RAM size register in SCFG_EXT9. The RAM size register is mainly used to restrict the accessible main RAM to 4MB before launching a DS game. In theory, not a very difficult thing to implement. In practice, however, there is an issue that kept us from enabling that feature: when it's enabled, the DSi launcher crashes when launching a DS game, while they would otherwise run fine (albeit with the full 16MB RAM instead of the 4MB they might expect).
As explained in GBAtek, nocash ran into the same issue:
SCFG_EXT9.bit14-15 affect the Main RAM mapping on <both> ARM9 and ARM7 side (that, at least AFTER games have been booted, however, there's a special case DURING boot process: For NDS games, the firmware switches to 4MB mode on ARM9 side, whilst ARM7 is still relocating memory from the 16MB area at the same time - unknown how that is working exactly, maybe ARM7 isn't affected by SCFG_EXT9 setting until ARM7 has configured/disabled its own SCFG_EXT7 register).
The basic process of the loader is as follows: the ARM9 syncs with the ARM7 via IPCSYNC, then both CPUs run through lists of memory areas to copy or clear, then the ARM9 changes the main RAM size if required. However, while the ARM7 has a bunch of regions in main RAM to clear, the ARM9 is given empty copy/clear lists, and all it has to do is clear its DTCM, which is quickly done. As a result, the ARM9 changes the main RAM size while the ARM7 is still clearing regions, causing it to overwrite the ARM9's code, and you guess how this goes: kaboom.
Yet, the same code works fine on hardware.
I had already experimented with the RAM size register, to try and find out if there's anything fancy about it, but there's nothing special at all. The RAM size gets changed instantly on both sides, and there's nothing fancy about memory mapping either. Oh and the ARM9 caches are disabled when the loader is running, so they don't come into play here.
So I made a homebrew that reproduced the loader code: same ASM code, same memory regions, same everything. My first tests were to see if there was any kind of secret register altering main RAM mapping somehow, but there was none. Then, another test determined that, infact, on hardware, the RAM size change isn't applied until the ARM7 has cleared all its memory regions.
We then added code to measure how long each side takes to complete its tasks, and it turns out that the ARM9 takes much longer than expected. The ARM9 code is running in main RAM, and the ARM7 has a bunch of main RAM regions to copy and clear: as EXMEMCNT is set to give priority over main RAM to the ARM7, the concurrent accesses are slowing down the ARM9. A lot.
This is some shitty news.
First of all, this is probably not an isolated case: the same sort of thing can also affect timing in games, although probably on a lesser scale.
Secondly, there is no way to correctly emulate this sort of thing without a cycle-accurate emulator. Given the current performance characteristics of DS emulation, and the sheer complexity of DS timings, cycle accuracy can be considered off-limits. In this particular case, the best we could do would be some kind of estimation for a cycle penalty if several concurrent main RAM accesses are detected within a given timeslice (that's the thing, there's no real way to determine whether they are actually concurrent, due to how we run things).
|4 comments have been posted.|
|< melonDS 0.9.3 is out!|