|Home | Downloads | Screenshots | Forums | Source code | RSS|
Dec 7th 2018, by StapleButter
I had a few bugs in my implementation. After fixing those, the timings I got were closer to what it should have been. Too fast, but way too fast.
Anyway, I had issues with my test app too. One of the memory buffers it accesses was in main RAM instead of DTCM; fixing that oversight gave me more realistic timings. In reality, the Millionaire issue was what I first thought-- on hardware, the FMV decoder completes in 146 scanlines, and I was being too slow.
So, obviously, fixing the timings made it fast enough that the glitch was entirely gone. But, aside from that, we were back to square one: running too fast and the issues it causes everywhere. Rayman DS is a good test for that, and it was shitting itself big time, so... yeah.
If we raise the cached memory timings to 2 cycles, as some sort of average between cache hits and cache misses, we get closer to the real thing. Still a bit too fast, but this time Rayman DS is running normally.
The new timing logic, coupled with PU emulation, is a pretty noticeable speed hit.
We can keep the PU logic for a potential future 'homebrew developer' build, that would also have several features to warn against typical DS pitfalls. For example, some form of cache emulation and/or warning about things like DMA without flushing/invalidating caches, 8-bit accesses to VRAM, etc...
If there is demand for such a build, that is.
As far as regular gamers are concerned, we can go for a faster compromise that would run most of the shit (really most of it) well. It wouldn't run things like GBA emulators that make extensive use of the PU, but... eh.
All the commercial games around use the same PU settings. We can say as much about libnds homebrew, its settings are a little different (for example, putting the DTCM at 0x0B000000 instead of the typical 0x027E0000), but nothing terrible there. All the differences we can observe are essentially in where the DTCM goes.
Since the DTCM is already handled separately (in CP15.cpp), we can handle its timings separately, and use a coarse table for the rest of the memory regions, like we did before. All while retaining enough of the new timing model that melonDS's timing characteristics would be much closer to the real thing.
As explained before, it's near impossible to perfectly emulate ARM9 timings due to the sheer complexity of the architecture and the way it's implemented in the DS, so if we can get 'close enough', might as well do it without wasting too much performance over it.
|2 comments have been posted.|