|Home | Downloads | Screenshots | Forums | Source code | RSS|
Timing renovation, ep 3: the new GX timings
Nov 24th 2018, by StapleButter
So, this is it. GX timings are covered.
After days of feeding specific command lists to the geometry engine, measuring how long they take to run, and trying to figure out the logic behind it, we finally did it. And implemented it in melonDS.
Have yet to write a post to get into that in detail, but that will go on the board.
I believe it was definitely worth it.
Looking at history, it is apparent that first-gen DS emulators have run into issues caused by display lists taking too long to execute, despite following the timings given by GBAtek. When measuring their timings, we can guess that they went for the easier solution and lowered their per-command timings. Nothing bad with that, if it gets games running, and gets them running at better speeds on lower-end platforms, but that ain't accurate.
In a less "accuracy horseman" note, we'll quote byuu's article on accuracy, again:
If you do not get the timing perfect, you will end up playing a perpetual game of whack-a-mole. Fixing one game to break two others, fixing them to break yet two more. Fixing them to break the initial game once again. You need only look at the changelogs over the past fifteen years to verify this.
You might not end up needing absolute perfection there, but history has shown that, if you don't have the basic logic down, hacking around timing issues can only get you so far.
A prime example may well be Burnout Legends, which JMC47 mentioned in his blog post The next generation of DS emulators. The game seems to have built-in frameskipping or slowdown compensation, but it's not working correctly on emulators, resulting in random slowdowns or speedups.
I haven't looked into this game in detail, though, so I don't know for sure what it's doing.
But a possibility would be that the game knows ahead of time how long its 3D scene will take to render. This is possible if you have measured your individual display lists by running them on hardware, and gave them metadata indicating their time cost. From there, if you know which objects are going to be shown, you know how long the scene will take to render. You can then use that knowledge to keep your game logic in sync even if you're dropping frames, so the game doesn't appear to run slower.
But of course, if your emulator's GX timings don't match the hardware's, this falls apart. And if a game is doing that kind of thing, no amount of hodgepodging GX timings will fix it if you haven't gotten the logic right.
Anyway, I tested Burnout Legends with the new GX timings, but I could hardly judge whether it was correct as the framerate dips below 60FPS on my PC. So I asked JMC47, who gave it a try and said that the game is now running as it should.
Coming back to our problem games, those that overflow the GXFIFO. Super Mario 64 DS, and also Rayman Raving Rabbids 2. SM64DS was the worst offender, with more than 10000 stall cycles per frame, against a ~1800 cycle average for Rayman RR2.
None of the other games I tested overflowed the FIFO. They're generally somewhat well-programmed, and use the adequate DMA mode which avoids overflows.
After the timing renovation, these two games still overflow the FIFO, but much less. The stall cycles were nearly halved in SM64DS and the reduction in Rayman RR2 is tenfolds.
I guess this is how they work. What was less nice was that Rayman RR2 got regular music streaming issues from the stalls. So, quickly, we fire up a new hardware test, and find out that GXFIFO stalls don't halt the ARM7. Once this is addressed, the audio issues are gone.
So, in the end, this first part of the timing renovation turns out rather well.
Now it's time to get to the more daunting part: the general timings. Mostly memory access timings. I have these down already, so it's mostly a matter of implementing them in a way that doesn't slow things down.
And, regarding what I said above, I hope that I don't run into that kind of problems. Implementing the ARM9 caches is possible, but has a performance cost. So, if I hack around this, the game code would likely just end up running too fast. If this causes problems, we will have to hack around them, or propose ARM9 cache emulation as an option.
We'll see what we can do there. Apparently, the current timings are not that bad, but some things are running too slow, as Who Wants To Be A Millionaire has shown. The other issue is that, well, the current code for timings is a gross hack.
Love and waffles!
|4 comments have been posted.|