Bugfix0ring streak
So many things to do and so few coders. Poor melonDS company :P

Regardless, a bugfixing streak started happening. So, while I'm busy brainstorming about the current issue (which I'll talk about), have a post with juicy technical details about the process. It's oh so fun, you'll see.

First bug to go down is this weird 3D glitch that shows up in Mario Kart DS and probably others. The issue post shows what it's about: random glitch lines showing up in 3D graphics.

First attempts are some tests to determine the nature of the glitchy pixels. Disabling polygon edges, disabling translucent polygons, disabling antialiasing, whatever, you name it.

Eventually, we find out that those pixels are edge pixels.

Then, I suspected a depth test related issue. Stencil test can be excluded, shadow polygons are always translucent, and if you've read the previous post about the GPU innards, you know that translucent pixels don't set edge flags.

Enter depth buffer debugging. AKA Mario Kart LSD edition.

The purpose of this isn't to have an epic trip racing crapo AI players in a trippy setting, but to have a visual reading of the depth buffer. Not very readable to the bare eye, but the 24-bit Z values are mapped to 24-bit RGB, and this is a gross hack that bypasses the whole rendering pipeline (which is 18-bit).

You can see a glitched line in the middle of this frame, and use any good image editing software to read the raw values of the pixels. We find out that the Z values on the glitch line are greater than those of surrounding pixels, which means it should not punch through the polygons behind it (those polygons should be in front of it).

What the fuck?

Attempting to log these bogus depth test cases, we find out that interpolation is somehow generating Z values that are out of range, of course in a way that manages to slip past Depth Buffer Viewer 5000000. Sometimes they're so out-of-bounds that they end up negative, which causes the glitch lines.

(should probably not be using signed ints there are Z values are always positive, anyway. but then that'd have hidden the glitch, probably)

Tracking those, we find that some polygons with long, nearly-horizontal edges cause the rasterizer to accidentally draw outside of the polygon bounds. In turn, the perspective-correct interpolation factor for these pixels is also out of bounds, which, in some cases, screws up interpolation of vertex attributes, making things overflow and shit themselves.

Talk about a butterfly effect.

(and remind me to talk about that 'perspective-correct interpolation factor' thing someday, but for now, that's how the DS GPU does things. no W reciprocals.)

But, finally, the bug is down.

Next one on the hunt is this one: enemies invisible in Bionicle Heroes. SilasLaspada usefully noted that the console keeps complaining about 3D-related matrix stack overflows/underflows.

What's this about? I mentioned it in the GPU post, the GPU has a set of matrix stacks that are basically a hardware implementation of glPushMatrix() & co. You get the standard push/pop operations, but also store/restore operations which allow accessing arbitrary slots in the stack.

There are four matrix stacks: projection, position, vector, and texture. The projection matrix stack is limited to one slot. I'm pretty sure the texture matrix stack is too, but this one is hazy. Finally, the position and vector matrix stacks have 32 slots, which makes sense. These two are interlinked, the position matrix is used to transform models and the vector matrix is used for lighting calculations. The idea is to avoid having to normalize normals and light directions after having transformed them, by instead supplying a vector matrix which is (normally) an unscaled version of the position matrix. For this purpose, you get a matrix mode in which all matrix manipulation commands apply to both position and vector matrices, except, of course, the scale command.

Anyway, it's not too uncommon for games to accidentally overflow/underflow a matrix stack by pushing or popping too much, as a result of a bug in the game code. I saw it happen in NSMB's intro when the castle gets struck by lightning, for example, without any visible consequence. So I dismissed such cases as benign game bugs.

Until, well, this one bug. Missing shit onscreen, constant complaints about overflow/underflow, too fishy to be a coincidence.

A bit of logging shows that, at some point, the game proceeds to push the position matrix 36 times, without ever popping. And it seems to do a lot of stupid shit generally, like pretending the projection matrix stack has two slots, etc... Terrible game code? Emulator bug? One way to find out, I guess.

So how does this work? What happens if you overflow or underflow a matrix stack?

GXSTAT has a flag signalling that, so I guess you... raise that flag, and cancel the operation?



Half of that is true, because of course. This is the DS we're dealing with.

It raises the over/underflow flag, and... it fucking goes on and performs the operation anyway, using only the relevant bits of the stack pointer. For example, if the stack pointer is 31 and you push the matrix, it's stored in slot 31, pointer is incremented, error flag is raised (it starts raising at 31). Now if you push again, the pointer is 32, whose lower 5 bits are 0, so your matrix is stored in slot 0. Yes.

For the position/vector stacks, the stack pointer is 6-bit, so if you push or pop enough, it will eventually wrap around and stop raising the over/underflow flag.

For the projection and texture stacks, abusive pushing and popping will keep accessing the same slot, since there's only one.

Anyway, revising melonDS according to all this seemed to fix the bug, so I guess this is shoddy programming on the game's part. Another bug bites the dust.

Next one is FMVs flickering in Who Wants To Be A Millionaire.

In particular, the bottom screen is partly flickering. Quick test, this only happens when using the threaded 3D renderer, which means that the game is changing VRAM mappings while the renderer is running. As melonDS has safeguards to ensure the threaded renderer doesn't run outside of the real hardware's 3D rendering period, that means the game is malfunctioning.

Basically, it's using the 3D renderer to draw FMV to both screens. Which is a bit silly during the intro as the bottom screen is a fixed image. I haven't seen other FMVs in that game though, so... yeah.

So, we log shit, as usual. We find out that, regularly, the game unmaps texture VRAM, updates the contents via DMA, then maps it back to the GPU. This normally starts upon VBlank, so the texture memory is updated before the GPU starts rendering. Oh but sometimes it gets delayed by something and starts at scanline 243, which is way too late (3D rendering starts at scanline 214).

The non-threaded renderer wouldn't care, it runs 'instantly' upon scanline 214, so it would only be lagging one frame behind, which nobody would notice. The threaded renderer, however, cares.

We find out that the VBlank handler is being held back because something keeps IRQs disabled. This 'something' turns out to be the FMV decoder. It's running in ITCM and keeps IRQs disabled so it can complete ASAP, which makes sense. However, as far as melonDS is concerned, the decoder is taking too much time to do its thing and ends up delaying the VBlank handler.

There is no doubt that this is a big fat timing issue. We have a few of those already, namely RaymanDS which does weird things given bad timings. But also, less severely, 3D graphics sporadically glitching or disappearing for one frame, FMVs playing slightly too slow...

Y'know what this means: time to renovate the timings in melonDS to make them suck less. We can't get perfect without severe performance loss given the complexity of the DS architecture (two CPUs, shared memory, ARM9 caches...), but we can get close. Which wouldn't matter too much, timing on the real thing tends to be rather nondeterministic. With the ARM9, there are many little things factoring in: two-opcodes-at-once THUMB prefetch, parallel code and data accesses under certain circumstances, sometimes skipping internal cycles... and also the fact that the ARM9 runs at 66MHz, but the system clock is 33MHz, so the ARM9 may have to add one extra cycle to align to the system clock when using the bus. The ARM9 is on the borderline where CPUs start getting too complex for accurate emulation of their timings.

Anyway, we're not there yet, but brainstorming the renovation.

For example, complete emulation of the ARM9 caches will not be possible without a performance hit, but I guess we can emulate the MPU and thus properly determine which regions are cached instead of hardcoding it.

We'll see how that goes.
Yashi says:
Nov 7th 2018
Nice job! I'm really looking forward to your progress.
NM64 says:
Nov 9th 2018
So I'm no emulation expert as I've stated previous, but I'm guessing that the DS's only 4MB of RAM ends up effectively being much larger once in the case of melonDS?

Otherwise I would think that 4MB is small enough to fit into the L3 cache of most modern CPUs which could then be readily shared between individual CPU cores without issue where you could have one core emulating ARM7 and another ARM9.

...but again, this is probably one of those things that only people with no experience in coding or with emulators would think is even remote possible.
Arisotura says:
Nov 25th 2018
emulating ARM9/ARM7 on separate cores would cause all sorts of timing issues, I'm afraid
Post a comment