VRAM? Lots of fun!
First of all, a little status update. From the previous list, DMA timings are covered, they're properly emulated except for cart DMA. I have also been fixing several GXFIFO related bugs, which lets most games run fine. Some badly programmed games (SM64DS or Rayman RR2, using immediate DMA instead of proper GXFIFO DMA) still overflow the FIFO a little, and since I haven't yet figured out why, I added a little hack to cover that for now.

Improving the 3D renderer, the other big TODO item, mostly means implementing textures. And also that weird focus I developed on making the renderer pixel-perfect.

Pixel perfection can be postponed. Most games would still be playable fine without it.

Textures mean supporting texture memory, which is a special case of VRAM mapping. I first want to revamp VRAM emulation to better match the hardware. There are notes about VRAM mapping in the melonDS code, and they have been there since a while now, waiting for a proper implementation.

Thing is, VRAM mapping on the DS isn't linear like with most old consoles. You get 9 VRAM banks of different sizes (from 16K to 128K), each can be mapped to different addresses for different purposes. GBAtek documents the mapping modes for each, but it's missing details like how the banks are mirrored and what happens if two or more VRAM banks overlap (which does happen in some games). So, I wanted to test that.

I first assumed the mapping was 1:1, as most DS emulators out there handle it. Which would mean that if two banks overlapped, it would either map the last one, or apply a priority order. So the first test was to map banks A and B to different addresses, write 0x1111 into A and 0x2222 into B, then map them at the same address and read, then do it again but map in a different order. The results would tell whether A has priority over B, or B over A, or whether it just maps the last bank.

The result wasn't quite what I expected. The tests didn't read 0x1111 or 0x2222, they both read 0x3333.

VRAM mapping isn't a 1:1 table. It basically just tells each bank which address range it should respond to. When two banks share an address range, they respond at the same time: writes go to both banks, and reads read from both banks too, ORing the values together, as shown above.

VRAM mirroring is also funky, each bank is mirrored in its own way, and the mirroring scheme doesn't depend on the bank's size. But for special cases like extended palettes or texture memory, mirroring doesn't matter.

I have yet to finish covering the VRAM mapping oddities without completely killing performance, and make the 2D GPU code suck less. Once that's done, we can get to the fun part of implementing textures in the 3D renderer.
On the way to 3D graphics
Unrelated note: I changed the way the site displays comments, to start from the oldest. It feels more natural this way, atleast to me.

So a while ago, I was at the point where getting further with melonDS required emulating the 3D engine.

As tempting as it is to hack things together to get some graphics showing up, I decided I'd take the time to do things right. Plus, GBAtek documents the 3D engine well, so let's make good use of it.

This also means that the renderer I'm building is a software one. This doesn't exclude the possibility of implementing hardware rendering for 3D or even 2D later on, but starting this way is more interesting than mapping the DS 3D features to OpenGL. I'm able to learn about deeper aspects of the 3D engine, and emulate it more closely than a hardware renderer can get. There are also features and details of the original hardware that can't be reproduced well with OpenGL, as is typical with older console GPUs, and it isn't always super-specific details, it can also be about basic features.

For example, the DS is able to draw triangles and quads natively, but modern GPUs can only draw triangles -- the more complex primitives supported by OpenGL are broken down into triangles. This can affect how vertex attributes (color and texture coordinates) are interpolated across a quad. Below are two renders of a quad with the same attributes:


The left one shows color interpolation across the original quad. The right one shows color interpolation after the quad was broken down into triangles. You can notice the difference.

At this point, melonDS's renderer is able to draw polygons without exploding, which is a good start. It can do color interpolation, but it lacks perspective correction for now. It also lacks more important features, like texturing.

Anyway, melonDS isn't too far away from a first release. Here's a quick list of the things that remain to be done before it can be shipped:

* a better 3D renderer, as explained above
* emulating DMA timings
* building a somewhat proper interface
* misc. tidbits: timers that suck less, a few missing bits of the 2D renderer, etc...

Stay tuned for more!
Deeper than it seems
I've been wanting to work out the DS's color precision. Incoming colors (palettes, vertex colors...) are all 15-bit, but the screens actually output 18-bit color. The 3D engine is well documented in this regard, we know that the colors are converted to 18-bit before being blended together and even how they're converted. However, details on the 2D engines are at a loss. GBAtek doesn't say much, other than that master brightness is applied to 18-bit color.

For reference, GBAtek's diagram of the whole video system.

I first tested master brightness. The idea is to configure it in "brightness decrease" mode with a factor of 8, such that incoming colors are halved. Then, fill two parts of the screen with colors 0,2,0 and 0,3,0 respectively. If the colors are processed as 15-bit, they will come out the same, but if they're converted to 18-bit beforehand, there will be a difference. Noting that I picked green colors to make the difference as visible as possible, as the human eye sees green best.

The test reveals that there is a difference, subtle but visible. Which confirms that master brightness is applied to 18-bit color.

To find out where the color conversion takes place, I did the test again, using the "brightness decrease" special effect instead of master brightness. It works the same, but it's applied earlier than master brightness. The test also showed a difference, meaning that color special effects are applied to 18-bit color.

So now we know that colors are converted as soon as possible, but there is one remaining detail to work out: how they're converted.

We know that the 3D engine converts color components the following way:

if (in == 0) out = 0;
else out = in*2 + 1;

From there, the idea is to display identical colors from the 2D and 3D engines, side by side, and compare them. By doing it for each possible intensity (from 0 to 31), we can work out where they differ and infer how the 2D engine converts colors.

The result of this test is a little surprising. For all the intensities above 0, the colors differ.

out = in*2;

So this is how the 2D engine converts colors. Which means that it's impossible to reach full intensity, for example, white (31,31,31) will come out as 62,62,62 instead of 63,63,63. This also means that colors output by the 2D and 3D engines will be subtly different, atleast unless special effects are used.

I also checked the VRAM/FIFO display modes of 2D engine A, and they convert colors the same way as the 2D engine. However, the white displayed by a disabled screen is 63,63,63.
A lot closer to a finished product
Even if still very far.

After a while, melonDS finally boots commercial games. Among my small game library, the compatibility rate is encouraging, even. Here are a few screenshots, for example:

Getting there took a while of implementing new features, but also bashing my head against obscure bugs that often turn out to come from silly little things.

As an example, a bug that prevented games from booting.

On ARMv4/v5, the LDR opcode has the particularity that if you read from an address that isn't word-aligned, it aligns the address, and rotates the word it read so that the LSB is the byte pointed by the original address. In melonDS, it was implemented the following way:

u32 val = ROR(cpu->DataRead32(offset), ((offset&0x3)<<3));

At first glance, looks alright, doesn't it? Except ROR() is a macro, defined as follows:

#define ROR(x, n) (((x) >> (n)) | ((x) << (32-(n))))

This basically ends up calling cpu->DataRead32() twice. This isn't a big deal when reading from RAM, it only wastes some CPU time, but the bug goes unnoticed. However, it has nasty side-effects when reading from I/O registers like the IPC FIFO.

Next up, aside from GPU-related work, were features like touchscreen or save memory.

Touchscreen emulation isn't too hard. Saves are a little more involved. In itself, it's nothing big, the save memory is an EEPROM or Flash chip accessed over a dedicated SPI bus. The issue is how to determine the correct memory type. The ROM header doesn't contain that information, so we must guess it. The current implementation waits for the game to start writing to the save memory and tries to determine the memory type from the length of the longest write. The idea is to guess the memory page size, from which the memory type and size can be inferred. If a save file is already present, those variables are inferred from the file's size.

With that covered, games can do something more interesting than sitting on a "failed to erase data" screen.

So far, this is what I have tested:

New Super Mario Bros: non-3D minigames playable, freezes when going ingame
Super Mario 64 DS, Rayman Raving Rabbids 2, Meteos demo: "playable", but no 3D graphics
Mario Slam Basketball, Rayman DS: get stuck trying to do a GX FIFO DMA
Mario & Sonic at the Olympic Games, Mario Kart DS: freeze when trying to display 3D graphics
Super Princess Peach: playable, 3D effects missing
Worms 2 Open Warfare: seems to work, but menus invisible -- this game is all 3D

It appears that there are now two main immediate directions for melonDS: UI and 3D support.

The UI part will need some thinking to pick the best framework. The current UI is something I quickly threw together using the Win32 APIs so I could see graphics, but for the "final" product, I want something cross-platform. An idea is to provide a quick SDL-based interface and a more complete Qt interface. I don't like some aspects of Qt, but regardless, it's a possible candidate, and a powerful one.

A decent UI would also support things like selecting a ROM file instead of hardcoding the filename, choosing a save memory type should autodetection fail, choosing whether to boot from the BIOS or from the game directly, all those things.

3D support is going to be required to get further into emulating games at this point. Past the obvious reason that they can be unplayable without 3D graphics, some require the hardware support to get further.

The 3D GPU has a FIFO for sending commands to it, called GX FIFO. It can be set to trigger an IRQ when it gets empty or less than half-full. Some games wait for the IRQ before sending more commands, some others use DMA to send their command lists automatically. Without proper support, these games would just hang.

The most bizarre game is probably Super Mario 64 DS. It does enable the GX FIFO IRQ at times, but never waits for it -- instead polling the GXSTAT register with the following code:

0205A390                 LDR     R12, =0x4000600
0205A394                 LDR     R4, [R12]
0205A398                 AND     R4, R4, #0x7000000
0205A39C                 MOV     R4, R4, LSR#24
0205A3A0                 ANDS    R4, R4, #2
0205A3A4                 BEQ     #0x0205A394

This is basically an inefficient way of checking whether bit 25 of GXSTAT (0x04000600) is set.

((GXSTAT & 0x0700000) >> 24) & 0x2

Could have as well been:

GXSTAT & 0x02000000

Why it doesn't just wait for the GX FIFO IRQ is a mystery (waiting for an IRQ lets the CPU go idle and saves power, unlike this kind of busy loop).

Stay tuned for more reports of the melonDS adventure!
Getting somewhere
melonDS has progressed nicely since the last post. To give you an idea:

melonDS running the DS firmware

It's got some graphics capabilities, even though those are quite limited. Rotated/scaled sprites, like those used for the clock up there, gave me some trouble, but I eventually got it working.

Speaking of the clock, it's frozen in time. I had to implement the RTC to get the firmware to actually boot instead of constantly entering "please enter time/date" mode, but for now, I hardcoded the time/date. The RTC is real fun to work with btw-- DS software talks to it by bitbanging a GPIO register. From the emulator's point of view, you get a series of zeros and ones which you must put back together to get data you can work with.

I also implemented proper-ish support for the DS cart hardware. This is quite the fun too. There's an initialization sequence done by the BIOS, and two different encryption methods are used. The BIOS starts by retrieving the ROM header and chip ID, then switches to Key1-encrypted command mode and retrieves the secure area (a protected part of the ROM not readable via the normal read command), and finally switches to Key2-encrypted command mode, under which the game will operate.

It is also worth noting that on actual carts, the first 2KB of the secure area are Key1 encrypted. The first 8 bytes are a double-encrypted identifier (string 'encryObj') that is used by the BIOS to verify whether the decryption was successful. However, all the DS ROM dumps out there have that 2KB block decrypted. Thus, we need to re-encrypt it for it to be loaded successfully.

Getting Key1 to work took a while (and uncovered a bug in the CPU core). Key1 is based on Blowfish and is the most complicated encryption algorithm used. Key2 is based on a XOR stream generated from two 39-bit registers, but it can be ignored, it is entirely implemented in hardware.

The really hacky part in the current implementation is how cart DMA is implemented. Cart DMA basically works by automatically transferring a word from the cart data output register to memory as soon as there's data ready. The DMA engine doesn't know how long the transfer is, it only knows to wait until there's data available, transfer it, then repeat.

When you take the lazy approach of making cart reads return data instantly, you may run into trouble when implementing cart DMA properly. A read from the cart data output register would advance the read pointer, and if DMA is enabled, perform a DMA transfer, which would trigger another read, and so on until the recursion overflows the stack.

So instead, there's an entirely separate code path for cart DMA. Atleast until the transfer delays are emulated. Then again, I also envision optimizing common DMA transfers with special code paths avoiding a lot of address decoding work, like I did in blargSNES. Common DMA transfers would mostly be transfers to VRAM, palette memory or OAM, though. Optimizing cart DMA would only speed up loading times at best.

Noting that with all that, the firmware does detect the cartridge, the game title and icon show up fine... booting a game doesn't quite work yet, though.

Well, actually, the game tested (NSMB) does boot, but it hangs before doing anything visible. The twist is that it isn't quite stable, sometimes it behaves differently, but still hangs. This tends to indicate that either the timings are really bad, or there's some evil "out-of-bounds array write" type bug hiding somewhere.

The main loop/scheduler needs rewritten badly anyway. The current system is messy at best and incorrect in certain cases (fast timers would be a good example). I should use absolute 64bit timestamps instead of trying to keep track of cycle differences here and there.
So what's melonDS?
melonDS is a new emulator project. My goals for it are to do things right and fast. Time will tell how far we can get in that direction.

If you know me, I'm not new in the emulation scene. I'm responsible for lolSNES/blargSNES, among others. I'm not new in the DS scene either, I worked a lot on DeSmuME back in the days. Why I'm straying away from DeSmuME is not something I will get into here.

So here I am, writing my own emulator for the second time. The hardest part in starting an emulator project from zero is managing to build something that resembles an emulator. It takes a fair amount of work before you can start getting results.

To this day, melonDS has gotten over the initial stage and does give some results. Nothing actually works aside from the very simple ARMWrestler CPU test, but hey, it's a start. melonDS has a 'good enough' subset of the ARM instruction set implemented, as well as a few hardware features, so getting things to run is mostly a matter of implementing the rest of the DS hardware.

One thing I want to do is being able to boot from the BIOS, like an actual DS. I was unable to get it to work in DeSmuME back then, but manually loading a DS firmware and booting it (thus skipping the BIOS) did work.

The progress of melonDS in regard to this is encouraging. The BIOS loads the firmware into memory and boots it. The firmware gets to the point where it tries to use the IPC FIFO, which I haven't implemented yet. Interesting things ahead!

Of course, melonDS will later have the option to boot a game directly, like all the emulators out there do. But for the sake of accuracy, I want to be able to boot from the BIOS.

The holy grail would be wifi emulation, especially local multiplayer, which I could never get working in DeSmuME. Time will tell if that goal is achievable.
melonDS dev blog
I will use this blog to post about the development of melonDS, my DS emulator project. I expect it to be lots of fun.