melonDS aims at providing fast and accurate Nintendo DS emulation. While it is still a work in progress, it has a pretty solid set of features:

• Nearly complete core (CPU, video, audio, ...)
• JIT recompiler for fast emulation
• OpenGL renderer, 3D upscaling
• RTC, microphone, lid close/open
• Joystick support
• Savestates
• Various display position/sizing/rotation modes
• (WIP) Wifi: local multiplayer, online connectivity
• (WIP) DSi emulation
• (WIP) GBA slot add-ons
• and more are planned!

Download melonDS

If you're running into trouble: Howto/FAQ
Redesigning the cart interface
The kind of change that doesn't immediately mean a lot for end users, but means a lot for us coders (and ultimately means something for end users, too).

Anyway, this tends to show why it's good to think forward when designing your code. That being said, I need to find a balance with this. I tend to either think forward too much and end up paralyzed by questions that don't mean much, or just write code as it comes to my mind.

The cart interface in melonDS was originally built without much consideration for future. If you're wondering, the cart interface is the part of the emulator that lets emulated software access the emulated cartridge, because on the DS the cart isn't just directly mapped to CPU address space like on older consoles. Instead, there are a bunch of commands you can send to the cart to retrieve various parts of the contents, and different encryption protocols securing it up.

As melonDS became capable enough to run commercial software, emulating the cart interface was a must. So NDSCart.cpp was born. The main component is the NDSCart namespace, which originally emulated the cart interface hardware (basically the DS side) and command responses for a generic cart. There is also NDSCart_SRAM, which emulates the on-cart SPI save memory. A tad hacky, but for most games, it did the job.

But, that's the thing, not all DS carts are the same!

There were already some exceptions for homebrew ROMs, which might want to use the cart interface and, depending how old they are, need a more lax implementation of the generic cart protocol. Namely, retail carts don't let you read addresses lower than 0x8000 via the generic data read command (0xB7), because that region contains the ROM header (read via a different command), the Key1 encryption data and the secure area. However, old homebrew ROMs don't have any of that (save for the oldstyle DS header), and have their ARM9 binary start at 0x200. Newer homebrew ROMs are closer to the layout of a retail cart, mostly due to the added DSi support (the DSi header is 0x1000 bytes instead of 0x200), but, since not everybody is here to rebuild their ROMs, we still need to support the older ROMs.

Homebrew aside, there are also different types of retail carts.

A prime example is Pokémon games. The carts are fitted with a IR transceiver, which is accessed via the save-memory SPI bus. In practice, the first byte of a SPI transfer is a command for the IR transceiver. For now, we know that command 0x08 is some ping command that should reply 0xAA, and command 0x00 is the pass-through command, where any further bytes are forwarded to the save memory. Emulating this is required for Pokémon games to be playable at all. In melonDS, these commands were added to the generic save-memory code. A bit of a hack, since this means these would be 'functional' in any game instead of just Pokémon games, but it did the trick.

But there's more. Games like WarioWare DIY, or Jam with the Band, don't even use the save-memory SPI bus. They have save memory, but it's a NAND memory that is accessed via the same bus as the ROM itself, through a set of specific commands.

... read more
Release 0.9.2 coming out soon
As title says. Right now, the only thing holding us back is that we need to setup Azure CI for proper Mac builds. I have contacted the Azure service to get a CI grant, but they haven't gotten back to me yet.

We have some cool ideas, too, but these will be for further releases.

Also, in somewhat related news, I'm starting to work on another idea. It's not related to melonDS, but it's related to the DS. For now, this is going to be a surprise, but those who have seen my Twitter lately might figure out what I'm up to. I will make a post once I've got a working prototype.
Status updatezorz!
This is the first time I write a blog post in a while, so I will try to keep this short.

First, what's new on my side?

I finished my hearing for the gender marker change thing. You know, so I can get a big fat F on my ID card. You prolly don't care a lot about my trans shenanigans but this means it's one thing out of my way, and we can now proceed to full-speed melonDSing (and hopefully not from a squat, but we're doing our best).

What else is there to say?

I can't keep my focus on one thing aaaaaa

I wanna maaaaaybe try to emulate some new fun shit in melonDS. like the pokémon keyboard thingy.

Wait, no, we need to make DSi emulation better. We can prolly add a file explorer thing, so you can put your DSiware into the thing easily, and idk what other cool features there were. Just suggest them below this post, pretty sure we can get this done together! melonDS will soar through union and friendship!
Announcing ARM64 Mac (aka Apple Silicon Mac) support!
First of all I'd like to thank StLouisCPhT for testing all the changes I made to try and get melonDS to work on ARM64 Macs. Without them, this wouldn't be possible as I don't have an ARM64 Mac (I don't have any Mac at all, however x86-64 macOS can be run in VMs). I would also like to thank Generic (aka RSDuck) for helping me out with JIT issues.

This started when a user named "Joel" (now StLouisCPhT) commented on my earlier post about compiling melonDS for ARM64 Macs. We took this to Private Messages on the forum board, and we were making progress slowly, but surely.

Here's a quick overview of the things that were needed for melonDS to run on ARM64 Macs (if you want to download the beta, scroll to the bottom of this post).

- The JIT compiler
The first issue was adding a way to get the Program Counter on ARM64 Macs. This was easy, all it needed was an ifdef to use "uc_mcontext->__ss.__pc".
The second issue was that Apple introduced W^X for JIT memory. This meant that we can only have Read-Write and Read-Execute permissions at one time, and this could be toggled by setting "pthread_jit_write_protect_np" to either true or false. This command had to be added to some location, otherwise the JIT would crash when trying to run something.

There was also another strange issue when some lines would cause the JIT to crash.
   for (int i = 0; i < (JitMemMainSize + JitMemSecondarySize) / 4; i++)
       *(((u32*)GetRWPtr()) + i) = brk_0;
Generic (aka RSDuck) told me that this was for adding breakpoint instructions to the JIT buffer. StLouisCPhT and I tried various things, but to no avail. However, once the changes were rebased on master, it seemed to work fine....

- Microphone
This wasn't affecting just ARM64 Macs, but I'll add it anyway.
We got a bug report on GitHub for the microphone not working on macOS. The solution? Add the "NSMicrophoneUsageDescription" key in the plist file, otherwise macOS would not ask for microphone access.

- Known Issues

* The fastmem checkbox is disabled: This is intentional, I cannot get fastmem to work on either x86-64 Macs or ARM64 Macs.
* Local Multiplayer doesn't work by running more than one instance on the same Mac: This needs to be investigated, although Local Multiplayer doesn't have the best support yet. It seems to work by running two copies of melonDS on different Macs though.

Well, that's all.
The beta builds for ARM64 include the libraries bundled, so you won't have to install them separately.

This build is outdated - see Downloads
melonDS 0.9.1 beta for ARM64 macOS
(For future reference, this build is based off commit 2c2e868.)
Been silent lately...
Winter depression and covid shito don't help, gladly we'll soon be getting more sunlight, so there's atleast that.

Anyway, you may have observed I haven't been doing a lot for melonDS lately... there are a couple reasons to that, besides that side project of fixing up vintage Macs and other shit.

I don't want to abandon melonDS, it's my project, but it's true that I have less interest into it at this point. Things like emulating new hardware features and hunting emulation bugs interest me much more than, say, UI shenanigans. As far as these things go, I tend to just accept work from other contributors, sometimes to the point of them becoming fulltime members of the 'melonDS team'. It's better this way, after all; as I say, an emulator project is like a tree: once you're done with the trunk, it branches off in a billion directions.

So, what can we attempt doing now?

I've been idly trying to figure out why DSiware will only load when installed on the NAND. So far, haven't gone very far with this, mostly because, well, my motivation tends to be an all-or-nothing thing. Either get the spark and find yourself coding until 5:00, or have a hard time doing anything at all. Also, working with the DSi firmware is a real pleasure: it's a huge spaghetti network of threads and callbacks and shit, making it a massive pain to track anything, even moreso when all you have is a bunch of ARM ASM.

From there, there would be two possibilities: a quick hack to bypass whatever check the DSi firmware uses, like we do for the region check, or actually installing the provided DSiware into the NAND. The latter implies dealing with an encryption layer and a FAT filesystem, which I can't wrap my head around. We'd need to get a good lil' FAT library, but then this would open more possibilities regarding DSi emulation.

Then there are the remaining popular-request items, and the big pile of issues.

For example, a looooot of the issues are with the OpenGL renderer. This is why I stand for emulating things accurately: you are far less likely to run into constant issues and get yourself caught into a game of whack-a-mole perpetually hacking around issues. While things like the OpenGL renderer are creative and cool, we have seen countless times that the DS GPU is a pile of quirks, and that emulating it correctly is only possible with somethings like our software renderer. We can keep coming up with creative solutions to try and fix OpenGL issues, but at some point there is only so much we can do when using a fundamentally inadequate tool.

An alternative may be unearthing my old 'shaderzorz' experiment: that was an attempt at a compute-shader rasterizer that worked similarly to that of the DS. Such a renderer, implemented either in compute shaders or in fragment shaders, may be able to get around the shortcomings of the current OpenGL renderer, and maybe also support higher-resolution rendering without too much of a performance penalty. Back then, I ended up ditching it in favor of a classical polygon-based renderer because I wasn't positive about the performance. However, the current renderer is far from optimal too, due to how rendering has to be done.

... read more
Update on donations
There are now multiple ways to send donations to the melonDS team:

* Patreon for monthly donations
* Paypal for one-time donations

I will add a donation page with these links and extra info.

Also, while I'm at it, some notes on these donation links:

Since these go to accounts that are mine, I was worrying that some people would think I spend the donation money on drugs and hookers, or whatever else. That ain't the case.

The donations mainly go towards the hosting costs, and other expenses related to the melonDS project (acquiring oddball addons to emulate, dumping the DSi bootrom, ...).

The domain name costs about 20€/year. The server itself costs $5/month, plus extra bandwidth usage; the server plan might have to be upgraded in the future as melonDS gets more popular.

The current influx of donations is already covering that, so the extra money is there for the melonDS team members to use if they face any emergency. For example, back in 2018 I had to cover transition-related expenses while having no income and trying to survive, so the patreon helped a bit with that, thank you folks. Currently things are going better for me, but we never know when one of us find themselves facing precarity, especially within the current context.

So I hope this clears up any doubt over the donations.

... read more
Merry Christmas from the melonDS team!
We hope you're having a good time with your family, friends, etc. Either way, to make things better, we bring you a little Christmas release.

melonDS 0.9.1

There are several changes since melonDS 0.9.

First, you may notice that we removed the nonfunctional vsync option from the video settings dialog. Admittedly, that setting was functional in the 0.8 versions. However, with Qt and the new multi-context OpenGL rendering we do, implementing vsync will take a bit more effort, and we haven't figured it out yet.

However, Generic implemented a new framerate limiter, based on that of Citra. This should help a lot with frame pacing issues.

I removed the hardcoded debug hotkey which had been accidentally left into the 0.9 release (oops).

Speaking of which, we now have a proper fullscreen hotkey. People were trying to use F11 as a fullscreen hotkey before, which not only was not implemented, but was actually triggering the hardcoded debug hotkey, freezing melonDS for a while. Now you can actually use F11 (or any key of your liking) for fullscreen.

On the DSi side, it is now possible to run unlaunch'd NANDs in melonDS. It may not yet be possible to hack melonDS and install unlaunch on it, though. We also added preliminary camera support, for now it feeds a fixed stripe pattern, but atleast the bases are there so games do better than just crashing.

We also now have a Mac build, courtesy WaluigiWare64. Speaking of builds, these release builds are pulled straight from our Github CI instead of being compiled on my computer. Let us know if there are any issues with them.
You can easily install melonDS and it's dependencies on macOS by running:
brew --cask install melonds

On the subject of package managers, melonDS is now also available as a flatpak package on flatpak, providing a simple, unified way to install melonDS on all Linux systems. First, install and setup the flatpak package manager, then install melonDS by running this in a shell.
flatpak install flathub net.kuribo64.melonDS

And, as usual, we have a bunch of little fixes and tweaks, which you can discover in our changelog or in the Github commit list.

... read more
Back in business
Sometimes, there's nothing quite like an interesting issue to motivate a lazy Arisotura.

For example: no graphics in the third flying level in Power Rangers - Super Legends.


Yeah, it's not very playable like this, even moreso as this is a shoot-em-up level.

When I looked at this, I saw Generic was already on it. He figured out that, when entering the level, the horizontal offset for BG0 was not being reset, and its previous value was 256, which caused BG0 (and thus the 3D graphics) to be pushed offscreen.

Moreover, NO$GBA and DeSmuME both suffered from the same issue, which meant that once again we were stepping into uncharted territory. Exciting!

So I set to work. I logged what the game was doing and made a disassembly. There were two main possibilities there: either the game was misbehaving due to some emulation issue, or it was working as intended but relied on unknown hardware behavior. The second hypothesis seemed more likely, seeing as when entering the glitched level, the game took care to clear VRAM and setup 2D layers and all that. It didn't seem to be a timing issue either, as those are typically affected when tweaking melonDS's cache timing constants, but tweaking these made no change there.

So, looking at my logs, I made several tests on hardware. I figured something was resetting the BG0 scroll position, but couldn't manage to reproduce that. So, seeing as it was 5:00, I went to bed, like normal people do.

... read more
A tour through melonDS's JIT recompiler Part 1
I already talked about the JIT recompiler on this blog before, but that was mostly blabla. Now we go into the nitty gritty details of how everything works! Maybe this will help other people working on JIT recompilers as there seems to be not so much written on this, so I learned a lot about this from reading other people's source code and talking to them (which I still encourage!). Also the JIT isn't my only work on melonDS, so I have some other topics to talk about later as well.

The heart of almost every emulator is the CPU emulation. In the case of the Nintendo DS it has two ARM cores, the ARM7 inherited from the GBA and an ARM9 core which is the main processor. Fortunately the main difference between these two for us are a few extra instructions and memory integrated in it (DTCM, ITCM and cache, the latter deserves it's own article btw). Otherwise it's also just a faster processor.

The most straightforward way to emulate a processor is an interpreter, i.e. replicating it's function step by step. So first the current instruction is fetched, then it's decoded to determine which handler is the appropriate one to execute it, which then is invoked. Then the program counter is increased and the cycle starts again.

This approach has the advantage that it's relatively easy to implement while allowing for very accurate emulation, of course only if you take everything into account (instruction behaviour, timing, …), but has the major disadvantage that it's pretty slow. For every emulated instruction quite a lot native instrutions have to be executed.

One way to improve this is what Dolphin and Mupen call a "cached interpreter". The idea is to take a few instructions at a time (a block) when they're first executed and save the decoding for them. Next time this block is executed we just need to follow this list of saved handlers. Viewing multiple instructions at once has other advantages as well, like e.g. we can analyse it and detect idle loops to skip them.

But even the cached interpreter is still comparatively inefficent. But what if we can generate a function at runtime which does the equivalent job of a block of emulated instructions and save it, so next time this block of instructions has to be executed we only need to call this function? With this method we could completely bypass branching out to the handlers which implement the respective instructions, because essentially everything is inlined. Other optimisations become possible, like we can keep emulated registers in native registers or we can completely eliminate the computation of values which aren't used and that's merely the beginning. That's where the speed of JIT recompilers comes from.

Before we can start recompiling instructions we first need to clear up on blocks of instructions. There are two main questions here:
  • where does a block begin and where does it end?
  • how are blocks saved/looked up?
Note that most of this applies for cached interpreters as well.

First we say a block can only be entered via the first instruction and left via the last one. This makes the code generation significantly more easier for us, but also the generated code more efficient. So it's not possible to jump into a block half way in, instead we would create another block which would start at that point. This has one problem: with the interpreter we can leave or execute at another point after every instruction, e.g. when an interrupt occured or the timeslot of the cpu is over, while a JIT block has to be executed until the end. For this reason the maximum block size is adjustable in desmume (and some games require setting it below a certain value) which is the case for melonDS as well, though we have some more hacks haven't heared of a game breaking at too high block sizes yet ;). The last thing to consider is that we can't just take the next n instructions from the first one and compile them into a block. We need to keep in mind that branch instructions can bring the pc to any other places, including somewhere inside this block and can also split the execution into two paths if they're conditional. While this all could be handled to generate even more efficient code (we do this to some degree, more on that later), for now we leave this out. So after a branch instruction we end a block.

The pivot of the second question is the block cache. melonDS's block cache has gone through a few iterations, though originally I just copied desmume's which is the one I'm going to describe here, we get fancier in the future. The way the generated code is stored might sound crude but it's simply a large buffer (32 MB) which we fill from bottom to top, once it's full we reset everything. That works surprisingly well, as it fits the code of most games and we still do it like this. Now we need to associate the entry point of a block inside that buffer with the pc in the emulated system where that block starts. Since big parts of the address space are unused it would be unwise to have a big buffer with a pointer for every possible address (that would also take 32 GB on an 64-bit system). A hash table would be an option but lookup can be relatively slow with those. Instead we add one layer of indirection. There is a first array of pointers which divides the address space into 16 KB or so regions. Each of those pointers point into other arrays for all the memory banks which exist which then point to the entry point of each JIT block function. We also only need to store a pointer for every second address, as ARM (4 byte) and Thumb (2 byte) instructions are always aligned to their respective sizes.

... read more
melonDS - now also for macOS!

If you want to test it, scroll down to the bottom of the post. I’ll be explaining about what needed to be changed for it to work.

This originally started as a little challenge. "It shouldn't be that hard," I thought. However, it wasn't as easy as I would have hoped, but I got there in the end.

- The JIT recompiler

Thanks Generic (aka RSDuck) for helping me out a lot here and guiding me!


It mapped memory using "memfd_create()" on Linux, which didn't exist on macOS. Instead, on macOS shm_open is used to create the fastmem memory.
macOS also didn't have "->gregs" in "uc_mcontext" and no "REG_RIP" either. This has to be changed to "->__ss.__rip" instead.
Then, it would crash with a "bus error" on attempting to load. This was caused because macOS returned "bus error" instead of "segmentation fault", so the signal handler couldn't handle it.
Note: fastmem was disabled because it caused all sorts of errors while trying to boot firmware or run games. If anyone manages to fix it, send a pull request!

The JIT itself

The JIT would build, but at link time it would complain about "ARM_Dispatch" and "ARM_Ret" being undefined. Apparently in the Mach-O format (used in macOS) global function names defined in assembly are required to be prepended by an underscore.
Then it would crash upon booting firmware or trying to load a game. This was caused by the line here which tried to reprotect some memory to make it executable. On macOS, new memory is now mmap'ed instead.

... read more