melonDS RSS The latest news on melonDS. Been silent lately... -- by Arisotura Sun, 31 Jan 2021 15:37:59 +0000
Anyway, you may have observed I haven't been doing a lot for melonDS lately... there are a couple reasons to that, besides that side project of fixing up vintage Macs and other shit.

I don't want to abandon melonDS, it's my project, but it's true that I have less interest into it at this point. Things like emulating new hardware features and hunting emulation bugs interest me much more than, say, UI shenanigans. As far as these things go, I tend to just accept work from other contributors, sometimes to the point of them becoming fulltime members of the 'melonDS team'. It's better this way, after all; as I say, an emulator project is like a tree: once you're done with the trunk, it branches off in a billion directions.

So, what can we attempt doing now?

I've been idly trying to figure out why DSiware will only load when installed on the NAND. So far, haven't gone very far with this, mostly because, well, my motivation tends to be an all-or-nothing thing. Either get the spark and find yourself coding until 5:00, or have a hard time doing anything at all. Also, working with the DSi firmware is a real pleasure: it's a huge spaghetti network of threads and callbacks and shit, making it a massive pain to track anything, even moreso when all you have is a bunch of ARM ASM.

From there, there would be two possibilities: a quick hack to bypass whatever check the DSi firmware uses, like we do for the region check, or actually installing the provided DSiware into the NAND. The latter implies dealing with an encryption layer and a FAT filesystem, which I can't wrap my head around. We'd need to get a good lil' FAT library, but then this would open more possibilities regarding DSi emulation.

Then there are the remaining popular-request items, and the big pile of issues.

For example, a looooot of the issues are with the OpenGL renderer. This is why I stand for emulating things accurately: you are far less likely to run into constant issues and get yourself caught into a game of whack-a-mole perpetually hacking around issues. While things like the OpenGL renderer are creative and cool, we have seen countless times that the DS GPU is a pile of quirks, and that emulating it correctly is only possible with somethings like our software renderer. We can keep coming up with creative solutions to try and fix OpenGL issues, but at some point there is only so much we can do when using a fundamentally inadequate tool.

An alternative may be unearthing my old 'shaderzorz' experiment: that was an attempt at a compute-shader rasterizer that worked similarly to that of the DS. Such a renderer, implemented either in compute shaders or in fragment shaders, may be able to get around the shortcomings of the current OpenGL renderer, and maybe also support higher-resolution rendering without too much of a performance penalty. Back then, I ended up ditching it in favor of a classical polygon-based renderer because I wasn't positive about the performance. However, the current renderer is far from optimal too, due to how rendering has to be done.

So, we might offer such a renderer for platforms that support it, and keep the classical renderer for the remaining users, but we would have to accept that the latter will remain imperfect.

There are also the timing issues. The holy grail of DS emulation, I guess. This, like several of the other things I have in mind (hi pixel-perfection), is pretty much a high-effort low-reward item.

No hacking around timings will get us really far if we don't have the logic down. Thus, we would need to work out the CPU timings, how memory waitstates for code and data regions interact for each instruction, and so on. Only once we have the general logic down, could we try implementing a model efficiently, and then seeing whether such a model can work with estimated cache timings, or whether we need actual cache emulation. Until then, no real point attempting to emulate the ARM9 caches if our base timing logic is off.

Another thing that would definitely be good for user-friendliness, would be making melonDS plug-and-play: basically, not requiring original BIOS and firmware.

As far as DS mode is concerned, it's possible to use DraStic's alternate BIOS (even though I would like to make my own eventually). Building a basic firmware image would also be doable, obviously it wouldn't come with the DS menu, but it would be enough to run games. We would just need to provide an interface for changing the firmware config.

DSi mode would definitely be trickier. It requires a NAND dump, and I haven't looked into how feasible it would be to craft a working basic NAND. We haven't even looked into DSi-mode direct boot yet, and that would definitely be a requirement.

Then there is the good ol' wifi quest. Not sure how far we can get there...


I want to be there for melonDS 1.0. That's going to be one big release :)

Maybe I originally wanted to write about more things? My brain is running out of fuel. Oh well.]]>
Update on donations -- by Arisotura Wed, 13 Jan 2021 17:34:31 +0000
* Patreon for monthly donations
* Paypal for one-time donations

I will add a donation page with these links and extra info.

Also, while I'm at it, some notes on these donation links:

Since these go to accounts that are mine, I was worrying that some people would think I spend the donation money on drugs and hookers, or whatever else. That ain't the case.

The donations mainly go towards the hosting costs, and other expenses related to the melonDS project (acquiring oddball addons to emulate, dumping the DSi bootrom, ...).

The domain name costs about 20€/year. The server itself costs $5/month, plus extra bandwidth usage; the server plan might have to be upgraded in the future as melonDS gets more popular.

The current influx of donations is already covering that, so the extra money is there for the melonDS team members to use if they face any emergency. For example, back in 2018 I had to cover transition-related expenses while having no income and trying to survive, so the patreon helped a bit with that, thank you folks. Currently things are going better for me, but we never know when one of us find themselves facing precarity, especially within the current context.

So I hope this clears up any doubt over the donations.

And of course, I want to remind that while donating is a nice gesture, it doesn't entitle you to anything in return, and conversely, you don't miss out on anything if you're not donating. You do as you wish.


I am also updating this site, mainly adding a dynamic page system to replace the old hardcoded pages (the howto/FAQ page has been ported to that). Let me know if anything is broken or missing.]]>
Merry Christmas from the melonDS team! -- by Arisotura Fri, 25 Dec 2020 15:19:08 +0000
melonDS 0.9.1

There are several changes since melonDS 0.9.

First, you may notice that we removed the nonfunctional vsync option from the video settings dialog. Admittedly, that setting was functional in the 0.8 versions. However, with Qt and the new multi-context OpenGL rendering we do, implementing vsync will take a bit more effort, and we haven't figured it out yet.

However, Generic implemented a new framerate limiter, based on that of Citra. This should help a lot with frame pacing issues.

I removed the hardcoded debug hotkey which had been accidentally left into the 0.9 release (oops).

Speaking of which, we now have a proper fullscreen hotkey. People were trying to use F11 as a fullscreen hotkey before, which not only was not implemented, but was actually triggering the hardcoded debug hotkey, freezing melonDS for a while. Now you can actually use F11 (or any key of your liking) for fullscreen.

On the DSi side, it is now possible to run unlaunch'd NANDs in melonDS. It may not yet be possible to hack melonDS and install unlaunch on it, though. We also added preliminary camera support, for now it feeds a fixed stripe pattern, but atleast the bases are there so games do better than just crashing.

We also now have a Mac build, courtesy WaluigiWare64. Speaking of builds, these release builds are pulled straight from our Github CI instead of being compiled on my computer. Let us know if there are any issues with them.
You can easily install melonDS and it's dependencies on macOS by running:
brew --cask install melonds

On the subject of package managers, melonDS is now also available as a flatpak package on flatpak, providing a simple, unified way to install melonDS on all Linux systems. First, install and setup the flatpak package manager, then install melonDS by running this in a shell.
flatpak install flathub net.kuribo64.melonDS

And, as usual, we have a bunch of little fixes and tweaks, which you can discover in our changelog or in the Github commit list.


melonDS 0.9.1, Windows 64-bit
melonDS 0.9.1, Linux 64-bit
melonDS 0.9.1, Linux ARM64
melonDS 0.9.1, MacOS 64-bit

If you're feeling generous: here's our Patreon]]>
Back in business -- by Arisotura Thu, 10 Dec 2020 22:22:46 +0000
For example: no graphics in the third flying level in Power Rangers - Super Legends.


Yeah, it's not very playable like this, even moreso as this is a shoot-em-up level.

When I looked at this, I saw Generic was already on it. He figured out that, when entering the level, the horizontal offset for BG0 was not being reset, and its previous value was 256, which caused BG0 (and thus the 3D graphics) to be pushed offscreen.

Moreover, NO$GBA and DeSmuME both suffered from the same issue, which meant that once again we were stepping into uncharted territory. Exciting!

So I set to work. I logged what the game was doing and made a disassembly. There were two main possibilities there: either the game was misbehaving due to some emulation issue, or it was working as intended but relied on unknown hardware behavior. The second hypothesis seemed more likely, seeing as when entering the glitched level, the game took care to clear VRAM and setup 2D layers and all that. It didn't seem to be a timing issue either, as those are typically affected when tweaking melonDS's cache timing constants, but tweaking these made no change there.

So, looking at my logs, I made several tests on hardware. I figured something was resetting the BG0 scroll position, but couldn't manage to reproduce that. So, seeing as it was 5:00, I went to bed, like normal people do.

The next day, Generic gave me a test ROM that reproduced the issue at hand. It was one of the basic libnds 3D examples, except it was modified to write a value to BG0HOFS at the very beginning of main(), before doing anything else. Sure enough, the example cube found itself scrolled on melonDS, but not on hardware. Well, looks like after hours of banging our heads at that issue, he'd figured out something.

Going from this, I was able to pinpoint the exact conditions that caused the BG0 scroll position to be reset. In particular, in the glInit() function:

powerOn(POWER_3D_CORE | POWER_MATRIX);    // enable 3D core & geometry engine

POWCNT1. I'd suspected it, I'd tried messing with it, but didn't find anything, because the hardware behavior was not what I'd expected.

Basically, POWCNT1 is the ARM9-side power control register. It lets you enable or disable the 2D renderers, the two components of the 3D renderer (the geometry engine and the rendering engine), the screens, and also swap the screens. Fun shit.

Further experiment showed us that, unlike what we thought, horizontal scrolling for the 3D layer isn't done by the 2D engine, but by the 3D rendering engine, right before the 3D layer is passed to the 2D engine. This was further confirmed by Hydr8gon, who observed that when capturing the 3D layer alone, it also has the scrolling applied.

So how does this work? The 3D rendering engine has its own scroll position register, which is updated when writing to BG0HOFS. Except when the 3D rendering engine is disabled, in which case its register stays untouched. There we go.

Sure enough, Power Rangers doesn't turn on the 3D renderer until it actually has to render 3D graphics, that is, in our glitched level. Hence, the scroll position given to BG0 in the previous 2D-only levels would never end up into the 3D rendering engine's scroll register.

Sure enough, implementing that behavior into melonDS did the trick:

Did we tell you the DS has quirky hardware?]]>
A tour through melonDS's JIT recompiler Part 1 -- by Generic aka RSDuck Sat, 05 Dec 2020 21:56:31 +0000
The heart of almost every emulator is the CPU emulation. In the case of the Nintendo DS it has two ARM cores, the ARM7 inherited from the GBA and an ARM9 core which is the main processor. Fortunately the main difference between these two for us are a few extra instructions and memory integrated in it (DTCM, ITCM and cache, the latter deserves it's own article btw). Otherwise it's also just a faster processor.

The most straightforward way to emulate a processor is an interpreter, i.e. replicating it's function step by step. So first the current instruction is fetched, then it's decoded to determine which handler is the appropriate one to execute it, which then is invoked. Then the program counter is increased and the cycle starts again.

This approach has the advantage that it's relatively easy to implement while allowing for very accurate emulation, of course only if you take everything into account (instruction behaviour, timing, …), but has the major disadvantage that it's pretty slow. For every emulated instruction quite a lot native instrutions have to be executed.

One way to improve this is what Dolphin and Mupen call a "cached interpreter". The idea is to take a few instructions at a time (a block) when they're first executed and save the decoding for them. Next time this block is executed we just need to follow this list of saved handlers. Viewing multiple instructions at once has other advantages as well, like e.g. we can analyse it and detect idle loops to skip them.

But even the cached interpreter is still comparatively inefficent. But what if we can generate a function at runtime which does the equivalent job of a block of emulated instructions and save it, so next time this block of instructions has to be executed we only need to call this function? With this method we could completely bypass branching out to the handlers which implement the respective instructions, because essentially everything is inlined. Other optimisations become possible, like we can keep emulated registers in native registers or we can completely eliminate the computation of values which aren't used and that's merely the beginning. That's where the speed of JIT recompilers comes from.

Before we can start recompiling instructions we first need to clear up on blocks of instructions. There are two main questions here:

  • where does a block begin and where does it end?

  • how are blocks saved/looked up?

Note that most of this applies for cached interpreters as well.

First we say a block can only be entered via the first instruction and left via the last one. This makes the code generation significantly more easier for us, but also the generated code more efficient. So it's not possible to jump into a block half way in, instead we would create another block which would start at that point. This has one problem: with the interpreter we can leave or execute at another point after every instruction, e.g. when an interrupt occured or the timeslot of the cpu is over, while a JIT block has to be executed until the end. For this reason the maximum block size is adjustable in desmume (and some games require setting it below a certain value) which is the case for melonDS as well, though we have some more hacks haven't heared of a game breaking at too high block sizes yet ;). The last thing to consider is that we can't just take the next n instructions from the first one and compile them into a block. We need to keep in mind that branch instructions can bring the pc to any other places, including somewhere inside this block and can also split the execution into two paths if they're conditional. While this all could be handled to generate even more efficient code (we do this to some degree, more on that later), for now we leave this out. So after a branch instruction we end a block.

The pivot of the second question is the block cache. melonDS's block cache has gone through a few iterations, though originally I just copied desmume's which is the one I'm going to describe here, we get fancier in the future. The way the generated code is stored might sound crude but it's simply a large buffer (32 MB) which we fill from bottom to top, once it's full we reset everything. That works surprisingly well, as it fits the code of most games and we still do it like this. Now we need to associate the entry point of a block inside that buffer with the pc in the emulated system where that block starts. Since big parts of the address space are unused it would be unwise to have a big buffer with a pointer for every possible address (that would also take 32 GB on an 64-bit system). A hash table would be an option but lookup can be relatively slow with those. Instead we add one layer of indirection. There is a first array of pointers which divides the address space into 16 KB or so regions. Each of those pointers point into other arrays for all the memory banks which exist which then point to the entry point of each JIT block function. We also only need to store a pointer for every second address, as ARM (4 byte) and Thumb (2 byte) instructions are always aligned to their respective sizes.

Now instead of the usual interpreter loop described before we instead lookup if there's a JIT block at the current execution address. If yes we execute it, otherwise we compile a new one starting at that address and insert it into the block cache for that address.

That concludes the first part of this series. Next time we'll look into the recompilation process itself!]]>
melonDS - now also for macOS! -- by WaluigiWare64 Sun, 29 Nov 2020 17:32:59 +0000 melonDS now also supports macOS!

If you want to test it, scroll down to the bottom of the post. I’ll be explaining about what needed to be changed for it to work.

This originally started as a little challenge. "It shouldn't be that hard," I thought. However, it wasn't as easy as I would have hoped, but I got there in the end.

- The JIT recompiler

Thanks Generic (aka RSDuck) for helping me out a lot here and guiding me!


It mapped memory using "memfd_create()" on Linux, which didn't exist on macOS. Instead, on macOS shm_open is used to create the fastmem memory.
macOS also didn't have "->gregs" in "uc_mcontext" and no "REG_RIP" either. This has to be changed to "->__ss.__rip" instead.
Then, it would crash with a "bus error" on attempting to load. This was caused because macOS returned "bus error" instead of "segmentation fault", so the signal handler couldn't handle it.
Note: fastmem was disabled because it caused all sorts of errors while trying to boot firmware or run games. If anyone manages to fix it, send a pull request!

The JIT itself

The JIT would build, but at link time it would complain about "ARM_Dispatch" and "ARM_Ret" being undefined. Apparently in the Mach-O format (used in macOS) global function names defined in assembly are required to be prepended by an underscore.
Then it would crash upon booting firmware or trying to load a game. This was caused by the line here which tried to reprotect some memory to make it executable. On macOS, new memory is now mmap'ed instead.

- The OpenGL renderer

macOS complained about not being able to find "GL/gl.h" and "GL/glext.h". These includes had to be changed to "OpenGL/gl3.h" and "OpenGL/gl3ext.h" on macOS and the OpenGL framework was linked.
Also the functions defined by the OpenGL macronator already existed on macOS, which caused "ambiguous reference" errors. The macronator was ifndef'd out.

- Direct Mode

Direct Mode used "AF_PACKET" to get the MAC address, which doesn't exist on macOS. "AF_LINK" was used instead.
The library names of libpcap had to be changed to "libpcap.A.dylib" and "libpcap.dylib" on macOS.

- Binding Keys

This was a simple fix. For some reason, macOS didn't give focus to the buttons in the key binding menu when they were pressed, which meant that they couldn't detect keys. I had to set the focus policy to Qt::StrongFocus to get them to accept focus.

- App Bundle

Now it built fine and it worked, but it came as a Unix executable, not a macOS app bundle. I had to add some lines in CMakeLists.txt to make it build an app bundle.
I also generated a macOS ".icns" icon file for melonDS, so now the icon showed up on the app bundle.

- No libslirp available in Homebrew

Homebrew (the package manager) didn't have libslirp in their repositories, so I created a pull request here which was merged.

Here are the downloads. If you find any issues, make sure you comment here and tell me so I can fix it!
You will have to install the appropriate libraries beforehand with the Homebrew Package Manager.
In Terminal paste the following command to install the required libraries.
brew install qt5 sdl2 libslirp

melonDS 0.9 beta for macOS x86_64
To unzip the above download, you may need to use a program like The Unarchiver.]]>
A lil' message to would-be translators -- by Arisotura Thu, 12 Nov 2020 12:11:09 +0000
I am wary about internationalizing software before the end of the dev cycle. That being said, is there really an 'end of dev cycle' for an emulator project? I think it'd be a good idea to make melonDS accessible to languages that aren't English. I have a couple concerns about this though:

- I'd like translators to stick around. If they can be around to fix up their respective translations before each release, that will be great. I just really want to avoid having translations become incomplete and/or obsolete because their author is long gone.

- I want to ensure the translators are good at English and understand the terminology used in melonDS's UI. Just basic quality insurance, no Google Translate crap.

- What about this website? It's a whole different can of worms. The interface could be translated, but having to translate each and every blog post would be a massive pain in the ass.

There are also a bunch of technical concerns, but, overall, maybe we can try and pull this off for melonDS 1.0, or even earlier?

If you're in, check out this thread.

Thank you!]]>
Changes to the website -- by Arisotura Sat, 31 Oct 2020 13:07:11 +0000
Of course, comments are also still open to guests.

There are more updates planned to this site, so, let me know asap if anything breaks.]]>
The DSi camera adventure -- by Arisotura Tue, 27 Oct 2020 13:34:06 +0000
Well, it's not been that easy.

I had started work in the dsi_camera branch, but so far it was a large trainwreck. I couldn't really understand how camera transfers work and how everything interacts together. My attempt at a guessed implementation was getting nowhere, which meant it was time for some hardware research.

So I started work on a DSi camera test homebrew. I first went and implemented the initialization procedure found in GBAtek, only to be rewarded with a hang when trying to activate a camera. I tried many things, taking the init procedure from some open-source Aptina MT9V113 driver (the model of camera the DSi uses), reverse-engineering the DSi camera app to use its exact init procedure, all to no avail.

I felt stuck there. I even tried looking for existing examples using the DSi cameras, found this one by Epicpkmn11, but at the time it seemed to have the same issue I was having.

I eventually went out and asked for help on several places. A side effect is that I'm now found in some Discord servers. I also posted a thread at nesdev, knowing nocash hangs around there. The documentation in GBAtek implied he did get the cameras working, so I figured he'd be able to help. And he did, thanks there.

I first looked at the code he provided, checking for any meaningful differences in the init procedure, but it looked like I had all the essential stuff right. I was stumped.

It eventually occured to me that maybe I should try initializing both cameras simultaneously, like Nintendo does, rather than only initializing one camera. You know how it is, when you're desperate, anything can look like a valid solution. Anyway, that didn't cut it, but it revealed something interesting when I tried to read some registers from both cameras. Some reads were getting corrupted. So I knew something was up with the I2C code.

Looking at nocash's I2C code, I was able to spot and fix the issue. Turns out that during an I2C read, you don't raise an ack when reading the last byte. This fixed the corruption I was observing, and finally allowed the camera to activate successfully. At the same time, Epicpkmn11 happened to be in the same Discord server I was in, so they could fix their code too (turns out it did have the same issue as mine).

Next thing I did was enable a camera transfer, and, lo and behold, I was able to display camera input on my DSi. From this, I tested the camera transfer hardware in several ways, to try figuring out how it works.

Details are still hazy, but this time I was able to make a working implementation in melonDS.

I made the data register return a fixed value, hence the red/blue stripes. But now that we have a working base, next step is feeding an image buffer into this, and ironing out the remaining issues (for example, taking a picture causes a system error).

But after that, I felt like chilling some, and figured I would try finding out why ZXDS was running abysmally slow in melonDS. Basically, ZXDS is a ZX Spectrum emulator for the DS. I'm not into Spectrum emulation, but from what I could read, the emulator is quite impressive technically, and I enjoyed reading the author's developer diary.

Anyway, ZXDS abuses the DS's writable VCount to limit the framerate to 50FPS. To quote its author:

Having this common master frequency, it was now simple to use it for converting T cycles passed to amount of samples to generate, as well as to use it to set up a timer which would count the 50Hz which would drive the LCD, and still keep everything in perfect sync. The refresh rate of the LCD itself can't be set directly, though, however the DS features a writable VCOUNT register which can be used to delay the start of the next frame as needed. It is normally intended to be used to synchronize display of machines participating in multiplayer games, but it was trivial to abuse it for holding the retrace after each frame displayed until the interrupt handler driven by the 50Hz timer allowed it to go, effectively slowing the LCD refresh rate to 50Hz as well.

Technically, ZXDS uses two cascading timers to achieve this. Timer 2 is set to an interval of 16 cycles, and drives timer 3 which is set so that its IRQ will fire every 20ms. ZXDS will then hold VCount at a certain value until the IRQ fires. All fine and dandy.

Of course, as far as melonDS is concerned, this is where the problem was lying. ZXDS would hold VCount for way too long, causing each frame to last absurdly long. Sure enough, the issue came from the timers. What, a timer issue in melonDS in 2020? Madness!

In particular, it came from timer 2 and its tight interval. melonDS updates its timers more loosely than the actual DS, but it assumed a timer would only overflow once after each update. What happened here was that updates were far enough apart that timer 2 had the time to overflow more than once. The assumption that it would only overflow once caused it to start behaving wrong, and that is why the timer 3 IRQs were so far apart.

So I did something similar to audio timers, which are updated every 1024 cycles (and do take into account the fact that they could overflow several times in one update). This fixed the issue, ZXDS now runs at the expected 50FPS.

Sorry for the silence lately -- by Arisotura Sun, 11 Oct 2020 15:26:20 +0000
Anyway, what can we attempt doing, at this point?

Besides dealing with the pull requests and issue reports?

One thing I was working on lately was DSi camera support, but I didn't get too far. I'm going to need hardware tests to figure out how the camera hardware works. Considering the lenghty initialization procedure for those, it's not quite something I look forward to. So I'll post more about this when I get further into it.

I have ideas for the OpenGL renderer, namely, a better method for rendering quads. It would need more work for an implementation though, but might be worth it.

But, one of my main concerns is about wifi, especially local multiplayer.

At this point, melonDS is mainly known as 'the wifi emulator'. It's a bit sad that, 3 years after we got it working, we're still telling people to disable their framerate limiter and pray. We can probably do better.

It's not like we haven't tried, though. You might have seen that branch named 'betterer_wifi' in the repo. I was hoping to run the wifi with more stable timing, but it was a trainwreck, it performed even worse than our current method.

The main issue with local multiplayer is that it requires tight synchronization to function. You might remember how finicky it was back in the old days, you would start lagging and disconnecting as soon as your friend was more than 10m away from you. Long story short, the protocol works by having the host repeatedly poll its clients, multiple times per frame, and each client is given a narrow window to respond (the time given is barely greater than what it takes to transfer the response frame).

On melonDS, things are even worse as it's difficult to tell whether wifi issues arise from bad emulation of the wifi hardware, or from transmit errors, or both. We use BSD sockets as a means of transmitting frames, which inherently adds some lag. The way melonDS runs is also problematic in that it's just running as fast as possible, which can result in the wifi system running faster than it should. The throttling mechanisms, be it audio sync or framerate limiter, only kick in every once in a while, so they only make things worse at the scale of wifi operation.

So, for local multiplayer to function correctly, we would need to overhaul it. Basically, synchronize things tightly based on multiplayer frame exchanges. How to do so without ruining performance, good question.

One possibility I thought of would be running multiple DSes inside one melonDS instance, akin to NO$GBA. However this would require quite some refactoring, as the melonDS codebase was built around the assumption that it would only ever emulate one system per instance.

The other possibility is, well, reworking how we do the whole IPC. Figuring out a fast way to do IPC (we're talking about microsecond-order timings). Synchronizing melonDS instances tightly. And so on.

Welp. Time will tell how this goes, I guess.]]>