DS emulator, sorta

melonDS aims at providing fast and accurate Nintendo DS emulation. While it is still a work in progress, it has a pretty solid set of features:

• Nearly complete core (CPU, video, audio, ...)
• RTC, microphone, lid close/open
• Joystick support
• Savestates
• Various display position/sizing/rotation modes
• (WIP) Wifi: local multiplayer, online connectivity
• and more are planned!

Download melonDS

If you're running into trouble: Howto/FAQ

If you're feeling generous: melonDS Patreon
Doing the time warp

This looks a lot like another screenshot, from two years ago:

So why am I posting this now? Well the answer is simple, we're going back in time and preparing a new melonDS release that is roughly equivalent to 0.1.


Joke aside, there are some key differences between those screenshots:

* newer one has proper clipping at screen edges
* both lack Z-buffering, but newer one is different, likely because the older one didn't have Y-sorting
* newer one is using OpenGL

So, yeah, that's the long-awaited OpenGL renderer. Like the good ol' software renderer in its time, it's taking its baby steps, and barely beginning to render something, but I'm working on it, so in a while it will become awesome :P

This renderer will aim for reasonable accuracy. As a result, it will require a pretty recent OpenGL version, and compatible hardware. It's set to OpenGL 4.3 currently, but I will adjust the minimum requirement once the renderer is finished.

If needed, I can provide alternate versions of the renderer for lower-end hardware supporting older OpenGL versions, but they will be less accurate. While the software renderer is the 'gold standard', the current OpenGL renderer is a sort of 'minimum standard' to get most games to render correctly. Any lower-spec renderer may render certain games wrong and that will be unlikely to get fixed (or it would be fixed but at the cost of killing performance).


Speaking of the software renderer, I also felt like doing a bit more research towards the holy grail: pixel perfection. I'm not done yet, but I finally have those pesky polygon edge slope functions down. Someday we will get those aging cart tests to pass ;)

But that will be for later. I may also write a post about all the juicy low-level hardware details.


But, back to OpenGL. Might as well explain why the planning phase for this renderer took so long. Although you guess that it's 50% the DS GPU being a pile of quirks and 50% me being a lazy fuck.

The first experiments were made with a compute shader based rasterizer. That way, I could get it perfect, while supporting graphical enhancements. I ended up ditching this solution because the performance wasn't good.

So, back to more standard rendering methods, aka pushing triangles. We won't get to rasterize quads correctly that way, but in most cases, the difference shouldn't matter.

First thing to do is to devise an efficient way to push triangles. This requires straying away from standard rendering methods, especially in how we do texturing and all.

On the DS, a game can choose to change the current texture at any time. Polygon attributes can only be changed before a BEGIN_VTXS command, but that doesn't make it any better. Polygons are sorted by their Y coordinates before rendering, which can completely change their ordering. Basically, there is no guarantee that polygons will be grouped by polygon/texture attributes, and the ordering after Y-sorting must be preserved or you might break things like UIs that rely on it.

This is shitty for our purposes though. If, for the DS, changing polygon/texture attributes is mostly free, you can't say as much about OpenGL (or any desktop graphics API for that matter). You would end up with one draw call per polygon, which isn't really a good thing.

Another thing worth considering is that our window for 3D rendering is not a full frame (16.667ms). On the DS, 3D rendering starts at scanline 215 (or 214?). Rendering any sooner would be a bad idea as the game might still be updating texture VRAM. But, we need 3D graphics as soon as scanline 0 of the next frame, which leaves us only 48 scanlines worth of time to do the rendering.

The software renderer is able to work around this limitation by using threading and per-scanline rendering (pretty much like the real thing, except that one seems to render two scanlines at once), which extends the rendering time frame to 192 scanlines.

OpenGL does not render per-scanline, though. So we can forget about this. However, a possibility would be splitting the frame in four 256x48 chunks. I will study that possibility if performance is an issue -- would have to see how far the extended rendering timeframe can outweigh the extra draw calls. Maybe propose the two rendering methods as options.

Back to pushing triangles, for now. I devised a way to pass polygon/texture attributes to the fragment shader, and render all the polygons in one draw call. Nice and dandy, but we're not out of trouble. This will imply passing the raw DS VRAM to the fragment shader and having it handle all the details of texturing, akin to the TextureLookup() function in the software renderer. No idea about the performance implications of this.

Also, we will have to think of something for shadow polygons, I don't think we can use the regular stencil buffer with this.

Well. I hope this renderer will be compatible with OpenGL ES, with all the tricks it may be pulling, but... we'll see.
melonDS 0.7.4
Arisotura finally stopped being lazy, among other things, so here we go, we present you melonDS 0.7.4.

This release is hardly going to be revolutionary though, but I had to get it out before starting work on the hardware renderer.

The highlight of this release is the upgrade to the online wifi feature. Two main points there:

1. Under direct mode, you can finally choose which network adapter will be used by libpcap. The wifi settings dialog also shows their MAC and IP as to help you ensure you're picking the right adapter.

2. Indirect mode, which, as mentioned before, works without libpcap and on any connection. However, it's in alpha stages and it's a gross hack in general, so you can feel free to try it out, but don't expect miracles.

As usual, direct mode requires libpcap and an ethernet connection, and that you run melonDS with admin/superuser privileges.

Other than that, there are a few little fixes to the SDL audio code. melonDS now only emits sound when running something, so it shouldn't emit obnoxious noises when starting anymore. Also, it no longer crashes if you use WAV microphone input with a file that has multiple channels.

And other little things.

Now, full steam ahead towards 0.8! Or not quite. I also need to finish the redesign for this site, among other things.


melonDS 0.7.4, Windows 64-bit
melonDS 0.7.4, Linux 64-bit

melonDS Patreon if you're feeling generous
Immediate plans
Indirect mode is still flaky, but it's good enough for a beta release. I only have to polish a couple details and it'll be good for a 0.7.4 release, so, expect that release soon.

I also wanted to fix one of the issues with local multiplayer: when more than two players are involved, clients receive replies sent by other clients, which they shouldn't receive, and this likely contributes to it shitting itself big time.

But alas, this will be more complicated than anticipated. This is also the main reason why local multiplayer pretty much stagnated after melonDS gained its "wifi emulator" reputation back in 2017: emulating local MP is a large pile of issues that are all interconnected. Wifi emulation in melonDS is more or less a pile of hacks, and it works, but it's full of issues. Some stem from incomplete understanding of the DS wifi hardware (after 15 years. welp), but most of them are timing issues.

(which are also why I'm pretty much pessimistic about ever connecting melonDS to a real DS)

Local multiplayer 'a la Nintendo' works on a host-client scheme detailed here. Long story short, the host polls the clients at a regular, small interval (for example, every 4ms for Pictochat). Dealing with the sheer amount of packets transferred is in itself a challenge.

But that's not all. When the host sends its packet, each client is given a window within which it should send its response. Miss your window and it's considered a failure.

This works well with actual DSes because they're all running at the same speed, so the timings are reliable.

With melonDS, it's another story. We get lag inherent to the platform on which we're running: the network stack, thread scheduling, etc... Running multiple DSes in one melonDS instance might help alleviate these lag sources, but it wouldn't be a perfect solution either (it would likely be running the DSes on separate threads).

We also learned by experimentation that the framerate limiter is a problem, and connections worked better when disabling it. As silly as that sounds, it makes sense when looking closer. When disabling the framerate limiter (or when running below 60FPS), the melonDS instances run as fast as possible, and they may end up running at roughly the same speed, consistently, which makes for a better connection (less chance that MP replies miss their window). However, when enabling it, your melonDS instances may be running at any speed above 60FPS, but they will be spending several milliseconds per frame doing nothing in order to bring the framerate back to 60FPS.

Which, you guess, is bad bad bad for MP communications. The wifi system is driven by the emulator's scheduler, so it will end up running faster than it should, squishing MP reply windows and making it way more likely that replies get dropped.

And indeed, disabling the framerate limiter greatly reduces the amount of MP replies missing their window, even if the framerates are barely above 60FPS. The framerate limiter might be a bit zealous there.

However, the emulator instances might run at different framerates and possibly desync. And that's not too convenient if they end up running at absurdly high framerates.


To get anywhere with this local multiplayer shito, we'll need a redesign. No amount of hacky solutions will get us anywhere with this pile of hacks.

First part is how the wifi system is driven.

melonDS runs ~560190 cycles per frame, which is 33611400Hz under ideal circumstances, close enough to the DS clock frequency.

Wifi is updated every microsecond. The handler is called every 33 cycles, which means that one emulated microsecond actually lasts ~0.9818 microseconds. I'll spare you the nerdy calculations about how much that represents in offset, because so far that hasn't prevented it from working.

But on the DS, the wifi system is driven by a 22MHz clock, independently from the system clock. So driving melonDS's wifi system independently from the scheduler would not only be accurate, but also isolate it from the core's variable execution speed. However, two main issues arise from this:

1. We need to keep the core somewhat in sync with the wifi system, or the game would eventually shit itself. How to synchronize and when to do so? That's the question.

2. The current wifi system needs to be updated per microsecond. If we put it on a separate thread, how to take care of this without pegging the host CPU and killing performance? We could design a scheduler similar to the core one. However the wifi system has a readable microsecond counter, and we need to take care of that somehow. Without per-microsecond updates, we can only approximate it and hope it will be good enough.

To further prove my point:


We'll think of all that later. Past 0.7.4, we're going to focus on the hardware renderer.
Getting somewhere, finally

Finally, ClIRC is cooperating!

This took a bit of hackery to get around the limitations we face when working with plain old BSD sockets. Namely, how we're handling TCP acks for now.

Regardless, it's finally working without exploding! Well, I don't know about things like altWFC. But ClIRC is working about as well as when using direct mode.

We'll be polishing this and testing it with things like altWFC. You can expect a release real soon.

(also, a side effect of using sockets is that closing melonDS terminates connections correctly, which is nice for testing)
Change in plans
Indirect mode was already down to a level where it was dissecting TCP/UDP headers and keeping track of TCP sockets and all.

So I figured I would just rewrite it to use regular sockets instead of libpcap. There will be a lot less issues getting things to work this way, and this has the significant advantage that it doesn't require libpcap, and likely can work without requiring admin privileges.

The downside is that we have less control this way, as all the low-level TCP/IP details are handled by the host OS, so some things might break, but I'm confident that this will work just fine for the most typical uses, which involve standard TCP connections.

Anyway, I have DHCP, ARP and DNS covered. I started work on TCP, it's now able to initiate a connection to a server. Now I need to get the actual data exchange working.

I'm also unsure this can work for any use case where the DS is used as a server, but for now we don't have to care about that. And there's always direct mode.
Uninspired title is uninspired. Anyway, two things for this post.

1. Coming-out, sorta.

I am, well... nonbinary, likely some flavor of girl.

I'm a bit sick of the constant assumption that internet users are male by default, or that 'there are no girls on the internet'. Other genders, trans or not, are here too.

If you don't know, you can either ask or use gender-neutral prounous, it's still better than perpetrating the 'masculine by default' norm.

2. Network shito

The issue I mentioned with the DS receiving packets addressed to the host, also happens the other way around, ie the host receives traffic addressed to the DS. This might be the cause for the lag I observed.

Still thinking of a workaround to this. May need to read about network routing and all.

That's all folks!
Indirect mode progress
After a while of dealing with the silly issues like failing to properly compute checksums, indirect mode is finally showing some promising results.

The idea is that outgoing network traffic is altered to make it look like it's coming from the host machine, so that it will get past wifi access points without trouble. In return, incoming traffic is addressed to the host machine, so I first took the easy path of redirecting all traffic to the emulated DS regardless.

Bad idea tho.

This caused all connections on the host machine to die constantly. Reason is that on the DS, the sgIP stack is receiving TCP traffic it did not initiate (because those are destined to the host), and interfering with these connections, likely just trying to kill what it perceives as bogus connections.

To work around this, we have to keep track of TCP sockets. Examining the flags in the TCP header, we know when a connection begins, and can know to only redirect incoming traffic if the source IP and source/destination ports match an existing socket.

Basically, NAT.

This gave promising results with ClIRC, but for some reason there's quite some lag on messages sent from ClIRC, which I don't remember being that laggy.

The current implementation will be limited to TCP and UDP, but for uses like AltWFC, this should be enough. For more advanced shito, there's always direct mode, which I think can be made to work on a wifi connection with some hackery.
Full steam ahead!
You might have noticed the beginning of another coding streak. Is an awesome feature being coded? You'd be right.

One of the main reasons why coding is going at this weird pace, besides the recent surge in local political activity and moving to a betterer apartment, is, well, how I achieve work. Whenever big-ish changes are required (designing something new, redesigning something...), I need time to think about it, and sometimes just do something else entirely for a long while. Until I go 'fuck it' and just code the damn thing.

So, I started coding the damn thing.

The "bind socket to any address" setting for local wifi has been moved to this spanking new wifi settings dialog.

There's also the new UI for setting up online mode, so you can, you know, pick a network adapter, etc.

That's why it took a while btw. It displays adapter information such as MAC, IP, DNS servers, which aren't provided by stock libpcap. So I had to use the Win32 API to retrieve those (and I have yet to code the Linux equivalent). We also get more meaningful adapter names this way, instead of having them all be named 'Microsoft'.

Then, there is the 'direct mode' thingablarg. What's that for?

Direct mode is basically the old way of forwarding network traffic: directly passing packets from/to libpcap. While this is the most straightforward and less likely to shit itself, it has the downside that it requires an ethernet connection. Reason is that under this mode, melonDS is seen as an entirely separate device on the network, with its own MAC address and all. On an ethernet connection, you can just use promiscuous mode to retrieve the packets sent to melonDS. But wifi is a different affair. Devices need to be associated to the access point. As that isn't the case for melonDS, its packets are just thrown away.

To work around this issue, not-direct mode is being developed. The idea is to do DHCP/NAT inside the emulator. melonDS would be in a little subnet along with its access point melonAP and its miniature DHCP server, and a bridge to the actual network. This way, packets sent to melonDS would bear the MAC address of your computer instead of that of melonDS, and wifi access points wouldn't throw a fit.

Hoping I can get this to work. But first, I have to finish fixing up the water heater.
Some news
A little update on the plans for upcoming melonDS versions.

Things have slowed down a bit since 0.7.3. I wanted to address some of the newer timing issues, but there's bigger trouble arising from this, I'll get to it.

Also, attempting to get things done(tm) in real life, which is another reason why it's slowing down.

Regardless, we're still alive and kicking.

For melonDS 0.7.4, I want to fix some interface issues, and finally add some interface for setting up online-mode wifi. There might be other fixes, but that release will hardly be a revolution.

0.8 will be, as promised, the hardware renderer and upscaling. I don't want to delay this any further, so instead I'm going to delay what will come next.

For 0.9, or something like that. We're running into timing issues that can't be addressed by simply tweaking our timing values for cached memory. Not without breaking other games. I am reluctant to using per-game hacks.

So, the last line before the dreaded "per-game hacks or full ARM9 cache emulation" situation: perfecting CPU timings. The current CPU timings are imperfect, making things run faster than their hardware counterpart, especially on the ARM9 side. The ARM9 is complex so working out its complete timings will take a while.

Also, there's been some changes to this site, and we're not done :)
melonDS 0.7.3
After another round of fixoring, we present you melonDS 0.7.3. So, what are the changes in this version? Mostly addressing the issues that popped up in 0.7.1/0.7.2, and improving the interface a bit.

Something that should have been done long ago, and has finally been done: the emulator's main loop was rewritten to use absolute timestamps to keep track of cycle counts. Overall, it's more reliable (far less chances of desyncs), more accurate, and also a bit faster, and the code is cleaner. So we win on all fronts.

This should fix the recent flickering issues, like that of Colour Cross.

The downside is that this breaks savestate compatibility with older melonDS versions.

In the same vein, there's another fix to GX timings. We knew that vertex timings were different when a vertex completed a polygon, but we just found out that timings also differ based on whether the polygon passes culling and clipping. This will help with games that need their display lists to be running fast enough.

To hopefully fix the saving issues, we built a new database, resulting of a mix between the Advanscene XML and savelist.bin from the R4 Wood firmware. For better detection, the new romlist.bin indexes games by their serial code rather than by the ROM's CRC32, which also means that it will work with hacked ROMs. This means you will need to make sure to replace romlist.bin with the newer one if you're extracting this melonDS release over an older one.

There's also a particular fix for Pokémon Mystery Dungeon - Explorers of Sky, which uses 128K EEPROM. It should now save properly.

However, note that the correct savefile size for this game is 128KB. If you have a savefile which is 256KB, the game will read it fine but fail to save to it (because that size is detected as FLASH memory and not EEPROM). You can fix this by opening your savefile in a hex editor and deleting the upper half of it. This is especially important if you use a savefile coming from DeSmuME as it will create a 256KB file.

We also have a neat little pile of UI improvements.

melonDS now supports hotplugging joysticks. That being said, it still defaults to the first joystick available on your system. I'm not quite sure how to go about handling multiple joysticks.

Under Linux, the crashes coming from the input config dialog should be fixed.

The main window is smarter about remembering its size: it also remembers whether it was maximized.

There's a tentative fix for hiDPI under Windows, which should atleast make the screens scale correctly.

There's a new menu for setting the window size to an integer scaling factor (1x to 4x).

The setting for savefile relocation when using savestates is now disabled by default. While the feature may be useful, having it enabled by default was confusing for unaware users.

And finally, the misc fixes:

* fix STRD_POST (fixes music in Just Sing - Vol 2)
* 2D: fix blending bugs
* add support for 8bit reads to DISPCNT/BGxCNT (fixes The Wild West)
* make nocashprint also work in ARM mode


melonDS 0.7.3, Windows 64-bit
melonDS 0.7.3, Linux 64-bit

melonDS Patreon if you're feeling generous