melonDS aims at providing fast and accurate Nintendo DS emulation. While it is still a work in progress, it has a pretty solid set of features:

• Nearly complete core (CPU, video, audio, ...)
• JIT recompiler for fast emulation
• OpenGL renderer, 3D upscaling
• RTC, microphone, lid close/open
• Joystick support
• Savestates
• Various display position/sizing/rotation modes
• (WIP) Wifi: local multiplayer, online connectivity
• (WIP) DSi emulation
• (WIP) GBA slot add-ons
• and more are planned!

Download melonDS

If you're running into trouble: Howto/FAQ
LAN multiplayer: getting there!
As of today:

LAN gameplay between the two computers I have at hand over here.

LAN is actually fairly close to a finished product, now. It's obviously still a bit rough around the edges, may be a tad unstable, but from my testing it has shown to be pretty smooth.

From this point, I encourage anybody who is interested to grab the experimental builds from our Github CI. It's the runs labelled 'season2 test PR'; note that you will need a Github account to download them.

But if you're motivated, please give them some deep testing, abuse them in whatever ways you can think of, and report anything that seems odd to you, in our IRC or Discord or in this thread. We'd like these features to be rock-solid for the 1.0 release.

How to get a LAN game going:

1. On the host machine: open melonDS, then System -> Multiplayer -> Host LAN game. You enter a player name and you're good to go.

2. On client machines: open melonDS, then System -> Multiplayer -> Join LAN game. Enter your player name there, then it should list any existing LAN games. If not, you can always try using the direct connect button.

... read more
LAN multiplayer: in the works
First of all, thank you all for the kind comments to the last post, it means a lot!

Anyway, I've been throwing together a basic interface for LAN multiplayer. It's far from complete, but it's already showing promising results so far.

The issue is that it requires a low enough latency to work well. When connecting my two computers over a crosswired cable, yielding a latency of ~1ms, I had decent success connecting the two sides together. However, when using the regular network (my laptop connects over wifi, while the other computer uses a PLC adapter), latency was all over the place and it was impossible to get a local multiplayer connection going.

This isn't too different from the old socket-based network interface, really. Except that interface performed even worse, because it was really just tossing packets over the network with zero care.

This new LAN mode will function more like typical LAN games: you get a host starting a LAN game, then everyone else connects to the host using their local IP (or their computer's host name). Then you would launch your game and connect to other players like you would on an actual DS.

There's still some more work to be done before this can be called finished, mostly UI work but also other tidbits. I'll keep you informed.
Sorry for the silence lately
I've been merging some pull requests earlier today, and, looking at some of them...

"Wow, has it really been one year?"

I'm sorry about this. It's not that we, melonDS team, don't care about user contributions. We do, and we're grateful for them. However it takes time to review them and make sure they're mergeable.

And while I can't speak for the rest of the team, the way I am doesn't help. I don't really notice the passage of time, anything in the past just feels like it happened ages ago, and it's very easy for me to let myself be carried into a comfortable routine for a long time. As you can imagine, it's a problem for certain daily life things. It also means that I end up postponing some things a lot, when I don't immediately feel like dealing with them and there's no urgency, I end up just thinking that "I'll do it later when I feel like it", let time pass, let routine carry me, and... "wow, has it really been one year?". Things like dealing with pull requests definitely fall into this pattern.

Lately, side effects from meds haven't been helping either. I take ritalin for ADHD, and venlafaxine for depression, and the two don't go together well for me. The latest dose increases (in July) led me to a situation where I had bad side effects: absurdly dry mouth, headaches and light sensitivity, bad sleep (waking up every hour or so), and intense fatigue, entirely negating ritalin's benefits.

I stopped taking ritalin and things quickly got a lot better. Although this also means I'm back to my usual ADHD self, in terms of focus and productivity: I don't really control what my brain focuses on, and it can take a long while to get myself to do things.

I won't try taking ritalin again until I'm off venlafaxine. I'm tapering down that; I get more tired some days but overall it's going well. I think venlafaxine has done its job, I also feel a lot better about myself and the world than even one year ago, so I'm pretty confident I can do this. I also want to go out, meet people, do things (all of which are very good ways to keep depression at bay), but it doesn't help when meds leave me too tired to do much at all. So yeah.

From a coding perspective: I want to attempt getting a proper LAN mode going for melonDS. Although lately my brain has been wanting me to focus on something else: a fun side project adventure that involves DSi DSP programming. It's not directly related to melonDS, but it may end up benefitting the DS/DSi scene either way, so hey. I'd want to talk more about it, if it does pan out, but that'd go on another blog.
Optimizing wifi, take two
I've been thinking about the problem and making another experiment, which seems more successful.

The basic idea is that games send multiplayer data at a generally steady rate and the timings are predictable. When the host sends a CMD frame, generally there won't be anything else being sent (other than replies and ack) until the end of the CMD_COUNT window, unless some clients failed to reply. The only exception to this is when the host sends a beacon frame, but since these are sent at a fixed interval, it's not hard to take them into account.

I had to do a few tweaks to get it reliable in certain situations, but so far it gives quite a substantial speed boost and doesn't seem to impact reliability. We still have to see how this goes -- we might have to make it optional if it causes problems somewhere.

I'm also trying to fix some other problems, namely how in some games you can't connect more than two players. I have a bit of an idea what's going on, but I'll need to work on it more.

We're also thinking of bringing back LAN, but as a proper LAN mode instead of just tossing packets around the network. It may even possible to build something robust around enet. I'll have to give this a try.
Trying to optimize LLE wifi sync...
In the previous post, I mentioned I had an idea to optimize LLE wifi. I've been experimenting, but for now, nothing really conclusive came out of it.

Basically, if you remember your local multiplayer 101, this is how it goes:

The CMD frame isn't sent immediately, there is always a variable delay before. In a real world setting, you'll want to ensure your wifi channel isn't being used before sending stuff, so that's what the delay is. However, the CMD frame and subsequent reply frames use the 802.11 duration field to reserve the needed timeslot for the entire CMD exchange, so that everything can be pretty tight.

Since version 0.9.5, melonDS uses a synchronization mechanism to ensure this works smoothly. It is actually fairly simple: each frame that is sent has a timestamp attached to it, that is used to keep things in sync. Clients may not run ahead of the host -- when they receive a frame from the host, they know how much they need to catch up, when to start actually receiving that frame, and in the case of a CMD frame, how long they can run ahead before waiting for another frame. And the host discards stale reply frames to prevent itself from running too far ahead. All fine and dandy.

However, I observed that local multiplayer exchanges aren't as tight as I thought before. In most games, the host only sends a CMD frame once per video frame (that is, once every ~16.67ms). Pictochat is a bit different -- it does send CMD frames every ~4ms, but it also does that thing where it polls 3 or 4 clients per frame in a rolling order, so it amounts to a pretty similar thing anyway.

This means that in our case, with the current sync mechanism, clients will spend a whole lot of time waiting for a CMD frame, then having to catch up a whole lot while the host is waiting for reply frames. From a performance standpoint, this is far from ideal -- it largely negates the benefits of multi-core CPUs.

I came up with two ways to work around this, and experimented with them, but so far they both have issues.

The first way would be to abuse the pretty generous CMDCOUNT window games use when sending a CMD frame. I thought that in theory it should be possible to send the CMD frame to clients early, and even have the clients send their reply early as soon as possible, and delay the actual emulation of the CMD/reply/ack exchange until the end of the CMDCOUNT window, and thus things would run more smoothly.

... read more
HLE, wifi, netplay: recent developments and ideas
So yeah, I have been pretty silent lately, sorry about it. Been tired, a bit of a depression burst, medication adjustment and stuff, but overall I'm doing well. I've been taking some vacation time, so this helps a lot, too. I've mostly been chilling, bathing, toasting under the Summer sun, pretty standard vacation stuff.

However I've also been able to do some melonDS-related work.

I started implementing wifi in melonHLE. So far, I got to the point that there's some data exchange going on, but the connection isn't working due to a timing problem -- I need to rework when I send data frames and how to keep all the melonDS instances in sync.

It's too early to tell, but if I can get it working, it may be a good base for local multiplayer and netplay on lower-end platforms, as I said before. On one hand, the way wifi works at a high level seems to provide enough leeway that I could get away with somewhat lax sync, and thus better performance. On the other hand, I don't have it working yet.

The current implementation is also a bit of a pile of hacks. I will need to clean it up and implement a bunch of the more minor details. I'm also not very confident that the compatibility rate will be on par with the tried and true LLE approach, but the only way to find out is to try out.

There are still several more general issues that limit melonHLE's general compatibility. While testing some other games, I found out that there are atleast 3 different versions of the sound module, more recent games (like Picross 3D) have a different power management module, and there are games that get stuck due to missing functionality. And that's without even getting into DSi stuff. Ideally, I'd want to index all the ARM7 binary versions out there to have a better idea how viable melonHLE may be.

Regardless, this work will likely prove useful for future melonDS/Dolphin interop. As you may already know, some Wii games are able to connect to a DS to provide extra features. This connection uses the same protocol as DS local multiplayer. On the Wii, the functionality is exposed through an API similar to what DS games use at a high level. Dolphin, being HLE, deals with high-level service calls, while regular LLE melonDS has to emulate the wifi hardware and deal with the 802.11 protocol in all their complexity. So obviously some work will be needed on Dolphin's side in order to make it communicate with melonDS.

Also, while I was brainstorming how to efficiently synchronize melonDS instances in HLE wifi, I had an idea for LLE wifi. I will need to try it out, but if it works, it may drastically loosen up the synchronization requirements for local multiplayer (although the gain will likely depend on how many players are connected).

If any of this pans out, we might be able to bring back the ability to do local multiplayer over LAN (I mean, without using the netplay system). I don't know how well it would work in practice -- I remind you that the current IPC comm layer has a bunch of extra smarts to avoid causing lag by unwarranted blocking, and I'm not confident these can be reliably replicated over the network.

Stay tuned!
melonHLE, facts and ideas
So first of all, I'm done with the mold problem and its consequences, meaning that my apartment finally looks like an actual apartment and I can exhale a big sigh of relief.

I continued my work on melonHLE, taking it to a point where it may be something serious. The compatibility rate seemed good, even though some games don't run because some auxiliary services aren't completely implemented. I've had some fun reverse-engineering the sound engines and implementing them, with decent results.

So I did a quick performance comparison:

The tests were done on my laptop Crepe (Core i7-5500U, 2.4GHz). Numbers are an average measure of frames per second.

melonHLE shows to be faster, but it's not mindblowing either. However, keep in mind that this is largely a quick and dirty experiment. There are some simple ways to make melonHLE faster, one of which is increasing the maximum CPU time slice (kMaxIterationCycles). The current value of 64 is chosen to keep the ARM9 and ARM7 somewhat in sync, but obviously, in melonHLE we don't need to keep the ARM7 in sync. A much bigger kMaxIterationCycles value increases performance to some extent and has no downsides.

Regardless, melonHLE may prove a viable option for lower-end platforms. In the end, it might be integrated into melonDS as an option, though it needs more work and testing.

Now, you may ask, how does any of this relate to the netplay saga?

This joins the general idea of optimizing melonDS for lower-end platforms. For example, Generic is trying to optimize melonDS for the Switch, and more particularly trying to optimize the whole 3D graphics pipeline. Full 3D games are more demanding, so this is a worthy optimization target.

... read more
While my situation is being sorted out, I will make a post about an idea I had a while ago: attempting HLE for DS emulation.

All the existing DS emulators, as far as I'm aware, are essentially LLE. DS games are mostly self-contained and run on the bare metal, relying on the small BIOSes for basic functions like interrupt waits, decompression, etc.

Some emulators, like DeSmuME, are able to HLE the BIOS calls, basically replicating them inside the emulator. The main advantage to this is that the emulator doesn't require a proper BIOS dump to run games, but there is no other real benefit from this. BIOS calls aren't critical enough that HLEing them might boost performance significantly.

What I've been experimenting with melonHLE goes further: HLEing the ARM7.

It may seem feasible if you consider that Nintendo never allowed game developers to write their own ARM7 binaries. This means that, in theory, all commercial games out there will have one of the few possible ARM7 binary versions. It also means that the ARM7 is limited to taking care of utility tasks, while all the game logic is running on the ARM9.

In practice, how does it work?

The ARM9 communicates with the ARM7 via the IPC hardware (IPCSYNC and the IPC FIFO), and some shared memory areas. When the game boots, there is a IPCSYNC handshake, then the ARM7 exposes a bunch of services that are accessed via the IPC FIFO. The services serve to provide access to the ARM7-side hardware: sound, wifi, touchscreen controller, PMIC, firmware memory, etc. Most of these services are fairly simple, with sound and wifi being by far the most complex ones.

So I've been experimenting with this in a private repo. So far, I've implemented enough of the utility services to get some games to boot, and observe a few things:

* There is a substantial speed gain from HLEing the ARM7. If this proves to be viable in the long run (despite the problems I will get to later), it may be an option for low-end platforms.

... read more
I've awoken from my slumber
All the way back in early 2021 (I can't believe it's been 2 years) I wrote the compute shader renderer for the Switch port of melonDS as described in my previous post on it. If you don't know much about it the compute shader renderer then I recommend checking that post out.

After more or less completing it for Switch (the port desparately needs an update, it will come, I promise), I didn't really touch the code much. Over the last couple of weeks this finally changed.

The renderer had to be ported from Switch's homebrew GPU API deko3D to OpenGL, which fortunately wasn't that hard, because A. most of the complexity lies within the shader there is not that much buffer jougling and B. Nvidia GPUs (or atleast Maxwell) being somewhat of a OpenGL hardware implementation.

But let's come to the main attraction, besides some fixes, high resolution rendering is finally implemented for it. And it works wonderfully, with far fewer or no artefacts compared to the classic OpenGL renderer. And even on my integrated Intel UHD 620 I can reach up to 3x-4x resolution depending on the game.

With local wireless there is now another reason you might want to use it over the software renderer. If you are short on CPU cores for all the melonDS instances you can offload the rasterisation onto the GPU.

There are still a few things left to do. For some reason the shaders (which are all compiled on startup, so no stuttering while playing) seem to compile quite slowly on Windows for Intel and Nvidia GPUs. Bizzarely this seems to be related to the very large SSBOs, atleast reducing their size seems to lead to speed up. So my plan is to replace the large buffers which scale proportionally to the resolution with ones which have unspecified size or image load and store. If I had to guess the driver performs the layout calculation somehow for every array entry. In case I don't get the compile times low enough, I need to implement a shader binary cache.

The outlines generated through edge marking (e.g. used by the Zelda games) are always only pixel thick, which quickly becomes very thin for higher resolutions. Thus I want to add an option to counteract that (I am still not exactly sure how to do it.

Another issue that currently the compute shader renderer isn't integrated into the GUI at all, it currently just replaces the OpenGL renderer.

And like always there is still some clean up to be done in the code. As a last note, the compute shader renderer already uses a texture cache (which as part of this clean up should also be used by the OpenGL renderer). Implementing texture replacement on top of that is not hard and is on my list as well, but one step after the other.

And yes, it allows you to play Pokemon in higher resolutions with no back lines.
The netplay saga, ep 3.5
Just a quick post...

I'm having a mold problem in my apartment, so there won't be a lot of progress on netplay (or melonDS in general) while this is being dealt with. To give you an idea, I'm typing this from another place.

Regarding netplay: I'm not going to go with the idea of sending ROMs over. The other solutions suck from an end user perspective, but I don't want to deal with the legal grey area.

For the rest, we're waiting for JesseTG's pull request for in-memory savestates.