melonDS aims at providing fast and accurate Nintendo DS emulation. While it is still a work in progress, it has a pretty solid set of features:

• Nearly complete core (CPU, video, audio, ...)
• JIT recompiler for fast emulation
• OpenGL renderer, 3D upscaling
• RTC, microphone, lid close/open
• Joystick support
• Savestates
• Various display position/sizing/rotation modes
• (WIP) Wifi: local multiplayer, online connectivity
• (WIP) DSi emulation
• (WIP) GBA slot add-ons
• and more are planned!

Download melonDS

If you're running into trouble: Howto/FAQ
Trying to optimize LLE wifi sync...
In the previous post, I mentioned I had an idea to optimize LLE wifi. I've been experimenting, but for now, nothing really conclusive came out of it.

Basically, if you remember your local multiplayer 101, this is how it goes:

The CMD frame isn't sent immediately, there is always a variable delay before. In a real world setting, you'll want to ensure your wifi channel isn't being used before sending stuff, so that's what the delay is. However, the CMD frame and subsequent reply frames use the 802.11 duration field to reserve the needed timeslot for the entire CMD exchange, so that everything can be pretty tight.

Since version 0.9.5, melonDS uses a synchronization mechanism to ensure this works smoothly. It is actually fairly simple: each frame that is sent has a timestamp attached to it, that is used to keep things in sync. Clients may not run ahead of the host -- when they receive a frame from the host, they know how much they need to catch up, when to start actually receiving that frame, and in the case of a CMD frame, how long they can run ahead before waiting for another frame. And the host discards stale reply frames to prevent itself from running too far ahead. All fine and dandy.

However, I observed that local multiplayer exchanges aren't as tight as I thought before. In most games, the host only sends a CMD frame once per video frame (that is, once every ~16.67ms). Pictochat is a bit different -- it does send CMD frames every ~4ms, but it also does that thing where it polls 3 or 4 clients per frame in a rolling order, so it amounts to a pretty similar thing anyway.

This means that in our case, with the current sync mechanism, clients will spend a whole lot of time waiting for a CMD frame, then having to catch up a whole lot while the host is waiting for reply frames. From a performance standpoint, this is far from ideal -- it largely negates the benefits of multi-core CPUs.

I came up with two ways to work around this, and experimented with them, but so far they both have issues.

The first way would be to abuse the pretty generous CMDCOUNT window games use when sending a CMD frame. I thought that in theory it should be possible to send the CMD frame to clients early, and even have the clients send their reply early as soon as possible, and delay the actual emulation of the CMD/reply/ack exchange until the end of the CMDCOUNT window, and thus things would run more smoothly.

... read more
HLE, wifi, netplay: recent developments and ideas
So yeah, I have been pretty silent lately, sorry about it. Been tired, a bit of a depression burst, medication adjustment and stuff, but overall I'm doing well. I've been taking some vacation time, so this helps a lot, too. I've mostly been chilling, bathing, toasting under the Summer sun, pretty standard vacation stuff.

However I've also been able to do some melonDS-related work.

I started implementing wifi in melonHLE. So far, I got to the point that there's some data exchange going on, but the connection isn't working due to a timing problem -- I need to rework when I send data frames and how to keep all the melonDS instances in sync.

It's too early to tell, but if I can get it working, it may be a good base for local multiplayer and netplay on lower-end platforms, as I said before. On one hand, the way wifi works at a high level seems to provide enough leeway that I could get away with somewhat lax sync, and thus better performance. On the other hand, I don't have it working yet.

The current implementation is also a bit of a pile of hacks. I will need to clean it up and implement a bunch of the more minor details. I'm also not very confident that the compatibility rate will be on par with the tried and true LLE approach, but the only way to find out is to try out.

There are still several more general issues that limit melonHLE's general compatibility. While testing some other games, I found out that there are atleast 3 different versions of the sound module, more recent games (like Picross 3D) have a different power management module, and there are games that get stuck due to missing functionality. And that's without even getting into DSi stuff. Ideally, I'd want to index all the ARM7 binary versions out there to have a better idea how viable melonHLE may be.

Regardless, this work will likely prove useful for future melonDS/Dolphin interop. As you may already know, some Wii games are able to connect to a DS to provide extra features. This connection uses the same protocol as DS local multiplayer. On the Wii, the functionality is exposed through an API similar to what DS games use at a high level. Dolphin, being HLE, deals with high-level service calls, while regular LLE melonDS has to emulate the wifi hardware and deal with the 802.11 protocol in all their complexity. So obviously some work will be needed on Dolphin's side in order to make it communicate with melonDS.

Also, while I was brainstorming how to efficiently synchronize melonDS instances in HLE wifi, I had an idea for LLE wifi. I will need to try it out, but if it works, it may drastically loosen up the synchronization requirements for local multiplayer (although the gain will likely depend on how many players are connected).

If any of this pans out, we might be able to bring back the ability to do local multiplayer over LAN (I mean, without using the netplay system). I don't know how well it would work in practice -- I remind you that the current IPC comm layer has a bunch of extra smarts to avoid causing lag by unwarranted blocking, and I'm not confident these can be reliably replicated over the network.

Stay tuned!
melonHLE, facts and ideas
So first of all, I'm done with the mold problem and its consequences, meaning that my apartment finally looks like an actual apartment and I can exhale a big sigh of relief.

I continued my work on melonHLE, taking it to a point where it may be something serious. The compatibility rate seemed good, even though some games don't run because some auxiliary services aren't completely implemented. I've had some fun reverse-engineering the sound engines and implementing them, with decent results.

So I did a quick performance comparison:

The tests were done on my laptop Crepe (Core i7-5500U, 2.4GHz). Numbers are an average measure of frames per second.

melonHLE shows to be faster, but it's not mindblowing either. However, keep in mind that this is largely a quick and dirty experiment. There are some simple ways to make melonHLE faster, one of which is increasing the maximum CPU time slice (kMaxIterationCycles). The current value of 64 is chosen to keep the ARM9 and ARM7 somewhat in sync, but obviously, in melonHLE we don't need to keep the ARM7 in sync. A much bigger kMaxIterationCycles value increases performance to some extent and has no downsides.

Regardless, melonHLE may prove a viable option for lower-end platforms. In the end, it might be integrated into melonDS as an option, though it needs more work and testing.

Now, you may ask, how does any of this relate to the netplay saga?

This joins the general idea of optimizing melonDS for lower-end platforms. For example, Generic is trying to optimize melonDS for the Switch, and more particularly trying to optimize the whole 3D graphics pipeline. Full 3D games are more demanding, so this is a worthy optimization target.

... read more
While my situation is being sorted out, I will make a post about an idea I had a while ago: attempting HLE for DS emulation.

All the existing DS emulators, as far as I'm aware, are essentially LLE. DS games are mostly self-contained and run on the bare metal, relying on the small BIOSes for basic functions like interrupt waits, decompression, etc.

Some emulators, like DeSmuME, are able to HLE the BIOS calls, basically replicating them inside the emulator. The main advantage to this is that the emulator doesn't require a proper BIOS dump to run games, but there is no other real benefit from this. BIOS calls aren't critical enough that HLEing them might boost performance significantly.

What I've been experimenting with melonHLE goes further: HLEing the ARM7.

It may seem feasible if you consider that Nintendo never allowed game developers to write their own ARM7 binaries. This means that, in theory, all commercial games out there will have one of the few possible ARM7 binary versions. It also means that the ARM7 is limited to taking care of utility tasks, while all the game logic is running on the ARM9.

In practice, how does it work?

The ARM9 communicates with the ARM7 via the IPC hardware (IPCSYNC and the IPC FIFO), and some shared memory areas. When the game boots, there is a IPCSYNC handshake, then the ARM7 exposes a bunch of services that are accessed via the IPC FIFO. The services serve to provide access to the ARM7-side hardware: sound, wifi, touchscreen controller, PMIC, firmware memory, etc. Most of these services are fairly simple, with sound and wifi being by far the most complex ones.

So I've been experimenting with this in a private repo. So far, I've implemented enough of the utility services to get some games to boot, and observe a few things:

* There is a substantial speed gain from HLEing the ARM7. If this proves to be viable in the long run (despite the problems I will get to later), it may be an option for low-end platforms.

... read more
I've awoken from my slumber
All the way back in early 2021 (I can't believe it's been 2 years) I wrote the compute shader renderer for the Switch port of melonDS as described in my previous post on it. If you don't know much about it the compute shader renderer then I recommend checking that post out.

After more or less completing it for Switch (the port desparately needs an update, it will come, I promise), I didn't really touch the code much. Over the last couple of weeks this finally changed.

The renderer had to be ported from Switch's homebrew GPU API deko3D to OpenGL, which fortunately wasn't that hard, because A. most of the complexity lies within the shader there is not that much buffer jougling and B. Nvidia GPUs (or atleast Maxwell) being somewhat of a OpenGL hardware implementation.

But let's come to the main attraction, besides some fixes, high resolution rendering is finally implemented for it. And it works wonderfully, with far fewer or no artefacts compared to the classic OpenGL renderer. And even on my integrated Intel UHD 620 I can reach up to 3x-4x resolution depending on the game.

With local wireless there is now another reason you might want to use it over the software renderer. If you are short on CPU cores for all the melonDS instances you can offload the rasterisation onto the GPU.

There are still a few things left to do. For some reason the shaders (which are all compiled on startup, so no stuttering while playing) seem to compile quite slowly on Windows for Intel and Nvidia GPUs. Bizzarely this seems to be related to the very large SSBOs, atleast reducing their size seems to lead to speed up. So my plan is to replace the large buffers which scale proportionally to the resolution with ones which have unspecified size or image load and store. If I had to guess the driver performs the layout calculation somehow for every array entry. In case I don't get the compile times low enough, I need to implement a shader binary cache.

The outlines generated through edge marking (e.g. used by the Zelda games) are always only pixel thick, which quickly becomes very thin for higher resolutions. Thus I want to add an option to counteract that (I am still not exactly sure how to do it.

Another issue that currently the compute shader renderer isn't integrated into the GUI at all, it currently just replaces the OpenGL renderer.

And like always there is still some clean up to be done in the code. As a last note, the compute shader renderer already uses a texture cache (which as part of this clean up should also be used by the OpenGL renderer). Implementing texture replacement on top of that is not hard and is on my list as well, but one step after the other.

And yes, it allows you to play Pokemon in higher resolutions with no back lines.
The netplay saga, ep 3.5
Just a quick post...

I'm having a mold problem in my apartment, so there won't be a lot of progress on netplay (or melonDS in general) while this is being dealt with. To give you an idea, I'm typing this from another place.

Regarding netplay: I'm not going to go with the idea of sending ROMs over. The other solutions suck from an end user perspective, but I don't want to deal with the legal grey area.

For the rest, we're waiting for JesseTG's pull request for in-memory savestates.
The netplay saga, ep 3
In the previous episode, I had basic input forwarding working, but we had problems due to the initial state being different on each side. So in this episode, I've been working on tackling this.

The first problem is the ROMs themselves. The first iteration of netplay required that each side have the same ROM, but this has the problem that there can be multiple revisions of the same game, and some games (hi Pokémon) even support multiplayer interaction with different games. Requiring every player to have the exact same ROM feels really restrictive, especially compared to a real-life DS multiplayer session, where each player has their own game cart (or doesn't, and uses download play).

Yet, we do need to ensure that every mirror client is using the exact same ROM as their mirror host. Two solutions: either having mirror hosts send their ROM to their mirror clients, or requiring all ROMs to already be present on all sides.

From an end user perspective, I don't like the second solution. It may require users to deal with complex multi-ROM setups, making sure they load everything in the right place; there's quite the potential for things to go wrong, or just for users to be confused.

So I went and experimented with the first solution. While it keeps things simple, it has the downside that transferring DS ROMs takes a while, due to their average size of 64MB. But there are ways to alleviate this: compressing the transferred data, but also skipping the transfer entirely if all sides already have the exact same ROM (which we can verify with a simple CRC).

Keep in mind that none of this is set in stone, and I'm largely experimenting here. We are still pretty far from a finished product.

Next step is ensuring that the emulator state on boot is the same on each side. For this, I had the idea of using the savestate system: basically, have the mirror host take a savestate after the ROM is loaded, send that state over to mirror clients, have them apply it, and it's guaranteed that all sides start with the exact same state.

I ran into a few issues with this. First, the savestate system doesn't save the BIOS and firmware, because it wasn't deemed necessary at the time I designed it. But right now, it's a requirement if we want our mirror clients to have the same user settings, MAC address, etc... as their mirror host. I also ran into a bug in the savestate system itself, which isn't a problem in most cases but turned out to be problematic in this current situation. After addressing all this, I was finally able to have all sides start from the exact same state. And it does fix the issues I had observed: games stay in perfect sync, items in Mario Kart will always pull the same item on each side, the AI players will stay in sync, etc...

This does have a bit of the same problem as sending ROMs around, though: melonDS savestates tend to be ~18MB in size. So, definitely, compression will come in handy here. I also want to look into other ways to optimize this: enet (the network library we use for this) isn't well suited for transferring large amounts of data like that, so it's slow.

... read more
The netplay saga, ep 2
I've been building the basic network infrastructure for netplay lately. Oh, the fun that is dealing with synchronization.

If you remember the graph from the previous post:

Implementing this proves to be tricky, because since each individual instance there is its own process, there's a lot of moving parts. So first, we're going to name them.

Assuming player 1 is the player who initiated the game: player 1's instance 1 acts as the game host, while player 2's instance 2 and player 3's instance 3 act as game clients. The game host transmits useful information to the game clients, tells them when to start running, ...

Then player 1's instance 1 acts as a mirror host: players 2 and 3's instance 1, the mirror clients, connect to it, and receive their input from it, thus mirroring player 1's input on players 2 and 3's machines. Similarly, player 2's instance 2 and player 3's instance 3 are also mirror hosts.

As is typical with netplay implementations, inputs are delayed by a fixed amount, which is hardcoded to 4 frames in the current test branch, but will be configurable in the final product. The basic idea is to delay inputs a bit on all sides to counter network lag.

Each input frame sent by a mirror host is given a frame count, which lets mirror clients make sure to apply that frame at the exact same time as their host, thus ensuring all sides are given the exact same inputs. If a mirror client runs out of input frames (because the mirror host is running slower), it will need to block until it receives input frames -- missing an input frame would cause a desync.

During a local multiplayer game, this has shown to be enough to form a somewhat viable netplay implementation: this crude synchronization mechanism, combined with local multiplayer sync, do a good job at keeping all instances in sync. But when the players aren't engaging in a local multiplayer game yet, there is the possibility that mirror clients run too slow and end up lagging behind an awful lot.

... read more
The local multiplayer saga, season 2: Netplay
As the apartment shito is beginning to settle down, I have been able to start working on this.

My first goal was to fix the two annoying issues I described in the previous post. In this situation, I was trying to get the pause command to simultaneously pause all local melonDS instances, instead of just the one that received the command. There is more to be done in the way of cross-instance sync, but this seemed an obvious starting point to me.

The first issue was due to the way the interface works. Originally, the only way to pause melonDS was through the interface (System->Pause). Later on, the pause hotkey was added. Hotkeys are checked and handled in the emu thread (separate from the UI thread), so to keep things simple, the pause hotkey would just send a signal to the main window which would behave like using the System->Pause menu command. It's a bit of a roundabout way to handle this, but it has the advantages that it avoids duplicating code too much, and keeps the UI state (the Pause checkmark) in sync without having to worry about it.

When I started adding cross-instance pause, I made the pause command handler send a message to other melonDS instances through IPC. Then the other instances would receive that message and treat it the same as pressing the pause hotkey. Easy peasy.

Yeah, except doing so would cause these instances to send more pause messages, essentially entering a feedback loop.

So I had to add a separate handler for the IPC pause command to avoid this. Not the best solution, but it works.

Next problem was that during a local multiplayer game, cross-instance pause would interfere with the local multiplayer sync system, and could essentially cause some instances to get stuck. To deal with this, I had to add some more intelligence to the IPC comm layer to avoid waiting on instances that are paused. And it does the trick. Pausing a local multiplayer game may cause minor packet loss, due to the way this works, but I haven't seen any problems in my testing -- Nintendo's local multiplayer protocol is resilient, so this should be mostly fine.

There is more state that should be shared across melonDS instances, like the recent ROM menu. We'll get there. Now that the system is in place for cross-instance comm, it shouldn't be very difficult.

But for now, I want to build the base for netplay. So let's talk about this.

... read more
Status update
Not a lot to report melonDS wise. I'm running into problems with my cross-instance sync design, the kind of issues that are very annoying to solve, so... yeah.

Real life wise, I have finished the diagnosis process, so I'm now diagnosed with ADHD. I'm also able to get medicated for this; we'll have to see how this goes, but so far it's been a night and day difference for me. It's much easier for me to get myself to start tasks and stick to them. I'm impressed at how productive I have been at work these days. And so far I'm not seeing any adverse effects.

I hope this improvement can also help me getting things done for melonDS or my other personal projects, but right now it's a tad complicated. I'm moving to a new, better apartment, and while this is a great thing, it also means that there are a lot of things to take care of still, and that tends to be energy consuming for me.

But hopefully, by the end of March this should all settle down and allow me to take a big sigh of relief. In the meantime I'm trying to think of how to address the problems I'm facing regarding melonDS. Netplay is really something I want to get going.