melonDS RSS The latest news on melonDS. melonHLE, facts and ideas -- by Arisotura Tue, 06 Jun 2023 09:49:33 +0000
I continued my work on melonHLE, taking it to a point where it may be something serious. The compatibility rate seemed good, even though some games don't run because some auxiliary services aren't completely implemented. I've had some fun reverse-engineering the sound engines and implementing them, with decent results.

So I did a quick performance comparison:

The tests were done on my laptop Crepe (Core i7-5500U, 2.4GHz). Numbers are an average measure of frames per second.

melonHLE shows to be faster, but it's not mindblowing either. However, keep in mind that this is largely a quick and dirty experiment. There are some simple ways to make melonHLE faster, one of which is increasing the maximum CPU time slice (kMaxIterationCycles). The current value of 64 is chosen to keep the ARM9 and ARM7 somewhat in sync, but obviously, in melonHLE we don't need to keep the ARM7 in sync. A much bigger kMaxIterationCycles value increases performance to some extent and has no downsides.

Regardless, melonHLE may prove a viable option for lower-end platforms. In the end, it might be integrated into melonDS as an option, though it needs more work and testing.

Now, you may ask, how does any of this relate to the netplay saga?

This joins the general idea of optimizing melonDS for lower-end platforms. For example, Generic is trying to optimize melonDS for the Switch, and more particularly trying to optimize the whole 3D graphics pipeline. Full 3D games are more demanding, so this is a worthy optimization target.

Optimizing melonDS will also benefit local multiplayer and netplay, seeing as these require running multiple melonDS instances at the same time. I want to make it accessible to a broad audience and not just those with the absolute bestest computers, especially in these troubled times of chip shortages and such.

Which brings me to another point: HLE wifi. So far, I haven't done any work towards it, but I'm tempted to give it a try. The wifi service may be high-level enough to be emulated separately from melonDS's current wifi implementation, and with less complexity and more lax synchronization, but the only way to know is to give it a try. If it works out, it might make local multiplayer possible with lower performance requirements.

Stay tuned!]]>
melonHLE? -- by Arisotura Tue, 09 May 2023 07:52:25 +0000
All the existing DS emulators, as far as I'm aware, are essentially LLE. DS games are mostly self-contained and run on the bare metal, relying on the small BIOSes for basic functions like interrupt waits, decompression, etc.

Some emulators, like DeSmuME, are able to HLE the BIOS calls, basically replicating them inside the emulator. The main advantage to this is that the emulator doesn't require a proper BIOS dump to run games, but there is no other real benefit from this. BIOS calls aren't critical enough that HLEing them might boost performance significantly.

What I've been experimenting with melonHLE goes further: HLEing the ARM7.

It may seem feasible if you consider that Nintendo never allowed game developers to write their own ARM7 binaries. This means that, in theory, all commercial games out there will have one of the few possible ARM7 binary versions. It also means that the ARM7 is limited to taking care of utility tasks, while all the game logic is running on the ARM9.

In practice, how does it work?

The ARM9 communicates with the ARM7 via the IPC hardware (IPCSYNC and the IPC FIFO), and some shared memory areas. When the game boots, there is a IPCSYNC handshake, then the ARM7 exposes a bunch of services that are accessed via the IPC FIFO. The services serve to provide access to the ARM7-side hardware: sound, wifi, touchscreen controller, PMIC, firmware memory, etc. Most of these services are fairly simple, with sound and wifi being by far the most complex ones.

So I've been experimenting with this in a private repo. So far, I've implemented enough of the utility services to get some games to boot, and observe a few things:

* There is a substantial speed gain from HLEing the ARM7. If this proves to be viable in the long run (despite the problems I will get to later), it may be an option for low-end platforms.

* Even if not, I still find it quite interesting to reverse-engineer the ARM7 binary and figure out how things work.

* Super Mario 64 DS has an earlier version of the sound engine, where some of the commands are different.

* Super Princess Peach has a completely different sound engine.

* Mario Kart DS's ARM7 binary has an extra service, which is used to assist loading code to ARM7 VRAM. Nothing we really need to worry about here.

* Aside from that, the smaller utility services seem to be pretty identical across games. However, I haven't started looking into wifi.

Now, the problems to this approach:

* Obviously, this only works for commercial games. It may be possible to support most homebrew by implementing libnds's default ARM7 binary, but anything with a custom ARM7 binary won't work.

* It is far less accurate than the current LLE approach. Given the ARM7's average workload, it may not matter to most games, but there's still the potential for timing issues (which stem from bad game programming).

* I have tried a few games, but I don't know how much variation there is across the entire DS library, and how I should go about identifying the different possible ARM7 binaries. This is going to be the main determining factor for whether ARM7 HLE is viable: how much code complexity is required to attain decent compatibility?

Also, I might want to avoid getting sidetracked too much. I want to finish implementing netplay, once my apartment is less of a mess.]]>
I've awoken from my slumber -- by Generic aka RSDuck Sat, 22 Apr 2023 00:02:00 +0000 previous post on it. If you don't know much about it the compute shader renderer then I recommend checking that post out.

After more or less completing it for Switch (the port desparately needs an update, it will come, I promise), I didn't really touch the code much. Over the last couple of weeks this finally changed.

The renderer had to be ported from Switch's homebrew GPU API deko3D to OpenGL, which fortunately wasn't that hard, because A. most of the complexity lies within the shader there is not that much buffer jougling and B. Nvidia GPUs (or atleast Maxwell) being somewhat of a OpenGL hardware implementation.

But let's come to the main attraction, besides some fixes, high resolution rendering is finally implemented for it. And it works wonderfully, with far fewer or no artefacts compared to the classic OpenGL renderer. And even on my integrated Intel UHD 620 I can reach up to 3x-4x resolution depending on the game.

With local wireless there is now another reason you might want to use it over the software renderer. If you are short on CPU cores for all the melonDS instances you can offload the rasterisation onto the GPU.

There are still a few things left to do. For some reason the shaders (which are all compiled on startup, so no stuttering while playing) seem to compile quite slowly on Windows for Intel and Nvidia GPUs. Bizzarely this seems to be related to the very large SSBOs, atleast reducing their size seems to lead to speed up. So my plan is to replace the large buffers which scale proportionally to the resolution with ones which have unspecified size or image load and store. If I had to guess the driver performs the layout calculation somehow for every array entry. In case I don't get the compile times low enough, I need to implement a shader binary cache.

The outlines generated through edge marking (e.g. used by the Zelda games) are always only pixel thick, which quickly becomes very thin for higher resolutions. Thus I want to add an option to counteract that (I am still not exactly sure how to do it.

Another issue that currently the compute shader renderer isn't integrated into the GUI at all, it currently just replaces the OpenGL renderer.

And like always there is still some clean up to be done in the code. As a last note, the compute shader renderer already uses a texture cache (which as part of this clean up should also be used by the OpenGL renderer). Implementing texture replacement on top of that is not hard and is on my list as well, but one step after the other.

And yes, it allows you to play Pokemon in higher resolutions with no back lines.]]>
The netplay saga, ep 3.5 -- by Arisotura Mon, 17 Apr 2023 19:22:16 +0000
I'm having a mold problem in my apartment, so there won't be a lot of progress on netplay (or melonDS in general) while this is being dealt with. To give you an idea, I'm typing this from another place.

Regarding netplay: I'm not going to go with the idea of sending ROMs over. The other solutions suck from an end user perspective, but I don't want to deal with the legal grey area.

For the rest, we're waiting for JesseTG's pull request for in-memory savestates.]]>
The netplay saga, ep 3 -- by Arisotura Fri, 07 Apr 2023 07:58:53 +0000

The first problem is the ROMs themselves. The first iteration of netplay required that each side have the same ROM, but this has the problem that there can be multiple revisions of the same game, and some games (hi Pokémon) even support multiplayer interaction with different games. Requiring every player to have the exact same ROM feels really restrictive, especially compared to a real-life DS multiplayer session, where each player has their own game cart (or doesn't, and uses download play).

Yet, we do need to ensure that every mirror client is using the exact same ROM as their mirror host. Two solutions: either having mirror hosts send their ROM to their mirror clients, or requiring all ROMs to already be present on all sides.

From an end user perspective, I don't like the second solution. It may require users to deal with complex multi-ROM setups, making sure they load everything in the right place; there's quite the potential for things to go wrong, or just for users to be confused.

So I went and experimented with the first solution. While it keeps things simple, it has the downside that transferring DS ROMs takes a while, due to their average size of 64MB. But there are ways to alleviate this: compressing the transferred data, but also skipping the transfer entirely if all sides already have the exact same ROM (which we can verify with a simple CRC).

Keep in mind that none of this is set in stone, and I'm largely experimenting here. We are still pretty far from a finished product.

Next step is ensuring that the emulator state on boot is the same on each side. For this, I had the idea of using the savestate system: basically, have the mirror host take a savestate after the ROM is loaded, send that state over to mirror clients, have them apply it, and it's guaranteed that all sides start with the exact same state.

I ran into a few issues with this. First, the savestate system doesn't save the BIOS and firmware, because it wasn't deemed necessary at the time I designed it. But right now, it's a requirement if we want our mirror clients to have the same user settings, MAC address, etc... as their mirror host. I also ran into a bug in the savestate system itself, which isn't a problem in most cases but turned out to be problematic in this current situation. After addressing all this, I was finally able to have all sides start from the exact same state. And it does fix the issues I had observed: games stay in perfect sync, items in Mario Kart will always pull the same item on each side, the AI players will stay in sync, etc...

This does have a bit of the same problem as sending ROMs around, though: melonDS savestates tend to be ~18MB in size. So, definitely, compression will come in handy here. I also want to look into other ways to optimize this: enet (the network library we use for this) isn't well suited for transferring large amounts of data like that, so it's slow.

But, overall, at this point we have something close to a viable netplay implementation. As I said, there's still a lot of work to turn this into a finished product, but so far it's looking pretty promising.

For one, the current savestate system uses files, which isn't ideal in this situation. We've had the idea of changing it to use memory buffers for a while, because the way it works (lots of small fread/fwrite calls) isn't ideal on some platforms (like the Switch). JesseTG is working on it, and this change will come in handy here, so I'm waiting for it. For the sake of testing, I circumvented this limitation in a pretty gross way, but... yeah.

Then there's a lot of UI work to be done. Integrating all this into the user interface properly, making things configurable instead of being hardcoded, making everything clear and intuitive, handling problematic situations gracefully... And, of course, I need to add support for headless melonDS instances, so other players can stay hidden from your view, replicating the true DS multiplayer experience.

There's also a bunch of performance testing and tuning to be done. During my testing, I observed some hiccups, but it's also worth noting that my computers are like a decade old, so they're not the best hardware around. I also haven't had the chance to test this over the internet, so I want to see how it performs in these situations.]]>
The netplay saga, ep 2 -- by Arisotura Mon, 27 Mar 2023 11:23:52 +0000
If you remember the graph from the previous post:

Implementing this proves to be tricky, because since each individual instance there is its own process, there's a lot of moving parts. So first, we're going to name them.

Assuming player 1 is the player who initiated the game: player 1's instance 1 acts as the game host, while player 2's instance 2 and player 3's instance 3 act as game clients. The game host transmits useful information to the game clients, tells them when to start running, ...

Then player 1's instance 1 acts as a mirror host: players 2 and 3's instance 1, the mirror clients, connect to it, and receive their input from it, thus mirroring player 1's input on players 2 and 3's machines. Similarly, player 2's instance 2 and player 3's instance 3 are also mirror hosts.

As is typical with netplay implementations, inputs are delayed by a fixed amount, which is hardcoded to 4 frames in the current test branch, but will be configurable in the final product. The basic idea is to delay inputs a bit on all sides to counter network lag.

Each input frame sent by a mirror host is given a frame count, which lets mirror clients make sure to apply that frame at the exact same time as their host, thus ensuring all sides are given the exact same inputs. If a mirror client runs out of input frames (because the mirror host is running slower), it will need to block until it receives input frames -- missing an input frame would cause a desync.

During a local multiplayer game, this has shown to be enough to form a somewhat viable netplay implementation: this crude synchronization mechanism, combined with local multiplayer sync, do a good job at keeping all instances in sync. But when the players aren't engaging in a local multiplayer game yet, there is the possibility that mirror clients run too slow and end up lagging behind an awful lot.

My basic idea for dealing with this was to have mirror clients report their frame count, and the mirror host then waits for them to catch up if any of them is more than 16 frames behind. This mechanism doesn't have to be tight, but just enough to keep things reasonably synced up when local multiplayer sync isn't doing the job.

Except it didn't work. It always caused big fat lag spikes when starting the game. I feared I was running in an interlock situation. I thought again about how my sync mechanism worked, checked my code, and, well...

               if (clientframes < (NDS::NumFrames - 16))
                   event.peer->data = (void*)1;
                   block = true;

Basically, checking if the mirror client's frame count (clientframes) is less than the current frame count (NDS::NumFrames) minus 16. Innocuous code, huh?

clientframes and NDS::NumFrames are unsigned integers. This means that if NDS::NumFrames is less than 16, (NDS::NumFrames - 16) overflows to a large positive number, causing erroneous blocking.


Casting to signed integers fixed the issue. So as of now, the whole sync mechanism seems to be behaving as expected. This will need more serious testing, though. I have only tested with two players, because I only have so many viable computers.

This is still far from being a viable netplay implementation, though. Because for this to work reliably, we need to ensure that mirror clients maintain the same state as their mirror hosts.

As of now, they are fed the same inputs, but the initial state isn't guaranteed to be the same. I observed this in Mario Kart, for example: each side may pull different items from boxes, AI players will behave different, etc. In my test branch, I hardcoded the RTC to a fixed time, but we will need to adopt a solution for it. But I believe Mario Kart may be initializing its RNG from other sources, like firmware configuration data or save data, so we need to ensure that these are properly synchronized when starting a game.

And even then, there is another thing I'm worried about: uncertainty in the local multiplayer comm resulting in slightly different state on each side. We will have to see whether this can be a problem, and if so, how we might address it.

There is also a lot to be done on the user interface side of things. Failing gracefully when something bad happens, presenting an interface that is user-friendly, and so on.

Speaking of, I'm open to suggestions and input from end users regarding this feature. If you have any ideas, I opened a thread for them: right here.]]>
The local multiplayer saga, season 2: Netplay -- by Arisotura Fri, 24 Mar 2023 10:44:26 +0000

My first goal was to fix the two annoying issues I described in the previous post. In this situation, I was trying to get the pause command to simultaneously pause all local melonDS instances, instead of just the one that received the command. There is more to be done in the way of cross-instance sync, but this seemed an obvious starting point to me.

The first issue was due to the way the interface works. Originally, the only way to pause melonDS was through the interface (System->Pause). Later on, the pause hotkey was added. Hotkeys are checked and handled in the emu thread (separate from the UI thread), so to keep things simple, the pause hotkey would just send a signal to the main window which would behave like using the System->Pause menu command. It's a bit of a roundabout way to handle this, but it has the advantages that it avoids duplicating code too much, and keeps the UI state (the Pause checkmark) in sync without having to worry about it.

When I started adding cross-instance pause, I made the pause command handler send a message to other melonDS instances through IPC. Then the other instances would receive that message and treat it the same as pressing the pause hotkey. Easy peasy.

Yeah, except doing so would cause these instances to send more pause messages, essentially entering a feedback loop.

So I had to add a separate handler for the IPC pause command to avoid this. Not the best solution, but it works.

Next problem was that during a local multiplayer game, cross-instance pause would interfere with the local multiplayer sync system, and could essentially cause some instances to get stuck. To deal with this, I had to add some more intelligence to the IPC comm layer to avoid waiting on instances that are paused. And it does the trick. Pausing a local multiplayer game may cause minor packet loss, due to the way this works, but I haven't seen any problems in my testing -- Nintendo's local multiplayer protocol is resilient, so this should be mostly fine.

There is more state that should be shared across melonDS instances, like the recent ROM menu. We'll get there. Now that the system is in place for cross-instance comm, it shouldn't be very difficult.

But for now, I want to build the base for netplay. So let's talk about this.

Due to the way local multiplayer games work on the DS, this netplay implementation is going to be somewhat different. The main thing to take into account is that, when I say the local multiplayer protocol is resilient, it can deal with missing packets, but due to the way it works, packets can't be received late -- they're either received on time or not received. This is the main reason why it would not be feasible to extend this protocol over the network.

So instead, the entire local multiplayer network has to be emulated on each player's computer. For example, consider this case of a 3-player game:

Here, all 3 instances are running on player 1's computer, but player 1 only sees their corresponding instance -- the other players' instances will be kept hidden. Player 1 controls their instance, and their inputs are forwarded to the corresponding instances on player 2 and 3's computers. Similarly, instances 2 and 3 on player 1's computer receive inputs from players 2 and 3 and mirror them. Same deal for every player. Since a given instance can only be controlled by one player (just like how you'd have one player per DS in a real-world setting), the communication is mostly unidirectional. Then, since the local multiplayer comm layer does the job of keeping local instances in sync, we only have to worry about keeping players in sync between eachother.

This has the downside that running multiple melonDS instances requires more powerful hardware, but we're confident that modern hardware can handle this.

So I'm currently working on building the basic netplay system. So far, I'm able to forward inputs to another instance over the network, but I have yet to structure this properly for a local multiplayer game.

Stay tuned!]]>
Status update -- by Arisotura Wed, 08 Mar 2023 19:36:20 +0000
Real life wise, I have finished the diagnosis process, so I'm now diagnosed with ADHD. I'm also able to get medicated for this; we'll have to see how this goes, but so far it's been a night and day difference for me. It's much easier for me to get myself to start tasks and stick to them. I'm impressed at how productive I have been at work these days. And so far I'm not seeing any adverse effects.

I hope this improvement can also help me getting things done for melonDS or my other personal projects, but right now it's a tad complicated. I'm moving to a new, better apartment, and while this is a great thing, it also means that there are a lot of things to take care of still, and that tends to be energy consuming for me.

But hopefully, by the end of March this should all settle down and allow me to take a big sigh of relief. In the meantime I'm trying to think of how to address the problems I'm facing regarding melonDS. Netplay is really something I want to get going.]]>
Slow days -- by Arisotura Mon, 30 Jan 2023 20:21:44 +0000
On my side, that certainly didn't help. I've been taking a new antidepressant, and the first month has been rough. Generally, feeling tired and lethargic, but also having to deal with insomnia, side effects from sleeping pills... you get the picture.

The side effects have largely subsided now, I'm able to sleep without having to take anything, I'm more energetic in general, and feeling good. I have had bad experiences with antidepressants in the past, so I'm glad that this one is working.

Now I'm looking forward to my ADHD diagnosis appointment, which will be next week. Hopefully they can help me there, it sure would be nice if I didn't take forever to motivate myself to do things (including, but not limited to, working on melonDS).

In the meantime, what I'm currently laying out is a base for proper communication between melonDS instances. For example, pausing all instances simultaneously, starting new instances with the right game pre-loaded, and so on. This is going to be a must for netplay.]]>
Lil' site updates -- by Arisotura Tue, 13 Dec 2022 20:43:22 +0000
Anyway, little update to this site. I changed the way IP bans work: they no longer block you from the entire site, instead they will just restrict your ability to post comments.

I will likely also do some other updates and cleanup related to this, too. So let me know if anything is broken or if you're erroneously banned from somewhere.]]>