melonDS RSS The latest news on melonDS. LAN multiplayer: in the works -- by Arisotura Fri, 08 Sep 2023 10:33:14 +0000
Anyway, I've been throwing together a basic interface for LAN multiplayer. It's far from complete, but it's already showing promising results so far.

The issue is that it requires a low enough latency to work well. When connecting my two computers over a crosswired cable, yielding a latency of ~1ms, I had decent success connecting the two sides together. However, when using the regular network (my laptop connects over wifi, while the other computer uses a PLC adapter), latency was all over the place and it was impossible to get a local multiplayer connection going.

This isn't too different from the old socket-based network interface, really. Except that interface performed even worse, because it was really just tossing packets over the network with zero care.

This new LAN mode will function more like typical LAN games: you get a host starting a LAN game, then everyone else connects to the host using their local IP (or their computer's host name). Then you would launch your game and connect to other players like you would on an actual DS.

There's still some more work to be done before this can be called finished, mostly UI work but also other tidbits. I'll keep you informed.]]>
Sorry for the silence lately -- by Arisotura Sun, 27 Aug 2023 12:48:58 +0000
"Wow, has it really been one year?"

I'm sorry about this. It's not that we, melonDS team, don't care about user contributions. We do, and we're grateful for them. However it takes time to review them and make sure they're mergeable.

And while I can't speak for the rest of the team, the way I am doesn't help. I don't really notice the passage of time, anything in the past just feels like it happened ages ago, and it's very easy for me to let myself be carried into a comfortable routine for a long time. As you can imagine, it's a problem for certain daily life things. It also means that I end up postponing some things a lot, when I don't immediately feel like dealing with them and there's no urgency, I end up just thinking that "I'll do it later when I feel like it", let time pass, let routine carry me, and... "wow, has it really been one year?". Things like dealing with pull requests definitely fall into this pattern.

Lately, side effects from meds haven't been helping either. I take ritalin for ADHD, and venlafaxine for depression, and the two don't go together well for me. The latest dose increases (in July) led me to a situation where I had bad side effects: absurdly dry mouth, headaches and light sensitivity, bad sleep (waking up every hour or so), and intense fatigue, entirely negating ritalin's benefits.

I stopped taking ritalin and things quickly got a lot better. Although this also means I'm back to my usual ADHD self, in terms of focus and productivity: I don't really control what my brain focuses on, and it can take a long while to get myself to do things.

I won't try taking ritalin again until I'm off venlafaxine. I'm tapering down that; I get more tired some days but overall it's going well. I think venlafaxine has done its job, I also feel a lot better about myself and the world than even one year ago, so I'm pretty confident I can do this. I also want to go out, meet people, do things (all of which are very good ways to keep depression at bay), but it doesn't help when meds leave me too tired to do much at all. So yeah.

From a coding perspective: I want to attempt getting a proper LAN mode going for melonDS. Although lately my brain has been wanting me to focus on something else: a fun side project adventure that involves DSi DSP programming. It's not directly related to melonDS, but it may end up benefitting the DS/DSi scene either way, so hey. I'd want to talk more about it, if it does pan out, but that'd go on another blog.]]>
Optimizing wifi, take two -- by Arisotura Mon, 31 Jul 2023 08:41:43 +0000
The basic idea is that games send multiplayer data at a generally steady rate and the timings are predictable. When the host sends a CMD frame, generally there won't be anything else being sent (other than replies and ack) until the end of the CMD_COUNT window, unless some clients failed to reply. The only exception to this is when the host sends a beacon frame, but since these are sent at a fixed interval, it's not hard to take them into account.

I had to do a few tweaks to get it reliable in certain situations, but so far it gives quite a substantial speed boost and doesn't seem to impact reliability. We still have to see how this goes -- we might have to make it optional if it causes problems somewhere.

I'm also trying to fix some other problems, namely how in some games you can't connect more than two players. I have a bit of an idea what's going on, but I'll need to work on it more.

We're also thinking of bringing back LAN, but as a proper LAN mode instead of just tossing packets around the network. It may even possible to build something robust around enet. I'll have to give this a try.]]>
Trying to optimize LLE wifi sync... -- by Arisotura Fri, 21 Jul 2023 22:41:59 +0000
Basically, if you remember your local multiplayer 101, this is how it goes:

The CMD frame isn't sent immediately, there is always a variable delay before. In a real world setting, you'll want to ensure your wifi channel isn't being used before sending stuff, so that's what the delay is. However, the CMD frame and subsequent reply frames use the 802.11 duration field to reserve the needed timeslot for the entire CMD exchange, so that everything can be pretty tight.

Since version 0.9.5, melonDS uses a synchronization mechanism to ensure this works smoothly. It is actually fairly simple: each frame that is sent has a timestamp attached to it, that is used to keep things in sync. Clients may not run ahead of the host -- when they receive a frame from the host, they know how much they need to catch up, when to start actually receiving that frame, and in the case of a CMD frame, how long they can run ahead before waiting for another frame. And the host discards stale reply frames to prevent itself from running too far ahead. All fine and dandy.

However, I observed that local multiplayer exchanges aren't as tight as I thought before. In most games, the host only sends a CMD frame once per video frame (that is, once every ~16.67ms). Pictochat is a bit different -- it does send CMD frames every ~4ms, but it also does that thing where it polls 3 or 4 clients per frame in a rolling order, so it amounts to a pretty similar thing anyway.

This means that in our case, with the current sync mechanism, clients will spend a whole lot of time waiting for a CMD frame, then having to catch up a whole lot while the host is waiting for reply frames. From a performance standpoint, this is far from ideal -- it largely negates the benefits of multi-core CPUs.

I came up with two ways to work around this, and experimented with them, but so far they both have issues.

The first way would be to abuse the pretty generous CMDCOUNT window games use when sending a CMD frame. I thought that in theory it should be possible to send the CMD frame to clients early, and even have the clients send their reply early as soon as possible, and delay the actual emulation of the CMD/reply/ack exchange until the end of the CMDCOUNT window, and thus things would run more smoothly.

In practice, however, there's a limit as to how far this can be taken. In NSMB MvsL, the CMD/reply/ack exchange needs to occur before the next VBlank or the game will slow down. Other games, like Mario Kart, exhibit stability issues like random disconnects.

In a way, I guess this makes sense -- delaying CMD frames a lot kinda tells the game that the wifi network is really busy.

The second way would be to let clients run freely for a given amount of time after the end of a CMD/reply/ack exchange, so that when the next CMD frame is received, there's much less catching up to do.

But this has its own set of issues too: it requires being able to predict when the next CMD frame will be sent. In general, CMD frames are sent at a fairly predictable interval, but for whatever reason a CMD frame may be sent earlier than expected (for example, if a retransmit is required). In such a situation, what do we do if the client has already run past the point that frame was supposed to be received? This ends up causing cascading problems no matter how you deal with it.

Working out the exact circumstances that may cause a CMD retransmit could make this a viable path, but I'm still not too confident about it.

In the end, I don't know. Maybe HLE is going to be better for this, I don't know.]]>
HLE, wifi, netplay: recent developments and ideas -- by Arisotura Thu, 13 Jul 2023 23:52:55 +0000
However I've also been able to do some melonDS-related work.

I started implementing wifi in melonHLE. So far, I got to the point that there's some data exchange going on, but the connection isn't working due to a timing problem -- I need to rework when I send data frames and how to keep all the melonDS instances in sync.

It's too early to tell, but if I can get it working, it may be a good base for local multiplayer and netplay on lower-end platforms, as I said before. On one hand, the way wifi works at a high level seems to provide enough leeway that I could get away with somewhat lax sync, and thus better performance. On the other hand, I don't have it working yet.

The current implementation is also a bit of a pile of hacks. I will need to clean it up and implement a bunch of the more minor details. I'm also not very confident that the compatibility rate will be on par with the tried and true LLE approach, but the only way to find out is to try out.

There are still several more general issues that limit melonHLE's general compatibility. While testing some other games, I found out that there are atleast 3 different versions of the sound module, more recent games (like Picross 3D) have a different power management module, and there are games that get stuck due to missing functionality. And that's without even getting into DSi stuff. Ideally, I'd want to index all the ARM7 binary versions out there to have a better idea how viable melonHLE may be.

Regardless, this work will likely prove useful for future melonDS/Dolphin interop. As you may already know, some Wii games are able to connect to a DS to provide extra features. This connection uses the same protocol as DS local multiplayer. On the Wii, the functionality is exposed through an API similar to what DS games use at a high level. Dolphin, being HLE, deals with high-level service calls, while regular LLE melonDS has to emulate the wifi hardware and deal with the 802.11 protocol in all their complexity. So obviously some work will be needed on Dolphin's side in order to make it communicate with melonDS.

Also, while I was brainstorming how to efficiently synchronize melonDS instances in HLE wifi, I had an idea for LLE wifi. I will need to try it out, but if it works, it may drastically loosen up the synchronization requirements for local multiplayer (although the gain will likely depend on how many players are connected).

If any of this pans out, we might be able to bring back the ability to do local multiplayer over LAN (I mean, without using the netplay system). I don't know how well it would work in practice -- I remind you that the current IPC comm layer has a bunch of extra smarts to avoid causing lag by unwarranted blocking, and I'm not confident these can be reliably replicated over the network.

Stay tuned!]]>
melonHLE, facts and ideas -- by Arisotura Tue, 06 Jun 2023 09:49:33 +0000
I continued my work on melonHLE, taking it to a point where it may be something serious. The compatibility rate seemed good, even though some games don't run because some auxiliary services aren't completely implemented. I've had some fun reverse-engineering the sound engines and implementing them, with decent results.

So I did a quick performance comparison:

The tests were done on my laptop Crepe (Core i7-5500U, 2.4GHz). Numbers are an average measure of frames per second.

melonHLE shows to be faster, but it's not mindblowing either. However, keep in mind that this is largely a quick and dirty experiment. There are some simple ways to make melonHLE faster, one of which is increasing the maximum CPU time slice (kMaxIterationCycles). The current value of 64 is chosen to keep the ARM9 and ARM7 somewhat in sync, but obviously, in melonHLE we don't need to keep the ARM7 in sync. A much bigger kMaxIterationCycles value increases performance to some extent and has no downsides.

Regardless, melonHLE may prove a viable option for lower-end platforms. In the end, it might be integrated into melonDS as an option, though it needs more work and testing.

Now, you may ask, how does any of this relate to the netplay saga?

This joins the general idea of optimizing melonDS for lower-end platforms. For example, Generic is trying to optimize melonDS for the Switch, and more particularly trying to optimize the whole 3D graphics pipeline. Full 3D games are more demanding, so this is a worthy optimization target.

Optimizing melonDS will also benefit local multiplayer and netplay, seeing as these require running multiple melonDS instances at the same time. I want to make it accessible to a broad audience and not just those with the absolute bestest computers, especially in these troubled times of chip shortages and such.

Which brings me to another point: HLE wifi. So far, I haven't done any work towards it, but I'm tempted to give it a try. The wifi service may be high-level enough to be emulated separately from melonDS's current wifi implementation, and with less complexity and more lax synchronization, but the only way to know is to give it a try. If it works out, it might make local multiplayer possible with lower performance requirements.

Stay tuned!]]>
melonHLE? -- by Arisotura Tue, 09 May 2023 07:52:25 +0000
All the existing DS emulators, as far as I'm aware, are essentially LLE. DS games are mostly self-contained and run on the bare metal, relying on the small BIOSes for basic functions like interrupt waits, decompression, etc.

Some emulators, like DeSmuME, are able to HLE the BIOS calls, basically replicating them inside the emulator. The main advantage to this is that the emulator doesn't require a proper BIOS dump to run games, but there is no other real benefit from this. BIOS calls aren't critical enough that HLEing them might boost performance significantly.

What I've been experimenting with melonHLE goes further: HLEing the ARM7.

It may seem feasible if you consider that Nintendo never allowed game developers to write their own ARM7 binaries. This means that, in theory, all commercial games out there will have one of the few possible ARM7 binary versions. It also means that the ARM7 is limited to taking care of utility tasks, while all the game logic is running on the ARM9.

In practice, how does it work?

The ARM9 communicates with the ARM7 via the IPC hardware (IPCSYNC and the IPC FIFO), and some shared memory areas. When the game boots, there is a IPCSYNC handshake, then the ARM7 exposes a bunch of services that are accessed via the IPC FIFO. The services serve to provide access to the ARM7-side hardware: sound, wifi, touchscreen controller, PMIC, firmware memory, etc. Most of these services are fairly simple, with sound and wifi being by far the most complex ones.

So I've been experimenting with this in a private repo. So far, I've implemented enough of the utility services to get some games to boot, and observe a few things:

* There is a substantial speed gain from HLEing the ARM7. If this proves to be viable in the long run (despite the problems I will get to later), it may be an option for low-end platforms.

* Even if not, I still find it quite interesting to reverse-engineer the ARM7 binary and figure out how things work.

* Super Mario 64 DS has an earlier version of the sound engine, where some of the commands are different.

* Super Princess Peach has a completely different sound engine.

* Mario Kart DS's ARM7 binary has an extra service, which is used to assist loading code to ARM7 VRAM. Nothing we really need to worry about here.

* Aside from that, the smaller utility services seem to be pretty identical across games. However, I haven't started looking into wifi.

Now, the problems to this approach:

* Obviously, this only works for commercial games. It may be possible to support most homebrew by implementing libnds's default ARM7 binary, but anything with a custom ARM7 binary won't work.

* It is far less accurate than the current LLE approach. Given the ARM7's average workload, it may not matter to most games, but there's still the potential for timing issues (which stem from bad game programming).

* I have tried a few games, but I don't know how much variation there is across the entire DS library, and how I should go about identifying the different possible ARM7 binaries. This is going to be the main determining factor for whether ARM7 HLE is viable: how much code complexity is required to attain decent compatibility?

Also, I might want to avoid getting sidetracked too much. I want to finish implementing netplay, once my apartment is less of a mess.]]>
I've awoken from my slumber -- by Generic aka RSDuck Sat, 22 Apr 2023 00:02:00 +0000 previous post on it. If you don't know much about it the compute shader renderer then I recommend checking that post out.

After more or less completing it for Switch (the port desparately needs an update, it will come, I promise), I didn't really touch the code much. Over the last couple of weeks this finally changed.

The renderer had to be ported from Switch's homebrew GPU API deko3D to OpenGL, which fortunately wasn't that hard, because A. most of the complexity lies within the shader there is not that much buffer jougling and B. Nvidia GPUs (or atleast Maxwell) being somewhat of a OpenGL hardware implementation.

But let's come to the main attraction, besides some fixes, high resolution rendering is finally implemented for it. And it works wonderfully, with far fewer or no artefacts compared to the classic OpenGL renderer. And even on my integrated Intel UHD 620 I can reach up to 3x-4x resolution depending on the game.

With local wireless there is now another reason you might want to use it over the software renderer. If you are short on CPU cores for all the melonDS instances you can offload the rasterisation onto the GPU.

There are still a few things left to do. For some reason the shaders (which are all compiled on startup, so no stuttering while playing) seem to compile quite slowly on Windows for Intel and Nvidia GPUs. Bizzarely this seems to be related to the very large SSBOs, atleast reducing their size seems to lead to speed up. So my plan is to replace the large buffers which scale proportionally to the resolution with ones which have unspecified size or image load and store. If I had to guess the driver performs the layout calculation somehow for every array entry. In case I don't get the compile times low enough, I need to implement a shader binary cache.

The outlines generated through edge marking (e.g. used by the Zelda games) are always only pixel thick, which quickly becomes very thin for higher resolutions. Thus I want to add an option to counteract that (I am still not exactly sure how to do it.

Another issue that currently the compute shader renderer isn't integrated into the GUI at all, it currently just replaces the OpenGL renderer.

And like always there is still some clean up to be done in the code. As a last note, the compute shader renderer already uses a texture cache (which as part of this clean up should also be used by the OpenGL renderer). Implementing texture replacement on top of that is not hard and is on my list as well, but one step after the other.

And yes, it allows you to play Pokemon in higher resolutions with no back lines.]]>
The netplay saga, ep 3.5 -- by Arisotura Mon, 17 Apr 2023 19:22:16 +0000
I'm having a mold problem in my apartment, so there won't be a lot of progress on netplay (or melonDS in general) while this is being dealt with. To give you an idea, I'm typing this from another place.

Regarding netplay: I'm not going to go with the idea of sending ROMs over. The other solutions suck from an end user perspective, but I don't want to deal with the legal grey area.

For the rest, we're waiting for JesseTG's pull request for in-memory savestates.]]>
The netplay saga, ep 3 -- by Arisotura Fri, 07 Apr 2023 07:58:53 +0000

The first problem is the ROMs themselves. The first iteration of netplay required that each side have the same ROM, but this has the problem that there can be multiple revisions of the same game, and some games (hi Pokémon) even support multiplayer interaction with different games. Requiring every player to have the exact same ROM feels really restrictive, especially compared to a real-life DS multiplayer session, where each player has their own game cart (or doesn't, and uses download play).

Yet, we do need to ensure that every mirror client is using the exact same ROM as their mirror host. Two solutions: either having mirror hosts send their ROM to their mirror clients, or requiring all ROMs to already be present on all sides.

From an end user perspective, I don't like the second solution. It may require users to deal with complex multi-ROM setups, making sure they load everything in the right place; there's quite the potential for things to go wrong, or just for users to be confused.

So I went and experimented with the first solution. While it keeps things simple, it has the downside that transferring DS ROMs takes a while, due to their average size of 64MB. But there are ways to alleviate this: compressing the transferred data, but also skipping the transfer entirely if all sides already have the exact same ROM (which we can verify with a simple CRC).

Keep in mind that none of this is set in stone, and I'm largely experimenting here. We are still pretty far from a finished product.

Next step is ensuring that the emulator state on boot is the same on each side. For this, I had the idea of using the savestate system: basically, have the mirror host take a savestate after the ROM is loaded, send that state over to mirror clients, have them apply it, and it's guaranteed that all sides start with the exact same state.

I ran into a few issues with this. First, the savestate system doesn't save the BIOS and firmware, because it wasn't deemed necessary at the time I designed it. But right now, it's a requirement if we want our mirror clients to have the same user settings, MAC address, etc... as their mirror host. I also ran into a bug in the savestate system itself, which isn't a problem in most cases but turned out to be problematic in this current situation. After addressing all this, I was finally able to have all sides start from the exact same state. And it does fix the issues I had observed: games stay in perfect sync, items in Mario Kart will always pull the same item on each side, the AI players will stay in sync, etc...

This does have a bit of the same problem as sending ROMs around, though: melonDS savestates tend to be ~18MB in size. So, definitely, compression will come in handy here. I also want to look into other ways to optimize this: enet (the network library we use for this) isn't well suited for transferring large amounts of data like that, so it's slow.

But, overall, at this point we have something close to a viable netplay implementation. As I said, there's still a lot of work to turn this into a finished product, but so far it's looking pretty promising.

For one, the current savestate system uses files, which isn't ideal in this situation. We've had the idea of changing it to use memory buffers for a while, because the way it works (lots of small fread/fwrite calls) isn't ideal on some platforms (like the Switch). JesseTG is working on it, and this change will come in handy here, so I'm waiting for it. For the sake of testing, I circumvented this limitation in a pretty gross way, but... yeah.

Then there's a lot of UI work to be done. Integrating all this into the user interface properly, making things configurable instead of being hardcoded, making everything clear and intuitive, handling problematic situations gracefully... And, of course, I need to add support for headless melonDS instances, so other players can stay hidden from your view, replicating the true DS multiplayer experience.

There's also a bunch of performance testing and tuning to be done. During my testing, I observed some hiccups, but it's also worth noting that my computers are like a decade old, so they're not the best hardware around. I also haven't had the chance to test this over the internet, so I want to see how it performs in these situations.]]>