melonDS RSS The latest news on melonDS. Little update on what's going on -- by Arisotura Mon, 18 Oct 2021 00:17:04 +0000
The focus for the next release is basically on making melonDS less obtuse to the average user. One of the biggest parts of this is removing the requirement for external BIOS and firmware files.

We started the work in that direction by merging some relevant pull requests, making melonDS able to fall back to the DraStic FreeBIOS and a generated firmware for DS mode. There are still some issues we need to take care of in one way or another, namely:

* We need to either force direct boot, or include some small bootloader inside the replacement BIOS/firmware. The easiest route is to force direct boot.

* The original BIOS includes the Blowfish key required to decrypt the secure area in encrypted ROMs. Lacking the key would make it impossible to support encrypted ROMs, but including it is muddy waters from a legal standpoint.

* A notable exception is that some of the encrypted ROMs, like those found in VC titles, contain the data for Blowfish crypto. It could be possible to make melonDS take advantage of it, but not all ROM dumps out there contain that (it's not normally readable in retail cartridges).

All in all, it seems pretty manageable on the DS side. Things get more rocky on the DSi side:

* As the NUS is still online, it is in theory possible to build a DSi NAND from scratch, though it needs to be studied closer. Some of the contents, like font files, would have to be replicated in free variants. The NAND bootloader would not be terribly difficult to replicate.

* There are several more crypto keys found in the DSi BIOS. While it is still possible to load DSi ROMs without requiring any key from the BIOS (provided the secure area is decrypted, which it generally is), I'm not sure about things like DSiWare.

* There is also the fun part of replicating the DSi BIOS. It has more SWI functions (namely, SHA1 and RSA functions, how fun) and also some fun bugs and shit.

It's nothing insurmontable, but it needs more work.

Another part I'm currently working on is making melonDS's DLDI support less obtuse. For now, DLDI support requires the user to provide a raw SD card image, which isn't very convenient.

Well, you might have read about my latest fatfs adventures. I decided to take this further: basically coding a system that will take a given folder on the host filesystem, create a FAT image from that, and then keep the two in sync. For now, it seems to work reasonably well, but it's going to need some intensive testing before being merged.

There are also some tidbits left: integrating this feature into the UI, making it configurable without being overly complex, ensuring the ROM being loaded is present in the DLDI volume and passing its path to libnds's argv system, adding a read-only mode that will preserve the host folder from any modification, ...

We're working on all these fun features, but it takes time. Especially if you're anything like me, dealing with ADHD (it took me over one hour to write this post). I'm also dealing with some minor real life shito, finalizing details of my transition, social programs to help me find a job adapted to me, etc... nothing bad, though.]]>
Sometimes issues are simple... sometimes not -- by Arisotura Sun, 05 Sep 2021 14:39:52 +0000
However, the road to DSi emulation is paved with all sorts of challenges. One example of a fun issue that had been reported a while ago: the DSi menu would freeze after the health/safety screen if any pictures were stored that could be displayed on the top screen. The issue was another unimplemented AES feature, and was fixed in melonDS 0.9.3.

Sometimes I wish all issues were this simple. I felt like looking at another of the known DSi-mode issues: the fact that we currently don't implement the RAM size register in SCFG_EXT9. The RAM size register is mainly used to restrict the accessible main RAM to 4MB before launching a DS game. In theory, not a very difficult thing to implement. In practice, however, there is an issue that kept us from enabling that feature: when it's enabled, the DSi launcher crashes when launching a DS game, while they would otherwise run fine (albeit with the full 16MB RAM instead of the 4MB they might expect).

As explained in GBAtek, nocash ran into the same issue:
SCFG_EXT9.bit14-15 affect the Main RAM mapping on <both> ARM9 and ARM7 side (that, at least AFTER games have been booted, however, there's a special case DURING boot process: For NDS games, the firmware switches to 4MB mode on ARM9 side, whilst ARM7 is still relocating memory from the 16MB area at the same time - unknown how that is working exactly, maybe ARM7 isn't affected by SCFG_EXT9 setting until ARM7 has configured/disabled its own SCFG_EXT7 register).

The basic process of the loader is as follows: the ARM9 syncs with the ARM7 via IPCSYNC, then both CPUs run through lists of memory areas to copy or clear, then the ARM9 changes the main RAM size if required. However, while the ARM7 has a bunch of regions in main RAM to clear, the ARM9 is given empty copy/clear lists, and all it has to do is clear its DTCM, which is quickly done. As a result, the ARM9 changes the main RAM size while the ARM7 is still clearing regions, causing it to overwrite the ARM9's code, and you guess how this goes: kaboom.

Yet, the same code works fine on hardware.

I had already experimented with the RAM size register, to try and find out if there's anything fancy about it, but there's nothing special at all. The RAM size gets changed instantly on both sides, and there's nothing fancy about memory mapping either. Oh and the ARM9 caches are disabled when the loader is running, so they don't come into play here.

So I made a homebrew that reproduced the loader code: same ASM code, same memory regions, same everything. My first tests were to see if there was any kind of secret register altering main RAM mapping somehow, but there was none. Then, another test determined that, infact, on hardware, the RAM size change isn't applied until the ARM7 has cleared all its memory regions.

We then added code to measure how long each side takes to complete its tasks, and it turns out that the ARM9 takes much longer than expected. The ARM9 code is running in main RAM, and the ARM7 has a bunch of main RAM regions to copy and clear: as EXMEMCNT is set to give priority over main RAM to the ARM7, the concurrent accesses are slowing down the ARM9. A lot.

This is some shitty news.

First of all, this is probably not an isolated case: the same sort of thing can also affect timing in games, although probably on a lesser scale.

Secondly, there is no way to correctly emulate this sort of thing without a cycle-accurate emulator. Given the current performance characteristics of DS emulation, and the sheer complexity of DS timings, cycle accuracy can be considered off-limits. In this particular case, the best we could do would be some kind of estimation for a cycle penalty if several concurrent main RAM accesses are detected within a given timeslice (that's the thing, there's no real way to determine whether they are actually concurrent, due to how we run things).

Oh well.]]>
melonDS 0.9.3 is out! -- by Arisotura Wed, 01 Sep 2021 23:59:32 +0000
First of all, we fixed touchscreen input, it should now work as expected in all screen modes. We also added support for touchscreen devices (tablets etc).

On the emulation side, we added support for audio interpolation, as an optional emulation improvement. Depending on how good your game's samples are, you may see an improvement in audio quality. There are multiple interpolation types to choose from, so you can see which one you like best.

We also added a setting to optionally degrade the audio output to 10-bit, like the actual DS, for more authentic experience. This goes hand in hand with emulation of the SOUNDBIAS register, too. Emulating this register means nothing for the average game, but it could be used for cool tricks in homebrew. Hell, we even managed to make the DS play a song solely by regularly changing SOUNDBIAS.

There have been several improvements to DSi mode too. Namely, touchscreen calibration is now automatically patched in DSi mode, eliminating the need for a recalibration. It is also possible to boot DSi games and homebrew directly now, although this feature is still experimental. Last but not least, the DSi title manager allows you to easily install your DSiWare titles to your emulated NAND.

A ROM info dialog has also been added under System -> ROM Info.

There's also the usual slew of bugfixes and other little additions, you can check the changelog for the full list.

For the eventual 1.0 release, we also want to make melonDS less obtuse all around: providing user-selectable paths for savefiles and such, BIOS/firmware substitutes atleast for DS mode, less obtuse DLDI support, etc...


melonDS 0.9.3, Windows x64
melonDS 0.9.3, Linux x64
melonDS 0.9.3, Linux ARM64
melonDS 0.9.3, macOS x64
melonDS 0.9.3, macOS ARM64
melonDS 0.9.3, macOS universal]]>
Buffing up DSi mode -- by Arisotura Tue, 24 Aug 2021 14:51:41 +0000
One of the recurring complaints is that, when running in DSi mode, touchscreen input is off, requiring the user to recalibrate the touchscreen, while this doesn't happen in DS mode.

Reason for that is that on the DS (and DSi), the touchscreen hardware doesn't return pixel coordinates, but raw digitizer readings. Calibration data is then used to convert these readings to pixel coordinates. Every touchscreen digitizer is going to have a slightly different range, which is why users have to calibrate their touchscreen.

melonDS makes up for that in a very simple fashion. It uses its own conversion, basically just multiplying the touchscreen pixel coordinates by 16 to make decent 'raw' coordinates. When booting, melonDS also patches the user's firmware data with its own adequate calibration data, so that no recalibration is required and the touchscreen Just Works(tm). Easy peasy.

However, in DSi mode, it's another story. The old DS firmware data still exists, but the DSi system instead uses user settings files stored in the NAND. Thing is, it's less easy to access the data there: the NAND is mostly a FAT volume with an encryption layer. Not exactly trivial to deal with.

I was shown fatfs, which is basically a lightweight FAT driver. It is meant to be used to access storage media such as SD cards on embedded devices, however it is trivial to make it work on a FAT volume contained within an image file. I wrote code to do that with the DSi NAND, taking care of encryption transparently, and bam, I had a viable base for NAND manipulation.

I then wrote code to access the user settings files inside the NAND, and patch the touchscreen calibration data there. After taking care of all the details like SHA-1 hashes and whatnot, the initial issue was covered: the DSi-mode touchscreen Just Worked(tm), with no recalibration needed, just like its DS-mode counterpart.

With this proof of concept being a success, I took it further:

This neat little toy allows installing DSiware to your NAND. It's also possible to delete titles you don't want anymore, and to import and export save data. However, a few notes about this:

* the title manager can only be used while emulation is not running, to avoid risks of NAND corruption
* only DSiware can be imported (ie, not cartridge ROMs)
* if you don't have a TMD file, it can be downloaded automatically from the NUS
* you have to provide your own DSiware files, we can't help you obtain them
* if you're running a stock DSi menu, only DSiware for your region will show up -- the current region-bypass hack only works for cart ROMs
* the importer requires a valid (augmented) dump of the DSi ARM7 BIOS

That's about it for now, but the inclusion of fatfs also opens other fun possibilities, like creating FAT images from given directories for things like DLDI or the DSi SD card.

Stay tuned!]]>
Audio interpolation -- by Arisotura Thu, 22 Jul 2021 21:35:59 +0000
Anyway, audio interpolation is one of the emulation improvements that have been requested for melonDS. My general policy for emulation improvements is that they should allow for keeping the accurate code paths, and they shouldn't add too much complexity to the code. Audio interpolation is well within these bounds. Actually, I had implemented it in DeSmuME back then, and due to the way DeSmuME's mixer works, it was quickly done.

So I figured I would give it a try in melonDS.

The basic idea behind audio interpolation is to smooth out the audio samples as they're being upsampled. DS games may have downsampled audio to save on space and bandwidth, and the DS mixer doesn't perform any interpolation, which can lead to rough sounding samples. The reason the DS does no interpolation is most likely due to how its mixer hardware works, but obviously as an emulator we can ignore these constraints and do a better job.

It's also noting that, as far as melonDS is concerned, there are two parts we need to take care of: the DS mixer and the audio output.

In the DS, the mixer is driven by the system clock, like nearly everything else. If you ever coded for the DS, you might have wondered why the frequency registers for the audio channels are weird:

40004x8h - NDS7 - SOUNDxTMR - Sound Channel X Timer Register (W)

 Bit0-15  Timer Value, Sample frequency, timerval=-(33513982Hz/2)/freq

The PSG Duty Cycles are composed of eight "samples", and so, the frequency for Rectangular Wave is 1/8th of the selected sample frequency.
For PSG Noise, the noise frequency is equal to the sample frequency.

The SOUNDxTMR registers directly control the channel timers, which are driven at half the system clock. These work like the general purpose timers: they are incremented at half the system clock, and every time they overflow, they are reloaded to the SOUNDxTMR value and the channel advances to the next sample.

This is a fairly simple and efficient design, but you can probably guess why it doesn't lend itself to interpolation. Basically, to get the sub-sample position you need for interpolation at any given time, you would need to subtract the current timer value from the reload value, then divide that by 0x10000 minus the reload value, which isn't convenient to implement in hardware.

The mixer in melonDS works in a similar way, although it is only sample-accurate, for several reasons: sample accuracy is good enough for DS games, we don't know how the mixer operates on a per-cycle basis, and of course, performance reasons. To reach its sample rate of approximately 32.7 KHz, the DS needs to output one audio sample every 1024 system-clock cycles, and that is how often we run the mixer in melonDS. We have to be a bit smart about updating our channel timers, but it works well enough.

However, this design means the output sample rate of the melonDS core depends on how fast it's running. Basically, melonDS runs 560190 cycles per frame and outputs one audio sample every 1024 cycles, like the real thing. Assuming a framerate of 60 FPS (which is a bit faster than the real thing), this means an audio output rate of 32823.6328125 Hz.

Well, yeah. Generally, you can't go and ask your audio library for a weird non-integer sample rate.

So what do we do, here? Well, early melonDS versions would just pick the closest integer sample rate, send out the audio output as-is, and pray. You guess, it didn't work that well. Not only was it impossible to attain perfect sync, but on some platforms we just could not get a sample rate of 32824 Hz.

Hence, a proper audio output stage was added. It lets us pick a more standard output rate of 48 KHz, lets the audio driver give us another sample rate if that one isn't available, then it resamples melonDS's audio output to match that output rate. The resampler also supports a small margin, which can make up for small variations in framerate.

This resampler would be another point of concern: currently, it upsamples audio with no interpolation, so there's room for improvement here too.

Anyway, I made a quick proof-of-concept in a separate branch. For now, it applies linear interpolation to all channels, and seems to work decently well. A few notes on this:

1. PSG channels are quite muffled. They should not be interpolated, but I'm partly tempted to keep that as a fun option.

2. Linear interpolation is the easiest but certainly not the best. I could implement better algorithms: cosine, cubic, gauss...

3. Of course, the feature would be made optional, and disabled by default.

I might also add an option for interpolation in the resampler, or keep the two tied together for simplicity? Not sure. Noting that interpolation makes things sound smoother but can also muffle sound to an extent. Your input is welcome!]]>
Tragic news -- by Arisotura Sun, 27 Jun 2021 22:40:29 +0000
Well, today I was going to work on DMA timings (besides, the notes in the previous post aren't quite right, go figure), but I learnt about what just happened and decided I would write this post instead. This is not the kind of thing we can ignore.

Our fellow emudev Near committed suicide. While the consequences leading to this sort of event can be complex, the role bullying and harassment have played into this is clear. Those people have blood on their hands.

I want this post to be a homage to Near. Their work on emulation and preservation has been astounding. These domains are important to me, thus I've always had great respect for Near's work. The amount of time and effort they poured into this is well over what I can hope to do, so I can only commend the dedication they have showed.

Rest in peace, Near.

I also want to reiterate that we at the melonDS team stand with our fellow emudevs, and against all practices of discrimination and harassment like those that have led to the deaths of too many of our fellows already.

There is no neutral position on this matter. I cannot stress this enough. Trying to remain 'neutral' enables the oppressors.]]>
Pride month and DMA timings -- by Arisotura Mon, 07 Jun 2021 11:02:55 +0000
Now, let's talk about something more technical (so this post is not just a 'political statement' :P ).

DMA timings.

You might have noticed the timing17 branch. So what, another timing branch. Arisotura just loves these. Or something.

It's going to be some general timing renovation, depending on how far my motivation will take me. I started the work with DMA timings, figuring it would just be matter of taking into account that sequential timings only work when the address is incrementing linearly... well, it's hairier than that.

This is based on tests done at like 04:00, and this is the DS, so take this with a rock of salt.

Most memory regions in the DS have such timing characteristics that sequential accesses make no difference, atleast from the standpoint of DMA. The ARM9 and its 3-cycle penalty when using the bus are another story.

However, there are a couple regions that have different rules.

Main RAM (0x02000000)

Seeing how slow the DS's main RAM is (8 cycles for a 16-bit access), it makes sense for it to support some form of burst access: when accessing a bunch of consecutive memory addresses, the first one will get the full waitstate, then consecutive ones will be faster.

However, how this interacts with DMA is... weird. I did some testing on the ARM9 side, it's probably similar on the ARM7 side but I will have to test there to confirm it.

Main RAM reads can be parallelized to some extent with writes to other memory regions. In practice, this seems to shave off one cycle from the nominal nonsequential timing.

Main RAM writes can be done sequentially in maximum bursts of 120 halfwords in 16-bit mode, or 80 words in 32-bit mode. You guess, the first write in the burst gets the nonsequential timing, the rest get the sequential timing.

Reads a bit weirder. In 32-bit mode, we get a maximum burst length of 118 words. In 16-bit mode, however, there seems to be a hardware glitch: we get two bursts of 119 halfwords each, then one burst of two halfwords, then two bursts of 119 halfwords, and so on, in a repeating pattern. It's like something (the DMA controller?) is enforcing a maximum burst length of 240 halfwords which was miscalculated.

The whole sequential burst thing assumes two things: a) that you are DMAing from main RAM to another memory region, or vice versa, and b) that the main RAM address is incrementing linearly.

Regarding a), DMAing from main RAM to main RAM will force each access to be nonsequential with no parallelization possible, resulting in abysmal performance. It is even faster to DMA from main RAM to another memory and then back to main RAM in two separate transfers.

Regarding b), burst access only works with a linearly incrementing address. Setting the DMA to use a fixed or decrementing address results in all-nonsequential accesses. For whatever reason, it seems that in this case, reads to the last halfword of each 32-byte block are one cycle faster.

Got it all? Now figure out an elegant way to implement all this into melonDS. I'm not too concerned about performance, DMA transfers aren't a bottleneck, I just want the code not to be a huge mess.

The GBA slot

Things are simpler there. The timings for the GBA ROM region are as configured in EXMEMCNT. The DMA controller can do burst accesses with little restrictions: the first halfword is nonsequential, as well as the last halfword of each 0x20000-byte block, the rest are all sequential. This even works with a fixed or decrementing address, which hints me that maybe these modes don't work correctly in this region.

There are no weird parallelizing shenanigans or whatever, either.

The GBA RAM region does not support sequential accesses. And, seeing the results I get on hardware, it might just not support DMA at all, or only under certain specific circumstances.

The wifi regions

The wifi I/O ports and RAM are mapped across two mirrored regions, at 0x04800000 (WS0) and 0x04808000 (WS1). Each region can be configured to have different waitstates, via WIFIWAITCNT. The point seems to be that the I/O and RAM regions each have different preferential timings.

This one needs a bit more testing, but the interface seems very similar to that of the GBA ROM region, even down to the way WIFIWAITCNT works, so there's probably nothing too fancy there.

WIFIWAITCNT (ARM7 - 0x04000206)
Bit 0-1: WS0 nonsequential timing (0-3 = 10, 8, 6, 18 cycles)
Bit 2: WS0 sequential timing (0-1 = 6, 4 cycles)
Bit 3-4: WS1 nonsequential timing (0-3 = 10, 8, 6, 18 cycles)
Bit 5: WS1 sequential timing (0-1 = 10, 4 cycles)

Seem familiar?

The DSi

Testing will need to be done on the DSi too. Most of the timing characteristics should be the same, but there are a couple things to take into account. Main RAM might have slightly different timings in some of the edge cases. There are the new WRAM regions. There are settings that can expand some busses from 16-bit to 32-bit (like for VRAM), which likely affects timings. There is the NDMA controller.

Why go through all that trouble?

Do the DMA timing details matter a lot? Yes and no.

Let's say you have a 4096-word DMA transfer, and you determine its duration to be 8192 cycles (assuming 2 cycles per word). If the actual hardware timing is like 8 cycles off, it doesn't matter a lot.

However, let's say the actual hardware timing is off by one cycle every 16 words. That's a total error of 256 cycles, which has more of an impact on things. Worse, in this hypothetical scenario, such an error would grow with larger DMA transfers too.

So, while I'm not spending a lot of effort on things like DMA setup delays, I'm putting all the effort into understanding how the overall timing of a DMA transfer relates to the transfer length, because that is the important part.

As I said in the previous post, I really want to deal with the timing issues once and for all.

This is where emu coders tend to be like "oh no, you have to emulate the ARM9 caches". This sort of thing is a bit of an emulation holy-grail. The last big timing improvement possible before needing a cycle-accurate emulator (which, with the DS, well, good luck).

But, in reality, emulating the ARM9 caches is not going to be a magical fix if our underlying timing model is inaccurate. At best, it will just create a different inaccurate timing profile, and we would be telling users to switch between the two and hope one of them will work.

We might end up having to emulate the ARM9 caches. That would suck big time. But we're not going to try to find out before our timing model is accurate enough. And, at that point, that accuracy along with the current kCodeCacheTiming/kDataCacheTiming approximation might be enough. Who knows.

Well, this went ass-far. And we haven't even started doing CPU timings. That's gonna be a fun ride for sure.]]>
Well, guess we owe you another release soon -- by Arisotura Wed, 19 May 2021 10:46:57 +0000
Like how touchscreen input will wrap around instead of properly clamping like it's always done. This is a good reminder to us to always test all features, even those that are granted for working. They work... until somebody introduces broken code that breaks them. It's not like this hasn't happened in the past, either, so we should do more quality control.

Sorry about this.

I haven't been doing a whole lot of melonDS coding lately either. I began researching some issues, began working on timing test suites, then veered to MPU stuff, and, well...

There are other things going on in real life, which doesn't help. I can't really focus on more than one big thing at once. There are a few going on right now: trying to find a bigger apartment to move in with friends, another housing-related thing I can't really post details about, and dealing with the obnoxious little pricks who live right next to my apartment and like partying all the damn time. Seriously they look like kids who got a fancypants sound system for Christmas or something.

Once these things have settled I will hopefully be able to work more on melonDS.

The first test suites I was making were for DMA timings. I'd need to polish them and add more test cases to make it good. Although really, the timings mostly boil down to a set of rules, some of which I have yet to implement into melonDS: for example, the maximum length of a sequential read/write burst is 118 units, and (figures) sequential bursts only work when the address is incrementing. The main issue I face is implementing the rules in an elegant and efficient way.

This doesn't even get into the fun part of CPU timings. You get the code fetch cycles, data access cycles, and other internal/etc cycles, which may interact and overlap in all sorts of fancy ways.

I want to address the timing issues once and for all, and I feel it's pointless to attempt things like full cache emulation if our underlying timing model is wrong.]]>
Introducing the compute renderer -- by Generic aka RSDuck Sat, 01 May 2021 08:42:51 +0000
Why are we doing this in the first place?
  • Enhancements such as higher resolution rendering at reasonable speeds compared to say a software rasteriser, but with less problems than the OpenGL renderer (though problems can never be fully excluded when running games differently than they were intended).

  • Fullspeed emulation of 3D games on Switch and potentially other devices which fit this weird niche where they have slow processors but pretty competent GPUs and good software side support for it.

You might have already heard of parallel-rdp from Themaister which provides a very accurate emulation of the RDP (i.e. that part of the N64 which in the end draws the triangles) running on the GPU. It has been a great inspiration for this project (which means where possible it's basically a clone). So thanks to Themaister for all the ideas and also for answering my questions!

Currently the main part of the work is done (it's already somewhat playable with a lot of games), so it's easier to list what's still missing:

  • Blending

  • Shadows

  • Equal depth testing

  • Antialiasing

  • Highlighting/Toon shading

  • Fog

  • Edgemarking

  • Rearimages

I plan on detailing some technical aspects later. Also I have not forgetten my A tour through melonDS's JIT recompiler "series", so expect to see some more posts by me here sooner or later.]]>
melonDS 0.9.2 is out! -- by Arisotura Mon, 26 Apr 2021 22:43:30 +0000
Namely, improved Mac support: there have been fixes to the JIT, but also to the interface, so things should work more smoothly under macOS.

melonDS also supports loading ROMs from the most common archive formats, now, which means users with large ROM sets should have it easier.

We also have a new menu listing the ROMs you have opened recently, making it quicker to open them again.

We got new fancy screen modes courtesy Generic. These make it possible to use 16:9 hacks along with melonDS, among other fun things. Speaking of renderers, he also went and fixed a whole bunch of OpenGL issues.

The cart interface refactor was finished in time for this release, and with it, support for NAND save memory. WarioWare DIY and Jam with the Band are now able to save correctly under melonDS.

And, as usual, there are a bunch of other misc changes, which you can find about in the changelog or commit history.


melonDS 0.9.2, Windows 64-bit
melonDS 0.9.2, Linux 64-bit
melonDS 0.9.2, Linux ARM64
melonDS 0.9.2, MacOS 64-bit
melonDS 0.9.2, MacOS ARM64
melonDS 0.9.2, MacOS universal]]>