melonDS RSS The latest news on melonDS. Audio interpolation -- by Arisotura Thu, 22 Jul 2021 21:35:59 +0000
Anyway, audio interpolation is one of the emulation improvements that have been requested for melonDS. My general policy for emulation improvements is that they should allow for keeping the accurate code paths, and they shouldn't add too much complexity to the code. Audio interpolation is well within these bounds. Actually, I had implemented it in DeSmuME back then, and due to the way DeSmuME's mixer works, it was quickly done.

So I figured I would give it a try in melonDS.

The basic idea behind audio interpolation is to smooth out the audio samples as they're being upsampled. DS games may have downsampled audio to save on space and bandwidth, and the DS mixer doesn't perform any interpolation, which can lead to rough sounding samples. The reason the DS does no interpolation is most likely due to how its mixer hardware works, but obviously as an emulator we can ignore these constraints and do a better job.

It's also noting that, as far as melonDS is concerned, there are two parts we need to take care of: the DS mixer and the audio output.

In the DS, the mixer is driven by the system clock, like nearly everything else. If you ever coded for the DS, you might have wondered why the frequency registers for the audio channels are weird:

40004x8h - NDS7 - SOUNDxTMR - Sound Channel X Timer Register (W)

 Bit0-15  Timer Value, Sample frequency, timerval=-(33513982Hz/2)/freq

The PSG Duty Cycles are composed of eight "samples", and so, the frequency for Rectangular Wave is 1/8th of the selected sample frequency.
For PSG Noise, the noise frequency is equal to the sample frequency.

The SOUNDxTMR registers directly control the channel timers, which are driven at half the system clock. These work like the general purpose timers: they are incremented at half the system clock, and every time they overflow, they are reloaded to the SOUNDxTMR value and the channel advances to the next sample.

This is a fairly simple and efficient design, but you can probably guess why it doesn't lend itself to interpolation. Basically, to get the sub-sample position you need for interpolation at any given time, you would need to subtract the current timer value from the reload value, then divide that by 0x10000 minus the reload value, which isn't convenient to implement in hardware.

The mixer in melonDS works in a similar way, although it is only sample-accurate, for several reasons: sample accuracy is good enough for DS games, we don't know how the mixer operates on a per-cycle basis, and of course, performance reasons. To reach its sample rate of approximately 32.7 KHz, the DS needs to output one audio sample every 1024 system-clock cycles, and that is how often we run the mixer in melonDS. We have to be a bit smart about updating our channel timers, but it works well enough.

However, this design means the output sample rate of the melonDS core depends on how fast it's running. Basically, melonDS runs 560190 cycles per frame and outputs one audio sample every 1024 cycles, like the real thing. Assuming a framerate of 60 FPS (which is a bit faster than the real thing), this means an audio output rate of 32823.6328125 Hz.

Well, yeah. Generally, you can't go and ask your audio library for a weird non-integer sample rate.

So what do we do, here? Well, early melonDS versions would just pick the closest integer sample rate, send out the audio output as-is, and pray. You guess, it didn't work that well. Not only was it impossible to attain perfect sync, but on some platforms we just could not get a sample rate of 32824 Hz.

Hence, a proper audio output stage was added. It lets us pick a more standard output rate of 48 KHz, lets the audio driver give us another sample rate if that one isn't available, then it resamples melonDS's audio output to match that output rate. The resampler also supports a small margin, which can make up for small variations in framerate.

This resampler would be another point of concern: currently, it upsamples audio with no interpolation, so there's room for improvement here too.

Anyway, I made a quick proof-of-concept in a separate branch. For now, it applies linear interpolation to all channels, and seems to work decently well. A few notes on this:

1. PSG channels are quite muffled. They should not be interpolated, but I'm partly tempted to keep that as a fun option.

2. Linear interpolation is the easiest but certainly not the best. I could implement better algorithms: cosine, cubic, gauss...

3. Of course, the feature would be made optional, and disabled by default.

I might also add an option for interpolation in the resampler, or keep the two tied together for simplicity? Not sure. Noting that interpolation makes things sound smoother but can also muffle sound to an extent. Your input is welcome!]]>
Tragic news -- by Arisotura Sun, 27 Jun 2021 22:40:29 +0000
Well, today I was going to work on DMA timings (besides, the notes in the previous post aren't quite right, go figure), but I learnt about what just happened and decided I would write this post instead. This is not the kind of thing we can ignore.

Our fellow emudev Near committed suicide. While the consequences leading to this sort of event can be complex, the role bullying and harassment have played into this is clear. Those people have blood on their hands.

I want this post to be a homage to Near. Their work on emulation and preservation has been astounding. These domains are important to me, thus I've always had great respect for Near's work. The amount of time and effort they poured into this is well over what I can hope to do, so I can only commend the dedication they have showed.

Rest in peace, Near.

I also want to reiterate that we at the melonDS team stand with our fellow emudevs, and against all practices of discrimination and harassment like those that have led to the deaths of too many of our fellows already.

There is no neutral position on this matter. I cannot stress this enough. Trying to remain 'neutral' enables the oppressors.]]>
Pride month and DMA timings -- by Arisotura Mon, 07 Jun 2021 11:02:55 +0000
Now, let's talk about something more technical (so this post is not just a 'political statement' :P ).

DMA timings.

You might have noticed the timing17 branch. So what, another timing branch. Arisotura just loves these. Or something.

It's going to be some general timing renovation, depending on how far my motivation will take me. I started the work with DMA timings, figuring it would just be matter of taking into account that sequential timings only work when the address is incrementing linearly... well, it's hairier than that.

This is based on tests done at like 04:00, and this is the DS, so take this with a rock of salt.

Most memory regions in the DS have such timing characteristics that sequential accesses make no difference, atleast from the standpoint of DMA. The ARM9 and its 3-cycle penalty when using the bus are another story.

However, there are a couple regions that have different rules.

Main RAM (0x02000000)

Seeing how slow the DS's main RAM is (8 cycles for a 16-bit access), it makes sense for it to support some form of burst access: when accessing a bunch of consecutive memory addresses, the first one will get the full waitstate, then consecutive ones will be faster.

However, how this interacts with DMA is... weird. I did some testing on the ARM9 side, it's probably similar on the ARM7 side but I will have to test there to confirm it.

Main RAM reads can be parallelized to some extent with writes to other memory regions. In practice, this seems to shave off one cycle from the nominal nonsequential timing.

Main RAM writes can be done sequentially in maximum bursts of 120 halfwords in 16-bit mode, or 80 words in 32-bit mode. You guess, the first write in the burst gets the nonsequential timing, the rest get the sequential timing.

Reads a bit weirder. In 32-bit mode, we get a maximum burst length of 118 words. In 16-bit mode, however, there seems to be a hardware glitch: we get two bursts of 119 halfwords each, then one burst of two halfwords, then two bursts of 119 halfwords, and so on, in a repeating pattern. It's like something (the DMA controller?) is enforcing a maximum burst length of 240 halfwords which was miscalculated.

The whole sequential burst thing assumes two things: a) that you are DMAing from main RAM to another memory region, or vice versa, and b) that the main RAM address is incrementing linearly.

Regarding a), DMAing from main RAM to main RAM will force each access to be nonsequential with no parallelization possible, resulting in abysmal performance. It is even faster to DMA from main RAM to another memory and then back to main RAM in two separate transfers.

Regarding b), burst access only works with a linearly incrementing address. Setting the DMA to use a fixed or decrementing address results in all-nonsequential accesses. For whatever reason, it seems that in this case, reads to the last halfword of each 32-byte block are one cycle faster.

Got it all? Now figure out an elegant way to implement all this into melonDS. I'm not too concerned about performance, DMA transfers aren't a bottleneck, I just want the code not to be a huge mess.

The GBA slot

Things are simpler there. The timings for the GBA ROM region are as configured in EXMEMCNT. The DMA controller can do burst accesses with little restrictions: the first halfword is nonsequential, as well as the last halfword of each 0x20000-byte block, the rest are all sequential. This even works with a fixed or decrementing address, which hints me that maybe these modes don't work correctly in this region.

There are no weird parallelizing shenanigans or whatever, either.

The GBA RAM region does not support sequential accesses. And, seeing the results I get on hardware, it might just not support DMA at all, or only under certain specific circumstances.

The wifi regions

The wifi I/O ports and RAM are mapped across two mirrored regions, at 0x04800000 (WS0) and 0x04808000 (WS1). Each region can be configured to have different waitstates, via WIFIWAITCNT. The point seems to be that the I/O and RAM regions each have different preferential timings.

This one needs a bit more testing, but the interface seems very similar to that of the GBA ROM region, even down to the way WIFIWAITCNT works, so there's probably nothing too fancy there.

WIFIWAITCNT (ARM7 - 0x04000206)
Bit 0-1: WS0 nonsequential timing (0-3 = 10, 8, 6, 18 cycles)
Bit 2: WS0 sequential timing (0-1 = 6, 4 cycles)
Bit 3-4: WS1 nonsequential timing (0-3 = 10, 8, 6, 18 cycles)
Bit 5: WS1 sequential timing (0-1 = 10, 4 cycles)

Seem familiar?

The DSi

Testing will need to be done on the DSi too. Most of the timing characteristics should be the same, but there are a couple things to take into account. Main RAM might have slightly different timings in some of the edge cases. There are the new WRAM regions. There are settings that can expand some busses from 16-bit to 32-bit (like for VRAM), which likely affects timings. There is the NDMA controller.

Why go through all that trouble?

Do the DMA timing details matter a lot? Yes and no.

Let's say you have a 4096-word DMA transfer, and you determine its duration to be 8192 cycles (assuming 2 cycles per word). If the actual hardware timing is like 8 cycles off, it doesn't matter a lot.

However, let's say the actual hardware timing is off by one cycle every 16 words. That's a total error of 256 cycles, which has more of an impact on things. Worse, in this hypothetical scenario, such an error would grow with larger DMA transfers too.

So, while I'm not spending a lot of effort on things like DMA setup delays, I'm putting all the effort into understanding how the overall timing of a DMA transfer relates to the transfer length, because that is the important part.

As I said in the previous post, I really want to deal with the timing issues once and for all.

This is where emu coders tend to be like "oh no, you have to emulate the ARM9 caches". This sort of thing is a bit of an emulation holy-grail. The last big timing improvement possible before needing a cycle-accurate emulator (which, with the DS, well, good luck).

But, in reality, emulating the ARM9 caches is not going to be a magical fix if our underlying timing model is inaccurate. At best, it will just create a different inaccurate timing profile, and we would be telling users to switch between the two and hope one of them will work.

We might end up having to emulate the ARM9 caches. That would suck big time. But we're not going to try to find out before our timing model is accurate enough. And, at that point, that accuracy along with the current kCodeCacheTiming/kDataCacheTiming approximation might be enough. Who knows.

Well, this went ass-far. And we haven't even started doing CPU timings. That's gonna be a fun ride for sure.]]>
Well, guess we owe you another release soon -- by Arisotura Wed, 19 May 2021 10:46:57 +0000
Like how touchscreen input will wrap around instead of properly clamping like it's always done. This is a good reminder to us to always test all features, even those that are granted for working. They work... until somebody introduces broken code that breaks them. It's not like this hasn't happened in the past, either, so we should do more quality control.

Sorry about this.

I haven't been doing a whole lot of melonDS coding lately either. I began researching some issues, began working on timing test suites, then veered to MPU stuff, and, well...

There are other things going on in real life, which doesn't help. I can't really focus on more than one big thing at once. There are a few going on right now: trying to find a bigger apartment to move in with friends, another housing-related thing I can't really post details about, and dealing with the obnoxious little pricks who live right next to my apartment and like partying all the damn time. Seriously they look like kids who got a fancypants sound system for Christmas or something.

Once these things have settled I will hopefully be able to work more on melonDS.

The first test suites I was making were for DMA timings. I'd need to polish them and add more test cases to make it good. Although really, the timings mostly boil down to a set of rules, some of which I have yet to implement into melonDS: for example, the maximum length of a sequential read/write burst is 118 units, and (figures) sequential bursts only work when the address is incrementing. The main issue I face is implementing the rules in an elegant and efficient way.

This doesn't even get into the fun part of CPU timings. You get the code fetch cycles, data access cycles, and other internal/etc cycles, which may interact and overlap in all sorts of fancy ways.

I want to address the timing issues once and for all, and I feel it's pointless to attempt things like full cache emulation if our underlying timing model is wrong.]]>
Introducing the compute renderer -- by Generic aka RSDuck Sat, 01 May 2021 08:42:51 +0000
Why are we doing this in the first place?
  • Enhancements such as higher resolution rendering at reasonable speeds compared to say a software rasteriser, but with less problems than the OpenGL renderer (though problems can never be fully excluded when running games differently than they were intended).

  • Fullspeed emulation of 3D games on Switch and potentially other devices which fit this weird niche where they have slow processors but pretty competent GPUs and good software side support for it.

You might have already heard of parallel-rdp from Themaister which provides a very accurate emulation of the RDP (i.e. that part of the N64 which in the end draws the triangles) running on the GPU. It has been a great inspiration for this project (which means where possible it's basically a clone). So thanks to Themaister for all the ideas and also for answering my questions!

Currently the main part of the work is done (it's already somewhat playable with a lot of games), so it's easier to list what's still missing:

  • Blending

  • Shadows

  • Equal depth testing

  • Antialiasing

  • Highlighting/Toon shading

  • Fog

  • Edgemarking

  • Rearimages

I plan on detailing some technical aspects later. Also I have not forgetten my A tour through melonDS's JIT recompiler "series", so expect to see some more posts by me here sooner or later.]]>
melonDS 0.9.2 is out! -- by Arisotura Mon, 26 Apr 2021 22:43:30 +0000
Namely, improved Mac support: there have been fixes to the JIT, but also to the interface, so things should work more smoothly under macOS.

melonDS also supports loading ROMs from the most common archive formats, now, which means users with large ROM sets should have it easier.

We also have a new menu listing the ROMs you have opened recently, making it quicker to open them again.

We got new fancy screen modes courtesy Generic. These make it possible to use 16:9 hacks along with melonDS, among other fun things. Speaking of renderers, he also went and fixed a whole bunch of OpenGL issues.

The cart interface refactor was finished in time for this release, and with it, support for NAND save memory. WarioWare DIY and Jam with the Band are now able to save correctly under melonDS.

And, as usual, there are a bunch of other misc changes, which you can find about in the changelog or commit history.


melonDS 0.9.2, Windows 64-bit
melonDS 0.9.2, Linux 64-bit
melonDS 0.9.2, Linux ARM64
melonDS 0.9.2, MacOS 64-bit
melonDS 0.9.2, MacOS ARM64
melonDS 0.9.2, MacOS universal]]>
Change to the save file handling -- by Arisotura Thu, 15 Apr 2021 01:20:06 +0000
This is when I realized that the current way melonDS handles save files was going to be problematic. Basically, if a save file already exists, melonDS will determine the save memory type from that file's size, instead of using its built-in game list. This was designed in the old days, where we had some wonky heuristics instead of the game list -- the basic idea was that if melonDS failed to determine the correct save memory type, you could provide a known good save file and it would work, bypassing the problem.

Obviously, this is also a double-edged sword. If you happen to have a save file that isn't the correct size, melonDS will pick the wrong save memory type, potentially breaking things. In the end, this strategy now seems to cause more problems than it solves, especially since we have the game list.

The NAND thing was the final nail in the coffin of this strategy. It had been assumed, from some source I don't recall, that the save memory size for NAND was 32MB, and this was nice, because each possible save memory size had only one associated type, no confusion. However, the hardware tests I did showed that the NAND save memory is actually 8MB, and this conflicts with one of the possible FLASH sizes.

(EDIT- I was corrected by another tester. The save memory for WarioWare DIY is 16MB, not 8MB.)

So I ripped the old thing out, and instead made it always rely on the game list to determine the save memory type, regardless of the save file's size. As a bonus, save files that are too big will work too, melonDS will just ignore the extra contents. Which means that, for example, DeSmuME's .dsv files could be used as-is, by just changing the extension to .sav.

If you believe this is going to be a major disadvantage, or see any issue with this, I encourage you to reply to this post.]]>
Redesigning the cart interface -- by Arisotura Tue, 13 Apr 2021 10:25:01 +0000
Anyway, this tends to show why it's good to think forward when designing your code. That being said, I need to find a balance with this. I tend to either think forward too much and end up paralyzed by questions that don't mean much, or just write code as it comes to my mind.

The cart interface in melonDS was originally built without much consideration for future. If you're wondering, the cart interface is the part of the emulator that lets emulated software access the emulated cartridge, because on the DS the cart isn't just directly mapped to CPU address space like on older consoles. Instead, there are a bunch of commands you can send to the cart to retrieve various parts of the contents, and different encryption protocols securing it up.

As melonDS became capable enough to run commercial software, emulating the cart interface was a must. So NDSCart.cpp was born. The main component is the NDSCart namespace, which originally emulated the cart interface hardware (basically the DS side) and command responses for a generic cart. There is also NDSCart_SRAM, which emulates the on-cart SPI save memory. A tad hacky, but for most games, it did the job.

But, that's the thing, not all DS carts are the same!

There were already some exceptions for homebrew ROMs, which might want to use the cart interface and, depending how old they are, need a more lax implementation of the generic cart protocol. Namely, retail carts don't let you read addresses lower than 0x8000 via the generic data read command (0xB7), because that region contains the ROM header (read via a different command), the Key1 encryption data and the secure area. However, old homebrew ROMs don't have any of that (save for the oldstyle DS header), and have their ARM9 binary start at 0x200. Newer homebrew ROMs are closer to the layout of a retail cart, mostly due to the added DSi support (the DSi header is 0x1000 bytes instead of 0x200), but, since not everybody is here to rebuild their ROMs, we still need to support the older ROMs.

Homebrew aside, there are also different types of retail carts.

A prime example is Pokémon games. The carts are fitted with a IR transceiver, which is accessed via the save-memory SPI bus. In practice, the first byte of a SPI transfer is a command for the IR transceiver. For now, we know that command 0x08 is some ping command that should reply 0xAA, and command 0x00 is the pass-through command, where any further bytes are forwarded to the save memory. Emulating this is required for Pokémon games to be playable at all. In melonDS, these commands were added to the generic save-memory code. A bit of a hack, since this means these would be 'functional' in any game instead of just Pokémon games, but it did the trick.

But there's more. Games like WarioWare DIY, or Jam with the Band, don't even use the save-memory SPI bus. They have save memory, but it's a NAND memory that is accessed via the same bus as the ROM itself, through a set of specific commands.

And then there's Pokémon Typing Adventure, which comes with its own Bluetooth keyboard. The cart is fitted with a fancypants Bluetooth controller that is accessed via the save-memory SPI bus. That controller can even send IRQs to the DS, via the cart interface's IREQ_MC line.

There's probably even more to this, who knows what sorts of obscure stuff has been produced for the DS.

On the GBA side, things aren't much better. The DS can play GBA games, which is probably not something melonDS will emulate in the near future. But there are several other possibilities here too. You have add-ons that plug into the GBA cart slot, like the Rumble Pak or the Guitar Hero grip thing. Some games can even detect that a specific game is inserted in the GBA slot and unlock features when that is the case.

At the time some GBA slot add-on support was added to melonDS, it became evident that we would need a clean interface system for both NDS and GBA carts, or this would become a large mess.

Thus, the cart_refactor branch was born. Nothing terribly exciting for now, just using OOP to build a proper interface system for NDS carts. When this is done, the same will be done to the GBA interface.

Hell, given how things are going with the Azure CI thing, the refactored interfaces might make it into melonDS 0.9.2.]]>
Release 0.9.2 coming out soon -- by Arisotura Sat, 10 Apr 2021 10:23:28 +0000
We have some cool ideas, too, but these will be for further releases.

Also, in somewhat related news, I'm starting to work on another idea. It's not related to melonDS, but it's related to the DS. For now, this is going to be a surprise, but those who have seen my Twitter lately might figure out what I'm up to. I will make a post once I've got a working prototype.]]>
Status updatezorz! -- by Arisotura Fri, 26 Mar 2021 17:44:50 +0000
First, what's new on my side?

I finished my hearing for the gender marker change thing. You know, so I can get a big fat F on my ID card. You prolly don't care a lot about my trans shenanigans but this means it's one thing out of my way, and we can now proceed to full-speed melonDSing (and hopefully not from a squat, but we're doing our best).

What else is there to say?

I can't keep my focus on one thing aaaaaa

I wanna maaaaaybe try to emulate some new fun shit in melonDS. like the pokémon keyboard thingy.

Wait, no, we need to make DSi emulation better. We can prolly add a file explorer thing, so you can put your DSiware into the thing easily, and idk what other cool features there were. Just suggest them below this post, pretty sure we can get this done together! melonDS will soar through union and friendship!]]>