Audio interpolation
Apologies for the slow Summer! We don't have air conditioners in the melonDS HQ. The current climate is causing the team to slowly melt.

Anyway, audio interpolation is one of the emulation improvements that have been requested for melonDS. My general policy for emulation improvements is that they should allow for keeping the accurate code paths, and they shouldn't add too much complexity to the code. Audio interpolation is well within these bounds. Actually, I had implemented it in DeSmuME back then, and due to the way DeSmuME's mixer works, it was quickly done.

So I figured I would give it a try in melonDS.

The basic idea behind audio interpolation is to smooth out the audio samples as they're being upsampled. DS games may have downsampled audio to save on space and bandwidth, and the DS mixer doesn't perform any interpolation, which can lead to rough sounding samples. The reason the DS does no interpolation is most likely due to how its mixer hardware works, but obviously as an emulator we can ignore these constraints and do a better job.

It's also noting that, as far as melonDS is concerned, there are two parts we need to take care of: the DS mixer and the audio output.

In the DS, the mixer is driven by the system clock, like nearly everything else. If you ever coded for the DS, you might have wondered why the frequency registers for the audio channels are weird:

40004x8h - NDS7 - SOUNDxTMR - Sound Channel X Timer Register (W)

 Bit0-15  Timer Value, Sample frequency, timerval=-(33513982Hz/2)/freq

The PSG Duty Cycles are composed of eight "samples", and so, the frequency for Rectangular Wave is 1/8th of the selected sample frequency.
For PSG Noise, the noise frequency is equal to the sample frequency.

The SOUNDxTMR registers directly control the channel timers, which are driven at half the system clock. These work like the general purpose timers: they are incremented at half the system clock, and every time they overflow, they are reloaded to the SOUNDxTMR value and the channel advances to the next sample.

This is a fairly simple and efficient design, but you can probably guess why it doesn't lend itself to interpolation. Basically, to get the sub-sample position you need for interpolation at any given time, you would need to subtract the current timer value from the reload value, then divide that by 0x10000 minus the reload value, which isn't convenient to implement in hardware.

The mixer in melonDS works in a similar way, although it is only sample-accurate, for several reasons: sample accuracy is good enough for DS games, we don't know how the mixer operates on a per-cycle basis, and of course, performance reasons. To reach its sample rate of approximately 32.7 KHz, the DS needs to output one audio sample every 1024 system-clock cycles, and that is how often we run the mixer in melonDS. We have to be a bit smart about updating our channel timers, but it works well enough.

However, this design means the output sample rate of the melonDS core depends on how fast it's running. Basically, melonDS runs 560190 cycles per frame and outputs one audio sample every 1024 cycles, like the real thing. Assuming a framerate of 60 FPS (which is a bit faster than the real thing), this means an audio output rate of 32823.6328125 Hz.

Well, yeah. Generally, you can't go and ask your audio library for a weird non-integer sample rate.

So what do we do, here? Well, early melonDS versions would just pick the closest integer sample rate, send out the audio output as-is, and pray. You guess, it didn't work that well. Not only was it impossible to attain perfect sync, but on some platforms we just could not get a sample rate of 32824 Hz.

Hence, a proper audio output stage was added. It lets us pick a more standard output rate of 48 KHz, lets the audio driver give us another sample rate if that one isn't available, then it resamples melonDS's audio output to match that output rate. The resampler also supports a small margin, which can make up for small variations in framerate.

This resampler would be another point of concern: currently, it upsamples audio with no interpolation, so there's room for improvement here too.

Anyway, I made a quick proof-of-concept in a separate branch. For now, it applies linear interpolation to all channels, and seems to work decently well. A few notes on this:

1. PSG channels are quite muffled. They should not be interpolated, but I'm partly tempted to keep that as a fun option.

2. Linear interpolation is the easiest but certainly not the best. I could implement better algorithms: cosine, cubic, gauss...

3. Of course, the feature would be made optional, and disabled by default.

I might also add an option for interpolation in the resampler, or keep the two tied together for simplicity? Not sure. Noting that interpolation makes things sound smoother but can also muffle sound to an extent. Your input is welcome!
Peduls says:
Jul 23rd 2021
Keep up the good work! Quick question... If I want to download the latest dev build for Windows do I go here
and download the latest "master"?
poudink says:
Jul 23rd 2021
Jul 23rd 2021
I hope they achieve the best audio interpolation! They have a great future! :D
keisui says:
Jul 23rd 2021
this emulator is slowly becoming the only one i need to use with every update , loving melonds and i can definitely see it becoming the best ds emulator out there , amazing work👍
Rayyan says:
Jul 23rd 2021
Peduls: you can also click the badge in the README to get to it quicker.
Zyute says:
Jul 23rd 2021
Im looking forward to see what the team can accomplish. Everyone here is doing a great job!
RinTohsaka says:
Jul 23rd 2021
Just to clarify, are the performance concerns regarding high quality interpolation due to the fast that it needs to be applied to multiple audio channels and not just one?

I ask because something like the very high quality SoX resampler tends to be able to easily do real-time resampling/interpolation even when set to "best" quality, albeit when dealing with simple two-channel audio. But I'm wondering if it's faster, "normal" quality setting isn't something more relevant to our interest performance-wise (with perhaps the "high" quality option being an optional setting?)

Also does melonDS always targeting 48KHz, or does it just target whatever the OS's audio output is configured to? It just seems like it'd be silly to resample/interpolate to 48KHz if the OS is then going to resample to something like 44.1KHz or 96KHz or even 192KHz.
Generic aka RSDuck says:
Jul 23rd 2021
> Also does melonDS always targeting 48KHz, or does it just target whatever the OS's audio output is configured to? It just seems like it'd be silly to resample/interpolate to 48KHz if the OS is then going to resample to something like 44.1KHz or 96KHz or even 192KHz.

we ask for 48 KHz, though in the end we use whatever is prefered:
RinTohsaka says:
Jul 24th 2021
I just did a quick performance test using the SoX resampler on my 4th gen Intel Haswell CPU underclocked to a mere 800MHz and, even set to "best" quality and otherwise default settings in the SoX foobar2000 plugin (Passband @ 95%; Phase response @ 50%), it took only 5 or 6 seconds to resample a 1-minute a custom-made 8-channel 32823Hz 32float LPCM WAV file to 48kHz.

Now I realize the DS actually has 16 channels, but 8 channels was the maximum I was able to get foobar2000 to accept. Nevertheless, the fact that SoX resampled so quickly even on "best" with my 2013-era CPU underclocked to a mere 800MHz means that it should be plenty fast if we wanted to implement a method of optional resampling that is much higher quality than the DS itself was capable of.

And for reference, SoX configured to "normal" quality took only around 3 seconds using the exact same test. Also to clarify, this was purely a single-threaded test (though I'm only on a 2c/2t CPU anyway).
Generic aka RSDuck says:
Jul 24th 2021
well we also have to consider other things, like e.g. we can't buffer too many samples. Also including another library as a dependency has implications.
Rin Tohsaka says:
Jul 25th 2021
Does it need to be included as a library though? I'm no coding guru, but I do know that SoX is completely open source under the LGPL license and therefore AFAIK should be completely lisence-compatible with melonDS's use of GPL, so is there a particular reason the code / algorithm can't just be basically used as-is?

Oh and btw, apparently the foobar2000 SoX plugin uses slightly different quality terminology from the source SoX code base - the source code base uses the quality terminology "High Quality" (equivalent to "Normal") and "VHQ" (equivalent to "Best").
Rin Tohsaka says:
Jul 25th 2021
Errr, what I meant is, rather than being an external library (or even a library that's bundled inside of the program binary or similar), why not just use the code / algorithm and integrate it right into the audio system that melonDS has in the same way the aforementioned sample resampler this blogpost speaks of?

...or perhaps my lack of coding knowledge is showing itself and I'm just spewing nonsense - it's audio stuff that's my expertise after all, not coding nor even emulation. :P
Generic aka RSDuck says:
Jul 25th 2021
it depends. Sometimes it's possible to isolate something from other code bases with little problems. But sometimes things rely heavily on other parts of the code that isolating it would require pulling in half of the other code.
AsPika2219 says:
Jul 26th 2021
Oh yeah! Better audio! I will waiting! 🥳️
Rin Tohsaka says:
Jul 27th 2021
This does remind me though - how would it handle audio if/when the game isn't running at full speed?

Does it stretch + pitch correct, or does it "down-pitch" in a manner similar to mGBA if/when you set the emulator to run at a slower speed (e.g. 55fps)?

I'm mainly asking because I actually really like the down-pitch option as it allows one to run at slightly slower speeds to match the screen refresh without audio quality taking a hit.
Post a comment