Display capture: oh, the fun!
This is going to be a juicy technical post related to what I'm working on at the moment.

Basically, if you've used 3D upscaling in melonDS, you know that there are games where it doesn't work. For example, games that display 3D graphics on both screens: they will flicker between the high-resolution picture and a low-resolution version. Or in other instances, it might just not work at all, and all you get is low-res graphics.

It's linked to the way the DS video hardware works. There are two 2D tile engines, but there is only one 3D engine. The output from that 3D engine is sent to the main 2D engine, where it is treated as BG0. You can also change which screen each 2D engine is connected to. But you can only render 3D graphics on one screen.

So how do those games render 3D graphics on both screens?

This is where display capture comes in. The principle is as follows: you render a 3D scene to the top screen, all while capturing that frame to, say, VRAM bank B. On the next frame, you switch screens and render a 3D scene to the bottom screen. Meanwhile, the top screen will display the previously captured frame from VRAM bank B, and the frame you're rendering will get captured to VRAM bank C. On the next frame, you render 3D graphics to the top screen again, and the bottom screen displays the capture from VRAM bank C. And so on.

This way, you're effectively rendering 3D graphics to both screens, albeit at 30 FPS. This is a typical use case for display capture, but not the only possiblity.

Display capture can receive input from two sources: source A, which is either the main 2D engine output or the raw 3D engine output, and source B, which is either a VRAM bank or the main memory display FIFO. Then you can either select source A, or source B, or blend the two together. The result from this will be written to the selected output VRAM bank. You can also choose to capture the entire screen or a region of it (128x128, 256x64 or 256x128).

All in all, quite an interesting feature. You can use it to do motion blur effects in hardware, or even to render graphics to a texture. Some games even do software processing on captured frames to apply custom effects. It is also used by the aging cart to verify the video hardware: it renders a scene and checksums the captured output.

For example, here's a demo of render-to-texture I just put together, based on the libnds examples:



The way this is done isn't very different from how dual-screen 3D is done.


Anyway, this stuff is very related to what I'm working on, so I'm going to explain a bit how upscaling is done in melonDS.

When I implemented the OpenGL renderer, I first followed the same approach as other emulators: render 3D graphics with OpenGL, read out the framebuffer and send it to the 2D renderer. Simple. Then, in order to support upscaling, I just had to increase the resolution of the 3D framebuffer. To compensate for this, the 2D renderer would push out more pixels.

The issue was that it was suboptimal: if I pushed the scaling factor to 4x, it would get pretty slow. On one hand, in the 2D renderer, pushing out more pixels takes more CPU time. On the other hand, on a PC, reading back from GPU memory is slow. The overhead tends to grow quadratically when you increase the output resolution.

So instead, I went for a different approach. The 2D renderer renders at 256x192, but the 3D layer is replaced with placeholder values. This incomplete framebuffer is then sent to the GPU along with the high-resolution 3D framebuffer, and the two are spliced together. The final high-resolution output can be sent straight to the screen, never leaving GPU memory. This approach is a lot faster than the previous one.

This is what was originally implemented in melonDS 0.8. Since this rendering method bypassed the regular frame presentation logic, it was a bit of a hack - the final compositing step was done straight in the frontend, for example. The renderer in modern melonDS is a somewhat more refined version of this, but the same basic idea remains.

There is also an issue in this: display capture. The initial solution was to downscale the GPU framebuffer to 256x192 and read that back, so it could be stored in the emulated VRAM, "as normal". Due to going through the emulated VRAM, the captured frame has to be at the original resolution. This is why upscaling in melonDS has those issues.

To work around this, one would need to detect when a VRAM bank is being used as a destination for a display capture, and replace it with a high-resolution version in further frames, in the same way was the 3D layer itself. But obviously, it's more complicated than that. There are several issues. For one, the game could still decide to access a captured frame in VRAM (to read it back or to do custom processing), so that needs to be fulfilled. There is also the several different ways a captured frame can be reused: as a bitmap BG layer (BG2 or BG3), as a bunch of bitmap sprites, or even as a texture in 3D graphics. This is kinda why it has been postponed for so long.

There are even more annoying details, if we consider all the possibilities: while an API like OpenGL gives you an identifier for a texture, and you can only use it within its bounds, the DS isn't like that. When you specify a texture on the 3D engine, you're really just giving it a VRAM address. You could technically point it in the middle of a previously captured frame, or before... Tricky to work with. I made a few demos (like the aforementioned render-to-texture demo) to exercise display capture, but the amount of possibilities makes it tricky.


So I'm basically trying to add support for high-resolution display capture.

The first step is to make a separate 2D renderer for OpenGL, which will go with the OpenGL 3D renderers. To remove the GLCompositor wart and the other hacks and integrate the OpenGL rendering functionality more cleanly (and thus, make it easier to implement other renderers in the future, too).

I'm also reworking this compositor to work around the original limitations, and make it easier to splice in high-resolution captured frames. I have a pretty good roadmap as far as the 2D renderer is concerned. For 3D, I'll have to see what I can do...

However, there will be more steps to this. I'm acutely aware of the limitations of the current approach: for example, it doesn't lend itself to applying filters to 2D graphics. I tried in the past, but kept running into issues.

There are several more visual improvements we could add to melonDS - 2D layer/sprite filtering, 3D texture filtering, etc... Thus, the second step of this adventure will be to rework the 2D renderer to do more of the actual rendering work on the GPU. A bit like the hardware renderer I had made for blargSNES a decade ago. This approach would make it easier to apply enhancements to 2D assets or even replace them with better versions entirely, much like user texture packs.

This is not entirely unlike the Deko3D renderer Generic made for the Switch port.

But hey, one step at a time... First, I want to get high-resolution capture working.

There's one detail that Nadia wants to fix before we can release melonDS 1.1. Depending how long this takes, and how long I take, 1.1 might include my improvements too. If not, that will be for 1.2. We'll see.
MatthewW says:
Nov 13th 2025
Glad you're still plugging away... feel like I learned a few things... and forgot a few more :)
Klauserus says:
Nov 13th 2025
That sounds really cool! I’m excited about the new high-resolution display capture feature and all the improvements you’re working on. Hopefully, it’ll be released soon — can’t wait to try it out!
Anonymous says:
Nov 13th 2025
You are just a genuis. melonDS is becoming more accurate and the features like upscaling and filtering are gorgeous. You are just amazing
Post a comment
Name:
DO NOT TOUCH