|
| Home | Downloads | Screenshots | Forums | Source code | RSS | Donate |
| Register | Log in |
| < melonDS 1.1 is out!Hardware renderer progress > |
|
Hardware rendering, the fun Dec 1st 2025, by Arisotura |
|
This whole thing I'm working on gives me flashbacks from blargSNES. The goal and constraints are different, though. We weren't doing upscaling on the 3DS, but also, we had no fragment shaders, so we were much more limited in what we could do. Anyway, these days, I'm waist-deep into OpenGL. I'm determined to go further than my original approach to upscaling, and it's a lot of fun too. I might as well talk more about that approach, and what its limitations are. First, let's talk about how 2D layers are composited on the DS. There are 6 basic layers: BG0, BG1, BG2, BG3, sprites (OBJ) and backdrop. Sprites are pre-rendered and treated as a flat layer (which means you can't blend a sprite with another sprite). Backdrop is a fixed color (entry 0 of the standard palette), which basically fills any space not occupied by another layer. For each pixel, the PPU keeps track of the two topmost layers, based on priority orders. Then, you have the BLDCNT register, which lets you choose a color effect to be applied (blending or fade effects), and the target layers it may apply to. For blending, the "1st target" is the topmost pixel, and the "2nd target" is the pixel underneath. If the layers both pixels belong to are adequately selected in BLDCNT, they will be blended together, using the coefficients in the BLDALPHA register. Fade effects work in a similar fashion, except since they only apply to the topmost pixel, there's no "2nd target". Then you also have the window feature, which can exclude not only individual layers from a given region, but can also disable color effects. There are also a few special cases: semi-transparent sprites, bitmap sprites, and the 3D layer. Those all ignore the color effect and 1st target selections in BLDCNT, as well as the window settings. In melonDS, the 2D renderer renders all layers according to their priority order, and keeps track of the last two values for each pixel: when writing a pixel, the previous value is pushed down to a secondary buffer. This way, at the end, the two buffers can be composited together to form the final video frame. I've talked a bit about how 3D upscaling was done: basically, the 3D layer is replaced with a placeholder. The final compositing step is skipped, and instead, the incomplete buffer is sent to the GPU. There, a compositor shader can sample this buffer and the actual hi-res 3D layer, and finish the work. This requires keeping track of not just the last two values, but the last three values for any given pixel: if a given 3D layer pixel turns out to be fully transparent, we need to be able to composite the pixels underneath "as normal". This approach was good in that it allowed for performant upscaling with minimal modifications to the 2D renderer. However, it was inherently limited in what was doable. It became apparent as I started to work on hi-res display capture. My very crude implementation, built on top of that old approach, worked fine for the simpler cases like dual-screen 3D. However, it was evident that anything more complex wouldn't work. For example, in this post, I showed a render-to-texture demo that uses display capture. I also made a similar demo that renders to a rotating BG layer rather than a 3D cube: And this is what it looks like when upscaled with the old approach: Basically, when detecting that a given layer is going to render a display capture, the renderer replaces it with a placeholder, like for the actual 3D layer. The placeholder values include the coordinates within the source bitmap, and the compositor shader uses them to sample the actual hi-res bitmap. The fatal flaw here is that this calculation doesn't account for the BG layer's rotation. Hence why it looks like shit. Linear interpolation could solve this issue, but it's just one of many problems with this approach. Another big issue was filtering. The basic reason is that when you're applying an upscaling filter to an image, for each given position within the destination image, you're going to be looking at not only the nearest pixel from the source image, but also the surrounding pixels, in an attempt at inferring the missing detail. For example, a bilinear filter works on a 2x2 block of source pixels, while it's 4x4 for a bicubic filter, and as much as 5x5 for xBRZ. In our case, the different graphical layers are smooshed together into a weird 3-layer cake. This makes it a major pain to perform filtering: say you're looking at a source pixel from BG2, you'd want to find neighboring BG2 pixels, but they may be at different levels within the layer cake, or they may just not be part of it at all. All in all, it's a massive pain in the ass to work with. Back in 2020, I had attempted to implement a xBRZ filter as a bit of a demo, to see how it'd work. I had even recorded a video of it on Super Princess Peach, and it was looking pretty decent... but due to the aforementioned issues, there were always weird glitches and other oddball issues, and it was evident that this was stretching beyond the limits of the old renderer approach. The xBRZ filter shader did remain in the melonDS codebase, unused... - So, basically, I started working on a proper hardware-accelerated 2D renderer. As of now, I'm able to decode individual BG layers and sprites to flat textures. The idea is that doing so will simplify filtering a whole lot: instead of having to worry about the original format of the layer, the tiles, the palettes, and so on, it would just be matter of fetching pixels from a flat texture. Here's an example of sprite rendering. They are first pre-rendered to an atlas texture, then they're placed on a hi-res sprite layer. This type of renderer allows for other nifty improvements too: for example, hi-res rotation/scaling. Next up is going to be rendering BG layers to similar hi-res layers. Once it's all done, the layers can be sent to the compositor shader and the job can be finished. I also have to think of provisions to deal with possible mid-frame setup changes. Anyone remember that midframe-OAM-modifying foodie game? There will also be some work on the 3D renderers, to add support for things like render-to-texture, but also possibly adding 3D enhancements such as texture filtering. - I can hear the people already, "why make this with OpenGL, that's old, you should use Vulkan". Yeah, OpenGL is no longer getting updates, but it's a stable and mature API, and it isn't going to be deprecated any time soon. For now, I see no reason to stop using it immediately. However, I'm also reworking the way renderers work in melonDS. Back then, Generic made changes to the system, so he could add different 2D renderers for the Switch port: a version of the software renderer that uses NEON SIMD, and a hardware-accelerated renderer that uses Deko3D. I'm building upon this, but I want to also integrate things better: for example, figuring out a way to couple the 2D and 3D renderers better, and generally a cleaner API. The idea is to also make it easier to implement different renderers. For example, the current OpenGL renderer is made with fast upscaling in mind, but we could have different renderers for mobile platforms (ie. OpenGL ES), that are first and foremost aimed at just being fast. Of course, we could also have a Vulkan renderer, or Direct3D, Metal, whatever you like. |
| 8 comments have been posted. |
| < melonDS 1.1 is out!Hardware renderer progress > |
|
qwertx says: Dec 1st 2025 |
|
this is some insane work and dedication. everyone in the ds community appreciates it |
|
Zyute says: Dec 2nd 2025 |
| Every time a update is posted here I'm blown away by the dedication and creativity that sparks up. While I don't always understand the technical side of each point made its abundantly clear you Arisotura have a deep love and passion that isn't stopped by all the events you have to go through so you have my deepest gratitude for not giving up. Hope your Thanksgiving was filled with good food and fun times. |
|
poudink says: Dec 2nd 2025 |
| I don't think they do thanksgiving in France |
|
Citrodata says: Dec 2nd 2025 |
|
While vulkan is the current new shiny toy. Its also extremly complicated for beginners. Using OpenGL in 2025 is using something you know you can trust runs everywhere. Even the PS3 got Mesa Ported OpenGL. So keep on with the good work your doing. And be proud knowing so many people use your emulator. Thanks for the dedication. |
|
Zyute says: Dec 2nd 2025 |
| Didn't know Arisotura was based in France 😅. Well Thanksgiving or not hope all goes well with work and health. |
|
Arisotura says: Dec 2nd 2025 |
| haha, thank you all! :) |
|
Khaos says: Dec 2nd 2025 |
| I just want to say THANK YOU for your work, MelonDS is the most accurate and coolest emulator I've ever used. The only thing I miss is a shader like the xBRZ one you mentioned, as with the speed and accuracy it emulates at, I'd replay my whole catalog! I'll keep looking forward to your great work! |
|
caffeine addict says: Dec 9th 2025 |
|
this sounds corny, but you and other emulator devs like you actually gives me hope in humanity. its beautiful to see brilliant people putting their time and energy towards a project like this. you've created a project that lets people like me re-live and continue to enjoy games from their childhoods. I'm gonna donate 15 bucks tonight. and everyone else in this thread should at least donate 5. arisotura deserves at least a cup of coffee for the incredible work she's doing. |