Introducing the compute renderer
To make things short, I've been working on porting melonDS's software rasteriser to run on the GPU via compute shaders. So how is this different to melonDS's existing OpenGL renderer? The OpenGL renderer uses builtin functionality of your GPU to draw triangles. This is of course fast, since it uses hardware specifically made for this, but it has the downside that some things can't be controlled by us, so the behaviour of the DS can't be replicated completely faithfully. On the otherhand this only utilises the programmable parts of the GPU (which means we have full control over them), so it's like the software rasteriser, only it utilises the parallel computing power of GPUs. Ideally it should be eable to be just as accurate as the software rasteriser is.

Why are we doing this in the first place?
  • Enhancements such as higher resolution rendering at reasonable speeds compared to say a software rasteriser, but with less problems than the OpenGL renderer (though problems can never be fully excluded when running games differently than they were intended).
  • Fullspeed emulation of 3D games on Switch and potentially other devices which fit this weird niche where they have slow processors but pretty competent GPUs and good software side support for it.
You might have already heard of parallel-rdp from Themaister which provides a very accurate emulation of the RDP (i.e. that part of the N64 which in the end draws the triangles) running on the GPU. It has been a great inspiration for this project (which means where possible it's basically a clone). So thanks to Themaister for all the ideas and also for answering my questions!

Currently the main part of the work is done (it's already somewhat playable with a lot of games), so it's easier to list what's still missing:
  • Blending
  • Shadows
  • Equal depth testing
  • Antialiasing
  • Highlighting/Toon shading
  • Fog
  • Edgemarking
  • Rearimages
I plan on detailing some technical aspects later. Also I have not forgetten my A tour through melonDS's JIT recompiler "series", so expect to see some more posts by me here sooner or later.
Guest says:
May 8th 2021
>Generic aka RSDuck
Yeah, that would certainly be a good question, I'll test it out once we get there (I have a GT8200 somewhere here, as well as a GTS8400 in a friend's PC (at their house), and a GT9500, so I'll be sure to test this.

There are claims that Citra developers love NVidia graphics, I have no idea about that, however I have seen an example of NVidia graphics working better than anything else, with Pokédex 3D Pro, which was not tested by anyone else at the time, and things were utterly slow, with completely broken artifacts, on my i3-540's (now on i5-4570 and a dying HD6950, hoping to upgrade to a R9 270 as soon as possible), i5-6200U's (broken motherboard, need new heatsink for Radeon dGPU-equipped motherboard), Radeon HD 5850 (dead now), and Radeon HD 6670 (falsified as an HD7570 with 4GB VRAM) (might be corroded now), however had no issues whatsoever on the GT8200, with NVidia's proprietary drivers, so I believe them when they say that NVidia really optimized their OpenGL stack to take advantage of whatever it could supply.
Generic aka RSDuck says:
May 9th 2021
Marv: thanks for the reference, though even quirky blending rules (like on the DS) are relatively easy to implement in the rasterisation method taken from Themaister.

Guest: yeah I think it's really just that Nvidia has the best OpenGL drivers on Windows, while Intel is known to be ok and AMD as meh.
Sorer says:
May 9th 2021
Would the compute renderer make the idea of texture dumping easier?
Seeing as the regular opengl renderer still got issues with textures on games etc.
Generic aka RSDuck says:
May 9th 2021
Sorer: I assume those are two different points.

Regarding texture dumping and replacement, that would totally be possible with the current OpenGL renderer (there even exists a dead prototype of a texture cache for it in the texcache branch which would basically be the base for something like this).

Regarding the other point, what exactly are you refering to? I don't know any game which has problems with the textures itself (i.e. not misaligned texture coordinates, etc.) with the OpenGL renderer. Any more details?
Sorer says:
May 9th 2021
Oops yeah I meant texture misaligned coords and considered it "texture issues" lol.
Rin Tohsaka says:
May 11th 2021
Bit late on this, but just to clarify, the existing CPU-based software renderer isn't going away, right?

As I've found out, early 2010 non-Atom Intel laptops (Westmere, Sandy Bridge) have enough CPU grunt to run melonDS at full speed without much of any issue, but their GPU feature-set on Windows is so lacking that the OpenGL renderer straight-up throws an error.

BTW regarding Nvidia OpenGL, it's my understanding that their Windows driver simply implements a lot of optional extensions that make it faster while AMD sticks strictly to the spec. So for any software that heavily relies on those optional OpenGL extensions, they're just break massively on AMD GPUs on Windows. (Of course, Linux is a completely different story due to mesa having absolutely fantastic kernel-level open source OpenGL support which makes AMD and Intel GPUs the best choice for that platform - my friend wanted to run an eduke32 game on Sandy Bridge graphics via OpenGL that was running at ~1fps in Windows, but in Linux it was an easy 40+ fps using nothing but the kernel-supplied drivers).
poudink says:
May 11th 2021
I don't see any reason for the software renderer to go away.
Generic aka RSDuck says:
May 11th 2021
> Bit late on this, but just to clarify, the existing CPU-based software renderer isn't going away, right?

like poudink said the CPU rasteriser is here to stay.

> BTW regarding Nvidia OpenGL, it's my understanding that their Windows driver simply implements a lot of optional extensions that make it faster while AMD sticks strictly to the spec. So for any software that heavily relies on those optional OpenGL extensions, they're just break massively on AMD GPUs on Windows. (Of course, Linux is a completely different story due to mesa having absolutely fantastic kernel-level open source OpenGL support which makes AMD and Intel GPUs the best choice for that platform - my friend wanted to run an eduke32 game on Sandy Bridge graphics via OpenGL that was running at ~1fps in Windows, but in Linux it was an easy 40+ fps using nothing but the kernel-supplied drivers).

Nvidia's OpenGL Windows driver can be a bit more lenient from time to time and they also do implement some Nvidia specific extension, but that should at most rarely matter for performance. They're speed comes just from the fact that they're better optimised.
Guest says:
May 18th 2021
>Generic aka RSDuck
My testing was conducted on Linux, not Windows, and with an up-to-date kernel and Mesa (for the time), as well as (with NVidia) the last proprietary NVidia 340 drivers (the last that support my GT8200).
Salvalie says:
May 25th 2022
To be honest, I tested melonds from main branch on rK3399, and while not having an amazing gpu, it performed FAR better on soft render compared to x1 opengl desktop profile (3.3). No idea if it's panfrost failure, but the results are shocking. Everytime I try a new open source DS emulator I wish drastic would be open source. I know, probably you are focusing on accuracy rather than speed, but boy... drastic is miles ahead on speed. I also wasn't able to scale up the games with soft render with main branch compilation.
Generic aka RSDuck says:
May 25th 2022
have you enabled the JIT recompiler? Also it's a known thing that the software renderer (which doesn't support non native resolution rendering) usually is on par or faster than the OGL renderer at 1x, panfrost certainly doesn't help though.
salva says:
May 26th 2022
yes, i did enable the JIT, but it doesnt change that much. well, panfrost is a very good performer on certain tasks, specially desktop x98_64 games run decently with opengl desktop profile and box64.. but yeah, not running great at all with melonDS opengl render.

on the benchmarks, specially on glmark2.wayland, I get very decent numbers.. 945 with desktop profile.. i mean, quite decent compared to rpi4. not at discrete gpus level of course.

but yeah, I will ahve to rely on what drastic can delivers.. I specially need it on aarch64 since manjaro doesnt support multiarch.. and well, its mostly dead, no aarch64 releases as standalone emulator.
Post a comment
Name:
DO NOT TOUCH