Doing the time warp

This looks a lot like another screenshot, from two years ago:

So why am I posting this now? Well the answer is simple, we're going back in time and preparing a new melonDS release that is roughly equivalent to 0.1.


Joke aside, there are some key differences between those screenshots:

* newer one has proper clipping at screen edges
* both lack Z-buffering, but newer one is different, likely because the older one didn't have Y-sorting
* newer one is using OpenGL

So, yeah, that's the long-awaited OpenGL renderer. Like the good ol' software renderer in its time, it's taking its baby steps, and barely beginning to render something, but I'm working on it, so in a while it will become awesome :P

This renderer will aim for reasonable accuracy. As a result, it will require a pretty recent OpenGL version, and compatible hardware. It's set to OpenGL 4.3 currently, but I will adjust the minimum requirement once the renderer is finished.

If needed, I can provide alternate versions of the renderer for lower-end hardware supporting older OpenGL versions, but they will be less accurate. While the software renderer is the 'gold standard', the current OpenGL renderer is a sort of 'minimum standard' to get most games to render correctly. Any lower-spec renderer may render certain games wrong and that will be unlikely to get fixed (or it would be fixed but at the cost of killing performance).


Speaking of the software renderer, I also felt like doing a bit more research towards the holy grail: pixel perfection. I'm not done yet, but I finally have those pesky polygon edge slope functions down. Someday we will get those aging cart tests to pass ;)

But that will be for later. I may also write a post about all the juicy low-level hardware details.


But, back to OpenGL. Might as well explain why the planning phase for this renderer took so long. Although you guess that it's 50% the DS GPU being a pile of quirks and 50% me being a lazy fuck.

The first experiments were made with a compute shader based rasterizer. That way, I could get it perfect, while supporting graphical enhancements. I ended up ditching this solution because the performance wasn't good.

So, back to more standard rendering methods, aka pushing triangles. We won't get to rasterize quads correctly that way, but in most cases, the difference shouldn't matter.

First thing to do is to devise an efficient way to push triangles. This requires straying away from standard rendering methods, especially in how we do texturing and all.

On the DS, a game can choose to change the current texture at any time. Polygon attributes can only be changed before a BEGIN_VTXS command, but that doesn't make it any better. Polygons are sorted by their Y coordinates before rendering, which can completely change their ordering. Basically, there is no guarantee that polygons will be grouped by polygon/texture attributes, and the ordering after Y-sorting must be preserved or you might break things like UIs that rely on it.

This is shitty for our purposes though. If, for the DS, changing polygon/texture attributes is mostly free, you can't say as much about OpenGL (or any desktop graphics API for that matter). You would end up with one draw call per polygon, which isn't really a good thing.

Another thing worth considering is that our window for 3D rendering is not a full frame (16.667ms). On the DS, 3D rendering starts at scanline 215 (or 214?). Rendering any sooner would be a bad idea as the game might still be updating texture VRAM. But, we need 3D graphics as soon as scanline 0 of the next frame, which leaves us only 48 scanlines worth of time to do the rendering.

The software renderer is able to work around this limitation by using threading and per-scanline rendering (pretty much like the real thing, except that one seems to render two scanlines at once), which extends the rendering time frame to 192 scanlines.

OpenGL does not render per-scanline, though. So we can forget about this. However, a possibility would be splitting the frame in four 256x48 chunks. I will study that possibility if performance is an issue -- would have to see how far the extended rendering timeframe can outweigh the extra draw calls. Maybe propose the two rendering methods as options.

Back to pushing triangles, for now. I devised a way to pass polygon/texture attributes to the fragment shader, and render all the polygons in one draw call. Nice and dandy, but we're not out of trouble. This will imply passing the raw DS VRAM to the fragment shader and having it handle all the details of texturing, akin to the TextureLookup() function in the software renderer. No idea about the performance implications of this.

Also, we will have to think of something for shadow polygons, I don't think we can use the regular stencil buffer with this.

Well. I hope this renderer will be compatible with OpenGL ES, with all the tricks it may be pulling, but... we'll see.
Branchus says:
Apr 11th 2019
I have waited for hardware renderer for a long time.
Thank you very much Arisotura
Zinx says:
Apr 11th 2019
Good job.
I hope this is gonna run well on AMD GPU's because their Windows OpenGL driver
is not that good.
Sam says:
Apr 11th 2019
I hope it'll run well on AMD cards
Beansta says:
Apr 11th 2019
It most likely wont run well on AMD cards. OpenGL is most likely being used for portability judging by the comment referring to GLES. Blame AMD for your shitty driver support
Marv says:
Apr 12th 2019
I wonder if an OpenGL extension like ARB_fragment_shader_interlock would come in handy to emulate depth/stencil buffer behaviour and blending ?
Ian says:
Apr 12th 2019
Looked up the specs for the DS. Is it correct it can only do something like 2048 triangles per frame? I mean the poly count is so low you really could do 1 draw call per triangle, although I wouldn't usually recommend that. Supermodel does something like up to 10k draw calls in opengl per frame. The reason its so high is probably for similar reasons. It doesn't sort the polys in software, but the game worlds are aggressively broken up in bounding boxes for culling. But these bounding boxes group together dissimilar polys/texture types etc. State changes on the model3 are practically free, I am sure the DS is similar if its all fixed function H/W. I also do TextureLookup() or textureLod in the shader to do completely custom texturing, including mipmapping. It's cheap, you just have to turn off linear texture filtering otherwise you double sample the pixels.
Marv says:
Apr 13th 2019
Going by previous documentation, the DS GPU is indeed limited 2048 triangles per frame and in ideal conditions can output a total of 122880 triangles per second which corresponds to the hard 2048 limit under 60 Hz. Doing 1 draw call per triangle might not be so bad on desktop GPUs but it would be a massive disaster on mobile systems even with relatively high-end GPUs using OpenGL ES. The driver overhead on most GL ES implementations are horrific and not to mention the sad driver bugs as well.
Generic says:
Apr 13th 2019
Since all attributes of the texture associated with each polygon is known and the textures are quite limited in size, maybe instead of uploading the whole VRAM, upload a texture atlas with textures used during the frame stiched together in an uniform pixel format. Additionally a simple uniform array with the offset and size of each texture inside the atlas is uploaded and indexed using a vertex attribute. Alterntively something like texture arrays or bindless textures( could be used, though the latter would decrease the amount of hardware supported by quite a bit.
poudink says:
Apr 13th 2019
you can have twice that amount (4096 triangles) using multipass rendering, which limits you to 30fps.
MelonMan says:
Apr 14th 2019
Couldn't you just use Mesa3D if you have hardware that doesn't support OpenGL?
poudink says:
Apr 14th 2019
if your hardware doesn't support opengl, then you have a big problem.
Marv says:
Apr 15th 2019
A lot of hardware doesn't officially support desktop OpenGL and they're mobile graphics hardware vendors. You can only get desktop OpenGL on mobile hardware by using open source mesa graphics stack such as freedreno, LIMA, etc like he mentioned. The biggest problem on the graphics side isn't going to be desktop OpenGL support, it's arguably going to be poor driver quality like anyone would especially point out on Android because god forbid they might fail to do even the simplest of things ...
MelonMan says:
Apr 16th 2019
poudink I meant if your hardware doesn't support OpenGL 2.1+ like mine (Intel Graphics 2000).
Anonymous says:
Apr 26th 2019
>Someday we will get those aging cart tests to pass
Just FYI, it looks like an updated aging cart has been found.
Guest says:
Apr 28th 2019
For what it's worth, direct mode appears to be working for me with WiFi under Linux, at least for local multiplayer (did not try AltWiFi), and at least at the beginning (I need to time it right), I had only tested with the Mystery Gift function in Pokémon HeartGold.
Post a comment