|Home | Downloads | Screenshots | Forums | Source code | RSS|
Nightmare in viewport street
Jul 5th 2017, by StapleButter
And another bug bites the dust... a quite old one, by the way.
Namely, it was reported two months ago in issue #18: missing character models in Homie Rollerz's character select screen. Actually, if you look closely, you can see they were there, but they got compressed to one scanline at the top, which was caused by a bug in the viewport transform.
In 3D graphics terms, the viewport defines how normalized device coordinates of polygons are transformed to screen coordinates, which can be used to render the polygons.
Most games specify a standard fullscreen viewport, but there are games that pull tricks. Homie Rollerz is one of them, the character select screen uses a 'bad' viewport. But, unlike high-level graphics APIs, the DS has no concept of bad viewport. You can input whatever viewport coordinates, it doesn't reject them or correct them.
So how does it behave? Well, if you've been following this, you surely know that the DS looks normal and friendly on the surface, but if you look deeper, you find out that everything is weird and quirky. Viewports are no exception.
That's why the bug stayed unfixed for so long. GBAtek doesn't document these cases, so it's a matter of running hardware tests, observing results, and doing it again and again until you figure out the logic.
For example, here's a test: the viewport is set to range from 64,0 to 128,192. Nothing special there.
Now, we change the viewport to range from 192,0 to 128,192, which results in a width of -64 (which, by the way, OpenGL would reject). One would say that such a viewport results in graphics getting mirrored, like this:
It looks correct, but that's not what the hardware outputs. In reality, it looks more like this:
The GPU doesn't support negative screen coordinates or viewport sizes, so they wrap around to large positive values. The range for X coordinates is 0-511, so for example trying to specify a viewport width of -1 results in a 511 pixel wide viewport. Such values are likely to cause the viewport transform calculations to overflow, causing coordinates to wrap around in a similar fashion. This is what happens in the example above -- the green vertex actually goes above 511.
Y coordinates work in the same way but with a range of 0-255. However, one has to take into account the fact that they're reversed -- the Y axis for 3D graphics goes upwards, but the 3D scene is rendered from top to bottom. The viewport is also specified upside-down. Different ways of reversing the screen coordinates give slightly different results, so I had to take a while to figure out the proper way. Y coordinates are reversed before the viewport transform, by inverting their sign. Viewport Y coordinates are reversed too, as such:
finalY0 = 191 - Y1;
finalY1 = 191 - Y0;
Past that, there are no special cases -- the viewport transform is the same, no matter the viewport coordinates and vertices. The only special case is that polygons with any Y coordinate greater than 192 aren't rendered (but they are still stored in vertex/polygon RAM). This is probably because the hardware can't figure out whether the coordinate should have been negative (as said before, it doesn't support negative screen coordinates), and thus can't guess what the correct result should be, so it's taking the easy way out. But this is a bit odd considering it has no problem with X coordinates going beyond 256.
With that, I'm not sure how coordinates should be scaled should 3D upscaling be implemented. But, I have a few ideas. We'll see.
But regardless, after revising viewport handling according to these findings, the Homie Rollerz character select screen Just Works™, and I haven't observed any regression, so it's all good.
Also, release 0.4 is for very soon.
|4 comments (last by huckleberrypie) | Post a comment|
To make it clear
Jun 21st 2017, by StapleButter
There is no Patreon version of melonDS. There will never be such a version.
Donating doesn't entitle you to anything. Not donating doesn't make you miss out on anything. You get the same package in all cases.
(this post used to be directed towards a particular news post, but it has since been corrected, so I will leave this here as a general note)
While it's nice that there are emu sites spreading the news, it is really better when they do some basic research and fact checking instead of posting mere assumptions that only cause confusion.
|3 comments (last by Simply) | Post a comment|
Opening to the outer world
Jun 20th 2017, by StapleButter
If you have followed melonDS from the beginning, you'd know that wifi was one of the goals. And well, it's getting there.
The first melonDS release, 0.1, already included some wifi code, but it was a very minimalistic stub. The point was merely to allow games to get past wifi initialization successfully. And it even failed at that due to a bug.
Chocolate waffles to you if you can locate the bug, by the way ;)
But well, at that stage, the focus wasn't much on wifi.
It was eventually fixed in 0.2, and some functionality was added, but it still didn't do much at all. Games finally got past wifi initialization, but that was about it.
It wasn't until 0.3 that some serious work was done. With the emulator core getting more robust, I could try going for the wifi quest again. Not that 0.3 went far at all -- it merely allowed players to see eachother, but it wasn't possible to actually connect. But it was something, and infrastructure for sending and receiving packets was in place and working, as well as a good chunk of the wifi hardware functionality.
You may already know how it went back in the DeSmuME days. As far as local multiplayer was concerned, I kept hitting a wall. Couldn't get it working, no matter how hard I tried. WFC is a separate issue.
It didn't help drive motivation knowing that my work was doomed to stay locked behind a permanent EXPERIMENTAL_WIFI wall, requiring a custom wifi-enabled build, and that the DeSmuME team's public attitude is to sweep wifi under the carpet and pretend it doesn't exist, but the main issue was the lack of documentation as far as local multiplayer is concerned.
The DS wifi hardware isn't a simple, standard transceiver. It is a custom, proprietary wifi system made by Nintendo for their purposes. It has special features to assist local multiplay communication at a fast rate.
GBAtek documents the more regular wifi features well enough to get things like WFC working, but the specific multiplay features weren't well known until now. Many packet captures have been made, several multiplay protocols like Pictochat and download play have been reverse-engineered, but details on hardware operation were at a loss. The hardware operates in a very specific way during multiplay, and there's some real clever design there.
As the DS doesn't support ad-hoc communication, multiplay follows a host-client scheme. Clients can't directly communicate between themselves, everything goes through the host. Who is the host depends on the game -- it is generally the person who starts the multiplayer game.
The host sends beacon frames at a regular interval to advertise the game to potential clients, pretty much like how access points send beacons to advertise their presence. The DS wifi hardware allows to automatically send beacons at a set interval, but there's nothing too special there.
When a client connects, the regular 802.11 association procedure is followed. The client sends an auth frame, to which the host replies with an auth frame, then the client sends an association request, and the host replies with an association response which gives the client its ID.
At this point, multiplay communication begins, and this is where the real meat is.
The host sends data frames using a special TX slot (which GBAtek names CMD). It doesn't simply send the data, it actually assists the whole exchange.
The host data frame has a bitmask telling which clients should reply. Each client is given an ID from 1 to 15.
So, after the data frame is sent, the hardware waits for those clients to reply. It uses said bitmask, as well as a register telling how long a client reply should last, to determine how long the wait should last. It can report when clients fail to reply within their window (assuming clients reply in order).
The client doesn't have much control over the process. It has a register that tells where its reply frame is in memory, and a register that holds its client ID, and that's about it. When receiving a data frame, the hardware checks its client bitmask against the client ID register to determine whether it should reply. The data frame also contains the client reply time. With these data, the client can determine when it should send its reply, and do so automatically. This happens even if no reply frame address is set -- in that case, it sends a default empty reply.
When all this is done, the host sends an automatic acknowledgement frame to the connected clients. This frame is different from the standard 802.11 ack in that it is received like a regular frame and processed by the clients. The standard ack isn't exposed to software.
If any replies are missing, the host will attempt to repeat the whole process (maybe altering the client bitmask so the clients that replied successfully aren't polled again, I haven't checked), if there is enough time left -- the CMDCOUNT timer defines a window during which the transfer is possible. When sending packets, it needs to wait until noone else is sending things, crosstalk isn't a good idea. Regular TX slots seem to keep waiting until the channel is free, but the CMD slot can abort if it has waited for too long.
All in all, a neat piece of hardware. You can guess that missing details on all this doesn't help getting multiplay working, especially as there are various status bits that get set here and there during the process, some of which are crucial.
So I set to work. I managed to replicate multiplay communication between two DS's with homebrew, and used it to work out the details. This, coupled with reverse-engineering of the Pictochat code, allowed to finally get it working in melonDS.
It isn't perfect, but this is definitely progress, as neither DeSmuME nor NO$GBA have gotten this far. As far as Pictochat is concerned, it appears that host->client transfers work fine, but client->host transfers sometimes get partially corrupted or lost entirely (and cause the client to get stuck unable to send).
NSMB MvsL also works, and seems rather stable, even though it can disconnect on you. There is some slowdown which appears to be caused by data loss.
There are also issues stemming from the communication medium used, namely, plain old BSD sockets. With UDP, it's easy and cross-platform, but there are issues. Binding to 127.0.0.1 gave the best results for me with Pictochat under Windows, but I got feedback that it doesn't work under Linux. Binding to all possible addresses works under both Window and Linux, but seemed to be worse for Pictochat, while being better for some other games. So we need to work on this too. BSD sockets have the advantage that they allow playing on separate computers over LAN (in theory -- this hasn't been tested), but maybe we will need faster IPC methods. Multiplayer sends packets at an amazing rate (Pictochat sends every ~4ms), so the communication medium has to keep up with this.
Stay tuned for more exciting reports!
|12 comments (last by lewis28042) | Post a comment|
So there it is, melonDS 0.3
Jun 4th 2017, by StapleButter
So what's new in this version?
A bunch of bugfixes. This version has better compatibility than the previous one.
This includes graphical glitches ranging from UI elements folding on themselves to motion blur filters becoming acid trips, or your TV decoder breaking.
But also, more evil bugs. As stated in previous posts, booting your game from the firmware is no longer a gamble, it should be stable all the time now. An amusing side note on that bug, it has existed since melonDS 0.1, but in that version, the RTC returned a hardcoded time, so the bug would always behave the same (some games worked, others not). melonDS 0.2 started using the system time, which is what introduced the randomness.
The 3D renderer got one-upped too. Now it runs on a separate thread, which gives a pretty nice speed boost on multicore CPUs. This is optional, so if it causes issues or slows things down, you can disable it and the renderer will work mostly like before.
When I had to implement the less interesting aspects of this (controlling the 3D renderer thread), I procrastinated and implemented 3D features instead. Like fog or edge marking. You can see them demonstrated in the screenshots above.
Then I went back and finished the threading. I'm not a big fan of threaded code, but it seems to be completely stable.
However, resetting or loading a new game is still not completely stable, it has a chance of freezing. Oh and the UI still sucks. I plan to finally get at the UI shit for 0.4, and I want to ditch wxWidgets, so I don't really feel like pouring a lot of time into the current UI.
There are more things that went into this release, and I'll let you find out!
Also, here's the melonDS Patreon for those who feel generous.
|3 comments (last by greenDarkness) | Post a comment|
May 24th 2017, by StapleButter
In the previous post, I said I wanted to run the 3D renderer on a separate thread. Well, we're going to see how all that works in detail.
Ever since 3D rendering was added into melonDS, it's been one of the bottlenecks whenever games use it. On the other hand, 2D rendering, while not being very well optimized, doesn't make for a big performance hit.
2D rendering isn't very expensive or difficult though -- the 2D renderers are oldschool tile engines, essentially drawing raster graphics onscreen at the specified coordinates, optionally with a bunch of fancy effects, but nothing too complex. In comparison, the 3D renderer is a full-fledged 3D GPU. It basically turns a bunch of polygons defined in 3D space, into a 2D representation that is then passed to the main 2D renderer and composited mostly like a regular 2D layer.
Transformations by various matrices, culling, clipping, viewport transform, rasterization with perspective-correct interpolation... it's a bunch of work.
The approach originally taken was to render a whole 3D frame upon scanline 215. I'm not sure whether rendering should start upon scanline 214 or 215 (GBAtek says it starts "48 scanlines in advance"), but melonDS starts at 215.
Which basically meant that the emulator had to wait until the whole 3D frame was rendered before doing anything else.
However, in emulation, you can't just throw everything on separate threads. Considering a component, whether you can put it on a separate thread depends on how tightly it is synchronized to other components.
This excellent article from byuu explains well how all this works.
In the case of our DS 3D renderer, threading is feasible because tight synchronization isn't required. Once the renderer starts rendering, you can't alter its state, the polygon and vertex lists are double-buffered and all the other important registers are latched. The only thing you can do is change VRAM mapping, but I have yet to come across a game that pulls stunts like swapping texture banks mid-frame.
So you can put the rendering process on a separate thread, and just tell it when to start rendering, and wait until it's done before telling it to start a new frame.
That's not the approach I tried, though. Rendering starts upon scanline 215, but the 2D renderer needs 3D graphics as soon as scanline 0, which only leaves 48 scanlines worth of time (out of 263) for the 3D thread to do its job. We can do better.
The 2D renderer is scanline-based. So, upon scanline 0, it doesn't need a whole 3D frame, but only the first scanline of it. And similarly for further scanlines. This gives the 3D thread 192 scanlines worth of time, which is a lot better.
It required more work though, as the 3D renderer had to be modified to work per-scanline. But nothing insurmontable. After a while tracking threading bugs, this finally worked, and gave a nice speed boost. Except it caused glitches in Worms 2.
The glitches were because of the "wait for the frame to be finished before starting a new frame" bit. I first did it upon scanline 215, right before signalling the 3D thread to start rendering. Well, normally the renderer would be done by scanline 192 anyway.
But Worms 2 takes advantage of the fact that 3D rendering starts 48 scanlines in advance, which also means it finishes 48 scanlines in advance. So the game unmaps texture banks and starts loading new textures as soon as scanline 146.
Which would cause problems if the 3D thread was still running at that time. When I forced the main thread to wait for the 3D thread to finish upon scanline 144, the glitches went away.
In the end, the threaded 3D renderer gives a nice speed boost. On my laptop, Super Mario 64 DS runs fullspeed now, and Mario Kart DS is close to fullspeed too. And I haven't seen more glitches, so all should be fine.
It still needs a bunch of polish. I want to make it optional, and the 3D thread needs proper start/stop control.
Also noting that there will be no performance gain if you have a single-core CPU (the whole threading apparatus may even reduce performance).
And, of course, there are other targets to optimize too.
Love and waffles~
|4 comments (last by Nintendo Maniac 64) | Post a comment|
Slicing the melons!
May 10th 2017, by StapleButter
So here are the two main goals for melonDS 0.3: threading the 3D renderer, and starting work on wifi connectivity.
The first goal basically aims at running the 3D renderer in parallel with the rest of the emulator. On the hardware, the renderer's state can't be altered while it is rendering, so the timing doesn't have to be precise, and we can use it to our advantage. As the current 3D renderer is a bottleneck, threading it should give a nice speed boost for multi-core CPUs (which are quite the norm nowadays).
The second goal doesn't mean wifi will work, but hey, we need to start somewhere.
Well actually wifi emulation has already been upped a notch compared to 0.2. The wifi RAM and associated registers are functional, as well as most of the timers. What remains to be done is functionality for sending/receiving packets. Power management and specific multiplayer features also need proper investigation. Then we have things like the RF and BB chips, which will likely never be fully emulated since they control very low-level aspects of wifi, like "how much energy should it take to consider we're receiving data".
So where does this get us? I have tested Pictochat and NSMB multiplayer, and in both cases, the host sets up a beacon and attempts to send it regularly, which is a good sign.
(The beacon is a packet regularly sent by wifi access points to advertise their presence. Since the DS doesn't support ad-hoc communication, multiplayer games use a similar scheme to communicate, typically with the first player acting as a host and other players being clients.)
Anyway, don't get too hyped over this, there's nothing too new here. DeSmuME and NO$GBA both get atleast this far if not further.
In the meantime, I've been implementing another obscure feature of the DS: writable VCount.
Old consoles typically have a register that reflects which scanline is being drawn onscreen. There are various names it can be called (LY on the GameBoy, VCOUNT on the GBA/DS), but it's essentially the same thing.
The DS has the particularity that the VCount register can be written to. This feature is typically used to synchronize consoles during multiplay, but there are other possible uses -- ZXDS uses it to bring the refresh rate to 50Hz.
GBAtek doesn't provide a whole lot of details about this and how it works, besides a recommended VCount value range of 202-212. The I/O map listing also hints that VCount is only writable from the ARM7, but that seems to be a copy-paste error, I observed that VCount is writable from both CPUs.
So I set out and investigated it, which gave fun results, as always with the DS.
I immediately tried writing to VCount during active display (scanlines 0-191). And, surprise, it works. The GPU then continues drawing from the updated VCount, but the screens keep going from their old position as the LCD controllers can't skip forward or backward.
It's also worth noting that VCount writes are delayed until the next scanline, which has the advantage that it makes timings reliable.
The effects of VCount writes only last for one frame, things go back to normal after the end of the frame (unless you write to VCount every frame).
Implementing this feature in melonDS was also the perfect occasion to make the framerate limiter suck less. It had to be upgraded anyway in order to support adjusting the framerate cap based on how long the frame lasted. So for example, if you ran ZXDS, melonDS would run at 50 FPS instead of 60. The new framerate limiter is also more accurate, which makes for nearly perfect audio output.
(ZXDS wouldn't run currently due to the lack of DLDI support, but you get the idea)
|1 comment (by Nintendo Maniac 64) | Post a comment|
More fun fixes
Apr 27th 2017, by StapleButter
So as I said in the last post, I've been fixing the firmware boot issues. Basically, when booting from the firmware, there was a high chance that the game would crash upon boot. But the bug and its effects were random. Sometimes it would simply hang on a white screen, sometimes it would jump to an invalid address... and sometimes it would work fine.
So we're going to see how we try to fix a random bug.
First thing to do is finding a way to reproduce the bug consistenly. So, hoping my bug would be affected by the time taken before clicking the health/safety warning screen, I set up a quick hack to automatically click the screen after a fixed number of frames. That didn't cut it, so I made the RTC return a fixed time. Which finally allowed me to reproduce the bug reliably.
I landed on a variation where the ARM9 jumped to an invalid address, which made it easy to find where it was doing that. I then started backtracking, which eventually led me to find out that some data were being copied from the wrong place, but it appeared the bug had more consequences than anticipated, making the backtracking long and tedious.
So instead, I dumped the RAM at the time the game booted, and compared it with a RAM dump from before a direct boot. It appeared that a chunk of code got accidentally erased. The range that got erased was within the cart's secure area, so I checked the code that handled secure area reads, but the bug wasn't there.
So I tracked where that code region was getting erased. The bug appeared, and it was another stupid bug. It turned out to be a DMA from the ARM7 accidentally doing that.
The bug was that during cart transfers, melonDS tried to start DMA for both CPUs. The cart interface can only be enabled for one CPU at a time, and thus DMA should only be checked for that CPU.
The bug resulted from a combination of factors:
1. The ARM7 BIOS loads the cart secure area, using DMA. When it's done, it leaves the DMA enabled(!). So when the ARM9 went to load something else from the cart, it accidentally triggered the ARM7-side DMA.
2. The secure area is made of 4 blocks, which the BIOS loads in a random order. So, if the last secure area block loaded happened to be the last one, the bug wouldn't overwrite the secure area and would only write after it -- which isn't a big deal because the rest of the game binary is loaded later. So in that rare case, the game would boot fine.
Was a fun ride, though.
|3 comments (last by River) | Post a comment|
Fixing Pokémon White
Apr 26th 2017, by StapleButter
This post will get into the bugfixing process for a particular bug. Bugfixing in emulation may look like black magic, but it's not so different from general bugfixing-- it boils down to understanding the bug and why it occurs. Of course, emulation makes it harder when it involves blobs of machine code for which you have no source code, but nothing insurmontable.
Anyway, the issue with Pokémon White (and probably others) was that the game wouldn't boot unless it was launched from the firmware. Not really convenient, especially as at the time of writing this, firmware boot in melonDS is unstable.
I first suspected the RAM setup done prior to a direct boot (NDS::SetupDirectBoot). There are several variables stored in memory by the firmware, which games can use for various purposes. For example, the firmware stores the cartridge chip ID there, games then check the cartridge chip ID against the stored value, and throw an exception if it has changed (which typically means that the cart was ejected).
However, some testing revealed that there was nothing the direct boot setup was missing that could have broken Pokémon, atleast regarding the RAM.
So I had to dig deeper into the issue. It turned out that during initialization, the ARM9 got interrupted by an IRQ, but for some reason, it never returned to its work after processing the IRQ.
DS games often use multiple threads, so it isn't uncommon to switch to a different task after an IRQ. But that wasn't the case here, it wasn't even picking a thread to return to. It got stuck inside the IRQ handler.
The IRQ occured upon receiving data from the IPC FIFO. In particular, the ARM7 sent the word 0x0008000C. The ARM9-side handler was coded to panic and enter an infinite loop upon receiving this word.
More investigation of the FIFO traffic showed that the ARM9 first sent the word 0x0000004D, which is part of the initialization procedure. To which the ARM7 replied with 0x0008000C. But it appeared that the ARM7-side FIFO handler was coded to do that. For a while, that stumped me. I couldn't understand how it was supposed to work.
I then logged FIFO traffic when booting the game from the firmware, whenever it successfully booted, to see where the exchange differed. The ARM9 sent 0x0000004D, to which the ARM7 replied 0x0000004D. But it appeared that in that case, the ARM7 was using a different handler.
I checked the ARM7 code responsible for setting up the FIFO handler. It first set up the 'good' handler, then called a function, then set up the 'bad' handler. Looking at the function inbetween, I noticed that it checked register 0x04000300, which is POSTFLG, and is set to 1 by the firmware.
Hey, what if...
So I quickly checked the direct boot setup, and it wasn't initializing the POSTFLG registers. And, surprise, addressing that fixed the issue, letting Pokémon White boot directly.
Well, not bad. Next up is fixing the issues that plague firmware boot, I guess.
|2 comments (last by Johnny Doe-Eye) | Post a comment|
melonDS 0.2, there it is
Apr 24th 2017, by StapleButter
You can check it out on the downloads page.
I'll let you find out what the novelties are ;)
|No comments yet | Post a comment|
Breaking the silence
Apr 8th 2017, by StapleButter
You may have noticed that there's one thing melonDS 0.1 lacks sorely: sound. It's an integral part of the gaming experience.
It's also one of the big items in the TODO list, so let's do this.
Getting basic sound playback going wasn't too difficult. The DS sound hardware is pretty simple: 16 channels that can play sound data encoded in PCM8, PCM16 or IMA-ADPCM. Channels 8 to 13 also support rectangular waves (PSG), and channels 14 and 15 can produce white noise.
I first worked on PSG as it's pretty simple, there would be less things that could go wrong. And indeed, the issues I got were mostly dumb little things like forgetting to clear intermediate buffers. Once that was working, I completed it with support for the remaining sound formats.
And there it was, sound in melonDS. That thing finally begins resembling a DS emulator :)
The sound core in melonDS is synchronous. The major downside is that it sounds like shit if emulation isn't running fullspeed. But it's been stated that the goal of melonDS is to do things right. There are other reasons why it has to be synchronous, too; the main one is sound capture.
The DS has two sound capture units, which can record audio output and write it to memory. Those are used in some games to apply a surround effect to the audio output, or to do reverb. The idea is to send the mixer output to the capture units instead of outputting directly, then use channels 1 and 3 to output the captured audio data after it's been altered by software.
Setups using sound capture expect that capture buffers will be filled at a fixed interval. This breaks apart if your sound core is asynchronous, because there is no guarantee that it will produce sound at a regular rate. What if you end up producing more sound than the game's buffer can hold? In these situations, the best way is to ignore those effects entirely. Same reason why the old HLE audio implementation of Dolphin didn't support reverb effects.
Anyway, sound output is still in the works, but it's fairly promising so far.
Oh by the way, have I mentioned that there are other DS emulators being worked on? Check out medusa!
|1 comment (by naknow) | Post a comment|