|Home | Downloads | Screenshots | Forums | Source code | RSS|
Why there is no 32-bit build of melonDS
Jul 25th 2017, by StapleButter
The main reason is that I don't have an incentive to provide 32-bit builds. Most people already have 64-bit OSes.
That being said, melonDS can currently run on 32-bit platforms. It may be less performant, as the 3D renderer does a lot of 64-bit math, but it is still possible.
But if I ever decide to implement a JIT, for example, there will be no 32-bit version of it.
If you're stuck on a 32-bit OS for hardware reasons, your computer will not be fast enough to run melonDS at playable speeds.
melonDS will be optimized, it will run faster, but it will also tend towards more accuracy. So I can't tell how fast it will be in the end. But I highly doubt it will run well on a PC from 2004. Maybe it will, if a JIT is made, but that's not a high priority task.
If you are stuck on such hardware, NO$GBA is a better choice for you. Or NeonDS if you don't mind lacking sound. Or hell, the commonly mentioned method of running DraStic in an Android emulator -- those who bash DeSmuME at every turn claim it's fast.
Truth is, emulating the DS is not a walk in the park. People tend to assume it should be easy to emulate fast because the main CPU is clocked at a measly 66MHz. Let's see:
There are two CPUs. ARM9 and ARM7, 66MHz and 33MHz respectively. Which means you need to keep them more or less in sync. Each time you stop emulating one CPU to go emulate the other (or to emulate any other hardware) impacts performance negatively, but synchronizing too loosely (not enough) can cause games to break. So you need to find the right compromise.
The ARM7 generally handles tasks like audio playback, low-level wifi access, and accessing some other peripherals (power management, firmware FLASH...). All commercial games use the same ARM7 program, because Nintendo never provided another one or allowed game devs to write their own. This means that in theory the ARM7 could be emulated in HLE. In practice, this has never been attempted, unless DraStic happens to do it. It's also worth noting that it would be incompatible with homebrew, since they don't use Nintendo's ARM7 program.
If it is possible to get ARM7 timings reasonably accurate without too much effort, the ARM9 is another deal. It is clocked at 66MHz, but the bus speed is 33MHz, so the CPU needs to adjust to that when accessing external memory. Accesses to main RAM are slow, moreso than on the ARM7, due to what appears to be bad hardware design. But the ARM9 has caches which can attenuate the problem (provided the program is built right). When the caches are used, timings can vary dramatically between cache hits and cache misses.
So emulating ARM9 timings is a choice between two unappealing options: 1) emulating the whole memory protection unit and caches, or 2) staying grossly inaccurate. I went with the second option with melonDS, but I'm considering attempting MPU/cache emulation to see how much it would affect performance.
Noting that RaymanDS is an example of timing-sensitive game: when timings are bad enough, it will start doing weird things. Effects get worse as timings are worse, and can range from text letters occasionally jumping out of place to travellings going haywire. I have observed polygons jumping out in melonDS, so the timings aren't good enough for this game.
And then you have all sorts of hardware attached to the CPUs. Timers, DMA channels, or the video hardware-- oldschool 2D tile engines and a primitive, quirky custom 3D GPU. The 2D GPUs need to be emulated atleast once per scanline as games can do scanline effects by changing registers midframe (it is less common than on the SNES for example, but it's still a thing). The 3D GPU is another deal: geometry is submitted to it by feeding commands to a FIFO. You need to take care of running commands often enough to avoid the FIFO getting full, especially as games often use DMA to fill it.
Oh by the way, the 2D GPUs are actually pretty complex, supporting a variety of BG modes and sizes, multiple ways to access graphics for sprites, and so on. The 3D GPU isn't any better, I think we have already established that it's a pile of quirks.
So yeah, with the sheer amount of hardware that must be emulated, the DS isn't a piece of cake.
|5 comments (last by StapleButter) | Post a comment|
melonDS 0.4 -- It's here, finally!
Jul 16th 2017, by StapleButter
melonDS 0.4 was long awaited, and finally, it's here!
So here's a quick rundown of the changes since 0.3. I'm keeping the best for the end.
The infamous boxtest bug that plagued several games has finally been fixed. The bug generally resulted in missing graphics.
The boxtest feature of the DS lets you determine whether a given box is within the view volume. Several games use it to avoid submitting geometry that would end up completely offscreen. A prime example would be Nanostray, which uses it for everything, including 2D menu elements that are always visible.
Technically, you send XYZ coordinates and sizes to the GPU, which calculates the box vertices from that. The box faces are then transformed and clipped like regular geometry, and the test returns true if any geometry makes it through the process (which would mean that it appears onscreen). This also means that the result will be false if the view volume is entirely contained within the box.
I had no idea where the bug was, as melonDS did all that correctly, and some tests with the libnds BoxTest sample revealed nothing. It turned out that the issue lied within the initial calculation of the box coordinates. When melonDS calculated, say, "X + width", it did so with 32-bit precision. However, the hardware calculates it with 16-bit precision, so if the result overflows, it just gets truncated. And, surprise, games tend to rely on that behavior. Getting it wrong means you end up testing a different box, and getting different results. Hence the bug.
There have been various other improvements to the 3D renderer. Things have been revised to be closer to the hardware.
As a result, the Pokémon games look nicer, they don't have those random black dots/lines all over the place anymore. The horrendous Z-fighting observed in Black/White is also gone.
Other games that suffered from random dots/lines around things, should be fixed too.
As well as things like wrong layering in UI screens rendered with 3D polygons.
The 2D renderer got its share of fixes too. Mainly related to bitmap BGs, but also a silly copypaste bug where rotscaled bitmap sprites didn't show up.
But there have also been improvements to an obscure feature: the main memory display FIFO. If you happen to remember:
The display FIFO is another obscure feature -- I don't know of any retail game that uses it, but I have yet to be surprised.
And, when debugging rendering issues in Splinter Cell, I have been surprised -- that game does use the display FIFO when using the thermal/night vision modes.
The issue in that game was unrelated to that feature (it was bad ordering of rendering and capture that caused flickering), but it was a good occasion to improve display FIFO emulation. It atleast allowed me to finally axe the last DMA hack as the associated DMA now works like on hardware. The FIFO is also sampled to a scanline-sized buffer with the same level of accuracy.
This doesn't matter when the FIFO is fed via DMA, but it enables some creative use of the feature -- for example, you can write 16 pixels to the FIFO to render a vertical stripe pattern on the whole screen. Or you will get bad results should you try to use it the wrong way. All of which is similar to what happens on hardware.
Pushing for accuracy, the last few big things that magically happened instantly now have their proper delays emulated, namely SPI transfers (including backup memory SPI) and hardware division/sqrt.
Firmware write was implemented, meaning you can change the firmware settings from the firmware itself (to run the firmware with no game: System -> Run). A backup of your firmware image will be made should anythig go wrong.
And, last but not least: working wifi multiplayer. It was already spoiled, but regardless, it's the first time DS emulation gets this far. And it doesn't require an unofficial build! It's there and ready to use.
It's not perfect though. But it's a start. Pictochat, NSMB and Pokémon are known to be working, but you might encounter spurious disconnects (or, more likely, lag). Mario Kart and Rayman RR2 refuse to work.
I added a setting ("Wifi: bind socket to any address") which you can play with to try making wifi work better. It can make things better or worse. Try it out if needed. Leaving it unchecked should be optimal, but I was told that it doesn't work under Linux.
With all this, no work was done on the UI, despite what I had stated. My apologies for this. But the UI will be worked on, I promise!
You can find melonDS 0.4 on the downloads page.
Patreon for melonDS if you're feeling generous.
|11 comments (last by AsPika2219) | Post a comment|
Nightmare in viewport street
Jul 5th 2017, by StapleButter
And another bug bites the dust... a quite old one, by the way.
Namely, it was reported two months ago in issue #18: missing character models in Homie Rollerz's character select screen. Actually, if you look closely, you can see they were there, but they got compressed to one scanline at the top, which was caused by a bug in the viewport transform.
In 3D graphics terms, the viewport defines how normalized device coordinates of polygons are transformed to screen coordinates, which can be used to render the polygons.
Most games specify a standard fullscreen viewport, but there are games that pull tricks. Homie Rollerz is one of them, the character select screen uses a 'bad' viewport. But, unlike high-level graphics APIs, the DS has no concept of bad viewport. You can input whatever viewport coordinates, it doesn't reject them or correct them.
So how does it behave? Well, if you've been following this, you surely know that the DS looks normal and friendly on the surface, but if you look deeper, you find out that everything is weird and quirky. Viewports are no exception.
That's why the bug stayed unfixed for so long. GBAtek doesn't document these cases, so it's a matter of running hardware tests, observing results, and doing it again and again until you figure out the logic.
For example, here's a test: the viewport is set to range from 64,0 to 128,192. Nothing special there.
Now, we change the viewport to range from 192,0 to 128,192, which results in a width of -64 (which, by the way, OpenGL would reject). One would say that such a viewport results in graphics getting mirrored, like this:
It looks correct, but that's not what the hardware outputs. In reality, it looks more like this:
The GPU doesn't support negative screen coordinates or viewport sizes, so they wrap around to large positive values. The range for X coordinates is 0-511, so for example trying to specify a viewport width of -1 results in a 511 pixel wide viewport. Such values are likely to cause the viewport transform calculations to overflow, causing coordinates to wrap around in a similar fashion. This is what happens in the example above -- the green vertex actually goes above 511.
Y coordinates work in the same way but with a range of 0-255. However, one has to take into account the fact that they're reversed -- the Y axis for 3D graphics goes upwards, but the 3D scene is rendered from top to bottom. The viewport is also specified upside-down. Different ways of reversing the screen coordinates give slightly different results, so I had to take a while to figure out the proper way. Y coordinates are reversed before the viewport transform, by inverting their sign. Viewport Y coordinates are reversed too, as such:
finalY0 = 191 - Y1;
finalY1 = 191 - Y0;
Past that, there are no special cases -- the viewport transform is the same, no matter the viewport coordinates and vertices. The only special case is that polygons with any Y coordinate greater than 192 aren't rendered (but they are still stored in vertex/polygon RAM). This is probably because the hardware can't figure out whether the coordinate should have been negative (as said before, it doesn't support negative screen coordinates), and thus can't guess what the correct result should be, so it's taking the easy way out. But this is a bit odd considering it has no problem with X coordinates going beyond 256.
With that, I'm not sure how coordinates should be scaled should 3D upscaling be implemented. But, I have a few ideas. We'll see.
But regardless, after revising viewport handling according to these findings, the Homie Rollerz character select screen Just Works™, and I haven't observed any regression, so it's all good.
Also, release 0.4 is for very soon.
|4 comments (last by huckleberrypie) | Post a comment|
To make it clear
Jun 21st 2017, by StapleButter
There is no Patreon version of melonDS. There will never be such a version.
Donating doesn't entitle you to anything. Not donating doesn't make you miss out on anything. You get the same package in all cases.
(this post used to be directed towards a particular news post, but it has since been corrected, so I will leave this here as a general note)
While it's nice that there are emu sites spreading the news, it is really better when they do some basic research and fact checking instead of posting mere assumptions that only cause confusion.
|3 comments (last by Simply) | Post a comment|
Opening to the outer world
Jun 20th 2017, by StapleButter
If you have followed melonDS from the beginning, you'd know that wifi was one of the goals. And well, it's getting there.
The first melonDS release, 0.1, already included some wifi code, but it was a very minimalistic stub. The point was merely to allow games to get past wifi initialization successfully. And it even failed at that due to a bug.
Chocolate waffles to you if you can locate the bug, by the way ;)
But well, at that stage, the focus wasn't much on wifi.
It was eventually fixed in 0.2, and some functionality was added, but it still didn't do much at all. Games finally got past wifi initialization, but that was about it.
It wasn't until 0.3 that some serious work was done. With the emulator core getting more robust, I could try going for the wifi quest again. Not that 0.3 went far at all -- it merely allowed players to see eachother, but it wasn't possible to actually connect. But it was something, and infrastructure for sending and receiving packets was in place and working, as well as a good chunk of the wifi hardware functionality.
You may already know how it went back in the DeSmuME days. As far as local multiplayer was concerned, I kept hitting a wall. Couldn't get it working, no matter how hard I tried. WFC is a separate issue.
It didn't help drive motivation knowing that my work was doomed to stay locked behind a permanent EXPERIMENTAL_WIFI wall, requiring a custom wifi-enabled build, and that the DeSmuME team's public attitude is to sweep wifi under the carpet and pretend it doesn't exist, but the main issue was the lack of documentation as far as local multiplayer is concerned.
The DS wifi hardware isn't a simple, standard transceiver. It is a custom, proprietary wifi system made by Nintendo for their purposes. It has special features to assist local multiplay communication at a fast rate.
GBAtek documents the more regular wifi features well enough to get things like WFC working, but the specific multiplay features weren't well known until now. Many packet captures have been made, several multiplay protocols like Pictochat and download play have been reverse-engineered, but details on hardware operation were at a loss. The hardware operates in a very specific way during multiplay, and there's some real clever design there.
As the DS doesn't support ad-hoc communication, multiplay follows a host-client scheme. Clients can't directly communicate between themselves, everything goes through the host. Who is the host depends on the game -- it is generally the person who starts the multiplayer game.
The host sends beacon frames at a regular interval to advertise the game to potential clients, pretty much like how access points send beacons to advertise their presence. The DS wifi hardware allows to automatically send beacons at a set interval, but there's nothing too special there.
When a client connects, the regular 802.11 association procedure is followed. The client sends an auth frame, to which the host replies with an auth frame, then the client sends an association request, and the host replies with an association response which gives the client its ID.
At this point, multiplay communication begins, and this is where the real meat is.
The host sends data frames using a special TX slot (which GBAtek names CMD). It doesn't simply send the data, it actually assists the whole exchange.
The host data frame has a bitmask telling which clients should reply. Each client is given an ID from 1 to 15.
So, after the data frame is sent, the hardware waits for those clients to reply. It uses said bitmask, as well as a register telling how long a client reply should last, to determine how long the wait should last. It can report when clients fail to reply within their window (assuming clients reply in order).
The client doesn't have much control over the process. It has a register that tells where its reply frame is in memory, and a register that holds its client ID, and that's about it. When receiving a data frame, the hardware checks its client bitmask against the client ID register to determine whether it should reply. The data frame also contains the client reply time. With these data, the client can determine when it should send its reply, and do so automatically. This happens even if no reply frame address is set -- in that case, it sends a default empty reply.
When all this is done, the host sends an automatic acknowledgement frame to the connected clients. This frame is different from the standard 802.11 ack in that it is received like a regular frame and processed by the clients. The standard ack isn't exposed to software.
If any replies are missing, the host will attempt to repeat the whole process (maybe altering the client bitmask so the clients that replied successfully aren't polled again, I haven't checked), if there is enough time left -- the CMDCOUNT timer defines a window during which the transfer is possible. When sending packets, it needs to wait until noone else is sending things, crosstalk isn't a good idea. Regular TX slots seem to keep waiting until the channel is free, but the CMD slot can abort if it has waited for too long.
All in all, a neat piece of hardware. You can guess that missing details on all this doesn't help getting multiplay working, especially as there are various status bits that get set here and there during the process, some of which are crucial.
So I set to work. I managed to replicate multiplay communication between two DS's with homebrew, and used it to work out the details. This, coupled with reverse-engineering of the Pictochat code, allowed to finally get it working in melonDS.
It isn't perfect, but this is definitely progress, as neither DeSmuME nor NO$GBA have gotten this far. As far as Pictochat is concerned, it appears that host->client transfers work fine, but client->host transfers sometimes get partially corrupted or lost entirely (and cause the client to get stuck unable to send).
NSMB MvsL also works, and seems rather stable, even though it can disconnect on you. There is some slowdown which appears to be caused by data loss.
There are also issues stemming from the communication medium used, namely, plain old BSD sockets. With UDP, it's easy and cross-platform, but there are issues. Binding to 127.0.0.1 gave the best results for me with Pictochat under Windows, but I got feedback that it doesn't work under Linux. Binding to all possible addresses works under both Window and Linux, but seemed to be worse for Pictochat, while being better for some other games. So we need to work on this too. BSD sockets have the advantage that they allow playing on separate computers over LAN (in theory -- this hasn't been tested), but maybe we will need faster IPC methods. Multiplayer sends packets at an amazing rate (Pictochat sends every ~4ms), so the communication medium has to keep up with this.
Stay tuned for more exciting reports!
|12 comments (last by lewis28042) | Post a comment|
So there it is, melonDS 0.3
Jun 4th 2017, by StapleButter
So what's new in this version?
A bunch of bugfixes. This version has better compatibility than the previous one.
This includes graphical glitches ranging from UI elements folding on themselves to motion blur filters becoming acid trips, or your TV decoder breaking.
But also, more evil bugs. As stated in previous posts, booting your game from the firmware is no longer a gamble, it should be stable all the time now. An amusing side note on that bug, it has existed since melonDS 0.1, but in that version, the RTC returned a hardcoded time, so the bug would always behave the same (some games worked, others not). melonDS 0.2 started using the system time, which is what introduced the randomness.
The 3D renderer got one-upped too. Now it runs on a separate thread, which gives a pretty nice speed boost on multicore CPUs. This is optional, so if it causes issues or slows things down, you can disable it and the renderer will work mostly like before.
When I had to implement the less interesting aspects of this (controlling the 3D renderer thread), I procrastinated and implemented 3D features instead. Like fog or edge marking. You can see them demonstrated in the screenshots above.
Then I went back and finished the threading. I'm not a big fan of threaded code, but it seems to be completely stable.
However, resetting or loading a new game is still not completely stable, it has a chance of freezing. Oh and the UI still sucks. I plan to finally get at the UI shit for 0.4, and I want to ditch wxWidgets, so I don't really feel like pouring a lot of time into the current UI.
There are more things that went into this release, and I'll let you find out!
Also, here's the melonDS Patreon for those who feel generous.
|3 comments (last by greenDarkness) | Post a comment|
May 24th 2017, by StapleButter
In the previous post, I said I wanted to run the 3D renderer on a separate thread. Well, we're going to see how all that works in detail.
Ever since 3D rendering was added into melonDS, it's been one of the bottlenecks whenever games use it. On the other hand, 2D rendering, while not being very well optimized, doesn't make for a big performance hit.
2D rendering isn't very expensive or difficult though -- the 2D renderers are oldschool tile engines, essentially drawing raster graphics onscreen at the specified coordinates, optionally with a bunch of fancy effects, but nothing too complex. In comparison, the 3D renderer is a full-fledged 3D GPU. It basically turns a bunch of polygons defined in 3D space, into a 2D representation that is then passed to the main 2D renderer and composited mostly like a regular 2D layer.
Transformations by various matrices, culling, clipping, viewport transform, rasterization with perspective-correct interpolation... it's a bunch of work.
The approach originally taken was to render a whole 3D frame upon scanline 215. I'm not sure whether rendering should start upon scanline 214 or 215 (GBAtek says it starts "48 scanlines in advance"), but melonDS starts at 215.
Which basically meant that the emulator had to wait until the whole 3D frame was rendered before doing anything else.
However, in emulation, you can't just throw everything on separate threads. Considering a component, whether you can put it on a separate thread depends on how tightly it is synchronized to other components.
This excellent article from byuu explains well how all this works.
In the case of our DS 3D renderer, threading is feasible because tight synchronization isn't required. Once the renderer starts rendering, you can't alter its state, the polygon and vertex lists are double-buffered and all the other important registers are latched. The only thing you can do is change VRAM mapping, but I have yet to come across a game that pulls stunts like swapping texture banks mid-frame.
So you can put the rendering process on a separate thread, and just tell it when to start rendering, and wait until it's done before telling it to start a new frame.
That's not the approach I tried, though. Rendering starts upon scanline 215, but the 2D renderer needs 3D graphics as soon as scanline 0, which only leaves 48 scanlines worth of time (out of 263) for the 3D thread to do its job. We can do better.
The 2D renderer is scanline-based. So, upon scanline 0, it doesn't need a whole 3D frame, but only the first scanline of it. And similarly for further scanlines. This gives the 3D thread 192 scanlines worth of time, which is a lot better.
It required more work though, as the 3D renderer had to be modified to work per-scanline. But nothing insurmontable. After a while tracking threading bugs, this finally worked, and gave a nice speed boost. Except it caused glitches in Worms 2.
The glitches were because of the "wait for the frame to be finished before starting a new frame" bit. I first did it upon scanline 215, right before signalling the 3D thread to start rendering. Well, normally the renderer would be done by scanline 192 anyway.
But Worms 2 takes advantage of the fact that 3D rendering starts 48 scanlines in advance, which also means it finishes 48 scanlines in advance. So the game unmaps texture banks and starts loading new textures as soon as scanline 146.
Which would cause problems if the 3D thread was still running at that time. When I forced the main thread to wait for the 3D thread to finish upon scanline 144, the glitches went away.
In the end, the threaded 3D renderer gives a nice speed boost. On my laptop, Super Mario 64 DS runs fullspeed now, and Mario Kart DS is close to fullspeed too. And I haven't seen more glitches, so all should be fine.
It still needs a bunch of polish. I want to make it optional, and the 3D thread needs proper start/stop control.
Also noting that there will be no performance gain if you have a single-core CPU (the whole threading apparatus may even reduce performance).
And, of course, there are other targets to optimize too.
Love and waffles~
|4 comments (last by Nintendo Maniac 64) | Post a comment|
Slicing the melons!
May 10th 2017, by StapleButter
So here are the two main goals for melonDS 0.3: threading the 3D renderer, and starting work on wifi connectivity.
The first goal basically aims at running the 3D renderer in parallel with the rest of the emulator. On the hardware, the renderer's state can't be altered while it is rendering, so the timing doesn't have to be precise, and we can use it to our advantage. As the current 3D renderer is a bottleneck, threading it should give a nice speed boost for multi-core CPUs (which are quite the norm nowadays).
The second goal doesn't mean wifi will work, but hey, we need to start somewhere.
Well actually wifi emulation has already been upped a notch compared to 0.2. The wifi RAM and associated registers are functional, as well as most of the timers. What remains to be done is functionality for sending/receiving packets. Power management and specific multiplayer features also need proper investigation. Then we have things like the RF and BB chips, which will likely never be fully emulated since they control very low-level aspects of wifi, like "how much energy should it take to consider we're receiving data".
So where does this get us? I have tested Pictochat and NSMB multiplayer, and in both cases, the host sets up a beacon and attempts to send it regularly, which is a good sign.
(The beacon is a packet regularly sent by wifi access points to advertise their presence. Since the DS doesn't support ad-hoc communication, multiplayer games use a similar scheme to communicate, typically with the first player acting as a host and other players being clients.)
Anyway, don't get too hyped over this, there's nothing too new here. DeSmuME and NO$GBA both get atleast this far if not further.
In the meantime, I've been implementing another obscure feature of the DS: writable VCount.
Old consoles typically have a register that reflects which scanline is being drawn onscreen. There are various names it can be called (LY on the GameBoy, VCOUNT on the GBA/DS), but it's essentially the same thing.
The DS has the particularity that the VCount register can be written to. This feature is typically used to synchronize consoles during multiplay, but there are other possible uses -- ZXDS uses it to bring the refresh rate to 50Hz.
GBAtek doesn't provide a whole lot of details about this and how it works, besides a recommended VCount value range of 202-212. The I/O map listing also hints that VCount is only writable from the ARM7, but that seems to be a copy-paste error, I observed that VCount is writable from both CPUs.
So I set out and investigated it, which gave fun results, as always with the DS.
I immediately tried writing to VCount during active display (scanlines 0-191). And, surprise, it works. The GPU then continues drawing from the updated VCount, but the screens keep going from their old position as the LCD controllers can't skip forward or backward.
It's also worth noting that VCount writes are delayed until the next scanline, which has the advantage that it makes timings reliable.
The effects of VCount writes only last for one frame, things go back to normal after the end of the frame (unless you write to VCount every frame).
Implementing this feature in melonDS was also the perfect occasion to make the framerate limiter suck less. It had to be upgraded anyway in order to support adjusting the framerate cap based on how long the frame lasted. So for example, if you ran ZXDS, melonDS would run at 50 FPS instead of 60. The new framerate limiter is also more accurate, which makes for nearly perfect audio output.
(ZXDS wouldn't run currently due to the lack of DLDI support, but you get the idea)
|1 comment (by Nintendo Maniac 64) | Post a comment|
More fun fixes
Apr 27th 2017, by StapleButter
So as I said in the last post, I've been fixing the firmware boot issues. Basically, when booting from the firmware, there was a high chance that the game would crash upon boot. But the bug and its effects were random. Sometimes it would simply hang on a white screen, sometimes it would jump to an invalid address... and sometimes it would work fine.
So we're going to see how we try to fix a random bug.
First thing to do is finding a way to reproduce the bug consistenly. So, hoping my bug would be affected by the time taken before clicking the health/safety warning screen, I set up a quick hack to automatically click the screen after a fixed number of frames. That didn't cut it, so I made the RTC return a fixed time. Which finally allowed me to reproduce the bug reliably.
I landed on a variation where the ARM9 jumped to an invalid address, which made it easy to find where it was doing that. I then started backtracking, which eventually led me to find out that some data were being copied from the wrong place, but it appeared the bug had more consequences than anticipated, making the backtracking long and tedious.
So instead, I dumped the RAM at the time the game booted, and compared it with a RAM dump from before a direct boot. It appeared that a chunk of code got accidentally erased. The range that got erased was within the cart's secure area, so I checked the code that handled secure area reads, but the bug wasn't there.
So I tracked where that code region was getting erased. The bug appeared, and it was another stupid bug. It turned out to be a DMA from the ARM7 accidentally doing that.
The bug was that during cart transfers, melonDS tried to start DMA for both CPUs. The cart interface can only be enabled for one CPU at a time, and thus DMA should only be checked for that CPU.
The bug resulted from a combination of factors:
1. The ARM7 BIOS loads the cart secure area, using DMA. When it's done, it leaves the DMA enabled(!). So when the ARM9 went to load something else from the cart, it accidentally triggered the ARM7-side DMA.
2. The secure area is made of 4 blocks, which the BIOS loads in a random order. So, if the last secure area block loaded happened to be the last one, the bug wouldn't overwrite the secure area and would only write after it -- which isn't a big deal because the rest of the game binary is loaded later. So in that rare case, the game would boot fine.
Was a fun ride, though.
|3 comments (last by River) | Post a comment|
Fixing Pokémon White
Apr 26th 2017, by StapleButter
This post will get into the bugfixing process for a particular bug. Bugfixing in emulation may look like black magic, but it's not so different from general bugfixing-- it boils down to understanding the bug and why it occurs. Of course, emulation makes it harder when it involves blobs of machine code for which you have no source code, but nothing insurmontable.
Anyway, the issue with Pokémon White (and probably others) was that the game wouldn't boot unless it was launched from the firmware. Not really convenient, especially as at the time of writing this, firmware boot in melonDS is unstable.
I first suspected the RAM setup done prior to a direct boot (NDS::SetupDirectBoot). There are several variables stored in memory by the firmware, which games can use for various purposes. For example, the firmware stores the cartridge chip ID there, games then check the cartridge chip ID against the stored value, and throw an exception if it has changed (which typically means that the cart was ejected).
However, some testing revealed that there was nothing the direct boot setup was missing that could have broken Pokémon, atleast regarding the RAM.
So I had to dig deeper into the issue. It turned out that during initialization, the ARM9 got interrupted by an IRQ, but for some reason, it never returned to its work after processing the IRQ.
DS games often use multiple threads, so it isn't uncommon to switch to a different task after an IRQ. But that wasn't the case here, it wasn't even picking a thread to return to. It got stuck inside the IRQ handler.
The IRQ occured upon receiving data from the IPC FIFO. In particular, the ARM7 sent the word 0x0008000C. The ARM9-side handler was coded to panic and enter an infinite loop upon receiving this word.
More investigation of the FIFO traffic showed that the ARM9 first sent the word 0x0000004D, which is part of the initialization procedure. To which the ARM7 replied with 0x0008000C. But it appeared that the ARM7-side FIFO handler was coded to do that. For a while, that stumped me. I couldn't understand how it was supposed to work.
I then logged FIFO traffic when booting the game from the firmware, whenever it successfully booted, to see where the exchange differed. The ARM9 sent 0x0000004D, to which the ARM7 replied 0x0000004D. But it appeared that in that case, the ARM7 was using a different handler.
I checked the ARM7 code responsible for setting up the FIFO handler. It first set up the 'good' handler, then called a function, then set up the 'bad' handler. Looking at the function inbetween, I noticed that it checked register 0x04000300, which is POSTFLG, and is set to 1 by the firmware.
Hey, what if...
So I quickly checked the direct boot setup, and it wasn't initializing the POSTFLG registers. And, surprise, addressing that fixed the issue, letting Pokémon White boot directly.
Well, not bad. Next up is fixing the issues that plague firmware boot, I guess.
|2 comments (last by Johnny Doe-Eye) | Post a comment|