Updated romlist.bin
The database was rebuilt, this time from a mishmash of the Advanscene and Wood databases. It should be a lot better.

In the near future, the database will be changed to identify ROMs by game code rather than CRC, so it will also work with hacked ROMs and whatnot. But for melonDS 0.7.2 or 0.7.1, you can get this updated database.

-> Download here <-

Just overwrite your existing romlist.bin with this and you're good to go!

Remember, if your game is still failing to save, you may need to delete the savefile.
World of shit
I've been looking at another 'timing renovation victim' issue: dual-screen 3D shitting itself in Colour Cross.

The game draws all its animated background tiles/etc separately, giving each its own transforms. It sends unpacked GXFIFO commands and doesn't use DMA. All in all, not very efficient, but probably not a bad way to do it given what they're doing.

Anyway, the game expects to take a certain time to draw each screen (for the bottom screen, it spans two frames). If the code runs too slow or too fast, it shits itself. Given how geometry is sent, GX timings don't affect this.

It doesn't have to be precise, but there's a window within which this has to fall, so we can't make it absurdly fast or absurdly slow and hope to fix it.


On the other hand, we have the Spellbound issue which I mentioned in the previous post. There, the code has to be running slow enough, I can't see any other way out. Both DMA and GX timings are correct there. I could always try dumping the display list and running it on hardware within the same conditions to see how it goes, but I'm sure the issue is with code timings.


So I guess this is it. No amount of hodgepodging timings will get us out of this, as I feared. And I really want to avoid 'solutions' like game-specific hacks or 'toggle this timing setting and see if it fixes your game'.

So, we have to emulate the ARM9 instruction cache.

On the plus side, since code fetches are mostly sequential, this shouldn't be a big performance penalty. The cache can be checked only upon branches and on cache line boundaries (on the DS, that is every 32 bytes). The ARM9 caches have 4-way associativity, which means a measly 4 cache lines to check before we know whether the current address is cached. So I believe we can afford to emulate it without killing performance.

The data cache would be the one killing performance; because data accesses are far less predictable, the cache would have to be checked upon every memory access. For now, it appears we can get away with not emulating it, so let's pray.


For less performant devices, we could always have a 'performance' profile that uses the old timing model (2 cycles per code fetch in cached memory, always). It's less exact and more likely to break something else, but it seems to run most things reasonably well, so it's a good candidate for a performance/accuracy compromise.


I still have to look into Spellbound, because of course making code fetches taking 2c doesn't fix it. It needs to be atleast 5c to work correctly, but 5c breaks other things, no surprise there.

Quick testing confirms what I suspected there, it only ran correctly in older melonDS versions because the GXFIFO DMA was too slow, taking 3c per word. On hardware, transferring from main RAM to IO registers (GXFIFO or whatever else) takes 2c per word, which is also the case in melonDS after the timing renovation. So, well, we can't take it back.

(the difference looks puny, but when transferring hundreds of units, it quickly adds up)


In brighter news, well, 0.7.3 will come out soon-ish, once I come up with something for the ARM9 code cache.

There are also a few improvements making it more user-friendly. For example, I made savemem relocation when loading savestates disabled by default, as I figure that having it enabled by default can be confusing.

We're also trying to fix the input config dialog crashes under Linux. I wasn't able to reproduce them because I wasn't looking at the right place: they happen when mapping joystick buttons, and taking that into consideration, I reproduced it, and could fix it. It was quite silly, reason is that we use a SDL timer callback to sense joystick input, and were updating the UI directly from that timer callback. Forgetting that GTK is not thread-safe and the UI shouldn't be manipulated outside of the UI thread, even after I ran into similar issues with the main window. Using uiQueueMain() lets us get around this and fixes the crash.

I'm still not 100% sure there though, there was a crash under FreeBSD that appeared to be different.

There's also a little improvement to joystick input, it can detect if the joystick is disconnected/reconnected while melonDS is running.

It's still limited to one joystick though, I'm not quite sure how to go about handling multiple joysticks and whether we can trust the OS not to go and change their indexes.

I also want to work on network support, adding the interface selector and built-in DHCP/NAT, but maybe for 0.7.4. I already have enough on my plate here, and that's without getting into the 0.8 hardware renderer.
Short break
Much needed to avoid burnout these days.

Especially when it seems that all you're throwing at the wall is pointless. That sure deals a blow to motivation.


I picked one of the available Advanscene databases for savemem type detection, not knowing what all the available ones mean and what the differences are. Well, this one seems to be a Swiss cheese, full of errors and missing games.

If you have any ideas for a better database (like, what the fuck are the differences between all the Advanscene ones), I'd love to hear about it. Among the suggestions are that of using AKAIO's database, so I might look into that.

Also, a possibility to consider: detecting ROMs by game code rather than by CRC.


We're getting more timing renovation victims, and this time they're goddamn real. They all seem to be the same thing, missing/corrupted geometry. I don't know about all the games, but I looked into one of them (Spellbound), and it looks, well, not very good. The developers coded their own software version of GXFIFO DMA (the base idea is to transfer a fixed-size chunk of display list data every time the GXFIFO is less than half-full), which works by relying on the GXFIFO IRQ set to trigger when it's less than half-full.

However, the way this is coded expects that by the time a DMA transfer is finished, the GXFIFO will already be less than half-full, so the next chunk will be started immediately. Of course, there's a point in melonDS where that doesn't happen, causing the game to think the transfer has completed, and send a SWAP_BUFFERS command in the middle of the display list, which works as well as you'd expect.

Something's up with our timings. I hope we don't need to emulate the ARM9 caches.


And the goddamn input config dialog under Linux. I have no fucking idea why that would crash. I can't reproduce the crash. It's working just fine under my setup (Ubuntu 16.04). I can't even tell if we're just hitting some obscure GTK bug.


argl.


and merry Christmas to you reader.
Hotfix release
It appeared that, due to an oversight, melonDS 0.7.2 would crash when loading a ROM if the physical microphone init failed.

This has since been fixed, and a hotfix has been pushed. If you downloaded melonDS 0.7.2 before this post, just redownload it to get the fixed version.
melonDS 0.7.2
The melonDS company celebrates Christmas! Albeit one week early. But this new release of melonDS is a neat little pile of presents.

You have already gotten a glimpse of it, but let's go over all the changes since 0.7.1, because it's been a fast week.


So first of all, the issues we have seen pop up in NSMB, Pokémon, or Etrian Odyssey after the timing renovation, have been fixed. Nicely, the new timings uncovered some stealth GX bugs that would surely have bitten us another day under other circumstances.

So the claim that was made for 0.7.1 ("now your games run better than ever") is finally more than a phony advert :P


Second big thing is, as you probably guessed by now, microphone support. If your machine has a microphone connected and if you are using SDL 2.0.5 or more recent, you can blow or blare bullshit into it and it just works! If that is not the case, you can also opt to feed a WAV file or white noise as microphone input.

Note that this feature is still experimental. Quality of microphone input may not be optimal, especially when using a physical microphone. WAV input works better.

WAV and white noise modes send input when pressing a microphone hotkey (default is the key right next to right Shift, '?' on QWERTY keyboards). WAV mode can take any reasonably small file, encoded as 8-bit or 16-bit PCM, signed or unsigned, any number of channels (it will read the first channel).

All this can be configured in the new audio settings dialog, where you can also set the volume for audio output.


Which brings us to the new hotkey system. For now, aside from the aforementioned mic hotkey, there is only another one: 'Close/open lid', which simulates closing/opening the DS. Default key is Backspace.

Oh and the hotkey system is an extension of the regular input system, which means you can also assign joystick buttons to these hotkeys.


Speaking of the input system, Windows users may have noticed that the input config dialog was abysmally slow, taking several seconds to open and generally feeling quite laggy. With a quick little fix in libui, that is no more, and the dialog now feels a lot more normal.

Some attempt was made at fixing possible crashes with that dialog under Linux, but those crashes may need more investigation. In my current setup (Ubuntu 16.04), I am unable to reproduce them, or break the input config dialog in any way. I don't know which are caused by melonDS and which are obscure GTK bugs. While one of the stack traces that were reported showed something I could easily work around, the other pointed at some obscure bug where some function internal to GTK is getting a NULL value and crashing.


On a whim, I added support for nocash-style debug print, which enables homebrew to print to the emulator's console. Also, Windows users don't need to get a debug build to get console output -- running the release version of melonDS from cmd will dump the console output there.


We also have some welcome contributions from some fine Github comrades:

* FPS limiter toggle, courtesy abcdjdj
* flatpak manifest, by cpba
* Linux libpcap library names added to the libpcap loader by dogtopus
* and finally Aqueminivan renovating the readme, it looks cool now!


And, last but not least, a whole bunch of misc bugs were fixed:

* black screens in Puzzler World 2
* American Girl - Kit Mystery Challenge! screeching garbage audio in the house
* blending fail in Pokémon Mystery Dungeon - Explorers of Sky
* lack of background music in Club Penguin: Herbert's Revenge
* a few wrong entries in romlist.bin were corrected
* config dialogs could be opened multiple times


Merry Christmas!


melonDS 0.7.2, Windows 64-bit
melonDS 0.7.2, Linux 64-bit

melonDS Patreon if you're feeling generous
I am fucking relieved
You might already know that 0.7.1 sports all new timings and that it also causes a few issues that were not present in 0.7. Ranging from NSMB's cannons putting on some weight to Pokémon characters getting magical growth to, less amusingly, dual-screen 3D shitting itself in Etrian Odyssey.

The NSMB and Etrian Odyssey issues also had the fun aspect that they didn't occur in the USA ROMs, for some reason.

Well, that's a thing of the past now.

I knew the issues appeared after the timing renovation, but that was about it. I figured that instead of blindly hodgepodging timings until the issues disappeared, and potentially getting into some whack-a-mole game, I would take the time to understand the issues.


So the first one was the NSMB cannon.

Technically, the cannon body is a billboard (3D sprite that always faces the camera). The 'big' version is just the same but with the cannon body scaled too big.

So I investigate how the cannon body is rendered. For that, we can use NO$GBA's debugger version, atleast until melonDS gets similar tools :P

Conveniently, the cannon body is the first polygon in the display list. Before it gets rendered, there are some transforms set up so it will face the camera. In this case, the second-to-last scale transform sometimes got wrong values that caused the billboard to appear too big, hence the bug.

So I look into the game's code to find out where and how that scale transform is calculated. Noting that debugging NSMB is made way easy by the awesome folks at the NSMB Hacking Domain who have a complete IDA database of the game.

The game feeds some transform commands to the GX, waits until it's done, then reads the clip matrix (projection and position matrices product), and derives the scale transform from it. In our case, the bug came from how melonDS handled the GXSTAT busy flag: there was a thin chance that, after submitting some commands, the game could manage to read GXSTAT before the busy flag was actually set. Thus it would decide that the GX was already done, and proceed to read the clip matrix while the transforms were executing, and get some of the values wrong.

The issue was unrelated to CPU/memory timings, it just happened to be a simple bug that slow enough timings kept hidden, until now. So, it was easily fixed.

The Pokémon issue is likely the same issue, and is likely fixed too. We just need to check it.


The Etrian Odyssey issue, however is different.

Quick investigation showed that the game was almost never swapping its screens, despite attempting to do dual-screen 3D.

The DS can only render 3D to one screen at the time, so how do you do dual-screen 3D? With a bit of trickery. For example, you would render a 3D frame to the top screen, and at the same time, capture that frame to VRAM. Then, during the next frame, you swap the screens, so that you're now rendering 3D on the bottom screen, and on the top screen you render the bitmap you previously captured. With this typical method, you get 3D on both screens, but you're limited at 30 FPS.

So obviously, if the game is not swapping the screens, the rendering is only going to be a trainwreck.

So, we look into the code responsible for swapping the screens and making that whole thing work. It runs upon VBlank, and basically does the following:

1. wait ~400 cycles, by running a subs/bcs loop 200 times
2. check the GXSTAT busy flag; if that is set, give up
3. swap screens, do more setup

The base idea there was likely that if a frame took too long to render, the game would avoid swapping the screens at the wrong time and causing flickering.

In our case, however, the SWAP_BUFFERS command took a bit too long and accidentally triggered that.

It was set to take 392 cycles as per GBAtek's data, but measurements on hardware showed that that duration is more like 325 cycles. Revising our code accordingly, Etrian Odyssey finally rendered normally.


So, as the title says, I am fucking relieved there. I thought we were faced with some problem that could only be solved by emulating the ARM9 caches, which, urgl. It turned out to not be the case. And melonDS 0.7.2 is totally gonna rock!
Sneak peek
New feature of melonDS 0.7.2, among a nice little pile of other features and fixes:



I'll let you guess ;)
melonDS on the Nintendo Switch
What's this?

In case you didn't know, melonDS has a port for the Switch. And it's made by me!

It's actually been around for a while now, but just today I put out an update with a major UI overhaul. As in, it actually has a UI now.

For the longest time the port was just text-based, with few options and it ran like crap. I just did it for fun, and because melonDS was surprisingly easy to port. A couple weeks ago we gained the ability to overclock the Switch, and this let melonDS actually reach playable framerates for a few games!

Normally overclocking is scary, but not so much on the Switch. The Switch rocks an Nvidia Tegra X1, the same processor used in the Nvidia Shield TV. Maybe to save battery life, or maybe to prevent overheating, the Switch runs at about half the clock speed of the Shield by default. Since melonDS isn't GPU intensive at all, the cooling of the Switch is plenty good enough to handle the higher clock speed on its own for emulation.

So now that that's a thing, melonDS on Switch seems to be more than just a novelty. It seems like it could actually become a usable NDS emulator! Unfortunately the UI is still shit. So I built a UI with OpenGL that resembles the UI of the Switch itself.

It's not much, but it gets the job done. On top of that I added all of the screen layouts available in desktop melonDS for some extra fun. Aside from a few minor things, melonDS for Switch is now at feature-parity with core melonDS, and it doesn't look terrible, either.

So what's in store for this project?

Well, since the UI is more or less finished, the plan from here on is to keep the Switch version up-to-date with core melonDS. Once the hardware renderer comes along in 0.8, it'll likely be able to run most, if not all, games at full speed. I'd also love to help StapleButter out with the core project, although my expertise in writing actual emulation code is extremely limited.

I guess I'm able to post on the blog now, so you can expect more updates from me at some point in the future.

If you're interested in my project, you can find it on GitHub here.
melonDS 0.7.1 -- here it is!
As title says.

We're not showing screenshots because they wouldn't be a good medium for conveying the number of changes in this release.


The biggest change here is that the core timings were entirely renovated to try being closer to the hardware.

First of all, after several days of gruelling testing and guesswork, we were finally able to understand the GX timings, and emulate them properly for the first time. But, as we weren't gonna stop there, we also renovated the timings for DMA and memory accesses, so that both of them are closer to their hardware counterparts. DMA and ARM7 should be pretty close to perfection now, ARM9 less so but it's still more realistic.

We have also been fixing the emulator's main loop, so that the ARM9, ARM7 and system clocks shouldn't desync anymore.

All of this, with a few added optimizations, fixes a whole bunch of issues, from things flickering to audio crackling to games outright going haywire (hi RaymanDS). Your games are now running better than ever!

... or not. That's also the point of the 0.7.1 release, I want to hear about any issues caused by the timing renovation, so we can get them fixed for the epic 0.8.

We already have one such issue, all of this is causing sprite flickering in Pokémon Platinum. Quick attempts at fixing this went nowhere, so we will have to investigate this proper.


There's also a number of misc fixes. For example, the 3D glitches that showed up in Mario Kart DS were fixed, but there's already a lengthy post about this.

There's a small fix to 2D windows, nothing really noteworthy, just fixes a game that was setting up backwards windows.

The input system no longer requires a dpad to be mapped to directional keys for joystick axes to work.

The code that looks for local files (melonDS.ini, BIOS, firmware...) was modified to explicitly check the directory melonDS is in, if that is not the same as the working directory. So in these cases it will no longer fail to find its files. Also, if melonDS.ini is absent, the preferred directory for creating it is the directory melonDS is in.

And, finally, we finally got rid of the old savemem type autodetect code. Considering its complexity and the amount of failures, it was a trainwreck. So instead, now, melonDS will pull that information from the provided ROM database (romlist.bin).

Edit- If your game is still failing to save: you might have an old failed save file lying around, try deleting it. When a save file is present, melonDS determines the savemem type from the file size, so if a file is present with a wrong size, you will have to delete it.


Enjoy!


Windows 64-bit
Linux 64-bit

melonDS Patreon
Update
I had a few bugs in my implementation. After fixing those, the timings I got were closer to what it should have been. Too fast, but way too fast.

Anyway, I had issues with my test app too. One of the memory buffers it accesses was in main RAM instead of DTCM; fixing that oversight gave me more realistic timings. In reality, the Millionaire issue was what I first thought-- on hardware, the FMV decoder completes in 146 scanlines, and I was being too slow.

So, obviously, fixing the timings made it fast enough that the glitch was entirely gone. But, aside from that, we were back to square one: running too fast and the issues it causes everywhere. Rayman DS is a good test for that, and it was shitting itself big time, so... yeah.

If we raise the cached memory timings to 2 cycles, as some sort of average between cache hits and cache misses, we get closer to the real thing. Still a bit too fast, but this time Rayman DS is running normally.


However...


The new timing logic, coupled with PU emulation, is a pretty noticeable speed hit.

We can keep the PU logic for a potential future 'homebrew developer' build, that would also have several features to warn against typical DS pitfalls. For example, some form of cache emulation and/or warning about things like DMA without flushing/invalidating caches, 8-bit accesses to VRAM, etc...

If there is demand for such a build, that is.

As far as regular gamers are concerned, we can go for a faster compromise that would run most of the shit (really most of it) well. It wouldn't run things like GBA emulators that make extensive use of the PU, but... eh.

All the commercial games around use the same PU settings. We can say as much about libnds homebrew, its settings are a little different (for example, putting the DTCM at 0x0B000000 instead of the typical 0x027E0000), but nothing terrible there. All the differences we can observe are essentially in where the DTCM goes.

Since the DTCM is already handled separately (in CP15.cpp), we can handle its timings separately, and use a coarse table for the rest of the memory regions, like we did before. All while retaining enough of the new timing model that melonDS's timing characteristics would be much closer to the real thing.

As explained before, it's near impossible to perfectly emulate ARM9 timings due to the sheer complexity of the architecture and the way it's implemented in the DS, so if we can get 'close enough', might as well do it without wasting too much performance over it.