Bugfix0ring streak
So many things to do and so few coders. Poor melonDS company :P

Regardless, a bugfixing streak started happening. So, while I'm busy brainstorming about the current issue (which I'll talk about), have a post with juicy technical details about the process. It's oh so fun, you'll see.



First bug to go down is this weird 3D glitch that shows up in Mario Kart DS and probably others. The issue post shows what it's about: random glitch lines showing up in 3D graphics.

First attempts are some tests to determine the nature of the glitchy pixels. Disabling polygon edges, disabling translucent polygons, disabling antialiasing, whatever, you name it.

Eventually, we find out that those pixels are edge pixels.

Then, I suspected a depth test related issue. Stencil test can be excluded, shadow polygons are always translucent, and if you've read the previous post about the GPU innards, you know that translucent pixels don't set edge flags.

Enter depth buffer debugging. AKA Mario Kart LSD edition.

The purpose of this isn't to have an epic trip racing crapo AI players in a trippy setting, but to have a visual reading of the depth buffer. Not very readable to the bare eye, but the 24-bit Z values are mapped to 24-bit RGB, and this is a gross hack that bypasses the whole rendering pipeline (which is 18-bit).

You can see a glitched line in the middle of this frame, and use any good image editing software to read the raw values of the pixels. We find out that the Z values on the glitch line are greater than those of surrounding pixels, which means it should not punch through the polygons behind it (those polygons should be in front of it).

What the fuck?

Attempting to log these bogus depth test cases, we find out that interpolation is somehow generating Z values that are out of range, of course in a way that manages to slip past Depth Buffer Viewer 5000000. Sometimes they're so out-of-bounds that they end up negative, which causes the glitch lines.

(should probably not be using signed ints there are Z values are always positive, anyway. but then that'd have hidden the glitch, probably)

Tracking those, we find that some polygons with long, nearly-horizontal edges cause the rasterizer to accidentally draw outside of the polygon bounds. In turn, the perspective-correct interpolation factor for these pixels is also out of bounds, which, in some cases, screws up interpolation of vertex attributes, making things overflow and shit themselves.

Talk about a butterfly effect.

(and remind me to talk about that 'perspective-correct interpolation factor' thing someday, but for now, that's how the DS GPU does things. no W reciprocals.)

But, finally, the bug is down.



Next one on the hunt is this one: enemies invisible in Bionicle Heroes. SilasLaspada usefully noted that the console keeps complaining about 3D-related matrix stack overflows/underflows.

What's this about? I mentioned it in the GPU post, the GPU has a set of matrix stacks that are basically a hardware implementation of glPushMatrix() & co. You get the standard push/pop operations, but also store/restore operations which allow accessing arbitrary slots in the stack.

There are four matrix stacks: projection, position, vector, and texture. The projection matrix stack is limited to one slot. I'm pretty sure the texture matrix stack is too, but this one is hazy. Finally, the position and vector matrix stacks have 32 slots, which makes sense. These two are interlinked, the position matrix is used to transform models and the vector matrix is used for lighting calculations. The idea is to avoid having to normalize normals and light directions after having transformed them, by instead supplying a vector matrix which is (normally) an unscaled version of the position matrix. For this purpose, you get a matrix mode in which all matrix manipulation commands apply to both position and vector matrices, except, of course, the scale command.

Anyway, it's not too uncommon for games to accidentally overflow/underflow a matrix stack by pushing or popping too much, as a result of a bug in the game code. I saw it happen in NSMB's intro when the castle gets struck by lightning, for example, without any visible consequence. So I dismissed such cases as benign game bugs.

Until, well, this one bug. Missing shit onscreen, constant complaints about overflow/underflow, too fishy to be a coincidence.

A bit of logging shows that, at some point, the game proceeds to push the position matrix 36 times, without ever popping. And it seems to do a lot of stupid shit generally, like pretending the projection matrix stack has two slots, etc... Terrible game code? Emulator bug? One way to find out, I guess.

So how does this work? What happens if you overflow or underflow a matrix stack?

GXSTAT has a flag signalling that, so I guess you... raise that flag, and cancel the operation?

Right?

Hahaha.

Half of that is true, because of course. This is the DS we're dealing with.

It raises the over/underflow flag, and... it fucking goes on and performs the operation anyway, using only the relevant bits of the stack pointer. For example, if the stack pointer is 31 and you push the matrix, it's stored in slot 31, pointer is incremented, error flag is raised (it starts raising at 31). Now if you push again, the pointer is 32, whose lower 5 bits are 0, so your matrix is stored in slot 0. Yes.

For the position/vector stacks, the stack pointer is 6-bit, so if you push or pop enough, it will eventually wrap around and stop raising the over/underflow flag.

For the projection and texture stacks, abusive pushing and popping will keep accessing the same slot, since there's only one.

Anyway, revising melonDS according to all this seemed to fix the bug, so I guess this is shoddy programming on the game's part. Another bug bites the dust.



Next one is FMVs flickering in Who Wants To Be A Millionaire.

In particular, the bottom screen is partly flickering. Quick test, this only happens when using the threaded 3D renderer, which means that the game is changing VRAM mappings while the renderer is running. As melonDS has safeguards to ensure the threaded renderer doesn't run outside of the real hardware's 3D rendering period, that means the game is malfunctioning.

Basically, it's using the 3D renderer to draw FMV to both screens. Which is a bit silly during the intro as the bottom screen is a fixed image. I haven't seen other FMVs in that game though, so... yeah.

So, we log shit, as usual. We find out that, regularly, the game unmaps texture VRAM, updates the contents via DMA, then maps it back to the GPU. This normally starts upon VBlank, so the texture memory is updated before the GPU starts rendering. Oh but sometimes it gets delayed by something and starts at scanline 243, which is way too late (3D rendering starts at scanline 214).

The non-threaded renderer wouldn't care, it runs 'instantly' upon scanline 214, so it would only be lagging one frame behind, which nobody would notice. The threaded renderer, however, cares.

We find out that the VBlank handler is being held back because something keeps IRQs disabled. This 'something' turns out to be the FMV decoder. It's running in ITCM and keeps IRQs disabled so it can complete ASAP, which makes sense. However, as far as melonDS is concerned, the decoder is taking too much time to do its thing and ends up delaying the VBlank handler.

There is no doubt that this is a big fat timing issue. We have a few of those already, namely RaymanDS which does weird things given bad timings. But also, less severely, 3D graphics sporadically glitching or disappearing for one frame, FMVs playing slightly too slow...

Y'know what this means: time to renovate the timings in melonDS to make them suck less. We can't get perfect without severe performance loss given the complexity of the DS architecture (two CPUs, shared memory, ARM9 caches...), but we can get close. Which wouldn't matter too much, timing on the real thing tends to be rather nondeterministic. With the ARM9, there are many little things factoring in: two-opcodes-at-once THUMB prefetch, parallel code and data accesses under certain circumstances, sometimes skipping internal cycles... and also the fact that the ARM9 runs at 66MHz, but the system clock is 33MHz, so the ARM9 may have to add one extra cycle to align to the system clock when using the bus. The ARM9 is on the borderline where CPUs start getting too complex for accurate emulation of their timings.

Anyway, we're not there yet, but brainstorming the renovation.

For example, complete emulation of the ARM9 caches will not be possible without a performance hit, but I guess we can emulate the MPU and thus properly determine which regions are cached instead of hardcoding it.

We'll see how that goes.
The DS GPU and its fun quirks
Oh hey, another 'technical shito' post. Kind of a preamble to what will be the flagship feature of 0.8.

(would be nice to add actual tags/categories to this blog, btw. heh)



Anyway, I want to talk about how the DS GPU works, how it's different from modern GPUs, and why I don't think that using Vulkan over OpenGL would be any benefit.

I don't know Vulkan a lot, so don't quote me on that, but from what I get, Vulkan stands out by working on a lower level than OpenGL, letting you manage the GPU memory and similar things. This may be good for emulating more modern consoles, where sometimes proprietary graphics APIs are used that allow levels of control that aren't possible with OpenGL.

For example, the blargSNES hardware renderer -- one of the tricks it pulls is that during some of the operations, the same depth/stencil buffer is used with different color buffers. This isn't possible with OpenGL.

Also, there's less cruft between the application and the GPU, meaning better performance, provided you're doing things right. While OpenGL drivers are full of optimizations for common use cases and even for specific games, with Vulkan it's all up to the application to be, you know, well programmed.

So basically, with more power comes more responsibility.



I am no 3D API expert, so, back to what we know well: the DS GPU.

There are already a few posts about specific parts of it (the fancypants quads, the viewport shito, a fun rasterizer quirk, the wonderful antialiasing implementation), but here we're going to cover the whole things, with all the juicy details. Or atleast all those that we know of. Heh.

The GPU in itself is a fairly old and obsolete piece of hardware. It is limited to 2048 polygons and/or 6144 vertices per frame. The resolution is 256x192. Even if you increased that by 4, performance would not be a concern. The DS can output a maximum of 122880 polygons per second under optimal conditions, which is puny compared to modern GPUs.

Now, we get into the details of the GPU operation. It looks fairly standard on the top, but deep down the operation is different from modern GPUs, which can make it difficult to emulate some features properly.

The GPU is split in two parts: the geometry engine and the rendering engine. The geometry engine processes incoming vertices, builds polygons and transforms them so they can be passed to the rendering engine, which (you guess) draws them to the screen.



The geometry engine

Fairly standard geometry pipeline.

A detail worth noting is that all the arithmetic is done using fixed-point integers, as the DS doesn't support floating-point numbers.

The geometry engine is emulated entirely in software (GPU3D.cpp), so it's not too relevant to what we use to render graphics, but I'm going to detail it anyway.

1. Transform and lighting. Incoming vertices and texture coordinates are transformed using sets of 4x4 matrices. Lighting is optionally applied to vertex colors. Pretty standard stuff there, the only nonstandard part is how texture coordinates work (1.0 = one texel on the DS). We can also note the whole matrix stack system, which is more or less a hardware implementation of glPushMatrix() et al.

2. Polygon setup. Transformed vertices are assembled into polygons, which can be triangles, quads, triangle strips or quad strips. Quads are natively handled and not converted to triangles, which is a bit of a problem as modern GPUs only support triangles. However it seems that someone came up with a solution which I will have to test out.

3. Culling. Polygons can be eliminated based on whether they're facing the screen and which culling mode is selected. Pretty standard too. I have yet to determine how this works for quads, though.

4. Clipping. Polygons that are outside of the view volume are eliminated. Polygons that are partly outside are clipped, this step doesn't create new polygons but can add vertices to the existing ones. Basically, each of the 6 clipping planes can end up adding one vertex to the polygon, which means we can end up with as much as 10 vertices. The part about the rendering engine will explain how this is dealt with.

5. Viewport transform. X/Y coordinates are transformed to screen coordinates. Z coordinates are transformed to fit within the depth buffer's 24-bit range.

An interesting bit would be how W coordinates are handled: those are 'normalized' to fit within a 16-bit range. For this, each polygon's W coordinates are considered, and if any is greater than 0xFFFF, they are shifted right by 4 until they all fit within 16 bits. And conversely, if they're all less than 0x1000, they're shifted left until they get there, presumably to get better ranges and thus better precision for interpolation.

6. Sorting. Polygons are sorted so that translucent polygons are drawn last. Then they are sorted by their Y coordinates (yes), which is mandatory for opaque polygons and optional for translucent ones.

This is also why it's limited to 2048 polygons: it has to store them somewhere to perform the sorting. There are two internal memory banks dedicated to storing polygons and vertices. There is even a register telling you how many polygons and vertices are stored.



The rendering engine

This is where the fun begins!

Once all the polygons are all set up and sorted, the rendering engine begins its work.

First fun thing is how it fills polygons. It's nothing like modern GPUs, which seem to fill by tiles and use algorithms optimized for triangles. I don't know how all of them work (heh), but I have seen a bit of how the 3DS GPU does it, and it's tile-based.

Anyway, on the DS, it's a scanline renderer. They have to do it this way so that it can render in parallel with the oldschool 2D tile engines, which operate on a per-scanline basis. There is a small buffer of 48 scanlines which can serve to even out overload on some scanlines.

The rasterizer is a scanline-based convex polygon renderer. It can handle arbitrary amounts of vertices. It can render things wrong if you give it polygons that aren't convex or that have crossed edges, for example:



A butterfly polygon. All fine and dandy.

What happens if we rotate it, though?



Oops.

What went wrong there? Let's outline the original polygon to get an idea:



The renderer is only able to fill one span per scanline. It determines the left and right edges starting at the topmost vertices, and follows those edges until it meets new vertices.

In the picture above, it starts from the topmost vertex, the top left one, and keeps filling until it meets the end of the left edge (the bottom left vertex). It is unaware that the edges are crossing.

At this point, it looks for the next vertex on the left edge, noting that interestingly, it knows not to pick vertices that are higher than the current one, and also knows that the left and right edges have been swapped. Thus, it continues filling until the end of the polygon.

I'd also picture some examples of non-convex polygons, but we're drifing away here, you guess how those go.

Instead, you probably wonder how Gouraud shading and texturing can work with arbitrary amounts of vertices. There are barycentric algorithms that work for interpolating things over a triangle, but... yeah.

The DS renderer, once again, does its own thing. More fancy pics to describe it.



The polygon vertices are points 1, 2, 3 and 4. The numbers don't match what the actual winding order would be, but you get the point.

For the current scanline, the renderer determines the vertices that directly surround the edges (as said above, it starts with the topmost vertices, then walks the edges until they end). In this case, those vertices are 1 and 2 for the left edge, 3 and 4 for the right edge.

The edge slopes are used to determine the limits of the span, which are 5 and 6. The vertex attributes at these points are interpolated, based on the vertical positions within the edges (or horizontal positions for edges whose slopes are X-major).

With that, for each pixel within the span (for example, point 7), the attributes at that pixel are interpolated from the attributes previously calculated at points 5 and 6, based on the X position within the span.

The factors used here are all 50% to make things easier to deal with, but you get the point.

I'm not getting into how attributes are interpolated in detail, this would be beyond the scope of this post. Although this would be interesting to write about too. It is basically perspective-correct interpolation, but there are some fun shortcuts and quirks.

Now you know how the DS fills polygons.


How about the actual fill rules now? This is a bunch of fun too!


Yeah. Well, first of all, there are different fill rules for opaque and translucent polygons. But the killer detail is that the rules are applied per pixel. Translucent polygons can have opaque pixels, and those will follow the same rules as opaque polygons. You guess how this goes -- emulating this kind of shit on a modern GPU requires separate rendering passes.

Then, various polygon attributes can affect rendering in various fun ways. Additionally to the pretty standard color and depth buffers, the renderer also has an attribute buffer, which keeps track of all sorts of fun things. Namely: polygon IDs (separately for opaque and translucent pixels), whether a pixel was translucent, whether it should receive fog, whether it is from a front-facing or back-facing polygon (yes), and whether it is on the edge of a polygon. And maybe more.

Emulating this sort of thing is not going to be trivial. Your average GPU has a stencil buffer that is limited to 8 bits, which is not nearly enough to emulate all the things the attribute buffer can store. So we will need to think of clever workarounds for this.

Let's see:

* depth buffer update: it is mandatory for opaque pixels, optional for translucent ones.

* polygon IDs: polygons are assigned 6-bit IDs, that can serve a few purposes. Opaque polygon IDs are used for edge marking. Translucent polygon IDs can be used to control where those are drawn: a translucent pixel will not be drawn if its polygon ID is the same as the existing translucent polygon ID in the attribute buffer. Both polygon IDs also serve to control where shadows are drawn, in a similar fashion, so for example you can have a shadow that covers the floor but not your character.

(side note: shadows are a simple stencil buffer implementation, nothing too terrible there)

Worth noting that when drawing translucent pixels, the existing opaque polygon ID is preserved, as well as edge flags from the last opaque polygon.

* fog flag: determines whether the fog pass should be applied to this pixel. How it is updated depends on whether the incoming pixel is opaque or translucent.

* front-facing flag: this one is a bastard. If you happen to remember:


(oh, what a lazy ass, she reused old screenshots for this)

Sands of Destruction, living up to its name. The screens in that game are quirk alley. Not only do they do things like tweaking their Y coordinates to influence Y-sorting, the screen shown in that screenshot is probably the worst.

It relies on a depth test edge case: the 'less than' compare function accepts equal values when you're drawing a front-facing polygon over opaque back-facing polygon pixels. Yes. As the polygons' Z values are all zero, the screen will be missing elements if you don't emulate this quirk.

I guess the intended effect was that you'd want your object's front side to always be visible over the back side, even if it's so flat that the Z values are all the same. The DS renderer feels like a hardware version of those DOS-era renderers in that it's full of hacks and quirks like that.

Anyway, this will be tricky to emulate with a GPU. And there are other similar depth test edge cases that have to be tested and documented, too.

* edge flags: the renderer keeps track of where polygon edges are. These serve for the final passes, namely, edge marking and antialiasing. There are also special edge filling rules for opaque polygons when antialiasing is disabled. The following graph, made from an old image I found, describes these rules:



Side note: wireframe polygons are rendered by only filling the edges! How smart.


A fun note on depth buffering, too:

On the DS, you get two possible modes for depth buffering: Z-buffering and W-buffering. Seems quite typical. Of course that's without getting into the details.

* Z-buffering uses Z coordinates, transformed to fit within the 24-bit range of the depth buffer. The Z coordinates are interpolated linearly across polygons (with a few oddities, but that shouldn't matter). Nothing too atypical there.

* W-buffering uses W coordinates as-is. Typically, modern GPUs seem to use 1/W instead, but, y'know, the DS uses fixed-point arithmetic, so using reciprocals wouldn't be a very good thing. Anyway, in this mode, the W coordinates are interpolated with perspective correction.


And, finally, the final rendering passes, in order:

* edge marking: the pixels that have edge flags set are given a color, taken from a table and determined based on the opaque polygon ID.

This will color polygon edges. Noting that if a translucent polygon was drawn over an opaque polygon, that polygon's edges will still be colored.

Side effect of how clipping works: the borders where polygons intersect with the screen borders will also be colored. You can notice it for example in Picross 3D screenshots.

* fog: it is applied per-pixel, based on depth values which are used to index a fog density table. You guess, it is applied where fog flags in the attribute buffer are set.

* antialiasing: it is applied to the (opaque) polygon edges. Pixel coverage values are calculated during polygon rendering, based on edge slopes. During the final pass, those pixels are blended with the pixels below, via a clever mechanism that was described in a previous post.

Antialiasing doesn't need to (and cannot) be emulated this way on a GPU, so this doesn't matter here.

Except for the fact that if edge marking and antialising are to be applied to the same pixels, they only get edge marking but with 50% opacity.


I guess we have more or less well described the rendering process. We didn't get into texture blending (combining vertex and texture colors), but this can be emulated with a fragment shader. Same goes for edge marking and fog, provided we can find ways around the whole attribute buffer thing.


But regardless, my point there is that OpenGL or Vulkan (or Direct3D or Glide or whatever) will not make a difference here. Our modern GPUs have more than enough power to handle the raw polygons. The issue is all the rasterization details and quirks. It's not even about pixel perfection here, just take a look at DeSmuME's issue tracker for example to see the issues they're facing with OpenGL rendering, the same issues which we will have to tackle somehow.


As a side note, going with OpenGL would have the advantage of allowing it to be ported to, say, the Switch (as Github user Hydr8gon started making a Switch port of our emulator).


Well... wish me luck.


(fun reminders of the blargSNES times, where I initially thought that hardware rendering was impossible... and you know how that went)
Quick bugfix update for Windows
If you have downloaded melonDS until now, and it crashes when you try to launch it, try redownloading it.

I pushed a quick bugfix update for the Windows build. Fixed the code that was responsible for loading INI/BIOS/firmware from AppData. So now, it shouldn't crash randomly anymore, and it should actually work as intended.

The bug went unnoticed because a) it strikes on a fresh melonDS install, and b) it's a heisenbug! It tried to access memory that was uninitialized and potentially freed, depending on a lot of things internal to the OS. You guess how it goes.
melonDS 0.7 -- Granting popular wishes
Or atleast starting to do so. There isn't a lot in this release, but hey, we have to start somewhere.

Atleast, The Spark is back, somewhat. So I guess we can take melonDS somewhere.


Not a lot of novelty visually speaking, so there will only be one screenshot:



Nothing changed! :D

Except, if you look closely, the bottom border of the blue platform thing.

Couldn't resist.

Fixed a small bug regarding shadows and antialiasing, that caused that.


What else? Miscellaneous fixes. melonDS shouldn't crash randomly when closing it anymore. And other things.


Oh, and savestates. Which were one of the popular demands, and which are finally brought to you.

Here is the outline of our implementation:

• 8 savestate slots, which are per-game. Savestate files are stored in the same place as save files.
• 'Undo state load' will revert to the state from right before loading a savestate. For example, if you loaded it accidentally.
• In savestate settings: 'Separate savefiles' allows saving to a separate save file after loading a savestate. The separate save file is bound to that savestate slot.
• Shift-Fx to save state, Fx to load state. x = 1 to 8 for regular slots, 9 for using an arbitrary file. F12 to undo the latest state load.


Have fun!


Windows 64-bit
Linux 64-bit

melonDS Patreon
Coming soooooooon!
The savestates feature is finished! Not saying much more though, the details are a surprise ;)

Other than there were other bugs like mentioned in the previous post that bit me, but eh, that's all over now.

There are a few issues to iron out regarding menus under GTK, though, so it'll take a little while.

But it's coming soon!
When your innocuous little code goes kaboom
So until we get something amazing to release, we're going to post some technical shito again.


As said in the previous post, I've been coding savestates. The base idea seems simple enough, just throw all the emulator's state into a file (or read it from that file if you were loading a state), make extra sure that you're not missing anything important, regenerating things that need it when loading, etc...


So I first designed the bases of the implementation:

• A savestate file is divided in several sections. Those can be ordered arbitrarily. The file format itself is very simple.

• To reduce complexity, we only handle loading/saving between frames.

• Some questions are still not clear, and I still need a way to avoid loading a savestate over the wrong ROM. (or I could allow it as a fun Easter egg, even if it generally results in a crash)


Then, the gruelling work, storing all the emulator state.


Including, you guess, the event scheduler, which is an array of event structures with an entry per possible event source. Just write that to the savestate file, right?

Think again.

typedef struct
{
   void (*Func)(u32 param);
   s32 WaitCycles;
   u32 Param;

} SchedEvent;


This is the event structure. We have a handler for when the event fires, how many cycles remain before it fires, and a parameter that is passed to the handler. The idea is that we can use different handlers for one event source, acting sort of like a state machine, which is what is done in display emulation for example (HBlank, next scanline, frame end).

The handler is a function pointer. Which means that storing it as-is in a file is a bad idea. For various reasons, there is absolutely no guarantee that the functions those pointers refer to will be at the same addresses between two runs. In fact, it even seems completely impossible, because modern-day OSes will randomize the address at which the melonDS process is loaded. Then you have things like using savestates across different versions or even different builds...


Well, good thing I thought about that before trying to run that code.


(now I also realized that there are several other instances of dumping structure arrays straight to the file, namely in the 3D pipeline, which might cause compatibility issues between builds as compilers might decide to order the structures differently; that will have to be dealt with too)


So, what do we do now?


As I originally wanted to keep the evnt system fairly flexible, an idea I had was to store relative function pointers, ie basing them off of main()'s address or something like that. While this would work for making savestates reusable between runs, it wouldn't fix any of the other issues. For example, run a different melonDS build and the event functions may be at different places, meaning the relative pointers wouldn't point to them anymore.


So instead, the way I went with was to store the handlers as indexes.

The idea is to keep a table of event handlers. When saving, we lookup the table to convert each event entry's handler to an index. When loading, we do the conversion backwards. Et voila, no problem. There are a couple safety checks around it, for example, to prevent haxing via a bogus savestate file containing an event handler ID that is out of range.

The main disadvantage of this method is that any function that is to be used as an event handler needs to be added to the event handler table.

But atleast our savestates are portable. I guess we win.


That being said, if any of you folks has a better idea for this problem, I'd like to hear about it.
gettin' there
Apologies for not keeping this up to date lately.

Anyway, savestates are being coded, and we need your help to test them out. More about this in this reddit post.

Other than that, I tried some hax on my new green DSi. Was able to install unlaunch. I'm not yet able to run DSi homebrew though-- ndstool builds oldschool DS ROM headers, which causes unlaunch to load these ROMs in DS mode. I have yet to figure out how to make ndstool build DSi ROM headers.

On the real life side... I'm getting there. Depression is a bitch tho. And capitalism too. I would feel a lot better if that didn't exist.
the plaaaaaans: getting staaaaaarted
this is spaaaaaarta





Meet Green.

It's a big DSi. Pretty cool, eh?

Quick inspection reveals that everything is alright. It was factory-reset, tho, so it might not have Flipnote on it. But whatever, we'll find a way. actually it does still have Flipnote. All good!


Still no news of the other order. If you happen to remember, it was returned to sender, and the sender notified me about it and asked me what they should do. I gave them another address to ship to, and even offered to pay for the reship, but since then, silence.

I guess I'll have to give the seller a reminder. Or ask for a refund.


Regardless, I think we can get started with the interesting things... once I'm a bit more settled.
The 'timing' branch
In the meantime, I figured I'd give this a try.

Basic idea is that emulating timings with reasonable accuracy will require emulating the ARM9 caches.

I want to see if I can emulate those without completely killing performance. If I pull it off, there'll be several benefits:

* much more accurate timings, fixing games that do weird things
* homebrew developers using melonDS wouldn't fall into the typical 'forgot to flush/invalidate cache' pitfall that reveals itself when testing on hardware
* if we take it even further, we can emulate the MPU (memory protection unit), which means emulating crashes

I guess that if I manage to get it going, there will be a sorta beta release so I can get feedback about this, whether it's killing performance, etc...
Updates on the plaaaaaaans
The package was, surprise, returned to sender, because some idiot can't find a mailbox that is street-facing and not hidden or anything fancy. Hell, it's right next to the building's front gate.

Whatever.

I'm trying to see if I can have it shipped at another address which I know should work.

I ordered another one, too (that was before the seller for the first one contacted me about the package), but it won't be shipped until a week.


I guess it can't be a bad idea to get two consoles either way. Like, if one turns out to be less/not haxable, or if I need to do hardmods...