Views: 1,269,894 Homepage | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search 12-01-20 09:52 PM
Guest:

0 users reading GBAtek addendum/errata | 1 bot

Main - Development - GBAtek addendum/errata New reply


Arisotura
Posted on 04-11-17 04:16 PM (rev. 27 of 07-23-19 09:34 PM) Link | #87
GBAtek is an amazing piece of documentation, but it can be improved upon :)


This is a general pile of findings. I claim no ownership on those, they come from several individuals.



ROM transfers

* transfer time is 8 cycles for a command, 4 cycles per response word (basically 1 cycle per byte) (see ROMCTRL bit27 for cycle duration)
* plus start delay and 0x200-block delay at the start of each 0x200 block
* bit28 allows skipping incoming bytes automatically during delays
* DELAYS DO NOT APPLY WHEN THE WR BIT IS SET
* DRQ bit (bit23) is set once a response word has been transferred
* reading from 0x04100010 clears the DRQ bit, and:
** if the transfer is finished: clears the busy bit and triggers IRQ if specified
** if there are more words to transfer: begins transferring the next word


2D

* The main memory display FIFO is a simple circular buffer that holds 16 pixels. The video controller reads from it regardless of whether you fill it. It doesn't get 'empty' or 'full'.
* Writing to the upper halfword of 0x04000068 increments the FIFO write pointer by two (writes to the lower halfword leave it unchanged). The write pointer simply wraps to 0 when reaching the end of the buffer. It is also reset upon VBlank.
* 8-bit writes to 0x04000068 don't work well. TODO: figure out what's happening. eventually.

* Colors are converted early from 5-bit to 6-bit, as such: 6bit = 5bit*2
* Color special effects (brightness, blending) are applied to the 6-bit color components
* In some cases, the MSb of color values is used as LSb for the green component. TODO: find out where this applies. Confirmed to apply to the standard BG palette.

* Bitmap sprite blending follows the same rules as non-bitmap semitransparent sprites, with EVA=alpha+1 and EVB=16-EVA. Except: bitmap sprites with alpha=0 are always hidden.

* 3D layer blending follows rules similar to those of semitransparent sprites (only requires second target bits set in BLDCNT, overrides BLDCNT color effect selection and window 'enable color effect' setting where it applies).
* 3D layer blending uses 5-bit alpha values (from the 3D graphics), such as: EVA=alpha+1 and EVB=32-EVA.
* When the 3D alpha is less than 16, the final color components are incremented by one. (seems to be some hardware glitch??)
* 3D layer pixels with alpha=0 are always hidden (not rendered). They're preserved when capturing the 3D output alone, though.

* BG mode 6 works on both GPUs. On the sub GPU, it only gets 128K of VRAM, so it will repeat the same bitmap 4 times.
* BG mode 7 renders (text-mode) BG0, BG1, and sprites. No BG2/BG3.

* large BG sizes 2-3 are the same as corresponding sizes for regular bitmap BGs (512x256, 512x512)


3D

* Shadow polygons can use textures. In that case, decal blending is applied.
* The stencil buffer can hold two scanlines. It's cleared only when the current scanline contains shadow mask polygons, before rendering a group of shadow mask polygons.
* Stencil buffer bits are set only where the shadow mask is drawn but fails the depth test.
* Visible shadow polygons (polyID>0) are only drawn where stencil buffer bits are set and where the destination pixel's polygonID is different from that of the shadow, regardless of whether that pixel was translucent.

* Drawing visible shadow polygons supposedly resets the stencil bits. TODO: check. I guess not.

* Toon highlight mode uses the following formula: (GBAtek is wrong)
v=vertex t=texture s=tooncolor=toontable[Rv]
R = ((Rt+1)*(Rv+1)-1)/64+Rs ;truncated to MAX=63
G = ((Gt+1)*(Rv+1)-1)/64+Gs ;truncated to MAX=63
B = ((Bt+1)*(Rv+1)-1)/64+Bs ;truncated to MAX=63
A = ((At+1)*(Av+1)-1)/64

* Translucent pixels are only drawn where the destination pixel has a different polygonID OR where the destination pixel was opaque.

* for each separate polygon, W values are 'normalized', they're collectively shifted left or right by 4 until they all fit within 16 bits (if they fit within 12 bits or less, they can be shifted left to use the 16-bit range better)

* conversion for Z values:
** Z-buffering: zbuf = (((Z * 0x4000) / W) + 0x3FFF) * 0x200 (using original W)
** W-buffering: zbuf = W (but it appears to use normalized W)

* conversion for clear depth:
** clearZ = (val * 0x200) + 0x1FF

* There are special depth-test rules for polygon borders. TODO: work it out.
** it seems to only apply to wireframe polygons
** when Z values are equal, left edges have priority over right edges, and top edges have priority over bottom edges (TODO: check wireframe vs normal)
** 'less or equal' depth test has no margin
** (dunno about other orders but they should use the regular rules. Y-sorting gets in the way)

* Cases where 'less than' depth test becomes 'less or equal':
** wireframe polygon borders as mentioned above
** apparently, polygon borders in some other cases too
** when rendering frontfacing polygon pixels over existing opaque backfacing polygon pixels

* in W-buffering mode, 'equal' depth test mode has a margin of 0xFF in either direction. That is, for example, incoming Z range of 0x100-0x2FE is considered equal to an existing Z-buffer value of 0x1FF.
* in Z-buffering mode, margins are +-0x200.

* PUSH/POP/STORE/RESTORE to the modelview matrix always apply to the vector matrix too, even in matrix mode 1.
* "NORMAL/VEC_TEST require matrix mode 2" <- wrong. They work the same regardless of the matrix mode.

* edge marking
Posted by GBAtek
Technically, when rendering a polygon, it's edges (ie. the wire-frame region) are flagged as possible-edges (but it's still rendered normally, without using the edge-color). Once when all opaque polygons (*) have been rendered, the edge color is applied to these flagged pixels, under following conditions: At least one of the four surrounding pixels (up, down, left, right) must have different polygon_id than the edge, and, the edge depth must be LESS than the depth of that surrounding pixel (ie. no edges are rendered if the depth is GREATER or EQUAL, even if the polygon_id differs). At the screen borders, edges seem to be rendered in respect to the rear-plane's polygon_id entry (see Port 4000350h).

-> polygon ID rule for screen edges confirmed
-> at screen edges, the aforementioned depth test uses CLEAR_DEPTH (when testing against a pixel that would be offscreen), even when using a clear bitmap

* antialiasing
** seems to be calculated from edge slopes
** topmost two pixels are retained, antialiasing blends them together including alpha (except color isn't blended when the pixel below has alpha=0)
** during rendering, if an incoming pixel fails the depth test with the topmost pixel, it is checked against the pixel below


DMA

* ARM9 DMA start mode 3 is similar to the GBA's 'video capture' DMA, although GBAtek doesn't make it obvious. It is triggered at the start of each scanline from 2 to 193. The enable bit is automatically cleared on scanline 194.
* TODO: find when 'main memory display' DMA starts. Probably 8 pixels (48 cycles) in advance from the actual display. -- DMA starts ~32 cycles after the start of the scanline. Actual display starts ~48 cycles after the start of the scanline.


Sound

* repeat mode 3 behaves same as mode 1 (loops)
* TODO: check to see what can be changed while a channel is playing. Format can be changed and that's a whole fucking pile of things to check.

____________________
Kuribo64

TechnoNightz
Posted on 07-12-18 08:43 AM Link | #625
I'd actually like to see an open-source documentation (perhaps a melonDS wiki) on this. As you say GBATEK is great but yeah.

Also you did a nice job explaining things, you've always been clear ^^

PoroCYon
Posted on 12-01-19 12:59 PM (rev. 5 of 11-16-20 05:38 PM) Link | #1406
Some that need to be confirmed:

DSP

  • DSP_PSTS bits 10..12 (REP0..REP2) are active-high (as in, 1=was written by DSP), while GBAtek says they're active-low

  • DSP_PCFG bits 12..15 have an undocumented transer mode (7: ARM9 bus loopback): transfers to/from the ARM9 bus, cf. DSP-internal DMA transfer mode 7. This mode requres some additional setup: you first need to set the following DSP-internal DMA registers to the following values (using transfer mode 1):

    [0x81BE] = 0 // select channel 0
    [0x81C6] = 0xABCD // destination address, high 16-bit
    [0x80E2] = 0 | (0<<1) | (1<<4) // configure AHBM (DSP->ARM9 DMA) // example value works for 16-bit transfers (see GBATek/Teakra docs for details)
    [0x80E4] = (1<<9) | 1<<8) // resp. mandatory bit, direction (0=read, 1=write) (see GBATek/Teakra docs for details)
    [0x80E6] = 1 // enable channel 0

    Then perform a transfer as follows (the example writes 0x1337 to 0xABCDEF98):

    DSP_PADR = 0xEF98 // destination address, low 16-bit
    DSP_PDATA = 0x1337 // for a write, read from this address for a read

Chagall
Posted on 09-15-20 06:38 PM Link | #2336
Rumble pak detection needs bit 6 set in addition to bit 1 reset (GBAtek only mentions bit 1).
Found empirically: https://github.com/Arisotura/melonDS/pull/719#discussion_r474976353

Generic aka RSDuck
Posted on 11-15-20 12:02 PM (rev. 6 of 11-16-20 05:45 PM) Link | #2756
there is a power saving mode for the wifi in which it stops receiving and transmitting data. There are two registers relevant to it: W_POWERSTATE (0x480803C) and W_POWERFORCE (0x4808040). An existing program which utilises wifi I tried continued to work fine, except that nothing was being received anymore. W_CONTENTFREE seemed to still count down. My guess is that only the antenna hardware is affected.

Power saving mode can be initiated by writing 0x8001 to W_POWERFORCE.

To leave power saving mode W_POWERFORCE needs to be set to 1 and bit 1 of W_POWERSTATE needs to be set (the order doesn't matter. This is also part of the init routine). Alternatively 0x8001 can be written to W_POWERFORCE. Gbatek discourages from doing this, saying it leads to other weird behaviour, though the official wifi firmware seems happily do this.

Not implementing this feature can lead to freezes in the Pokemon games. See here: http://melonds.kuribo64.net/board/thread.php?pid=2314#2314

There's probably still a lot more to unpack. What is disabled and what not is currently an educated guess and the consequences of writing 0x8001 to W_POWERFORCE mentioned in gbatek are still to be explored.

EDIT for some additional findings:
  • W_US_COUNT seems to continue to count in power saving mode
  • Gbatek says "Note: That queue stuff seems to work only if W_POWER_US=0 and W_MODE_RST=1." seems to be wrong. All power saving stuff documented here continues to work even when W_POWER_US=1
EDIT 2:
  • in regards to transmission: what happens seems to be that upon requesting to send data just nothing happens. So W_TXBUSY is never flagged nor is W_TX_SEQNO incremented. A request made in power saving mode will be discarded, i.e. not sent once power saving mode is off.


____________________
Take me to your heart / never let me go!

"clearly you need to mow more lawns and buy a better pc" - Hydr8gon

PoroCYon
Posted on 11-16-20 05:30 PM (rev. 7 of 11-16-20 06:40 PM) Link | #2762

IR cartridges

IR cartridges seem to work as follows, but I'd like to have someone else to verify this (seems to work with HGSS, BW and B2W2 carts, idk about others):

Everything automatically happens at 115200 baud, 8n1.

There seem to be three main SPI commands that are sent to what normally would be the savegame SPI bus, there's a fourth command to perform actual savegame operations. All transfers happen at 1 MHz (serial AUXSPI mode) unless indicated otherwise.

The cartridge needs to be powered on, but nothing more besides this. No header reading or KEY1/KEY2 init, and so on. (I rebooted the cart with SCFG_MC and started doing SPI commands immediately afterwards, seems to work fine).

The commands:
  • 0x00: savegame escape byte: as long as chip select is held, the bytes that follow will be treated like a regular savegame transfer. These can also happen at any clock speed, but the 0x00 byte itself needs to be transferred at 1 MHz. (This bit was already known.)
  • 0x01: receive data from IR: one command byte (0x01) is written, after which data bytes are read by the DS. The first byte read indicates the amount of bytes that will follow. It doesn't seem to be able to receive more than 255 bytes afaics. Bytes written to perform the reads are unused as far as I know, but usually set to 0 (HGSS does this, at least).
    • When there are zero bytes to read, you still have to deselect the SPI chip, or the next transfer will fail. Disable the 'chip select hold' bit in AUXSPICNT and send a zero byte.
  • 0x02: send data over IR: this one has no length prefix, chip select is used to determine when the transfer ends, as usual.
  • 0x08: not too sure about this, but probably a status thing. A command byte (0x08) is sent, and a status byte(?) is received from the cart. HGSS seems to always send two of these one after another, carts seem to return 0x00 on the first one and 0xaa for the second, unless other IR devices are sending actively, then both bytes are 0xaa.
HGSS, while trying to connect to a Pokéwalker, seems to first do a cmd 0x01, which returns 0 bytes, then 0x08 twice, after which it repeatedly issues other 0x01 commands until the return data of one of these indicates a Pokéwalker presence (the Pokéwalker sends out a fixed byte value as 'beacon' thing, the game will receive a 1-byte packet containing that beacon). 0x08 is never used again after being used twice in the beginning as far as I can see (but I might be wrong).

Allegedly, the chip in carts responsible for IR is a H8/38606R (connected to the SPI bus on one side, and to some IR leds or so on the other), just like the one in the Pokéwalker. Getting its ROM dumped sounds like a fun challenge.

____________________
TiTAN Forever


Main - Development - GBAtek addendum/errata New reply

Page rendered in 0.019 seconds. (2048KB of memory used)
MySQL - queries: 29, rows: 88/88, time: 0.014 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2018-07-20)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.