Views: 6,694,992 Homepage | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search 03-28-24 09:49 AM
Guest:

0 users reading GX timings | 1 bot

Main - Development - GX timings Hide post layouts | New reply


Arisotura
Posted on 11-30-18 07:39 PM (rev. 4 of 04-02-19 01:29 PM) Link | #769
Actually GBAtek isn't too far off, as far as individual timings are concerned.


Matrix command timings

The timings for commands 0x11-0x1C depend on the current matrix mode.

Mode 0:

Command CyclesRemarks
0x10 - MTX_MODE 1
0x11 - MTX_PUSH 17
0x12 - MTX_POP 36
0x13 - MTX_STORE 17
0x14 - MTX_RESTORE 36
0x15 - MTX_IDENTITY 19
0x16 - MTX_LOAD_4x4 34
0x17 - MTX_LOAD_4x3 30
0x18 - MTX_MULT_4x4 35
0x19 - MTX_MULT_4x3 35
0x1A - MTX_MULT_3x3 35
0x1B - MTX_SCALE 35
0x1C - MTX_TRANS 35

Mode 1:

Timings are identical to mode 0.

Mode 2:

Timings are identical to mode 0. MULT/TRANS take 30 more cycles.

Command CyclesRemarks
0x10 - MTX_MODE 1
0x11 - MTX_PUSH 17
0x12 - MTX_POP 36
0x13 - MTX_STORE 17
0x14 - MTX_RESTORE 36
0x15 - MTX_IDENTITY 19
0x16 - MTX_LOAD_4x4 34
0x17 - MTX_LOAD_4x3 30
0x18 - MTX_MULT_4x4 65
0x19 - MTX_MULT_4x3 65
0x1A - MTX_MULT_3x3 65
0x1B - MTX_SCALE 35
0x1C - MTX_TRANS 65

Mode 3:

This mode has completely different timings. Probably because the texture matrices are smaller internally, or because it doesn't have to update the clip matrix, or both. The latter would explain the huge timing difference for command 0x15.

Command CyclesRemarks
0x10 - MTX_MODE 1
0x11 - MTX_PUSH 17
0x12 - MTX_POP 18
0x13 - MTX_STORE 17
0x14 - MTX_RESTORE 18
0x15 - MTX_IDENTITY 1
0x16 - MTX_LOAD_4x4 26
0x17 - MTX_LOAD_4x3 19
0x18 - MTX_MULT_4x4 33
0x19 - MTX_MULT_4x3 33
0x1A - MTX_MULT_3x3 33
0x1B - MTX_SCALE 33
0x1C - MTX_TRANS 33


Other commands

Command CyclesRemarks
0x20 - COLOR 1
0x21 - NORMAL 9-12 / 2-5 9/9/10/11/12 for 0/1/2/3/4 lights enabled. also, see: normal parallel execution
0x22 - TEXCOORD 1
0x23 - VTX_16 5 / 7 / 9 see: vertex parallel execution. one extra cycle because one more parameter.
0x24 - VTX_10 4 / 6 / 8 see: vertex parallel execution
0x25 - VTX_XY 4 / 6 / 8 see: vertex parallel execution
0x26 - VTX_XZ 4 / 6 / 8 see: vertex parallel execution
0x27 - VTX_YZ 4 / 6 / 8 see: vertex parallel execution
0x28 - VTX_DIFF 4 / 6 / 8 see: vertex parallel execution
0x29 - POLYGON_ATTR 1
0x2A - TEXIMAGE_PARAM 1
0x2B - PLTT_BASE 1
0x30 - DIF_AMB 4
0x31 - SPE_EMI 4
0x32 - LIGHT_VECTOR 6 + pipeline stall
0x33 - LIGHT_COLOR 2
0x34 - SHININESS 32 (could not be measured accurately)
0x40 - BEGIN_VTXS 1 + pipeline stall
0x41 - END_VTXS 1
0x50 - SWAP_BUFFERS 392 wait till VBlank + 325 cycles (measured: 319/325/331)
0x60 - VIEWPORT 1
0x70 - BOX_TEST 257 + pipeline stall
0x71 - POS_TEST 7
0x72 - VEC_TEST 5

All other commands (nop/invalid) take one cycle.

Vertex parallel execution

Vertex commands are able to execute in parallel with most other commands.

Timings are expressed from the moment the vertex command starts. VTX_16 is preceded by one cycle because it takes two parameters, and starts upon the second cycle.

Commands 0x20, 0x30, 0x31, 0x72 can run 6 cycles after a vertex command.

Commands 0x29, 0x2A, 0x2B, 0x33, 0x34, 0x41, 0x60, 0x71 run 8 cycles after a vertex command (they cannot run in parallel).

Commands 0x32, 0x40, 0x70 stall the pipeline (see below for what this implies).

All other commands are able to run 4 cycles after a vertex command.

Further commands also abide by these rules, atleast until the end of the vertex command. For example: vertex/texcoord/color: texcoord runs 4 cycles after the vertex, color is delayed by one cycle (starts 6 cycles after).

Normal parallel execution

Normals are able to run in parallel with vertices coming right after.

The vertex can run 2/2/3/4/5 cycles after the normal starts, for 0/1/2/3/4 lights enabled respectively.

Under these circumstances, further commands don't get delayed until the normal has finished. (maybe some commands do! I haven't tested them all)

This explains why "texcoord/normal/vertex" runs faster than "normal/texcoord/vertex".

Polygon pipeline

Each vertex which completes a polygon places restrictions on when further vertices can run.

The process lasts 27 cycles for a triangle and 36 cycles for a quad. This duration is divided into 9-cycle slots in which vertices have to fit. The first slot is obviously occupied by the vertex that is executing (and building a polygon). Exceptions for strips: for triangle strips, all 3 slots are occupied; for quad strips, the first 2 slots are occupied.

EXCEPT: the process only lasts one slot if the polygon is rejected by culling/clipping

When a vertex starts within one slot, the slot is occupied, and the next vertex is delayed until the next slot.

Vertices running outside of the polygon-building process are free of any restrictions, and can run 4 cycles after the start of a previous vertex.

Pipeline stalls

Commands 0x32, 0x40 and 0x70 stall the pipeline. That is, if they happen during the polygon-building process described above, they are delayed until the end of the process.

Commands 0x32 and 0x70 get an extra delay when a pipeline stall happens: 8 and 10 cycles respectively.

If 0x32 happens outside of the polygon-building process, it can run 6 cycles after a vertex.

____________________
Kuribo64


Main - Development - GX timings Hide post layouts | New reply

Page rendered in 0.040 seconds. (2048KB of memory used)
MySQL - queries: 26, rows: 73/73, time: 0.014 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2018-07-20)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.