Views: 6,703,618 Homepage | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search 03-29-24 03:06 PM
Guest:

0 users reading GPU divider RE | 1 bot

Main - Development - GPU divider RE Hide post layouts | New reply


Arisotura
Posted on 04-02-19 01:41 PM (rev. 3 of 04-04-19 11:09 AM) Link | #934
which is where we try to figure out how the GPU does divisions, because it's weird

I don't think they embedded a divider for each purpose tho?



theory

GPU has a general-purpose unsigned 32bit divider

some special measures are taken before using it, to ensure that numerator and denominator a) are positive and b) fit within 32 bits



viewport transform

barring overflow cases (when W is greater than 0xFFFFFF -- it gets truncated to 24 bits)

sX = ((X + W) * sW) / (W*2)

when W is greater than 0xFFFF, (W*2) loses two bits of precision (effectively taking one bit from W).

so, first, assuming W within 0001..FFFF

X: -FFFF..FFFF (has to be between -W and W)
W: 0001..FFFF
sW: 000..1FF

X+W: 00000..1FFFF -> 17 bits
((X + W) * sW): 26 bits
(W*2): 17 bits

next, when W is greater than FFFF

X: -FFFFFF..FFFFFF
W: 010000..FFFFFF

X+W: 0000000..1FFFFFF -> 25 bits
((X + W) * sW): 34 bits
(W*2): 26 bits

denominator is shifted right by two, same for numerator??

TODO: how is numerator handled? presumably it has to always fit within 32bit unsigned range, which would explain the precision loss at denominator

numerator W doesn't lose precision



interpolation

((x * w0) << shift) / ((x * w0) + (xmax-x * w1))

x: 00..FF (in practice, never going to be greater)
xmax-x: same
wn: 0001..FFFF
shift: 8 or 9

numerator: 8+16+shift
denominator: 26 at most

numerator reaches 32 bits along X, 33 bits along Y

that would explain the interpolation quirks along Y: reducing W's to 15 bits so that the numerator fits within 32 bits

____________________
Kuribo64


Main - Development - GPU divider RE Hide post layouts | New reply

Page rendered in 0.027 seconds. (2048KB of memory used)
MySQL - queries: 26, rows: 73/73, time: 0.014 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2018-07-20)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.