| ||
| Views: 30,097,946 | Homepage | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search | 03-12-26 04:24 PM |
| Guest: | ||
| 0 users reading GBAtek addendum/errata | 2 bots |
| Main - Development - GBAtek addendum/errata | Hide post layouts | New reply |
| kuratius |
| ||
|
Newcomer Normal user Level: 2 Posts: 1/1 EXP: 17 Next: 29 Since: 04-24-25 Last post: 322 days ago Last view: 247 days ago |
Posted by PoroCYon Some additions to this: The MCU getting reset might be due the homebrew driver you used as a base sending a reset command every frame. The 1ADC, 2ADC referred to on some row speed settings on gbatek refers to low power mode in the readmode setting. 1ADC is low power mode. The pixel output clock can be calculated as 16.76 MHz *M/(N+1)/8/2 And correspondingly the frame rate should be pixel clock/(frame_length_lines*line_length_pck). There's another way to calculate it based on coarse and fine integration time integration_fps=pixel_clock/(coarse_int_time*line_length_pck+fine_int_time); Usually this this is more accurate for the actual camera FPS with auto exposure if you read out the current values in the registers. Here's some logs of the I2Cs Comms in FaceTraining's DSi version in melonDS if you force the arm7 to jump to the function that changes frame rate. Preset12 and above appear to be invalid; it does a bounds check on the contents of R1 if you jump to the function. The function is at address 037dec10 and expects R0 to contain some sort of camera model id, I recommend just setting all the bits to 1, and R1 to contain the number of the preset. FrameRatePresetRegisterWrites.zip |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1016/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
speaking of camera
I never posted about the camera interface thing, and GBAtek doesn't say a lot, so CAM_CNT: * if a camera transfer is underway, clearing bit15 is delayed until the end of that transfer CAM_DAT: * data FIFO holds 512 words * there are two FIFOs when a FIFO is full (ie. contains as many scanlines as the DMA interval set in CAM_CNT) * if the other FIFO is empty (was fully read): swap FIFOs, fire DMA * if not: raise overrun error transferring too many scanlines per block doesn't raise the overrun error reading too much of the FIFO doesn't raise the overrun error ____________________ Kuribo64 |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1024/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
audio related
POWCNT2: * "Note: Bit0 disables the internal Speaker only, headphones are not disabled." - not true. bit0 does disable headphone output too. * on the DSi, bit0 only disables NITRO mixer output, not DSP output. * also, only disables output -- mixer still runs * does disable SOUNDBIAS too MIC_CNT: * mic mode/freq bits can't be changed while bit15 is set * IRQ bits, however, can be changed IRQ bits: * 0 = none * 1 = when FIFO half-full * 2 = when FIFO full (seems to actually be overrun IRQ) * 3 = both IRQ triggers just once (ie. not like GXFIFO IRQ) ____________________ Kuribo64 |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1039/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
CAM_MCNT:
is read-only when the camera transfer is enabled. doesn't actually reset anything - none of the bits have any effect on the other camera registers. except bit 5, which goes to the cameras' reset lines. bit 7 seems to be set all the time. theory: might be a GPIO thing? EDIT- https://melonds.kuribo64.net/board/thread.php?pid=6140#6140 still weird. ____________________ Kuribo64 |
| Jakly |
| ||
|
Member Normal user Level: 7 Posts: 11/13 EXP: 1257 Next: 191 Since: 03-22-24 Last post: 72 days ago Last view: 2 days ago |
Notes and Findings on NDS ARM946E-S Pipelining and Timings:
All research has been performed on a new3DSxl in TWL mode; some details may vary based on HW revision or other misc factors. The subject is also very complex and thus hard to comprehensively verify all factors, so some parts of it may be speculative. ARM9E-S processors use a 5 stage pipeline: Fetch: future instruction data is read off of the bus Decode: the next instruction is decoded. Execute: most instruction logic is handled here. Memory: Some extra instruction logic is optionally handled here. All instructions spend at least 1 cycle in the memory stage, even if no logic is performed during it. Writeback: instruction results are written back to the register bank. Some load instructions finalize output in this stage. Forwarding and Interlocks: To alleviate the effects of the longer pipeline the cpu will attempt to use shortcut paths to send results to instructions that need them ahead of them being properly written back. In some cases the results may not be available immediately, this results in later instructions in the pipeline being delayed. This is called interlocking. Note that interlocks to R15 will always be incurred as you immediately need the result for the branch address. Side note: MRC is implemented as a load from coprocessor, and amusingly, MRC R15 being a load to the CPSR flag nibble results in timing behavior consistent with the interlock being triggered by the decode stage. It’s worth noting there are two types of interlock: Types A/B & Type C. This is due to the internal ports used to fetch a given operand from. Certain register writeback paths don’t have paths to the C register bank port. Type C interlocks are confirmed to happen for LDR/LDM results and take an extra cycle to resolve for store (Rd, not base) & multiply accumulate inputs. (Speculation: is this because both of those are registers being read during the memory stage…?) Speculation on how pipeline stages overlaps: It’s unclear when exactly the Decode stage occurs, it might be possible to infer it somehow, though it’s not particularly important so I haven’t tried yet. Execute stage seems to begin on the last cycle of the Fetch or previous Memory stage, whichever takes longer, this results in a 1 cycle overlap. The Fetch stage begins the cycle after the Execute stage ends. The Memory stage seems to usually begin the cycle after the Execute stage ends; note that this doesn’t quite apply to LDM/STM. The Writeback stage begins the cycle after the memory stage ends. LDM/STM seem to be unique in that they can overlap the Execute stage and Memory stage. With the memory stage beginning on the second cycle of the instruction and the execute stage lasting for one cycle per memory access; this should also apply to LDRD/STRD as they are internally LDM/STM. The writeback stage allegedly also occurs once per access after each individual one completes, though this doesn’t seem too useful to model. Note: I’m not 100% sure how SWP works here? Multi-bus drifting?!: ARM946E-S has 3 internal buses: Instruction Bus. Data Bus. Write Buffer Bus. ITCM: Sane as long as you never try to access it on the data bus. Data read (writes are weird?) accesses to ITCM seem to stall the instruction bus for 1 cycle. This has been observed with instruction accesses to ITCM, ICache, and the latched halfword thumb instruction fetches use. This has not been confirmed to occur on instruction fetches from uncached external bus accesses or actively streamed in external bus accesses. It has not yet been tested with prefetch aborts. Writes on the data bus stall the data bus for one cycle if you attempt to read from it on the next cycle. This has been confirmed for writing both DTCM and the Write Buffer. AHB buffer latency: All accesses to the external bus (AHB) must go through an internal buffer for some reason that i dont fully understand that adds a disgusting amount of latency. The buffer takes 3 bus cycles when using a 2:1 cpu:bus clock ratio and 2 bus cycles with 4:1. Buffering is only done while the ARM9 has confirmed ownership of the AHB. The ARM9 If the ARM9 loses ownership of the bus for any reason then all accesses must be rebuffered from scratch. When the ARM9 has ownership all 3 internal buses can queue up accesses, and ownership is not relinquished until all 3 queues are empty, this can result in the latency being hidden partially or fully under some circumstances. The ARM9 (probably?) can’t lose ownership of the external bus while buffering. (I guess this is the same or similar logic that prevents being interrupted in the middle of an individual fetch?) Note that the ARM9 seemingly will never queue up an access to the AHB if it is sequential and an access is still finishing on the current bus. I don’t know why, it just is. Note that an internal bus cannot be interrupted by another internal bus or itself until its current burst is finished. I wanna say priority in case of a tie is data > instruction > wb but that’s not something i’ve confirmed. Bus alignment: The ARM9 runs at a higher clock relative to the AHB, this means that any accesses to it will stall until aligned with the bus clock. Note that bus is rising edge which apparently results in results being attained on the first cycle of a given bus cycle. So a 2 cycle access to the AHB would finish in 3 ARM9 cycles rather than 4. This point tends to be largely moot due to alignment requirements but it can be used to interweave extra instructions between two different accesses. Cache streaming jank: Cache streaming allows the processor to continue executing to a limited extent while cache line fills occur in the background While a cache stream is active, attempting a nonsequential access on that bus will result in it stalling until the stream is fully complete, this includes internal accesses and latency will be incurred if to the AHB. If the next streamed in word finishes streaming in before the arm9 wants it, this will count as a nonsequential access and you will be forced to wait until the stream is fully complete to get the data. |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1083/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
not sure if I posted before about 2D stuff
anyway * internal rotscale reference points are updated even when forced blank is enabled in DISPCNT. * same goes for mosaic counters. also, mosaic weirdness, when changing it midframe: * BG mosaic: Y counter keeps counting until the old Y size is reached, then uses new Y size * OBJ mosaic: Y counter keeps counting until the new Y size is reached. if the new Y size is smaller than the old one, it will wrap around (the counter is 4 bits). * in both cases, the new X size applies immediately. * OBJ mosaic counter is NOT updated when the OBJ layer is disabled. * TODO: check if it's the case for BG layers (and if they have separate counters, too) * also, internal BG reference points are not updated when the corresponding layer is disabled. * THEORY: internal rotscale reference points are updated during VBlank too. * possibly also the case for mosaic counters. WHEN DISABLING BG/OBJ LAYERS OR ENABLING FORCED BLANK * disabling BG/OBJ layers or enabling forced blank midframe applies immediately. * enabling layers or disabling forced blank takes two scanlines to apply. * however, the OBJ mosaic counter starts counting again immediately when the OBJ layer is re-enabled. (TODO: verify for BG layers) * disabling/enabling windows applies immediately. WHEN DISABLING A 2D UNIT VIA POWCNT1 * internal rotscale ref points, and mosaic counters, aren't updated * TODO check if windows are updated * unit B outputs white. * unit A outputs black. * when re-enabling, the next OBJ scanline will be what it was prior to disabling (due to rendering sprites one scanline in advance) * POWCNT1 bit1 does not affect VRAM display (and presumably mainRAM FIFO) * does not affect master brightness (still applied to the "disabled" output) * does not affect display capture * disabling a 2D unit between scanlines 262 and 0 causes it to not properly reset the rotscale ref points * same goes for the mosaic counters POWCNT1 BIT 0 seems to cut off the LCD video signal? as well as the power, presumably. toggling that bit midframe does weird shit. prolly not good for hardware. two of my DS's emit repeated clicks when doing this. when disabling the screens, they become black, backlight off. when enabling them, they seem to stay black or white (? prolly depends on screen make/etc) for a moment before displaying. VCOUNT register is 9 bits wide. it gets reset to 0 at the end of scanline 262, normally. but you can write whatever value to it. (it gets applied at the next scanline) if you set it to 263, the next VCount is 264, and so on. it just keeps counting until it reaches 511 and wraps back to 0. if you set it to 262 or more, and the previous VCount is 192..260, the VBlank flag in DISPSTAT doesn't get cleared until the end of the next frame. this also suppresses the VBlank IRQ for the next frame. if you set it to 262 or more while the previous VCount is 261, the VBlank flag does get cleared. order of operations: 1. at end of scanline, VCount is incremented 2. VCount is checked for values 192 or 262, to adjust the VBlank flag (and raise IRQ upon rising edge) 3. if a value was written to VCount, it is applied 4. VMatch WRITING VCOUNT DURING ACTIVE DISPLAY if VCount is advanced, ie. scanlines beyond 191 end up on screen: I thought it would just display white pixels, but it's weirder than that, and what actually happens seems to depend on hardware. different results observed on DSPhat, DSLite, DSiXL. might be entirely up to the LCD itself, since apparently this messes with the video signal... I wanted to test with my capture-card DS, but the capture card just gives up in that situation. I guess it doesn't receive enough data to form a full frame, or something. DISPLAY CAPTURE clearing up a few things: * busy bit in DISPCAPCNT is always cleared upon VBlank, even if the capture size is smaller * stride for input B (VRAM, and display FIFO, too) is always 256, even if the capture size is 128x128 (which means the output stride is 128) * DISPCAPCNT is completely writable at any time. * the capture start bit is latched upon scanline 0 (VCOUNT transition 262->0), and cleared upon VBlank (if and only if it was latched). * the VRAM address capture writes to is based on VCOUNT. ____________________ Kuribo64 |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1085/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
oh by the way
DISP_MMEM_FIFO I researched how 8-bit writes work * byte 0: behaves like a 16-bit write to halfword 0. value is duplicated across the entire halfword. * byte 1: ignored. * byte 2: behaves like a 16-bit write to halfword 1. value is duplicated across the entire halfword. does not advance the write pointer. * byte 3: advances the write pointer. value is ignored. similarly, for VCOUNT byte writes just write to one or another half of the VCOUNT override value, and set the 'apply override at next scanline' flag. ____________________ Kuribo64 |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1106/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
CART INTERFACE
* there are separate register sets (and separate cart interfaces) for ARM9 and ARM7 * cart command regs (040001A8) aren't readable * if the cart interface isn't selected on the current CPU, it only reads FFFFFFFF - the actual hardware still works, just can't access the cart * doesn't apply if there's no cart inserted - both sides read 00000000 * SPI behaves similarly * cart commands while no cart is inserted read FFFFFF00 on DS, 00000000 on DSi * cart commands with ROMCTRL.bit29=0 read garbage (ie FFFF3F00 on DSi) * SPI works regardless of ROMCTRL.bit29 DSi SPECIFIC unlike what GBAtek implies, the two interfaces are fully working - there's actually a new EXMEMCNT bit for interface 2 (bit 10) * interface 1: regs: 040001Ax/04100010, transfer IRQ: 19, cart IRQ: 20, NDMA start mode: 04, EXMEMCNT bit: 11 * interface 2: regs: 040021Ax/04102010, transfer IRQ: 26, cart IRQ: 27, NDMA start mode: 05, EXMEMCNT bit: 10 SCFG_MC state=0: cart is off state=1: cart is on but in reset - reads FFFFFFFF state=2: cart is on and ready state=3: turn cart off (switch to state 0) after timer trying to switch to state 1 or 2 with no cart inserted forcefully switches to state 3, and triggers a card IRQ. (same on slot 2, but there's no card IRQ) ejecting the cart switches to state 3. switching to state 0 in any way (directly or through state 3) triggers a card IRQ if the cart was previously on. similarly, switching to state 0 (from a previous on state) clears bit 29 in ROMCTRL. if a cart is inserted and powered on: toggling bit 15 (interface swap bit) triggers a card IRQ on the interface which was connected to the actual cart (ie to slot 1). NOTE on bit15: this bit switches which addresses each interface responds to, as well as the IRQ lines, DMA trigger, etc. so, whatever values were in interface 1's registers appear at interface 2's, and vice versa. LOGIC STUFF state=0: power off, /RES low, no clock output - reads 00000000 or so state=1: power on, /RES low, no clock output - reads FFFFFFFF state=2: power on, /RES high, clock output - reads normal state=3: power on, /RES low, no clock output - reads 00000000 or garbage /RES is held low (active) if either the SCFG_MC state is not 2, or if ROMCTRL bit29 is cleared. however, ROMCTRL bit29=0 doesn't stop clock output. ____________________ Kuribo64 |
| Arisotura |
| ||
![]() Big fire melon magical melon girl Level: 64
Posts: 1108/1113 EXP: 2123836 Next: 90261 Since: 03-28-17 From: France Last post: 2 days ago Last view: 20 hours ago |
ROM TRANSFER UPDATE
transfers when WR=1: "DELAYS DO NOT APPLY WHEN THE WR BIT IS SET" -> not true. DRQ gets set immediately after starting a command, for the first data word. for the second data word, it gets set after the command + delays (+ first word?) are transferred. transfers with no data still cause DRQ to get set, which may cause DMA to fire spuriously (suggesting that the DMA trigger is a level trigger). DRQ will remain set until 0x04100010 is written to. toggling WR doesn't clear DRQ. writing to 0x04100010 doesn't clear DRQ if WR was cleared. toggling AUXSPICNT bit13 or bit15 doesn't clear DRQ (and doesn't affect ROMCTRL at all). GCDATAIN (0x04100010) this register is a 8-byte FIFO. the same FIFO is used for both transfer directions. reading and writing to GCDATAIN uses the same FIFO pointer (named "FIFO CPU-side pointer" in the lack of a better name - other pointer is on the cart side). reading from GCDATAIN returns the last FIFO entry. if the WR bit in ROMCTRL is cleared, it also advances the FIFO CPU-side pointer to the next word. when reading GCDATAIN in 16-bit or 8-bit units: any read will advance the CPU-side pointer. writing to GCDATAIN only works if the WR bit in ROMCTRL is set. then, it also advances the FIFO CPU-side pointer to the next word. when writing to GCDATAIN in 16-bit or 8-bit units: 16-bit or 8-bit writes work correctly, but the CPU-side pointer is only advanced when accessing the high-order byte (0x04100013). HARDWARE BUG when WR=1 if the last delay gap to apply at the beginning of a transfer isn't a multiple of 4, the hardware may accidentally send out one data word even if the FIFO is empty (ie if you fail to respond to the very first DRQ in time). the data that gets sent is whatever was in the FIFO. this glitch will also break the rest of the transfer by causing some weird offsetting (words getting written to GCDATAIN at the same time as they're being sent out). unsure if this also happens between 0x200-byte blocks. no idea why or how this even happens. for example: gap1=0, gap2=0: no glitch gap1=1, gap2=0: glitch gap1=2, gap2=0: glitch gap1=3, gap2=0: glitch gap1=4, gap2=0: no glitch gap1=1, gap2=0: glitch (last gap to apply was gap1) gap1=1, gap2=1: glitch gap1=1, gap2=2: glitch gap1=1, gap2=3: glitch gap1=1, gap2=4: no glitch (last gap to apply was gap2) gap1=3, gap2=0: glitch gap1=4, gap2=0: no glitch AUXSPICNT bit13/15: if they're "set wrong", ROMCTRL is still read/writable, but won't start transfers changing AUXSPICNT won't start a transfer if ROMCTRL bit31 is set. it needs to be toggled again. clearing AUXSPICNT bit13 during a SPI transfer causes /CS to go high (released). setting that bit again causes it to go back to low (active). toggling AUXSPICNT bit15 during a SPI transfer has no effect on /CS. ____________________ Kuribo64 |
| Main - Development - GBAtek addendum/errata | Hide post layouts | New reply |
|
Page rendered in 0.139 seconds. (2048KB of memory used) MySQL - queries: 29, rows: 98/98, time: 0.028 seconds.
Acmlmboard 2.064 (2018-07-20)© 2005-2008 Acmlm, Xkeeper, blackhole89 et al. |