Views: 6,855,040 Homepage | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search 04-19-24 04:33 AM
Guest:

Main - Posts by PoroCYon

Pages: 1 2
PoroCYon
Posted on 12-01-19 05:59 PM, in GBAtek addendum/errata (rev. 5 of 11-16-20 10:38 PM) Link | #1406
Some that need to be confirmed:

DSP

  • DSP_PSTS bits 10..12 (REP0..REP2) are active-high (as in, 1=was written by DSP), while GBAtek says they're active-low

  • DSP_PCFG bits 12..15 have an undocumented transer mode (7: ARM9 bus loopback): transfers to/from the ARM9 bus, cf. DSP-internal DMA transfer mode 7. This mode requres some additional setup: you first need to set the following DSP-internal DMA registers to the following values (using transfer mode 1):

    [0x81BE] = 0 // select channel 0
    [0x81C6] = 0xABCD // destination address, high 16-bit
    [0x80E2] = 0 | (0<<1) | (1<<4) // configure AHBM (DSP->ARM9 DMA) // example value works for 16-bit transfers (see GBATek/Teakra docs for details)
    [0x80E4] = (1<<9) | 1<<8) // resp. mandatory bit, direction (0=read, 1=write) (see GBATek/Teakra docs for details)
    [0x80E6] = 1 // enable channel 0

    Then perform a transfer as follows (the example writes 0x1337 to 0xABCDEF98):

    DSP_PADR = 0xEF98 // destination address, low 16-bit
    DSP_PDATA = 0x1337 // for a write, read from this address for a read

PoroCYon
Posted on 12-10-19 01:04 AM, in How to run Melon DS on Ubuntu? Link | #1413
sudo apt-get install libsdl2 (or maybe 'libsdl2-2.0-0' instead of 'libsdl2')

PoroCYon
Posted on 09-28-20 01:10 PM, in Melonds not working since the update on linux Link | #2451
Try running "ldd path/to/melonDS", melonDS switched from pcap to slirp, ldd might tell you you're missing the libslirp libraries.

Or you can try installig libslirp and see if that worked.

PoroCYon
Posted on 10-24-20 06:56 PM, in Framerate dropping while recording Link | #2618
that or, if you're lucky, it's only your OS scheduler messing things up. you could try upping the priority of the melonDS process, and try switching to a less CPU-intensive encoder on the OBS side

PoroCYon
Posted on 10-24-20 08:29 PM, in Framerate dropping while recording Link | #2622
Posted by Generic aka RSDuck
it's just that the Windows scheduler gives defocused graphical applications less cpu time, because if you don't focus them, you probably aren't interested in seeing what's happening in them at fullspeed.


either that, or the "window idle" event firing less often for the same reason (if that's being used to run a main loop iteration, but I'm not sure about the latter)

PoroCYon
Posted on 11-16-20 10:05 PM, in If you'd like to translate melonDS: read this post Link | #2758
I can do Dutch (and Lojban), and you know the answer to the other questions I think.

PoroCYon
Posted on 11-16-20 10:10 PM, in How do you run? Link | #2760
Which Qt version do you have installed? Which distro + version are you running?

Because to me this sounds a lot like https://github.com/Arisotura/melonDS/issues/751 .

PoroCYon
Posted on 11-16-20 10:30 PM, in GBAtek addendum/errata (rev. 9 of 03-01-21 09:21 PM) Link | #2762

IR cartridges

IR cartridges seem to work as follows, but I'd like to have someone else to verify this (seems to work with HGSS, BW and B2W2 carts, idk about others):

Everything automatically happens at 115200 baud, 8n1.

There seem to be three main SPI commands that are sent to what normally would be the savegame SPI bus, there's a fourth command to perform actual savegame operations. All transfers happen at 1 MHz (serial AUXSPI mode) unless indicated otherwise.

The cartridge needs to be powered on, but nothing more besides this. No header reading or KEY1/KEY2 init, and so on. (I rebooted the cart with SCFG_MC and started doing SPI commands immediately afterwards, seems to work fine).

This seems to be relevant for pretty much all NTR-031 carts, so Pokémon HGSS, BW, B2W2, "Walk With Me" and similar games, ...

The commands:
  • 0x00: savegame escape byte: as long as chip select is held, the bytes that follow will be treated like a regular savegame transfer. These can also happen at any clock speed, but the 0x00 byte itself needs to be transferred at 1 MHz. (This bit was already known.)
  • 0x01: receive data from IR: one command byte (0x01) is written, after which data bytes are read by the DS. The first byte read indicates the amount of bytes that will follow. It doesn't seem to be able to receive more than 255 bytes afaics. Bytes written to perform the reads are unused as far as I know, but usually set to 0 (HGSS does this, at least).
    • When there are zero bytes to read, you still have to deselect the SPI chip, or the next transfer will fail. Disable the 'chip select hold' bit in AUXSPICNT and send a zero byte.
  • 0x02: send data over IR: this one has no length prefix, chip select is used to determine when the transfer ends, as usual.
  • 0x03: write byte to in-cart IR MCU RAM: send high addr byte and low addr byte, then send a data byte. Writes the data byte to the specified address in the in-cartridge IR MCU. Discovered by nocash.
  • 0x04: read byte from in-cart IR MCU RAM: send high addr byte and low addr byte, then read a data byte. Can be used to dump the ROM inside the in-cart IR MCU. Discovered by nocash.
  • 0x05: write word to in-cart IR MCU RAM: send high addr byte and low addr byte, then send two data bytes. Discovered by nocash.
  • 0x06: read word from in-cart IR MCU RAM: send high addr byte and low addr byte, then read two data bytes. Discovered by nocash.
  • 0x07: mystery command, purpose unclear. Discovered by nocash.
  • 0x08: not too sure about this, but probably a status thing. A command byte (0x08) is sent, and a status byte(?) is received from the cart. HGSS seems to always send two of these one after another, carts seem to return 0x00 on the first one and 0xaa for the second, unless other IR devices are sending actively, then both bytes are 0xaa. Allegedly, "Walk With Me" and similar games don't have this command.
HGSS, while trying to connect to a Pokéwalker, seems to first do a cmd 0x01, which returns 0 bytes, then 0x08 twice, after which it repeatedly issues other 0x01 commands until the return data of one of these indicates a Pokéwalker presence (the Pokéwalker sends out a fixed byte value as 'beacon' thing, the game will receive a 1-byte packet containing that beacon). 0x08 is never used again after being used twice in the beginning as far as I can see (but I might be wrong).

Allegedly, the chip in carts responsible for IR is another H8/38606F (or 38602F?) (connected to the SPI bus on one side, and to some IR leds or so on the other).

[UPDATE: 2021-03-01: added info on cmd 0x03..0x07, and non-Pokémon games. info from nocash, not me.]

PoroCYon
Posted on 01-15-21 02:57 PM, in Why does Linux version weigh more than 1 GB? Link | #3144
Linux binaries distributed here are built using flatpak in order to have less hassle with dependencies etc. If you want a (much) smaller binary, you can always check if your distro has a melonDS package, or you can try compiling from source.

PoroCYon
Posted on 02-01-21 05:10 AM, in GBAtek addendum/errata (rev. 9 of 02-01-21 07:35 PM) Link | #3240
Recently I've done some timing tests with the DSi's NDMA units. These are all my findings:

NOTE: all testing has been done as main RAM->main RAM (functional), or as TIMER0_DATA->main RAM (priority/timing). CPU timing comparison was done with TIMER0_DATA->DTCM.

I did not yet test the interaction between the DSP's DMA capabilities, NDMA, and the ARM9.

Function


  • GCNT bit 0 does nothing. 3dbrew made it sound like it could be used to enable/disable NDMA globally, but it doesn't seem to do anything?
  • inc/dec/fix modes, FDATA work as expected.
  • Both the physical and logical block counts are just word sizes (or log2 thereof in the case of the physical block count). The logical block count (WCNT) is just the total amount of words transferred, it's not used a multiple of the physical block count, it's only a word count. It also doesn't have to be a multiple or has to be aligned or anything like that.
  • When accessing peripherals/devices, WCNT signifies the number of words that should be transferred by one peripheral event (when using the corresponding startup mode), and before the next peripheral event. When the device uses a FIFO, this register should contain the size of the FIFO, in words.
  • The timer (BCNT) doesn't implement any sort of timeout/deadline/... for the transfer to finish or it'll be cancelled, it's only for inserting delays. (Not sure why I tested this, I might've just been confused by the name.) It is meant to insert delays, see below for more details on that.
  • TCNT RESET mode does indeed reset the src or dest address after each logical block, not after each physical block.

Access ordering and priority


  • NDMA has priority over old DMA. This is especially visible when there are multiple ODMA and NDMA channels waiting to be resumed as soon as one NDMA channel is suspended through BCNT: first the NDMA channels will be picked (first to last), and ony then the ODMA channels. ODMA transfers cannot be suspended during their entire lifetime, however, not even by GCNT round-robin mode (unlike NDMA). That is, an ODMA transfer acts like a single NDMA physical block transfer.
  • Letting NDMA channels suspend using BCNT seems to restart the next available one. GCNT setting seems to change nothing in this situation
  • GCNT does indeed seem to be made to let the CPU have some time to do stuff while DMA is running.
  • Scheduling between NDMA channels (BCNT) and between NDMA and the CPU (GCNT) seems to happen according to the following rules:
    • Under NO circumstances is a physical block transfer interrupted, paused, aborted, or anything. It always completes once it has started, regardless of whatever may happen (except maybe a full power down of the entire SoC).
    • NDMA startup seems to have a few cycles delay, enough for the CPU to access the bus once or twice (in my case, copy TIMER0_DATA to somewhere in main RAM). Sadly, I don't have an exact number. (However, this seems to contradict the earlier observations wrt. ODMA vs NDMA startup priority behavior?) At this point, the "GCNT timer" starts.
    • Once a physical block has been transferred, the channel suspends for the time specified in BCNT (or, if it is 0, immediately continues with the next block). If the next channel is enabled, this one will start transfers. If no others are enabled, the CPU is now allowed to master the bus. This is true even if GCNT is set to highest-priority!
    • Once the "GCNT timer" expires, first, the hardware waits for the current physical block to finish transferring (if there is any currently happening), once that's finished, the CPU is allowed to run (for the same amount of cycles as specified in GCNT? I think it is). If all channels are either inactive or suspended due to BCNT, the CPU is resumed immediately.
    • In other words, only BCNT controls the priority and switching between NDMA channels, only GCNT controls switching between NDMA globally and the CPU. GCNT knows nothing about scheduling between DMA channels themselves, BCNT however, does seem to be at least slightly aware that it can relinquish control over to the CPU when all NDMA channels are either suspended or inactive. GCNT switches bus master every period, while BCNT causes a pause after every physical block transfer, it doesn't do a suspend/resume after every timer period.
  • As every channel is at least suspended for the amount of time specified in BCNT, channels can be fired in at least a round-robin fashion. However, when the BCNT period is smaller than the time it takes for a physical block to be transferred, the lowest pending channel is always resumed first, (regardless of GCNT setting)(?).

Timing stuff


  • NDMA seems to copy data as fast as ODMA (often pretty much exactly the same speed). However, NDMA seems to have slightly less variance on the timing in some tests, but more in others?.
  • Physical block sizing/moving to the next one doesn't seem to cause any delays (as in, there's no delay between finishing one physical block and starting the next one), or at least not as far as I've noticed. This is also true for writing to main RAM, so, a new phsyical block is still considered sequential access.
  • Main RAM writes are slow, lmao (at least when comparing main RAM->VRAM and VRAM->main RAM copies, VRAM->main seems to need 40% extra cycles (60k vs 100k timer ticks) for a copy of bank A).
  • Pausing transfers using BCNT between physical blocks does seem to insert delays, however, with a VRAM->main RAM copy, these seem to be disproportionate to the actual delay inserted. Therefore, this probably does cause new nonsequential accesses on the start of the next block. HOWEVER, timing variability seems to go down DRASTICALLY with higher BCNT values!
    • `BCNT=0 P16 32kwords vram2main` -> `N=100897 sigma=4005`
    • `BCNT=1 P16 32kwords vram2main` -> `N=110634 sigma=1635`: higher than expected
    • `BCNT=8 P16 32kwords vram2main` -> `N=118801 sigma= 597`: lower! than expected
    • `BCNT=16 P16 32kwords vram2main` -> `N=135170 sigma= 255`: lower! than expected
    • `BCNT=0 P16 32kwords main2vram` -> `N= 67846 sigma=3128`
    • `BCNT=1 P16 32kwords main2vram` -> `N= 83990 sigma= 920`: higher than expected
    • `BCNT=8 P16 32kwords main2vram` -> `N= 94221 sigma= 784`: lower! than expected
    • `BCNT=16 P16 32kwords main2vram` -> `N=110595 sigma= 463`: lower! than expected
    • `BCNT=0 P16 32kwords vrmA2vrmB` -> `N= 65623 sigma=3112`
    • `BCNT=1 P16 32kwords vrmA2vrmB` -> `N= 69651 sigma= 816`: ok
    • `BCNT=8 P16 32kwords vrmA2vrmB` -> `N= 83974 sigma= 226`: ok
    • `BCNT=16 P16 32kwords vrmA2vrmB` -> `N=100352 sigma= 447`: ok, WTF stddev?
    Sample size of 256 for all cases. `P16` means a physical block size of 16 words. Units of `N` and `σ` are timer 0 ticks (clockdiv set to 1, so 33 MHz).
  • GCNT round-robin mode timing does seem to add delays, linear(?) to the amount of cycles specififed in GCNT (so exponential to the actual value in that register).
    • `GCNT=1<<0 P1 32kwords vrmA2vrmB' -> `N=65630 sigma= 3543`
    • `GCNT=1<<1 P1 32kwords vrmA2vrmB' -> `N=65625 sigma= 3216`
    • `GCNT=1<<4 P1 32kwords vrmA2vrmB' -> `N=65623 sigma= 3117`
    • `GCNT=1<<8 P1 32kwords vrmA2vrmB' -> `N=65680 sigma= 3202`
    • `GCNT=1<<a P1 32kwords vrmA2vrmB' -> `N=66017 sigma= 4176`
    • `GCNT=1<<c P1 32kwords vrmA2vrmB' -> `N=67530 sigma= 5084`: +3.125% cf. 0
    • `GCNT=1<<e P1 32kwords vrmA2vrmB' -> `N=73650 sigma= 9418`: +12.5%
    • `GCNT=1<<f P1 32kwords vrmA2vrmB' -> `N=81810 sigma=16886`: +25%
  • FDATA/FILL mode doesn't incur an access penalty, unlike ODMA. It's fast. Clearing an entire VRAM bank takes 32k cycles, probably fewer when you enable the 32-bit VRAM bus enhancement.
  • Just like other types of accesses/DMA, copies with the source and destination within the same region (eg. both in main RAM, both in the same VRAM or (N)WRAM bank), are really slow (intra-main RAM can be up to 6 times slower than a VRAM-to-main RAM copy, inter-VRAM A takes 1.5x the time of a VRAM A -> VRAM B copy. The physical block size, again, has no effect on whether an access is deemed sequential or nonsequential.

(post in restricted forum)

(post in restricted forum)

PoroCYon
Posted on 06-27-21 08:00 PM, in the DSi findings stash Link | #3900
(very late reply, I know, oh well)
Posted by Digifiend
ARM7 bootloader dumped? Might be useful later, but not right now. Isn't ARM9 needed as well?

Technically yes, but, trying to dump the ARM9 one has proven to be much harder than the ARM7's. It'll take A While (no ETA) even n months later.

PoroCYon
Posted on 07-09-21 12:20 PM, in GBAtek addendum/errata Link | #3979
In the past I once tried overlapping NWRAM and the IO space (by the request of, either profi200 or normmatt iirc), and reads always returned IO stuff. maybe a write would go to both, didn't test that, but I think it did end up at at least the IO registers, iirc. (Testing wasn't very thorough, though, we mainly wanted to know if we could exploit possible IO r/w redirection stuff, and the result is 'no'.)

PoroCYon
Posted on 08-19-21 11:05 PM, in GBAtek addendum/errata (rev. 7 of 08-22-21 12:11 AM) Link | #4268

Aptina MT9V113 internal MCU stuff


The Aptina cameras have an internal 68HC11-based MCU. Parts of its address space can be accessed through the XDMA registers (0x98c, 0x990, reachable over I2C)
There's a "physical" address range (0x0000..0x1fff) and a "virtual"/"logical" one (0x2000..0x3fff). The former can be used to access "system" and "user RAM" (resp. 0x0000..0x03ff and 0x0400..0x7ff), as well as "Special Function Registers" (SFRs), basically just MMIO regs of the HC11. The latter allows access to the variable spaces of several "firmwares"/"services" running on the MCU, used for autofocus, autowhitebalance, etc. Each camera has a separate HC11.

The above is already kinda known, but now for the new stuff. (Keep this section of GBATEK at hand while reading this, as I guess only a handful people ever looked at the cameras to begin with.)

(NOTE: as the HC11 is an 8/16-bit MCU, pointers etc. are 16-bit (its address space is 16 bits wide). Also, it's a big-endian architecture, keep that in mind.)

SFRs

  • 0x1040: watchdog reset: write 0 to this address to calm down the watchdog timer.
  • 0x1048: "high-precision timer": 32-bit timer value that probaly increases by 1 every 16 MHz tick.
  • 0x1050: "pagetable" pointer: pointer to a list of 16 addresses used to determine to which physical addresses the virtual ones will resolve.
  • 0x1060..0x1066: "ring bus access" or so, not too sure what this is, but it looks like it gives you some sort of DMA access. Sadly, I don't know which registers in this range are address ones, and which ones are used for data.
  • 0x1070: GPIO (already documented on GBATEK, didn't touch it myself.)

Address translation


As alluded to in the previous part, SFR 0x1050 is used to set the mapping. Its value is 0x0100, which means we can access and modify the "pagetable mapping" over I2C. (I'm using quotes here because it's far from anything like a real MMU.)

At 0x0100, a list of 16 pointes can be found: 0x0140 (MONITOR), 0x0000 (SEQ), 0x005d (AF), 0x0165 (AWB), 0x01d4 (FD), 0, 0, 0x282 (MODE), 0, 0, 0, 0x0220 (HG), and then more zeros.

This matches what you'll find when dumping the 0x2000..0x2fff area. While system RAM has space for 16 more pointers (starting at 0x0120) for addresses 0x3000..0x3fff, in practice these are mirrors of the 0x2*** range.

Setting a pagetable entry to an address of 0x2000 or higher seems to return 0 values, maybe there's a carveout, or maybe there's just nothing behind those addresses. Additionally, it does seem to mirror the upper half of memory (0x8000 and up) to the lower half, which is *quite* suspicious as a datasheet says there's 32 kilobytes of firmware ROM, and the exception vectors of the HC11 are at 0xfffe etc. (6502-style) So I'm betting the firmware ROM is in the upper half of MCU memory.

System RAM layout


With the above, we can start building a map of the system RAM space. As one datasheet (links at the end of this post) alludes to the stack of the HC11 firmware also being in system RAM and being 128 bytes in size, it's not too hard to guess where it is.
  • 0x000..0x047: SEQ
  • 0x05d..0x0c0: AF
  • 0x100..0x11f: page table
  • 0x120..0x13f: unused shadow pagetable? idk
  • 0x140..0x168: MONITOR
  • 0x165..0x1d2: AWB
  • 0x1d4..0x1f4: FD
  • 0x220..0x286: HG
  • 0x282..0x2ea: MODE (yes this overlaps with HG)
  • 0x300..0x37f: ??? (maybe stack but I doubt it)
  • 0x380..0x3ff: stack
The 0x380 region seems to be 0xdeadbeef (big-endian)-filled and grows from high addresses to low ones. Additionally, it's one of the RAM regions that seems to get reset regularly (see below). This gives me a relatively high conficence to say it is the stack of the HC11.

User RAM (0x0400 and on) doesn't seem to be writable, sadly. Or maybe I'm missing some kind of magic switch.

Running code on the HC11


A datasheet alludes to using the MONITOR variables to run code: arg1 is the pointer of the code to run, arg2 an optional argument (where is this put in? the accumulators?), and then set cmd to 0x01 to start the code.

Sadly, this did not work for me: it resets the MCU, while doing not much else. Maybe it needs some kind of CRC (which the datasheet also alludes to), or maybe the feature is just locked away. (EDIT: maybe the resetting thing would be added to have the XDMA thing avoid messing with internal state. But does using the virtual addressing mode bypass it? Would chaining using another peripheral (the "ring bus access" maybe?) be able to be used as a bypass? I haven't tested.)

Then I tried putting some shellcode in low system RAM and filling the stack with bad addresses (that is at the same time a NOP sled). Sadly this didn't work either, as the MCU also got reset before it got to execute my code. Maybe I'm triggering some kind of (bad) crash, or maybe there's a safety mechanism that automatically resets everything. (I hope it's not the latter.)

(MCU resets seem to clear/reset the pagetable, stack (0x0380 and up), and some of the "logical variable" spaces in system RAM. 0x300..0x37f seems to be preserved, as well as some other small places in low system RAM.)

Maybe this is a fun challenge for someone else here? :P

P.S. I used Pk11's dsi-camera proof-of-concept as a base to mess with the cameras. Not really publishing my code because it's mostly just a spaghetti of I2C accesses and FIFO ugliness. I used vasm as HC11 assembler, though be aware that it gets some opcodes wrong, check with A09 (which doesn't seem to be able to process "org" directives) and this and this opcode listing I found. For more info on the HC11, see this (original DIP CPU info) and this (more modern microcontroller impl, only gave it a cursory glance, idk how useful it is outside of the instruction set info).

Datasheets

Also a neat link: https://files.niemo.de/aptina_pdfs/

PoroCYon
Posted on 11-30-22 07:40 PM, in Remote debugging - GDB Link | #5679
Hi, do you have any update on this? I just wanted to start on implementing a gdbstub myself, but if you have something working already, I could probably save myself the trouble.

PoroCYon
Posted on 12-13-22 05:55 PM, in Remote debugging - GDB Link | #5694
FYI, I ended up writing my own. Feedback would be nice.

(post in restricted forum)

(post in restricted forum)

(post in restricted forum)
Pages: 1 2

Main - Posts by PoroCYon

Page rendered in 0.103 seconds. (2048KB of memory used)
MySQL - queries: 22, rows: 109/109, time: 0.015 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2018-07-20)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.