Views: 6,934,346 | Homepage | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search | 04-26-24 05:45 PM |
Guest: |
Main - Posts by Arisotura |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 221/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
that sounds plenty, considering mine is 2.4GHz and it manages to run some shit at 60FPS
but that also depends what you're trying to run also, it doesn't use the GPU, except for drawing the final screens to the window ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 222/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
mhh
take a screenshot of the directory melonDS 0.7 is in? with all the files ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 223/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
other than that, all the files seem to be here and good ninja'd anyway might have to do with not finding melonDS.ini, there was a related bug but weird shit check the archive you downloaded -- I had updated it to include a stock melonDS.ini so this wouldn't happen ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 224/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
well there were multiple issues in some code meant to look for config files in AppData
* got a string buffer with the appdata path base, resized it, but didn't complete it, leaving the end uninitialized * CoTaskMemRealloc() can move the buffer if needed, which was not accounted for, so possibly trying to access freed memory ie. bad bad bad and likely why it crashed at random ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 225/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
https://github.com/StapleButter/melonDS/blob/master/src/libui_sdl/Platform.cpp#L358
alldevs is a linked list. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 226/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
we have an IRC channel. irc.badnik.net, #melonDS
Discord is bad and can go fuck itself. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 227/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
it's a software renderer.
there might be some obscure emulator somewhere that uses DirectX? not sure. they seem to all use either OpenGL or software renderers. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 228/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
AppData/Roaming/melonDS
you can put melonDS.ini and BIOS/firmware there if you wish. you don't have to. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 229/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 230/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
noting. if I can make it work with SDL.
not sure how it'd work for keyboard input and if that would be desirable tho. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 231/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
working out NS timings
LDR repeated 0x10000 times. cache disabled. overhead=8 consistently. 02000000: 1196658 (consistent) -> 18 cycles. 9 code, 9 data. 05000000: 759174-765830 -> 11 cycles. 9 code, 5 data, 3 cycle gain (parallel-ish) 06800000: 729405-737764 -> same. 07000000: 663869 (consistent) -> 10 cycles. 9 code, 4 data, 3 cycle gain. FFFF0000: 663869 (consistent) -> same. repeated NOP: mainRAM: 9 ITCM: 0.5 LDR repeated 0x1000 times. cache disabled. code in ITCM. 02000000: 36866-36982 -> 9 cycles. 0.5 code, 9 data, gain can only be as much as 0.5. 05000000: 23254-23271 -> 5 cycles. same shit. 06800000: 20482 consistently 07000000: 16386-24719 -> 4 cycles or 6 cycles. weird. 4 cycles data. FFFF0000: same STR repeated 0x10000 times. cache disabled. overhead=8 consistently. 02380000: 1196658 or 1205017 05000000: 756161-774193 06800000: 729405-737764 07000000: 663869-672228 FFFF0000: 663869-672228 (same numbers as above) STR repeated 0x1000 times. cache disabled. code in ITCM. 02380000: 36864-36978 05000000: 23257-23271 06800000: 20482 consistently 07000000: 16386-24719 (one or the other?? weird. either 4 or 6?? alignment of 66MHz cycles to bus shito??) FFFF0000: same pretty similar timings for read and write. ARM7 ---- running from WRAM (normal shit) 00000000 -> 3 (1 code fetch + 1 data fetch + 1 internal??) 01000000 -> 3 02000000 -> 9 (1 code fetch + 1 data fetch + 1 16bit-penalty + 1 internal + 5 penalty) 03000000 -> 3 03800000 -> 3 04000000 -> 3 04800000 -> 14 (1 code + 1 data + 1 16bit-penalty + 1 internal + 10. I guess) 04808000 -> 14 06000000 -> 4 (1 code + 1 data + 1 16bit-penalty + 1 internal) 08000000 -> 18 (1 code + 1 data + 1 16bit-penalty + 1 internal + 14 penalty) 0F000000 -> 3 FFFF0000 -> 3 running from VRAM 00000000 -> 4 (1 code fetch + 1 16bit-penalty + 1 data fetch + 1 internal??) 01000000 -> 4 02000000 -> 9 (1 code fetch + 1 16bit-penalty + 1 data fetch + 1 16bit-penalty + 1 internal + 5 penalty ???? doesn't fit) 03000000 -> 4 03800000 -> 4 04000000 -> 4 06000000 -> 5 (1 code + 1 16bit-penalty + 1 data + 1 16bit-penalty + 1 internal) 08000000 -> 19 (1 code + 1 16bit-penalty + 1 data + 1 16bit-penalty + 1 internal + 14 penalty) running from mainRAM 00000000 -> 9 01000000 -> 9 02000000 -> 18 03000000 -> 9 03800000 -> 9 04000000 -> 9 06000000 -> 9 (1 code + 1 16bit-penalty + 7 penalty + 1 data + 1 16bit-penalty + 1 internal - 3 gain) 08000000 -> 23 (22 when writing) (1 code + 1 16bit-penalty + 7 penalty + 1 data + 1 16bit-penalty + 1 internal + 14 penalty - 3 gain) STR seems to get 1c penalty when accessing same memory as code main RAM is always 9c. as if it was somehow able to do parallel accesses, when the other fetch is in another memory region. with a max gain of 3c, like the ARM9. this also eats up internal cycles. so, seems the penalty is 7c, like on the ARM9. timings for 32bitbus/mainRAM/wifi0/wifi1/VRAM/GBA wifi0 = 2 (6/6) wifi1 = 7 (18/4) LDR unaligned: no change LDMIA code in mainRAM: 1r: 9 / 18 / 19 / 29 / 9 / 23 2r: 9 / 20 / 31 / 37 / 11 / 35 max gain: 3c (2c on memory timings, LDMIA has 1I) NOP code in mainRAM: 2c (sequential code fetch) LDRH TIMINGS code on WRAM/VRAM/mainRAM timings for 32bitbus/mainRAM/wifi0/wifi1/VRAM/GBA WRAM: 3 / 8 / 8 / 20 / 3 / 12 VRAM: 4 / 8 / 9 / 21 / 4 / 13 mainRAM: 9 / 17 / 13 / 25 / 9 / 17 STRH TIMINGS code on WRAM/VRAM/mainRAM timings for 32bitbus/mainRAM/wifi0/wifi1/VRAM/GBA WRAM: 2 / 8 / 7 / 19 / 2 / 11 VRAM: 3 / 8 / 8 / 20 / 4 / 12 (noting penalty for storing to same region as code) mainRAM: 9 / 17 / 12 / 24 / 9 / 16 same effect observed with code in mainRAM. internal/data cycles seem to get merged with code cycles, for a max gain of 3c. like, GBA: from WRAM: 1 code cycle, 10 data cycles. from mainRAM: 9 code cycles, 10 data cycles, 3 gain. noting we still get the internal cycle if data>code. internal cycle is lumped with data. wifi timing is 8/20 (5/17), compared to 14/24 (10/20) in 32bit mode. odd. the timings are nice and clean, nothing like the ARM9. also, crap, forgot about the LDR internal cycle for the ARM9 part. then again the ARM9 does some weird parallel shito. oh well. also its internal cycles are weird. fuck the ARM9. wifi timings are weird: WIFIWAITCNT 002A (2,5): 14,14 (10,10) 003A (2,7): 14,24 (10,20) 003B (3,7): 26,24 (22,20) timings barring code cycles, 32/16: WS0: 0: 16/10 1: 14/8 2: 12/6 3: 24/18 4: 14/10 5: 12/8 6: 10/6 7: 22/18 WS1: 0: 20/10 1: 18/8 2: 16/6 3: 28/18 4: 14/10 5: 12/8 6: 10/6 7: 22/18 weird as fuck. actually kind of similar to the EXMEMCNT settings for GBA shito. 16bit timings are always 10/8/6/18. same as EXMEMCNT. bit0-1 set the base timing. bit2 sets the 2nd access timing for 32bit mode. which is: 6/4 for WS0, 10/4 for WS1. weird. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 232/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
soooo, summary of timings
for now, barring shit like GBA slot general rules * 1 cycle baseline for all accesses * 1 cycle penalty when using a 16bit bus for a 32bit access (really two accesses) ARM9 * nonseq penalty of 3 cycles when using the bus (even when accessing unmapped areas) * extra nonseq penalty of 4 cycles when accessing mainRAM (total penalty 7 cycles) * code/data accesses in parallel if in different memory regions. somewhat. weird. gains 3 cycles max. * code fetches forced nonseq 32bit ARM7 * nonseq penalty of 7 cycles when accessing mainRAM * mainRAM accesses can be parallelized to some extent. they can happen alongside internal cycles and accesses to any other memory region. max gain: 3c for code in mainRAM, 5c for data in mainRAM and writing. weird. * separate bus for mainRAM? * data accesses cause simultaneous code accesses to be nonseq. applies everywhere. matters a lot when running code from mainRAM. * writing data to same region as code has 1c extra penalty, except for mainRAM and wifi/gba. DMA * in 32bit mode, transferring from mainRAM to another memory region is 1 cycle faster * 1 cycle penalty if source and destination are the same memory region * if source and destination are mainRAM, all accesses are forced nonseq, resulting in trainwreck timings of 18 cycles/unit in 32bit mode and 16 cycles/unit in 16bit mode. * seems that the maximum length for a sequential burst is 120 units? needs more checking note on 'memory regions', esp VRAM * different VRAM banks are considered different regions! * VRAM address space with no bank mapped is the same as empty space (no 16bit bus penalty for a 32bit access) * overlapping banks don't add penalties or affect timings * shared WRAM is one bank rules for parallel cycles * ARM9: code cycles vs data cycles. max gain 3c. * ARM7: when accessing mainRAM. max gain 3c/5c. * DMA: when reading from mainRAM in 32bit mode. max gain 1c. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 233/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
ARM7 DMA
32/16 wifiwaitcnt: 2/7 mainRAM->mainRAM: 18/16 WRAM->mainRAM: 3/2 IO->mainRAM: 3/2 wifi0->mainRAM: 14/7 wifi1->mainRAM: 10/5 VRAM->mainRAM: 4/2 mainRAM->VRAM: 3/2 WRAM->VRAM: 3/2 WRAM->VRAM: 3/2 wifi0->VRAM: 14/7 wifi1->VRAM: 10/5 VRAM->VRAM: 5/3 (same-region penalty) WIFI DMA TIMINGS since it's weird too setting: WS0 32/16, WS1 32/16 0-3: 14/7, 22/11 4:7: 10/5, 10/5 so, all are sequential and not just 1/2? weird. timings of wifi0->wifi0, wifi1->wifi1 setting: WS0 32/16, WS1 32/16 0-3: 24/12, 40/20 (!!) -> seq cycles 6, 10 4:7: 16/8, 16/8 -> seq cycles 4 ok now I guess it makes sense again? in 32bit mode we just do two accesses and thus double the 16bit timing. no 16bit-bus-penalty, no nonseq shito. no sameregion penalty either. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 234/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
this is only a wild theory but maybe the RNG uses the system time, which advances slower than it might expect if you fast-forward ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 235/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
might just be that some OpenGL drivers for Android are total shit. dunno tho.
Android is crap too tbh. but, like... dunno maybe this is not running in performance mode or whatever. I don't know shit about Android. ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 236/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
well yeah debug builds are slow as shit
that's why the CodeBlocks project has that DebugFast config btw it gets all the optimizations so it's fast, but it gets shit like the debug console though it's not well suited to actual debugging or profiling ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 237/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
that's a great cake there
also well, it's going to be a bit tricky if you don't have a paypal ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 238/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
hey hey hey comrades
how 'bout joining the melonDS IRC so we could exchange about all that irc.badnik.net #melonds ____________________ Kuribo64 |
Arisotura |
| ||
Big fire melon magical melon girl Level: 56 Posts: 239/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
re: semaphores
in the case of the 3D thread, there is a semaphore that is incremented each time the renderer finishes a scanline, and decremented each time a scanline is read to be composited in the final video output. there's another semaphore that tells the renderer when it can start, and one that the renderer uses to signal when it has completed a frame. ____________________ Kuribo64 |
Arisotura |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Big fire melon magical melon girl Level: 56 Posts: 240/889 EXP: 1348093 Next: 50083 Since: 03-28-17 From: France Last post: 1 day ago Last view: 5 hours ago |
Actually GBAtek isn't too far off, as far as individual timings are concerned.
Matrix command timings The timings for commands 0x11-0x1C depend on the current matrix mode. Mode 0:
Mode 1: Timings are identical to mode 0. Mode 2: Timings are identical to mode 0. MULT/TRANS take 30 more cycles.
Mode 3: This mode has completely different timings. Probably because the texture matrices are smaller internally, or because it doesn't have to update the clip matrix, or both. The latter would explain the huge timing difference for command 0x15.
Other commands
All other commands (nop/invalid) take one cycle. Vertex parallel execution Vertex commands are able to execute in parallel with most other commands. Timings are expressed from the moment the vertex command starts. VTX_16 is preceded by one cycle because it takes two parameters, and starts upon the second cycle. Commands 0x20, 0x30, 0x31, 0x72 can run 6 cycles after a vertex command. Commands 0x29, 0x2A, 0x2B, 0x33, 0x34, 0x41, 0x60, 0x71 run 8 cycles after a vertex command (they cannot run in parallel). Commands 0x32, 0x40, 0x70 stall the pipeline (see below for what this implies). All other commands are able to run 4 cycles after a vertex command. Further commands also abide by these rules, atleast until the end of the vertex command. For example: vertex/texcoord/color: texcoord runs 4 cycles after the vertex, color is delayed by one cycle (starts 6 cycles after). Normal parallel execution Normals are able to run in parallel with vertices coming right after. The vertex can run 2/2/3/4/5 cycles after the normal starts, for 0/1/2/3/4 lights enabled respectively. Under these circumstances, further commands don't get delayed until the normal has finished. (maybe some commands do! I haven't tested them all) This explains why "texcoord/normal/vertex" runs faster than "normal/texcoord/vertex". Polygon pipeline Each vertex which completes a polygon places restrictions on when further vertices can run. The process lasts 27 cycles for a triangle and 36 cycles for a quad. This duration is divided into 9-cycle slots in which vertices have to fit. The first slot is obviously occupied by the vertex that is executing (and building a polygon). Exceptions for strips: for triangle strips, all 3 slots are occupied; for quad strips, the first 2 slots are occupied. EXCEPT: the process only lasts one slot if the polygon is rejected by culling/clipping When a vertex starts within one slot, the slot is occupied, and the next vertex is delayed until the next slot. Vertices running outside of the polygon-building process are free of any restrictions, and can run 4 cycles after the start of a previous vertex. Pipeline stalls Commands 0x32, 0x40 and 0x70 stall the pipeline. That is, if they happen during the polygon-building process described above, they are delayed until the end of the process. Commands 0x32 and 0x70 get an extra delay when a pipeline stall happens: 8 and 10 cycles respectively. If 0x32 happens outside of the polygon-building process, it can run 6 cycles after a vertex. ____________________ Kuribo64 |
Main - Posts by Arisotura |
Page rendered in 0.216 seconds. (2048KB of memory used) MySQL - queries: 22, rows: 109/109, time: 0.019 seconds. Acmlmboard 2.064 (2018-07-20) © 2005-2008 Acmlm, Xkeeper, blackhole89 et al. |