Reviving and rewriting paraLLEl-RDP – Fast and accurate low-level N64 RDP emulation

Over the last few months after completing the paraLLEl-RSP rewrite to a Lightrec based recompiler, I’ve been plugging away on a project which I had been putting off for years, to implement the N64 RDP with Vulkan compute shaders in a low-level fashion. Every design of the old implementation has been scrapped, and a new implementation has arisen from the ashes. I’ve learned a lot of advanced compute techniques, and I’m able to use far better methods than I was ever able to use back in the early days. This time, I wanted to do it right. Writing a good, accurate software renderer on a massively parallel architecture is not easy and you need to rethink everything. Serial C code will get you nowhere on a GPU, but it’s a fun puzzle, and quite rewarding when stuff works.

The new implementation is a standalone repository that could be integrated into any emulator given the effort: https://github.com/Themaister/parallel-rdp. For this first release, I integrated it into parallel-n64. It is licensed as MIT, so feel free to integrate it in other emulators as well.

Why?

I wanted to prove to myself that I could, and it’s … a little fun? I won’t claim this is more than it is. 🙂

Chasing bit-exactness

The new implementation is implemented in a test-driven way. The Angrylion renderer is used as a reference, and the goal is to generate the exact same output in the new renderer. I started writing an RDP conformance suite. Here, we generate RDP commands in C++, run the commands across different implementations, and compare results in RDRAM (and hidden RDRAM of course, 9-bit RAM is no joke). To pass, we must get an exact match. This is all fixed-point arithmetic, no room for error! I’ve basically just been studying Angrylion to understand what on earth is supposed to happen, and trying to make sense of what the higher level goal of everything is. In LLE, there’s a lot of weird magic that just happens to work out.

I’m quite happy with where I’ve ended up with testing and seeing output like this gives me a small dopamine shot before committing:

122/163 Test #122: rdp-test-interpolation-color-texture-ci4-tlut-ia16 ………………………….. Passed 2.50 sec
Start 123: rdp-test-interpolation-color-texture-ci8-tlut-ia16
123/163 Test #123: rdp-test-interpolation-color-texture-ci8-tlut-ia16 ………………………….. Passed 2.40 sec
Start 124: rdp-test-interpolation-color-texture-ci16-tlut-ia16
124/163 Test #124: rdp-test-interpolation-color-texture-ci16-tlut-ia16 …………………………. Passed 2.37 sec
Start 125: rdp-test-interpolation-color-texture-ci32-tlut-ia16
125/163 Test #125: rdp-test-interpolation-color-texture-ci32-tlut-ia16 …………………………. Passed 2.45 sec
Start 126: rdp-test-interpolation-color-texture-2cycle-lod-frac
126/163 Test #126: rdp-test-interpolation-color-texture-2cycle-lod-frac ………………………… Passed 2.51 sec
Start 127: rdp-test-interpolation-color-texture-perspective
127/163 Test #127: rdp-test-interpolation-color-texture-perspective ……………………………. Passed 2.50 sec
Start 128: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac
128/163 Test #128: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac ……………… Passed 3.29 sec
Start 129: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac-sharpen
129/163 Test #129: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac-sharpen ………. Passed 3.26 sec
Start 130: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac-detail
130/163 Test #130: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac-detail ……….. Passed 3.48 sec
Start 131: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac-sharpen-detail
131/163 Test #131: rdp-test-interpolation-color-texture-perspective-2cycle-lod-frac-sharpen-detail … Passed 3.26 sec
Start 132: rdp-test-texture-load-tile-16-yuv

151/163 Test #151: vi-test-aa-none …………………………………………………………. Passed 21.19 sec
Start 152: vi-test-aa-extra-dither-filter
152/163 Test #152: vi-test-aa-extra-dither-filter ……………………………………………. Passed 48.77 sec
Start 153: vi-test-aa-extra-divot
153/163 Test #153: vi-test-aa-extra-divot …………………………………………………… Passed 64.29 sec
Start 154: vi-test-aa-extra-dither-filter-divot
154/163 Test #154: vi-test-aa-extra-dither-filter-divot ………………………………………. Passed 65.90 sec
Start 155: vi-test-aa-extra-gamma
155/163 Test #155: vi-test-aa-extra-gamma …………………………………………………… Passed 48.28 sec
Start 156: vi-test-aa-extra-gamma-dither
156/163 Test #156: vi-test-aa-extra-gamma-dither …………………………………………….. Passed 48.18 sec
Start 157: vi-test-aa-extra-nogamma-dither
157/163 Test #157: vi-test-aa-extra-nogamma-dither …………………………………………… Passed 47.56 sec

100% tests passed, 0 tests failed out of 163 #feelsgoodman

Ideally, if someone is clever enough to hook up a serial connection to the N64, it might be possible to run these tests through a real N64, that would be interesting.

I also fully implemented the VI this time around. It passes bit-exact output with Angrylion in my tests and there is a VI conformance suite to validate this as well. I implemented almost the entire thing without even running actual content. Once I got to test real content and sort out the last weird bugs, we get to the next important part of a test-driven development workflow …

The importance of dumping formats

A critical aspect of verifying behavior is being able to dump RDP commands from the emulator and replay them.

On the left I have Angrylion and on the right paraLLEl-RDP running side by side from a dump where I can step draw by draw, and drill down any pesky bugs quite effectively. This humble tool has been invaluable. The Angrylion backend in parallel-n64 can be configured to generate dumps which are then used to drill down rendering bugs offline.

Compatibility

The compatibility is much improved and should be quite high, I won’t claim its perfect, but I’m quite happy with it so far. We went through essentially all relevant titles during testing (just the first few minutes), and found and fixed the few issues which popped up. Many games which were completely broken in the old implementation now work just fine. I’m fairly confident that those bugs are solvable this time around though if/when they show up.

Implementation techniques

With Vulkan in 2020 I have some more tools in my belt than was available back in the day. Vulkan is a quite capable compute API now.

Enforcing RDRAM coherency

A major pain point of any N64 emulator is the fact that RDRAM is shared for the CPU and RDP, and games sure know how to take advantage of this. This creates a huge burden on GPU-accelerated implementations as we now have to ensure full coherency to make it accurate. Most HLE emulators simply don’t care or employ complicated heuristics and workarounds, and that’s fine, but it’s not good enough for LLE.

In the previous implementation, it would try to do “framebuffer manager” techniques similar to HLE emulators, but this was the wrong approach and lead to a design which was impossible to fix. What if … we just import RDRAM as buffer straight into the Vulkan driver and render to that, wouldn’t that be awesome? Yes … yes, it would be, and that’s what I did. We have an obscure, but amazing extension in Vulkan called VK_EXT_external_memory_host which lets me import RDRAM from the emulator straight into Vulkan and render to it over the PCI-e bus. That way, all framebuffer management woes simply disappear, I render straight into RDRAM, and the only thing left to do is to handle synchronization. If you’re worried about rendering over the PCI-e bus, then don’t be. The bandwidth required to write out a 320×240 framebuffer is absolutely trivial especially considering that we’re doing …

Tile-based rendering

The last implementation was tile-based as well, but the design is much improved. This time around all tile binning is done entirely on the GPU in parallel, using techniques I implemented in https://github.com/Themaister/RetroWarp, which was the precursor project for this new paraLLEl-RDP. Using tile-based rendering, it does not really matter that we’re effectively rendering over the PCI-e bus as tile-based rendering is extremely good at minimizing external memory bandwidth. Of course, for iGPU, there is no (?) external PCI-e bus to fight with to begin with, so that’s nice!

Ubershaders with asynchronous pipeline optimization

The entire renderer is split into a very small selection of Vulkan GLSL shaders which are precompiled into SPIR-V. This time, I take full advantage of Vulkan specialization constants which allow me to fine-tune the shader for specific RDP state. This turned out to be an absolute massive win for performance. To avoid the dreaded shader compilation stutter, I can always fallback to a generic ubershader while pipeline is being compiled which is slow, but works for any combination of state. This is a very similar idea to what Dolphin pioneered for emulation a few years ago.

8/16-bit integer support

Memory accesses in the RDP are often 8 or 16 bits, and thus it is absolutely critical that we make use of 8/16-bit storage features to interact directly with RDRAM, and if the GPU supports it, we can make use of 8 and 16-bit arithmetic as well for good measure.

Async compute

Async compute is critical as well, since we can make the async compute queue high priority and ensure that RDP shading work happens with very low latency, while VI filtering and frontend shaders can happily chug along in the fragment/graphics queue. Both AMD and NVIDIA now have competent implementations here.

GPU-driven TMEM management

A big mistake I made previously was doing TMEM management in CPU timeline, this all came crashing down once we needed framebuffer effects. To avoid this, all TMEM uploads are now driven by the GPU. This is probably the hairiest part of paraLLEl-RDP by far, but I have quite a lot of gnarly tests to test all the relevant corner cases. There are some true insane edge cases that I cannot handle yet, but the results created would be completely meaningless to any actual content.

Performance

To talk about FPS figures it’s important to consider the three major performance hogs in a low-level N64 emulator, the VR4300 CPU, the RSP and finally the RDP. Emulating the RSP in an LLE fashion is still somewhat taxing, even with a dynarec (paraLLEl-RSP) and even if I make the RDP infinitely fast, there is an upper bound to how fast we can make the emulator run as the CPU and RSP are still completely single threaded affairs. Do keep that in mind. Still, even with multithreaded Angrylion, the RDP represents a quite healthy chunk of overhead that we can almost entirely remove with a GPU implementation.

GPU bound performance

It’s useful to look at what performance we’re getting if emulation was no constraint at all. By adding PARALLEL_RDP_BENCH=1 to environment variables, I can look at how much time is spent on GPU rendering.

Playing on an GTX 166o Ti outside the castle in Mario 64:

[INFO]: Timestamp tag report: render-pass
[INFO]: 0.196 ms / frame context
[INFO]: 0.500 iterations / frame context

We’re talking ~0.2ms on GPU to render one frame on average, hello theoretical 5000 VI/s … Somewhat smaller frame times can be observed on my Radeon 5700 XT, but we’re getting frame rates so ridiciously high they become meaningless here. We’ve tested it on quite old cards as well and the difference in FPS on even something ancient like an R9 290x card and a 2080 Ti is minimal since the time now spent in RDP rendering is completely irrelevant compared to CPU + RSP workloads. We seem to be getting about a 50-100% uplift in FPS, which represents the shaved away overhead that the CPU renderer had. Hello 300+ VI/s!

Unfortunately, Intel iGPU does not fare as well, with an overhead high enough that it does not generally beat multithreaded Angrylion running on CPU. I was somewhat disappointed by this, but I have not gone into any real shader optimization work. My early analysis suggests extremely poor occupancy and a ton of register spilling. I want to create a benchmark tool at some point to help drill down these issues down the line.

It would be interesting to test on the AMD APUs, but none of us have the hardware handy sadly 🙁

Synchronous vs Asynchronous RDP

There are two modes for the RDP. In async mode, the emulation thread does not wait for the GPU to complete rendering. This improves performance, at the cost of accuracy. Many games unfortunately really rely on the unified memory architecture of the N64. The default option is sync, and should be used unless you have a real need for speed, or the game in question does not need sync.

Here we see an example of broken blob shadows caused by async RDP in Jet Force Gemini. This happens because the CPU is actually reading the shadowmap rendered by the RDP, and blurring it on the CPU timeline (why on earth the game would do that is another question), then reuploading it to the RDP. These kinds of effects require very tight sync between CPU and GPU and comes up in many games. N64 is particularly notorious for these kinds of rendering challenges.

Of course, given how fast the GPU implementation is on discrete GPUs, sync mode does not really pose an issue. Do note that since we’re using async compute queues here, we are not stalling on frontend shading or anything like that. The typical stall times on the CPU is in the order of 1 ms per frame, which is very acceptable. That includes the render thread doing its thing, submitting that to GPU, getting it executed and coming back to CPU, which has some extra overhead.

Road-map for future improvement

I believe this is solid enough for a first release, but there are further avenues for improvement.

Figure out poor performance on Intel iGPU

There is something going on here that we should be able to improve.

Implement a workaround for implementations without VK_EXT_external_memory_host (EDIT: Now implemented as of 2020-05-18)

Unfortunately there is one particular driver on desktop which doesn’t support this, and that’s NVIDIA on Linux (Windows has been supported since 2018 …). Hopefully this gets implemented soon, but we will need a fallback. This will get ugly since we’ll need to start shuffling memory back and forth between RDRAM and a GPU buffer. Hopefully the async transfer queue can help make this less painful. It might also open up some opportunities for mobile, which also don’t implement this extension as we speak. There might also be incentives to rewrite some fundamental assumptions in the N64 emulator plugin specifications (can we please get rid of this crap …). If we can let the GPU backend allocate memory, we don’t need any fancy extension, but that means uprooting 20 years of assumptions and poking into the abyss … Perhaps a new implementation can break new ground here (hi @ares_emu!).

EDIT: This is now done! Takes a 5-10% performance hit in sync mode, but the workaround works quite well. A fine blend of masked SIMD moves, a writemask buffer, and atomics …

Internal upscaling?

It is rather counter-intuitive to do upscaling in an LLE emulator, but it might yield some very interesting results. Given how obscenely fast the discrete GPUs are at this task, we should be able to do a 2x or maybe even 4x upscale at way-faster-than-realtime speeds. It would be interesting to explore if this lets us avoid the worst artifacts commonly associated with upscaling in HLE.

Fancier deinterlacer?

Some N64 content runs at 480i, and we can probably spare some GPU cycles running a fancier deinterlacer 😉

Esoteric use cases?

PS1 wobbly polygon rendering has seen some kind of resurgence in the last years in the indie scene, perhaps we’ll see the same for the fuzzy N64 look eventually. With paraLLEl-RDP, it should be possible to build a rendering engine around a N64-lookalike game. That would be cool to see.

Conclusion

This is a somewhat esoteric implementation, but I hope I’ve inspired more implementations like this. Compute-based accurate renderers will hopefully spread to more systems that have difficulties with accurate rendering. I think it’s a very interesting topic, and it’s a fun take on emulation that is not well explored in general.

paraLLEl-RDP rewritten from scratch – available in paraLLEl n64 right now for RetroArch



The ParaLLEl N64 Libretro core has received an update today that adds the brand new paraLLEl-RDP Vulkan renderer to the emulator core.

I implore everybody to read Themaister’s blog post (Reviving and rewriting paraLLEl-RDP – Fast and accurate low-level N64 RDP emulation) for a deep dive into this new renderer.

Requirements

  • You need a graphics card that supports the Vulkan graphics API.
  • It’s currently only available on Windows and Linux.
  • Right now the renderer requires a specific Vulkan extension, called ‘VK_EXT_external_memory_host’. Only Nvidia Linux binary drivers for Vulkan currently doesn’t support this extension. It has been requested but there is no ETA yet on when they will implement this.

What’s new since the old ParaLLEl RDP?

  • Completely rewritten from the ground up
  • Bit-exact renderer
  • Should be pretty much on par with Angrylion accuracy-wise now – none of the issues that plagued the old paraLLEl RDP
  • Now emulates the VI (Video Interface) as well
  • Basic deinterlacing for interlaced video modes

How to install and set it up

  • In RetroArch, go to Online Updater.
  • (If you have paraLLEl N64 already installed) – Select ‘Update Installed Cores’. This will update all the cores that you already installed.
  • (If you don’t have paraLLEl N64 installed already) – go to ‘Core Updater’, and select ‘Nintendo – Nintendo 64 (paraLLEl N64)’.
  • Now start up a game with this core.
  • Go to the Quick Menu and go to ‘Options’. Scroll down the list until you reach ‘GFX Plugin’. Set this to ‘parallel’. Set ‘RSP plugin’ to ‘parallel’ as well.
  • For the changes to take effect, we now need to restart the core. You can either close the game or quit RetroArch and start the game up again.

Progress and development in N64 emulation over the past decade

State of HLE emulation

IMHO, this release today represents one of the biggest steps that have been taken so far to elevate Nintendo 64 emulation as a whole. N64 emulation has gotten a bad rep for over decades because of HLE RDP renderers that fail to accurately reproduce every game’s graphics correctly and tons of unemulated RSP microcode, but it’s gotten significantly better over the years. On the HLE front, things have progressed. GLideN64 has made big strides in emulating most of the major significant games, the HLE RSP implementation used by Mupen 64 Plus is starting to emulate most of the major micro codes that developers made for N64 games. So on that front, things have certainly improved. There are also obviously limiting factors on the HLE front. For instance, GLideN64 still requires OpenGL, and renderers for Vulkan and other modern graphics APIs have not been implemented as of this date (although they could be).

State of LLE emulation

So that’s the HLE front. But for the purpose of this blog article, we are mostly concerned here about Low-Level Emulation. Both HLE and LLE N64 emulation are valid approaches, but if we want to reproduce the N64 accurately, we ultimately have to go LLE. So, what is the state of LLE emulation?

For LLE emulation, some of the advancements over the past few years has been a multithreaded version of Angrylion. Angrylion is the most accurate software RDP renderer to date. Its main problem has always been how slow it is. Up until say the mid to late ’10s, desktop PCs just did not have the CPU power to run any game at fullspeed with this renderer. Multithreaded Angrylion has seen Angrylion make some big gains in the performance department previously thought unimaginable.

However, Angrylion as a software renderer can only be taken so far. The fact remains that it is a big bottleneck on the CPU, and you can easily see CPU activity exceeding over 65% on a modern rig with the multithreaded Angrylion renderer. Software rendering is just never going to be a particularly fast way of doing 3D rasterization.

So, back in 2016, the first attempt at making a hardware renderer that can compete with Angrylion was made. It was a big release for us and it marked one of the first pieces of software to be released that was designed exclusively around the then-new Vulkan graphics API. You can read our old blog post here.

It was a valiant first attempt at making a speedy Angrylion port to hardware. Unfortunately, this first version was full of bugs, and it had some big architectural issues that just made further development on it very hard. So it didn’t see much further development for the past few years.

This year, all the stars have aligned. First out of the gates was the resurrection of paraLLEl-RSP, another project by Themaister. Low-level N64 emulation places a big demand on the CPU, and while cxd4’s RSP interpreter is very accurate, to get at least a 2x leap in performance, a dynamic recompiler approach has to be taken. To that end, this year not only was paraLLEl-RSP resurrected, but we moved the dynamic recompiler architecture from LLVM to Lightrec. It’s a bit less performant than LLVM to be sure but it also has some big advantages – LLVM runtime libraries are very hard to embed and integrate for various platforms, while Lightrec doesn’t have these dependency issues. Furthermore, LLVM would take a long time recompiling code blocks, and it would cause big stutters during gameplay (for instance, bringing up the map in Doom 64 for the first time would cause like a 5-second freeze in the gameplay while it was recompiling a code block – obviously not ideal). With Lightrec, all those stutters were more or less gone.

So, Q1 2020. We now have multithreaded Angrylion which leverages the multi-core CPUs of today’s hardware to get better performance results. We have ParaLLEl RSP, a low-level RSP plugin with a dynamic recompiler that gives us a big bump in performance. But one piece of the puzzle is still missing, and it’s perhaps the most significant. Multithreaded Angrylion still is a software renderer and therefore it still massively bottlenecks the CPU. Whether you can spread that load out over multiple cores or not ultimately matters little – CPUs just are not good at doing fast 3D rasterization, a lesson learned by nearly every mid ’90s PC game developer, and why 3D accelerated hardware could not have come sooner.

So, the obvious Next Big Thing in N64 emulation was to get rid of this CPU bottleneck and move Angrylion kicking and screaming to the GPU, and this time avoid all of the issues that plagued the initial paraLLEl RDP prototype.

Where does that leave us?

With a very accurate Angrylion-quality LLE RDP renderer running on the GPU, and a dynarec LLE RSP core, you will be surprised at how accurate Mupen 64 Plus is now. Nearly every commercial game runs now as expected with nearly no graphical issues, the sound is as you’d expect it to be, it looks, runs and functions just like a real N64. And if you’re on a discrete Nvidia or AMD GPU, your GPU activity will be 4% on average, whether it’s a stone-age GPU from the year 2013 like an AMD R9 290x, or an Nvidia Geforce 2080 Ti. Nearly any discrete GPU made from 2013 to 2020 that supports the Vulkan API seems to eat low-level N64 graphics for breakfast. CPU activity also has decreased significantly. With multithreaded Angrylion and Parallel RSP, there would be about 68% CPU activity on my rig. This is brought down to just 7 to 10% using paraLLEl RDP instead of Angrylion. Software rendering on the CPU is just a huge bottleneck no matter which way you slice it.

So for most practical purposes, using the paraLLEl RDP and paraLLEl RSP cores in tandem, the future is now. Accurate N64 emulation is here, it’s no longer slow, and it’s no longer completely CPU bound either. And you can play it on RetroArch right now, right today. We don’t have to wait for a near-accurate representation of an N64, it’s already here with us for all practical gameplay purposes.

How much faster is paraLLEl RDP compared to Angrylion? That is hard to say, and depends on the game you’re running. On average you can expect a 2x speedup. However, notice that at native resolution rendering, any discrete GPU since 2013 eats this workload for breakfast. This means you’re completely CPU bound in terms of performance most of the time. The better your CPU is at single threaded workloads (IPC), the better it will perform. Core count is a less significant factor. I think on my specific rig, it was my CPU that was the weakest link in the chain (a 7700k i7 Intel CPU paired with a 2080 Ti). The GPU matters relatively little, the 2080 Ti was mostly being completely idle during these tests. For that matter, so was an old 2013 AMD card that I would test with the same CPU – GPU activity remained flat at around 4%. As Themaister has indicated in his blog post, this leaves so much room for upscaled resolutions, which is on the roadmap for future versions.

Benchmarks

System specs: CPU – Intel Core i7 7700k | GPU – Geforce RTX 2080 Ti (11GB VRAM, 2018) | 16GB RAM

Title Angrylion ParaLLEl RDP (Synchronous) ParaLLEl RDP (Asynchronous)
007 GoldenEye 82fps 119fps 133fps
Banjo Tooie 72fps 132fps 148fps
Doom 64 174fps 282fps 322fps
F-Zero X 158fps 370fps 478fps
Hexen 156fps 300fps 360fps
Indiana Jones and the Infernal Machine 61fps 94fps 114fps
Killer Instinct Gold ~103fps ~168fps ~240fps
Legend of Zelda: Majora’s Mask 122fps 202fps 220fps
Mario Kart 64 ~178fps ~309fps ~330-350fps
Perfect Dark (High-res) 70fps 125fps 130fps
Pilotwings 64 87fps 125fps 144fps
Quake 188fps 262fps 300fps
Resident Evil 2 183fps 226fps 383fps (*)
Star Wars Episode I: Battle for Naboo 90fps 136fps 178fps
Super Mario 64 129fps 204fps 220fps
Vigilante 8 (Low-res) 63fps 91fps 112fps
Vigilante 8 (High-res) ~46-55fps ~92-99fps ~119fps
World Driver Championship ~109fps ~225fps ~257fps

* – Has game breaking issues in this mode

System specs: CPU – Intel Core i7 7700k | GPU – AMD Radeon R9 290x (4GB VRAM, 2013) | 16GB RAM

Title Angrylion ParaLLEl RDP (Synchronous) ParaLLEl RDP (Asynchronous)
007 GoldenEye 82fps 119fps 133fps
Banjo Tooie 72fps 132fps 148fps
Doom 64 174fps 282fps 322fps
F-Zero X 158fps 360fps 439fps
Hexen 156fps 288fps 352fps
Indiana Jones and the Infernal Machine 61fps 94fps 114fps
Killer Instinct Gold ~93fps ~162fps ~239fps
Legend of Zelda: Majora’s Mask 122fps 202fps 220fps
Mario Kart 64 ~157fps ~274fps ~292fps
Perfect Dark (High-res) 70fps 125fps 130fps
Pilotwings 64 87fps 125fps 144fps
Quake 189fps 262fps 326fps
Resident Evil 2 156fps 226fps 383fps (*)
Star Wars Episode I: Battle for Naboo 90fps 136fps 178fps
Super Mario 64 129fps 195fps 209fps
Vigilante 8 (Low-res) 63fps 91fps 112fps
Vigilante 8 (High-res) ~46-55fps ~92-99fps ~119fps
World Driver Championship ~109fps ~224fps ~257fps

* – Has game breaking issues in this mode

Core option explanations


paraLLEl RDP has some special dedicated options. You can change these by going to Quick Menu and going to Options. Here’s a quick breakdown of what they do –

ParaLLEl Synchronous RDP:

Turning this off allows for higher CPU/GPU parallelism. However, there are certain games that might produce problems if left disabled. An example of such a game is Resident Evil 2.

It has been verified that with the vast majority of games, disabling this can provide for at least a +10fps speedup. Usually the performance difference is much higher though. Try experimenting with it. If you experience no game breaking bugs or visual anomalies, it’s safe to disable this for the game you’re running and enjoy higher performance.

Video Interface Options
ParaLLEl-RDP emulates the N64 RDP’s VI module. This applied plenty of postprocessing to the final output image to further smooth out the picture. Some of the options down below allow you to enable/disable some of these VI settings on the fly. Disabling some of these and enabling some others could be beneficial if you want to use several frontend shaders on top, since disabling some of these postprocessing effects could result in a radically different output image.

(ParaLLEl-RDP) VI Interlacing Disabling this will disable the VI serration bits used for interlaced video modes. Turning this off essentially looks like basic bob deinterlacing, the picture might become shaky as a result when leaving this off.

(ParaLLEl-RDP) VI Gamma Filter Disabling this will disable the hardware gamma filter that some games use.

(ParaLLEl-RDP) VI Divot filter Disabling this will disable the median filter which is intended to clean up some glitched pixels coming out of the RDP. Subtle difference in output, but usually seems to apply to shadow blob decals.

(ParaLLEl-RDP) VI AA Disabling this will disable Anti-Aliasing.

(ParaLLEl-RDP) VI Dither Filter The VI’s dither filter is used to make color banding less apparent with 16-bit pixels.

(ParaLLEl-RDP) VI Bilinear VI bilinear is the internal upscaler in the VI. Disabling this is typically a good idea, since it’s typically used to upscale horizontally.

By disabling VI AA and enabling VI Bilinear, the picture output looks just like how Angrylion’s “Unfiltered” mode currently looks like.

FAQ

Will this renderer be ported to OpenGL?

Here is the short answer – no. Not by us, at least. Reasons: OpenGL is an outdated API compared to Vulkan that does not support the features required by Parallel-RDP. GL does not support 8/16bit storage, external memory host, or async compute. If one would be able to make it work, it would only work on the very best GL implementation, where Vulkan is supported anyways, rendering it mostly moot.

Ports to DirectX 12 are similarly not going to be considered by us, others can feel free to do so. One word of warning – even DirectX12 (yes, even Ultimate) is found lacking when it comes to providing the graphics techniques that ParaLLEl RDP is built around. Whoever will take on the endeavor to port this to DX12 or GL 4.5/4.6 will have their work cut out for them.

RetroArch 1.8.6 released!


RetroArch 1.8.6 has just been released.

Grab it here.

We will release a Cores Progress report soon going over all the core changes that have happened since the last report. It’s an exhaustive list, and especially the older consoles will receive a lot of new cores and improvements.

Remember that this project exists for the benefit of our users, and that we wouldn’t keep doing this were it not for spreading the love with our users. This project exists because of your support and belief in us to keep going doing great things. If you’d like to show your support, consider donating to us. Check here in order to learn more. In addition to being able to support us on Patreon, there is now also the option to sponsor us on Github Sponsors! You can also help us out by buying some of our merch on our Teespring store!

Highlights

There are many things this release post will not touch upon, such as all the extra cores that have been added to the various console platforms. We’ll spend some more time on that in a future Cores Progress Report post. We’ll go over some of the other highlights instead.

PSL1GHT PlayStation3 port

A new port of RetroArch to the PSL1GHT toolchain has been made for PlayStation3.

Right now there are no automated nightly builds for this, but you can download our experimental stable for it instead.

Working:

  • packaging
  • running cores
  • switching cores
  • gamepad including axis
  • RGUI menu driver
  • audio
  • video
  • cores: 2048, ecwolf, freechaf

Not working:

  • OSD
  • Menus other than RGUI
  • Shaders
  • Graphical acceleration
  • Proper signing
  • ODE build
  • Rumble
  • mouse

iOS/tvOS – Fix audio getting cut off on interruption

While using RetroArch, if you playback audio content (such as via the Control Center) or if you are interrupted by a phone call, the audio in RetroArch would stop entirely.

Changed to set the audio session category to “ambient” so that you can playback other audio sources and have sounds in RA at the same time.

Also, took out the bit to save the config when the app loses focus – it became too much of a distraction (the notification is distracting – this was not working previously anyway).

OpenGL Core – Slang shader improvements

Before, the OpenGL Core shader driver did not correctly initialise loaded textures. The texture filtering and wrap mode are forced on texture creation, but these settings were not recorded – subsequent updates would set garbage values, that then resolved to linear filtering OFF and wrap mode = CLAMP_TO_EDGE.

The wrap mode seemed to work regardless – perhaps once this is set the first time, it cannot change? (I don’t understand the inner workings of OpenGL…) But the texture filtering was certainly wrong. For example, this is what a background image with linear filtering enabled looks like:

…what you actually get is nearest neighbour.

This PR fixes texture initialisation so the filtering and wrap mode are recorded correctly. A linear filtered background image now looks like this:

Only write config files to disk when parameters change

We’ve been looking at ways to reduce disk I/O overhead, since it tends to be a big bottleneck on slower platforms.

Before, RetroArch would continuously overwrite its configuration files:

  • retroarch.cfg is written every time content is closed, and when closing RetroArch itself
  • Core options are written every time content is closed

This represents a large amount of unnecessary disk access, which is quite slow (and also causes wear on solid state drives!)

With 1.8.6, configuration files are only written to disk when the content actually changes.

All types of configuration file should now be ‘well behaved’ – with the exception of cheat files. These are still overwritten when closing content, since reusing old parameters may cause issues (and since I don’t use cheats at all, I didn’t feel confident enough to dabble with this)

While making these changes, we also discovered and fixed a number of bugs:

  • RetroArch no longer crashes when attempting to save a config file after ‘unsetting’ a parameter (currently, this can be triggered quite easily by manipulating input remaps)
  • When using Material UI, RetroArch no longer modifies the wrong setting (or segfaults…) when tapping entries in the Quick Menu > Controls input remapping submenu
  • Quite a few real and potential memory leaks have been fixed.

Playlist compression

There’s a new Compress playlists option under Settings > Playlists. When enabled, playlists are stored in an archived format (using the new rzip_stream interface).

The obvious benefit is that playlist file size is reduced by ~90%, with a corresponding reduction in disk wear on solid state drives (playlists are rewritten to disk quite frequently!).

Given the small size of playlist files, these saving aren’t hugely significant – but of more interest is the fact that on one of our development machines (Linux + mechanical HDD), loading a compressed playlist takes ~20% less time than an uncompressed one (despite the extra zlib overheads). This produces noticeably smoother scrolling when switching playlists in XMB. This improvement is most likely platform-dependent, but on devices where storage speed is a real issue (e.g. 3DS, UWP) the difference in playlist loading times should be quite pronounced.

We’ve also fixed some small playlist-related bugs/issues:

  • When saving playlists using the old format, default core association is now written correctly (not sure when this regression happened…)
  • When saving playlists using the old format, per-playlist sort mode is now recorded (I miscounted the number of available metadata ‘slots’ in the old format files – there was in fact just enough room for this one extra setting)
  • Whenever a playlist is cached by the menu (i.e. when a playlist is opened for display), RetroArch will check the format of the playlist (old/new) and its compression state – if either differ from the current user-set values, the file will be updated. This ensures playlists remain in sync with menu settings. (Previously, toggling the ‘use old format’ setting would do nothing unless the playlist was subsequently modified – this has long been an annoyance for me, since it meant ‘fully populated’ playlists languished in whatever state they were originally created)

It goes without saying that RetroArch will automatically detect whether or not a playlist is compressed and handle it appropriately.

If a playlist has been compressed and a user subsequently wants to edit it by hand, they can simply toggle Compress playlists off and then view the playlist via the menu – it will automatically be decompressed to plain text/JSON.

In addition to this, since human readability is not a factor when compressing playlists, we now omit all whitespace (newlines/indentation) when writing compressed JSON.

This reduces performance overheads when reading compressed JSON playlists by ~16% (!)

SRAM Compression

This is a minor follow-up to PR #10454. It adds a new SaveRAM Compression option under Settings > Saving. When enabled, SRAM save files are written to disk as compressed archives.

While SRAM saves are generally quite small, this can still yield a not insignificant space saving on storage-starved devices (e.g. the SNES/NES Classic consoles). Moreover, it reduces wear on solid state drives when SaveRAM Autosave Interval is set (in the worst case, this can write a couple of MB to disk per minute – vs. a few kB when compression is enabled).

Actual compression ratios will vary greatly depending upon core and loaded content. Here are a few examples of SRAM save sizes for random cores/games:

Core Uncompressed Compressed
Gambatte 32 kB 178 B
Genesis Plus GX 32 kB 83 B
mGBA 64 kB 1.1 kB
Mupen64Plus-Next OpenGL 290 kB 736 B
PCSX-ReARMed 128 kB 605 B
Snes9x 8.0 kB 183 B

In many cases, the actual on-disk save size can be reduced to almost nothing.

Notes:

  • As with save states, RetroArch will automatically detect whether SRAM saves are compressed and handle them appropriately (SaveRAM Compression can be toggled at any time).
  • This only works with cores that use the libretro SRAM interface for saving games. Many (most?) do, but there are some exceptions – e.g. Flycast writes save files directly, and so to does Beetle PSX depending on core settings.

Savestate compression

There’s a new Savestate Compression option under Settings > Saving. When enabled, save state files are written to disk as compressed archives. This both saves a substantial amount of disk space and reduces wear on solid state drives.

Actual compression ratios will vary depending upon core and loaded content. Here are a few examples of save state sizes for random cores/games:

Core Compression OFF Compression ON
Beetle PSX HW 16 MB 1.5 MB
Flycast 27 MB 8.9 MB
Genesis Plus GX 1012 kB 47 kB
mGBA 453 kB 45 kB
Mupen64Plus-Next OpenGL 17 MB 1.5 MB
PPSSPP 40 MB 9.3 MB
PCSX-ReARMed 4.3 MB 2.3 MB
PUAE 11 MB 793 kB
Snes9x 421 kB 82 kB

Notes:

  • RetroArch will automatically detect whether state files are compressed or not, and load them approriately – i.e. Savestate Compression can be toggled at any time, and everything will Just Work (TM)
  • We now have a new file stream for reading/writing archived data: rzip_stream. This can be used to handle any compressed data writing tasks we might have in the future

(Manual content scanner/playlist cleaner) Prevent redundant playlist entries when handling M3U content

Before, when the manual content scanner was used to scan content that includes M3U files, redundant playlist entries were created. For example, content like this:

  • Panzer Dragoon Saga CD1 (Saturn) (U).cue
  • Panzer Dragoon Saga CD2 (Saturn) (U).cue
  • Panzer Dragoon Saga CD3 (Saturn) (U).cue
  • Panzer Dragoon Saga CD4 (Saturn) (U).cue
  • Panzer Dragoon Saga (Saturn) (U).m3u

(where the .m3u references all the .cue files) would generate playlist entries for both the .m3u file and each of the .cue files. This is annoying, since the latter are pointless, and must be removed manually by the user.

1.8.6 adds M3U ‘awareness’ to the manual content scanner. Now whenever M3U files are encountered, they are parsed, and anything they reference internally is removed/omitted from the output playlist.

This functionality has also been added to the Playlist Management Clean Playlist task, so these redundant entries can be removed easily from existing playlists.

(Side note: 1.8.6 also adds a simple but feature complete M3U handling library – this may have additional use if someone wants to add the ability to generate M3U files for existing content…)

Improved handling of ‘broken’ playlists

RetroArch previously would fall apart when handling ‘broken’ playlists – i.e. when playlist entries have missing or invalid path/core path/core name fields. 1.8.6 should fix the most significant issues:

  • RetroArch will no longer segfault when attempting to run content via a playlist entry with missing path or core path fields.
  • When a playlist entry has either core path and/or core name set to NULL, DETECT or an empty string, attempting to load content will fallback to the normal ‘core selection’ code (currently this happens only if both core path and core name are DETECT – this is wholly inadequate!)
  • RetroArch will no longer segfault when attempting to fetch content runtime information when core path is NULL
  • Core name + runtime info will only be displayed on playlists and in the Information submenu if both the core path and core name fields are ‘valid’ (i.e. not NULL or DETECT)
  • When handling entries with missing path fields, the menu sorting order now matches that of the playlist sorting order (at present, everything goes out of sync when paths are empty). Moreover, entries with missing path fields can now be ‘selected’, so users can remove them (currently, hitting A on such an entry immediately tries – and fails – to load the content, so the only way to remove the broken entry is via the Playlist Management > Clean Playlist feature)

(Playlist Management) Add optional per-playlist alphabetical sorting

At present, RetroArch offers a global Sort playlists alphabetically option – but several users have requested more fine grained control. i.e. Users with highly customised setups might want a number of ‘hand-crafted’ playlists with specific ordering (release date, games in a particular series, etc.) without losing the ability to automatically sort their other conventional platform-based playlists.

1.8.6 adds a new Sorting Method option to the Playlist Management interface. This allows the sorting method to be overridden on a per-playlist basis. Available values are System Default (reflects Sort playlists alphabetically setting), Alphabetical and None.

Notes:

  • Content history playlists are excluded – they are never sorted (this has always been the case!)
  • This option is only available when using the ‘new’ format playlists (i.e. Save playlists using old format = OFF). There’s just not enough room in the old-style playlists for additional metadata. Since pretty much everyone uses the new format (by default), I don’t think this is an issue.
  • 1.8.6 also tweaks the way that the displayed menu entries are handled – previously, it would go as follows:

    Sort playlist
    Loop through playlist and generate menu entries
    Sort menu entries

…not only did this duplicate effort, but it meant there was a chance of the playlist and menu going out of sync – especially when using the Label Display Mode feature, which could lead to a different alphabetical ordering when processing the generated menu entries. As of 1.8.6, only the playlist is ever sorted, and menu entries are listed in exactly the same order.

Ozone

Before, Ozone can display either one thumbnail + content metadata or two thumbnails (with content metadata fallback when one image is missing) for each playlist entry.

With 1.8.6, if two thumbnails are enabled then the user can toggle between the second thumbnail and content metadata by pressing RetroPad ‘select’. When metadata is shown in this way, an image icon is displayed to indicate that a second thumbnail is available. The toggle may also be performed with a mouse/touchscreen by clicking/tapping the thumbnail sidebar.

Ozone menu – Mouse/Touch input fixes

  • Pointer input is now correctly disabled when message boxes are displayed
  • It turns out that Windows reports negative pointer coordinates when the mouse cursor goes beyond the left hand edge of the RetroArch window (this doesn’t happen on Linux, so I never encountered this issue before!). As a result, if Ozone is currently not showing the sidebar (menu depth > 1), moving the cursor off the left edge of the window generates a false positive ‘cursor in sidebar’ event – which breaks menu navigation, as described in #10419. With this PR, we now handle ‘cursor in sidebar’ status correctly in all cases.

(RGUI) Enable automatic menu size reduction when running at low resolutions (down to 256×192)

Before, on all platforms other than the Wii/NGC, RGUI had a fixed frame buffer size of [320-426]x240 (width takes one of three values depending upon current menu aspect ratio).

In most cases this is fine, with an important exception: when running content at its native resolution (usually when connected to a CRT), the display size is often smaller than 320×240. For example, SNES titles run at 256×224; master system titles at 256×192. In these cases, RGUI gets ‘squished’ – there are not enough scanlines on the screen, so rows of menu pixels get dropped (or blurred together if bilinear filtering is enabled). This makes the menu difficult to read/use.

This PR modifies RGUI such that its frame buffer dimensions are automatically reduced when running at low resolutions. The minimum nominal menu size is 256×192, which should enable content for almost all TV-connected consoles to be run at native resolution while maintaining pixel perfect menu scaling.

(Unfortunately, going any smaller than this breaks RGUI – so for handheld systems it’s still best to run at higher resolutions with a shader or video filter)

While implementing this, narrowed down the detection of when the aspect ratio lock should be disabled: currently, RGUI’s aspect ratio lock ‘turns off’ when accessing the video settings menu – this now only happens when accessing the video scaling submenu, since this is the only section that can cause conflicts with the aspect lock method. (Note that the old behaviour is maintained for the Wii port, because it has special requirements relating to resolution changes)

Menu – widget and font improvements

  • The font ascender/descender metrics are now used to achieve ‘pixel perfect’ vertical text alignment
  • Message queue text now uses its own dedicated font. Previously, a single (larger) font was used for all active widgets, and this was scaled down for message queue items. This ‘squished’ the text a little; more importantly, when using the stb font renderers (on Android. etc.) it caused ugly artefacts around the edges of glyphs due to pixel interpolation errors. Now that a correctly sized font is used, the message queue is always rendered cleanly.
  • Previously, each widget font was ‘flushed’ (font_driver_flush()) at least once a frame. This is quite a slow operation. Now we only flush fonts if they have actually been used.

Content scanner was unable to identify games from CHD images on Android builds

The content scanner was unable to identify games from CHD images on Android builds (same files that are being properly identified on Windows builds).

It was discovered that both the extracted magic number and CRC hash differed on both builds. This should now be resolved.

Changelog

What you’ve read above is just a small sampling of what 1.8.6 has to offer. There might be things that we forgot to list in the changelog listed below, but here it is for your perusal regardless.

1.8.6

  • 3DS: Add IDs for UZEM, TGB Dual, and NeoCD
  • 3DS: Fix font driver horizontal text alignment
  • 3DS: Allow button presses up to INPUT_MAX_USERS – this enables the 3DS to bind and use buttons and axis for users up to the maximum set by ‘Max Users’ in the input settings menu.
  • 3DS: Disable video filter if upscaled resolution exceeds hardware limits. The 3DS has a maximum video buffer size of 2048×2048. This is sufficient for every core that it supports, but when using software video filters the core output resolution is doubled. This is made worse by the fact that the video filter upscaling buffer size is dependent upon the maximum output resolution of the core – which in some cases is very large indeed (e.g. pcsx-rearmed sets a maximum width of 1024, for enhanced resolution support). The 3DS has very limited ‘linear memory’ for graphics buffer purposes, and a large base core buffer + video filter buffer can easily exceed this – which may also disable video output, or cause a crash. This PR very simply adds a 3DS-specific check to the video filter initialisation: if the resultant upscaling buffer exceeds the hardware limitation, then the filter is automatically disabled.
  • 3DS/FONT/BUGFIX: Text colour was wrong: the RGBA channels were muddled, and R was always set to 255
  • 3DS/FONT/BUGFIX: When drawing multiline strings, the line spacing was completely incorrect
  • 3DS/FONT: Improves the appearance of the drop shadow effect on notification text.
  • 3DS/ARCHIVE/7Z: Re-enable 7zip support.
  • ARCHIVE/ZIP: Expand functionality of ‘rzip_stream’ interface. This PR expands the functionality of the new rzip_stream archived stream interface such that it now has almost complete feature parity with the standard file_stream interface, and can therefore be used as a drop-in replacement in most situations
  • AI SERVICE: Hide redundant entries when service is disabled
  • AI SERVICE: Added in auto-translate support
  • AI SERVICE: support for NVDA and SAPI narration
  • AUTOCONFIG: Use correct port index in input device configured/disconnected notifications
  • BUGFIX: Fix race condition where task could momentarily not be in the queue when reordering
  • CHEEVOS/BUGFIX: Prevent null reference rendering achievement list while closing application
  • CHEEVOS/BUGFIX: Report non-memorymap GBA cores as unsupported
  • COMMANDLINE: Advise against using -s and -S variables on the command line.
  • CONFIG FILE: Only write config files to disk when parameters change
  • CONFIG FILE/BUGFIX: RetroArch no longer crashes when attempting to save a config file after ‘unsetting’ a parameter (currently, this can be triggered quite easily by manipulating input remaps)
  • CONFIG FILE/BUGFIX: When using Material UI, RetroArch no longer modifies the wrong setting (or segfaults…) when tapping entries in the Quick Menu > Controls input remapping submenu
  • CONFIG FILE/BUGFIX: Quite a few real and potential memory leaks have been fixed.
  • CHD: Fixes a crash caused by ignoring the return value from one of the CHD library functions
    FASTFORWARDING: A new Mute When Fast-Forwarding option has been added under Settings > Audio. When enabled, users can fast forward without having to listen to distorted audio.
  • GLCORE/SLANG: Set filter and wrap mode correctly when intialising shader textures. Before, the glcore shader driver did not correctly initialise loaded textures. The texture filtering and wrap mode were forced on texture creation, but these settings were not recorded – subsequent updates would set garbage values, that would resolve to linear filtering OFF and wrap mode = CLAMP_TO_EDGE.
  • LOCALIZATION: Update Japanese translation
  • LOCALIZATION: Update Spanish translation
  • LOCALIZATION: Update Portuguese Brazilian translation
  • IOS: Set audio session category to ambient so sound does not get cut off on interruption (phone call/playing back audio)
    MAC/IOHIDMANAGER/BUGFIX: Fix for Mayflash N64 adapter. In case last hatswitch does not match cookie. For the mayflash N64 adapter, I was getting a BAD EXC ADDRESS (in mac OS 10.13) for this line (tmp was NULL). Retroarch would crash in the gui if I pressed a button from the DPAD on controller 2. With this change, it no longer crashes in the gui and still registers the button push.
  • MAC/COCOA: Fix mouse input – this brings back two lines of code that have been removed over time but appear to be required in order for mouse input to work on macOS
  • METAL/BUGFIX: GPU capture on Metal/OSX/NVidia could crash
  • METAL/BUGFIX: Taking screenshots could capture black frames. Resulting PNG screenshots were black.
  • METAL/BUGFIX: Corrupted image due to incorrect viewport copy when taking screenshot
  • MENU: Prevent font-related segfaults when using extremely small scales/window sizes
  • MENU: Fix ‘gfx_display_draw_texture_slice()’
  • MENU/FONT: Enable correct vertical alignment of text (+ font rendering fixes)
  • MENU/RGUI: Enable automatic menu size reduction when running at low resolutions (down to 256×192)
  • MENU/OZONE: Update timedate style options for Last Played sublabel metadata
  • MENU/OZONE: Hide ‘Menu Color Theme’ setting when ‘Use preferred system color theme’ is enabled
  • MENU/OZONE: Fix thumbnail switching via ‘scan’ button functionality
  • MENU/OZONE: Prevent glitches when rendering Ozone’s selection cursor
  • MENU/OZONE: Enable proper vertical text alignment + thumbnail display improvements
  • MENU/OZONE: Enable second thumbnail/content metadata toggle using RetroPad ‘select’
  • MENU/OZONE: Refactor footer display
  • MENU/OZONE: Hide thumbnail button hints when viewing file browser lists
  • MENU/OZONE/INPUT/BUGFIX: Fix undefined behaviour when using touch screen to change input remaps
  • MENU/OZONE/INPUT/BUGFIX: It turns out that Windows reports negative pointer coordinates when the mouse cursor goes beyond the left hand edge of the RetroArch window (this doesn’t happen on Linux, so I never encountered this issue before!). As a result, if Ozone is currently not showing the sidebar (menu depth > 1), moving the cursor off the left edge of the window generates a false positive ‘cursor in sidebar’ event – which breaks menu navigation, as described in #10419. With this PR, we now handle ‘cursor in sidebar’ status correctly in all cases
  • MENU/OZONE/INPUT/BUGFIX: Pointer input is now correctly disabled when message boxes are displayed
  • MENU/XMB: Fix thumbnail switching via ‘scan’ button functionality
  • ODROID GO ADVANCE: Add DRM HW context driver
  • PSL1GHT: Initial port
  • PSL1GHT/KEYBOARD: Implement PSL1GHT keyboard
  • PLAYLIST/BUGFIX: Improve handling of ‘broken’ playlists – RetroArch will no longer segfault when attempting to run content via a playlist entry with missing path or core path fields.
  • PLAYLIST/BUGFIX: Improve handling of ‘broken’ playlists – when a playlist entry has either core path and/or core name set to NULL, DETECT or an empty string, attempting to load content will fallback to the normal ‘core selection’ code (currently this happens only if both core path and core name are DETECT – this is wholly inadequate!)
  • PLAYLIST/BUGFIX: RetroArch will no longer segfault when attempting to fetch content runtime information when core path is NULL
  • PLAYLIST/BUGFIX: Core name + runtime info will only be displayed on playlists and in the Information submenu if both the core path and core name fields are ‘valid’ (i.e. not NULL or DETECT)
  • PLAYLIST/BUGFIX: When handling entries with missing path fields, the menu sorting order now matches that of the playlist sorting order (at present, everything goes out of sync when paths are empty). Moreover, entries with missing path fields can now be ‘selected’, so users can remove them (currently, hitting A on such an entry immediately tries – and fails – to load the content, so the only way to remove the broken entry is via the Playlist Management > Clean Playlist feature)
  • PLAYLIST: Add optional per-playlist alphabetical sorting
  • PLAYLIST: Omit whitespace when writing compressed JSON format playlists
  • PLAYLIST: Add optional playlist compression
  • QNX: Support analog sticks
  • SAVESTATES: Add optional save state compression (enabled by default now)
  • SRAM: Add optional save (SRAM) file compression
  • SCANNER: Prevent redundant playlist entries when handling M3U content
  • SCANNER/ANDROID: Fix content scanner being unable to identify certain games from CHD images (raw data sector/subcode)
  • TASKS/BUGFIX: Fix task deadlocks
  • TASKS/SCREENSHOT/BUGFIX: Fix heap-use-after-free error when widgets are disabled
  • TVOS: Disable overlays for tvOS, fix app icon
  • VIDEO/WIDGETS/BUGFIX: The font ascender/descender metrics added in #10375 are now used to achieve ‘pixel perfect’ vertical text alignment
  • VIDEO/WIDGETS/BUGFIX: Message queue text now uses its own dedicated font. Previously, a single (larger) font was used for all active widgets, and this was scaled down for message queue items. This ‘squished’ the text a little; more importantly, when using the stb font renderers (on Android. etc.) it caused ugly artefacts around the edges of glyphs due to pixel interpolation errors. Now that a correctly sized font is used, the message queue is always rendered cleanly.
    VIDEO/WIDGETS/BUGFIX: Previously, each widget font was ‘flushed’ (font_driver_flush()) at least once a frame. This is quite a slow operation. Now we only flush fonts if they have actually been used.
  • VULKAN/BUGFIX: Fix display of statistics text
  • UNIX/BUGFIX: Fix overflow when computing total memory on i386
  • WIIU/BUGFIX: Fix font driver horizontal text alignment
  • WIIU/BUGFIX: Fix non-vertex coordinates in draws using tex shader
  • WIIU/BUGFIX: Update and fix meta.xml file for the WiiU release. This change makes it so the information from the meta.xml file parsed for the WiiU’s Homebrew Launcher is displayed properly.

Kronos 2.1.2 progress report (Sega Saturn emulator)

It has been some time since the last report, let’s try to go a bit more in-depth this time.

The OpenGL CS video renderer

The Saturn is a beast. It features 8 processors, among them are 2 custom graphics processors called VDP1 and VDP2. The VDP2 handled backgrounds, while the VDP1 handled sprites, textures and polygons.

The VDP1 was rendering “quads” line per line, the general idea was to interpolate endpoints along the horizontal edges, then to draw textured lines between those endpoints. It had to draw the lines with an extra pixel where the slope changed, so all of the pixels had a neighbor to the left, right, top, or bottom. They did this to prevent gaps between the lines.

A modern graphics APIs like OpenGL doesn’t know how to do that, because its rendering pipeline is based on triangle geometry, so basically it can’t reproduce VDP1 behavior. There are tricks like tesselation, but ultimately they are just workarounds for specific issues and not all-in-one solutions for this. Here is some good news though : with OpenGL 4.3, a new feature called compute shaders was introduced, you might have heard about it through Flycast’s order independent transparency, or N64’s parallel, this new component gives lots of flexibility to OpenGL, and allows the implementation of routines to render quads line per line. It is what this renderer is about : reproducing VDP1 behavior accurately.

Let’s do some comparison, from first to last, those images were shot from console, Mednafen/beetle, Kronos (OpenGL CS renderer), Kronos (the old OpenGL renderer, based on YabaSanshiro’s). There are 2 things noticeable related to this VDP1 behavior in those :

  • border of the road : on console, Mednafen and Kronos’s new renderer, if you zoom in, you’ll notice it’s not a smooth line, there are dots, this is the accurate behavior; the last screen, while the smooth line might look better, is actually inaccurate.
  • holes everywhere : if you zoom in on the last screenshot, you’ll notice some holes here and there, on the top of the hills, on the road in the back, those holes don’t exist on the other screenshots.

It’s possible to workaround those holes with the OpenGL renderer, but at the end of the day you end up creating other issues in the process. Until recently we used such workaround but, in the case of Sega Rally, it was magnifying the dots on the border of the road.

The only known downside of this new renderer is that it will require a fairly good GPU!

ST-V support was improved

While still a bit preliminary, some major rework was done recently on ST-V support :

  • You can now set your favorite bios region (NB : it will be ignored if the game doesn’t support that region though)
  • The EEPROM is now properly saved and loaded
  • ROM loading mechanism was fixed, there should be no more messages of the ST-V bios telling you there is something wrong with the game you are trying to launch
  • Lots of input issues, going from the lack of kick harness (used for 5th & 6th buttons on some games) to the inputs not responding at all, were fixed

Improvements on the Libretro port

There were some long-term issues with the Libretro implementation, but a lot of improvements were done about them :

  • Resolution switching, which is something that happens every few seconds on saturn, was somehow wrong, one of the worst side effect was artifacts especially visible in “mesh” (if you don’t use the “improved mesh” core option), it was fixed
  • Toggling between fullscreen and windowed was causing issues from glitches to crashes, it has been mostly fixed
  • While the saturn framerate should be 50 or 60 fps depending on the region, sometimes it’s not rendering anything because the Saturn is actually shutting down its video output, kronos is trying to have an accurate behavior for this too, which is a bit of a headache for the libretro ecosystem which is expecting a more linear framerate. A better way of handling this was implemented.

Also, here is a summary of this core’s options :

  • Force HLE BIOS : it will ignore your bios file and use the old HLE bios from yabause instead, this function is unmaintained and is mainly there for debugging purpose (there is at least one known case where it’s unlocking the game though : Astal, for some reason the real bios is shutting down the video output), don’t report issues if you enabled this option.
  • Video format : will force format to PAL or NTSC, default is auto
  • Frameskip : will skip rendering at a fixed rate, it can improve playability dramatically on lower end devices
  • SH-2 cpu core : default is “kronos”, our cross-platform cached interpreter, the other one is the unmaintained yabause SH-2 interpreter, we got the same policy than the HLE bios about it.
  • OpenGL version : this option was introduced as a workaround for setups giving false positive when asking if a specific OpenGL version was supported (it happened…), set this to the highest version your gpu support.
  • Video renderer : to enable the new renderer, default is the old one for compatibility reasons
  • Share saves with beetle : will share save paths with beetle-saturn, allowing you to use the same savefiles.
  • Addon cartridge : to change cartridge, default is auto, it is recommended to let the default except if you intend to play heart of darkness, a prototype requiring the 16M extended RAM.
  • 6Player Adaptor on Port 1 : self explanatory
  • 6Player Adaptor on Port 2 : same, one word of warning though, enabling the second multitap is known for causing a weird autofire behavior.
  • Internal Resolution : self explanatory
  • Polygon Mode : works with the default OpenGL renderer, used to fix wobbling textures issues, OpenGL CS doesn’t need this, default is cpu tesselation but gpu tesselation is recommended if your gpu supports it (OpenGL 4.2), perspective correction is more cpu friendly but heavily glitched.
  • Improved mesh : will replace fake transparency (mesh) by real transparency, default is disabled
  • RBG Compute shaders : will use compute shaders to rotate background, it is recommended if your gpu supports it, default is disabled
  • Wireframe mode : self-explanatory ? It works only with OpenGL CS, mostly for debugging but can be a fun feature, give it a try for curiosity !
  • ST-V Service/Test Buttons : enable buttons to access service menu in ST-V game, default is disabled to avoid misspress
  • ST-V Favorite Region : select your region for ST-V, default is EU for censorship and language reasons.

On a sidenote, lots of other things were fixed/improved since my last report, but nothing seemed major so we decided to skip them. If you want to know more about this emulator, you can check the youtube channel, or join us on discord.

Libretro Cores Progress Report – April 2, 2020

Our last core progress report was on February 29, 2020. Below we detail the most significant changes to all the Libretro cores we and/or upstream partners maintain. We are listing changes that have happened since then.

How to update your cores in RetroArch

There are two ways to update your cores:

a – If you have already installed the core before, you can go to Online Updater and select ‘Update Installed Cores’.

b – If you haven’t installed the core yet, go to Online Updater, ‘Core Updater’, and select the core from the list that you want to install.

Final Burn Neo

Description: Multi-system arcade emulator

  • Latest updates from upstream

blueMSX

Description: Home computer MSX emulator

  • Fix not smooth scroll in PAL 50Hz
  • Buildfix for libnx (Switch)
  • Buildfix for 3DS

Beetle PSX

Description: Sony PlayStation emulator

  • Added “fast PAL” hack to allow PAL games to play at NTSC framerates
  • Added Force NTSC aspect ratio
  • Vulkan: Disable adaptive smoothing by default

    This should be disabled by default like the other Vulkan-exclusive
    enhancements so as to better match stock settings

  • Hide scanline core options based on content region
  • Refactor memory card core options logic

    Get rid of confusing check_variables() memcard startup logic and
    corresponding redundant variables, and update core option
    labels/sublabels to match actual core functionality.

  • Implement aspect ratio core option (psx.correct_aspect equivalent)

    Beetle PSX implementation of “psx.correct_aspect” introduced in Mednafen
    1.24.0-UNSTABLE (no relevant code backported from upstream).
    Additionally fixes aspect ratio scaling issues when cropping overscan or
    adjusting visible scanlines. “Force 4:3” is left as a legacy option for
    users preferring the old inaccurate behavior.

  • Add option for setting core-reported FPS timing
  • WIP: increase RAM to 8MB instead of the default 2
  • Improve internal FPS detection

Vitaquake 2

Description: Quake 2 game engine core

Vitaquake 2 is now available for the first time on Emscripten.

Hatari

Description: Atari ST/STE/TT/Falcon emulator

  • Port: Ported Hatari to PS Vita

Atari 800

Description:

  • Port: Ported Atari 800 to 3DS

Dosbox core

Description: MS-DOS home computer emulator

  • Latest updates from Github

Dosbox SVN

Description: MS-DOS home computer emulator

  • Latest updates from Github
  • Make 16MB RAM default, change default cycle mode to “fixed” and “10000”

    Max and auto modes are broken on some systems.

LRMAME

Description: Multi-system arcade emulator

  • Updated to version 0.219

ECWolf

Description: Wolfenstein 3D game engine core

  • Latest updates (TODO/FIXME)

Flycast

Description: Sega Dreamcast emulator

  • fix alignment issues reported by ubsan on x64
  • Fix chd lzma and zlib buffers alignment
  • Fix rec/x64 block check alignment
  • Fix ChannelEx struct alignment
  • nvmem: generate console ID at startup. rec-x64: Call stack alignment

    Generate console ID in dc_nvmem.bin if blank. Used by chuchu rocket
    login.
    Align stack to 16-byte when calling function from x64 rec

  • (NAOMI) add sfz3ugd button labels
  • (NAOMI) Alien Front Naomi needs DIV matching disabled
  • (NAOMI) VMU support (vonot, sf3zu). Fix otrigger inputs.
  • input: only use R2/L2 for trigger input even with digital triggers
  • renderer: generate mipmaps for custom textures
  • custom texture: stop loader thread before loading state
  • renderer: decrease MipmapD bias – fixes street lights in Sonic Adventure 1
  • gdrom: don’t resume CDDA if not playing. stop if cur > end – implement ATA_IDENTIFY
  • Protect RAM and VRAM when VMEM is disabled
  • (Switch) Initial Port
  • ta: defer index building and strip merging, filter out infinite vertices
  • pvr: reserve more opaque polys. Don’t crash on TA overrun
  • vmem: unprotect vram when releasing memory if NO_VMEM
  • (Switch) Iterate each Page for Permission set
  • Use -O2 for YUV_Block8x8 due to UB
  • pvr: don’t reset tile clipping value on each frame – Fixes Irides – master of blocks
  • support multi-session cue/bin. mipmap D-adjust only to increase LoD
  • limit maple schedule time
  • allow VRAM 8-bit reads
  • gl: use common ReadFramebuffer() func
  • sort triangles even with 1 polygon – fixes missing Naomi boot logo and vtennis2 black frame during replay
  • fix crash when TR poly count is 0

ChaiLove

  • Port: Ported ChaiLove core to 3DS
  • Port: Ported ChaiLove core to Android

HBMAME

Description: Emulator of homebrew and hacked games for arcade hardware

  • New core

VICE

Description: Commodore 64 home computer emulator

  • Split “Paddles” joyport type to first two RetroPads:
    Player1 vertical axis = Player2 horizontal axis
    Player1 2nd button = Player2 1st button
    Add speed modifier hotkeys (slower+faster) for paddles/mouse

    Because “Paddles” is in fact 2 controllers in one joyport, and currently it is read like a mouse with 2 axis and 2 buttons, this is not convenient for 2 player games, like Panic Analogue, which use paddles as 2 separate entities with one axis and one button.

  • Fixes for JiffyDOS, Disk Control & Statusbar –

    To evade JiffyDOS incompatibilities with CRTs, PRGs & TAPs, the allowance method is changed from whitelist to blacklist.
    Also M3U playlists of D64 images will allow, and playlists of TAP images will not
    Fixed not being able to insert disks at all when starting without content
    Drive type defaults to 1541, as in inserting D81s will not work for now, because drive type autodetection happens only on autostart
    Finetuned statusbar

  • Turbofire & JiffyDOS fix –
    Minor fixes:

    Turbofire pulse was off (value 2 was actually 4)
    No reason to allow JiffyDOS core option with anything other than D64 & D81, or is there..?

  • Remove Nuklear GUI, Add VKBD touch control –
    Replaced bloaty Nuklear with the lightweight VKBD from PUAE

    No drawbacks, only benefits: Touch control, better performance, simpler maintenance

  • Port: Fixed VICE core Android build
  • Port: Ported VICE core to 3DS
  • Port: Ported VICE core to Emscripten
  • Add support for disk control interface v1 (disk display labels)
  • x64: Exclude vicii-clock-stretch.c – vicii-clock-stretch.c is not really used on x64, it’s only for x128
  • Disable cpmcart on x128 –
    Both x128 and cpmcart have z80 cpu and both have z80_regs symbols.
    On platforms other than emscripten those symbols end up being aliased
    due to “-fcommon” behaviour. This would lead to very weird results if they
    would ever be used together.

    On real hw cpmcart is unnecessarry due to integrated CP/M mode

    In the emulator cpmcart is runtime-enable only on x64 and x64sc but
    the relevant code is still compiled-in.

    So just remove cpmcart.c and #ifndef to avoid references

  • Core option for disabling autostart joined with autostart warp
  • Statusbar improvements, VKBD transparency core option

GME

Description: Game Music Emulator core

  • Port: Ported GME core to PSL1GHT (PS3)
  • Port: Ported GME core to 3DS

prBoom

Description: Doom 1/2 game engine core

  • retro_run: Don’t attempt to run domm lop after exit – This fixes crash on exit on 3DS
  • Port: Added PSL1GHT port (PS3)

MelonDS

Description: Nintendo DS emulator

  • (Switch) Latest updates

P-UAE

Description: Commodore Amiga emulator

  • VKBD updates, CD turbo speed backport
  • WHDLoad changes (button overrides)
  • VKBD touch control –
    Only tested with Windows and mouse, since RETRO_DEVICE_POINTER also reacts with it. Hence also disabled real mouse control while VKBD is visible.
  • Port: Ported P-UAE to PSVita
  • Minor save state improvements
  • Extended ZIP support

ScummVM

  • Update to ScummVM 2.1.1
  • Allow launching games directly from game files

Mr. Boom

Description:

  • Port: Ported Mr. Boom to 3DS
  • Port: Ported Mr. Boom to Emscripten
  • Port: Ported Mr. Boom to PSP
  • Port: Ported Mr. Boom to PS Vita
  • Port: Ported Mr. Boom to Apple tvOS
  • Fix unaligned casts

FCEUmm

Description: NES emulator core

  • Fix unable to load some unif carts
  • M274 update
  • Add 42-in-80000 multicart (m380)
  • Add mapper 389 (Caltron 9-in-1)
  • BMCFK23C – update
  • Fix default palette
  • Add Mortal Kombat Trilogy – 8 People (M1274) (Ch) [!].nes to ines-cor…
  • Merge unif board BMC-Super24in1SC03 to BMC-FK23C
  • M176: Minor tweak to chr mixed ram/rom logic check and others
  • Simplify dipswitch options for Nintendo World Championships 1990 cart
  • MMC3: Make sure to free any allocated memory when using MMC3 as an external module
  • Misc mapper updates
  • m269: Move chr unscrambling to mapper init
  • Unif: Show raw values for prg/chr rom size in logs
  • Remove unneeded code in BMC-Super24in1SC03
  • Remove duplicate code in bmc-fk23c
  • Rewrite BMC-FK23C/A (m176) based on updated notes and testing
  • Fix incompatible pointer type warning
  • Add 168-in-1 New Contra Function 16 to ines-correct.h
  • unif.c: Align board map struct
  • ines.c: Cleanup mapper struct and iNESLoad()
  • Fix unterminated savestate struct
  • Update mapper 79
  • vrc2and4: Fix mapper 22 games not working (regression) and refactoring
  • Update ines-correct.h

The Powder Toy

Description: Game engine core

  • Port: Ported The Powder Toy to 3DS

PocketCDG

Description: MP3 Karaoke audio player

  • Eliminate too verbose output – On 3DS stderr is printed on lower screen and is slow. This messes up the
    performance completely.

Picodrive

Description: Sega Megadrive/Genesis/32X/CD emulator

  • Add option to change sound quality – Even with the fast renderer (#116), the framerate on the PSP slows down at some points in some games. Reducing the sound rate can help increase the framerate in these cases.

    It’s not ideal but it’s better than frame skipping. [bmaupin]

VBA-M

Description: Game Boy Advance emulator

  • Fix Save Failed error for Super Monkey Ball Jr.

gpSP

Description: Game Boy Advance emulator

  • [3DS] Fix dynarec prefetch aborts
  • Add automatic frame skipping

Frodo

Description: Commodore 64 emulator

  • Support running without ROM

PX68K

Description: Sharp X68000 Emulator

  • Prevent simultaneous up+down / left+right button presses

Lutro

Lutro now runs on the 3DS.

Snake runs at 60 fps
Platformer runs at 20 fps

  • (3DS) Fix build
  • (Switch) Fix build