paraLLEl-RDP rewritten from scratch – available in paraLLEl n64 right now for RetroArch



The ParaLLEl N64 Libretro core has received an update today that adds the brand new paraLLEl-RDP Vulkan renderer to the emulator core.

I implore everybody to read Themaister’s blog post (Reviving and rewriting paraLLEl-RDP – Fast and accurate low-level N64 RDP emulation) for a deep dive into this new renderer.

Requirements

  • You need a graphics card that supports the Vulkan graphics API.
  • It’s currently only available on Windows and Linux.
  • Right now the renderer requires a specific Vulkan extension, called ‘VK_EXT_external_memory_host’. Only Nvidia Linux binary drivers for Vulkan currently doesn’t support this extension. It has been requested but there is no ETA yet on when they will implement this.

What’s new since the old ParaLLEl RDP?

  • Completely rewritten from the ground up
  • Bit-exact renderer
  • Should be pretty much on par with Angrylion accuracy-wise now – none of the issues that plagued the old paraLLEl RDP
  • Now emulates the VI (Video Interface) as well
  • Basic deinterlacing for interlaced video modes

How to install and set it up

  • In RetroArch, go to Online Updater.
  • (If you have paraLLEl N64 already installed) – Select ‘Update Installed Cores’. This will update all the cores that you already installed.
  • (If you don’t have paraLLEl N64 installed already) – go to ‘Core Updater’, and select ‘Nintendo – Nintendo 64 (paraLLEl N64)’.
  • Now start up a game with this core.
  • Go to the Quick Menu and go to ‘Options’. Scroll down the list until you reach ‘GFX Plugin’. Set this to ‘parallel’. Set ‘RSP plugin’ to ‘parallel’ as well.
  • For the changes to take effect, we now need to restart the core. You can either close the game or quit RetroArch and start the game up again.

Progress and development in N64 emulation over the past decade

State of HLE emulation

IMHO, this release today represents one of the biggest steps that have been taken so far to elevate Nintendo 64 emulation as a whole. N64 emulation has gotten a bad rep for over decades because of HLE RDP renderers that fail to accurately reproduce every game’s graphics correctly and tons of unemulated RSP microcode, but it’s gotten significantly better over the years. On the HLE front, things have progressed. GLideN64 has made big strides in emulating most of the major significant games, the HLE RSP implementation used by Mupen 64 Plus is starting to emulate most of the major micro codes that developers made for N64 games. So on that front, things have certainly improved. There are also obviously limiting factors on the HLE front. For instance, GLideN64 still requires OpenGL, and renderers for Vulkan and other modern graphics APIs have not been implemented as of this date (although they could be).

State of LLE emulation

So that’s the HLE front. But for the purpose of this blog article, we are mostly concerned here about Low-Level Emulation. Both HLE and LLE N64 emulation are valid approaches, but if we want to reproduce the N64 accurately, we ultimately have to go LLE. So, what is the state of LLE emulation?

For LLE emulation, some of the advancements over the past few years has been a multithreaded version of Angrylion. Angrylion is the most accurate software RDP renderer to date. Its main problem has always been how slow it is. Up until say the mid to late ’10s, desktop PCs just did not have the CPU power to run any game at fullspeed with this renderer. Multithreaded Angrylion has seen Angrylion make some big gains in the performance department previously thought unimaginable.

However, Angrylion as a software renderer can only be taken so far. The fact remains that it is a big bottleneck on the CPU, and you can easily see CPU activity exceeding over 65% on a modern rig with the multithreaded Angrylion renderer. Software rendering is just never going to be a particularly fast way of doing 3D rasterization.

So, back in 2016, the first attempt at making a hardware renderer that can compete with Angrylion was made. It was a big release for us and it marked one of the first pieces of software to be released that was designed exclusively around the then-new Vulkan graphics API. You can read our old blog post here.

It was a valiant first attempt at making a speedy Angrylion port to hardware. Unfortunately, this first version was full of bugs, and it had some big architectural issues that just made further development on it very hard. So it didn’t see much further development for the past few years.

This year, all the stars have aligned. First out of the gates was the resurrection of paraLLEl-RSP, another project by Themaister. Low-level N64 emulation places a big demand on the CPU, and while cxd4’s RSP interpreter is very accurate, to get at least a 2x leap in performance, a dynamic recompiler approach has to be taken. To that end, this year not only was paraLLEl-RSP resurrected, but we moved the dynamic recompiler architecture from LLVM to Lightrec. It’s a bit less performant than LLVM to be sure but it also has some big advantages – LLVM runtime libraries are very hard to embed and integrate for various platforms, while Lightrec doesn’t have these dependency issues. Furthermore, LLVM would take a long time recompiling code blocks, and it would cause big stutters during gameplay (for instance, bringing up the map in Doom 64 for the first time would cause like a 5-second freeze in the gameplay while it was recompiling a code block – obviously not ideal). With Lightrec, all those stutters were more or less gone.

So, Q1 2020. We now have multithreaded Angrylion which leverages the multi-core CPUs of today’s hardware to get better performance results. We have ParaLLEl RSP, a low-level RSP plugin with a dynamic recompiler that gives us a big bump in performance. But one piece of the puzzle is still missing, and it’s perhaps the most significant. Multithreaded Angrylion still is a software renderer and therefore it still massively bottlenecks the CPU. Whether you can spread that load out over multiple cores or not ultimately matters little – CPUs just are not good at doing fast 3D rasterization, a lesson learned by nearly every mid ’90s PC game developer, and why 3D accelerated hardware could not have come sooner.

So, the obvious Next Big Thing in N64 emulation was to get rid of this CPU bottleneck and move Angrylion kicking and screaming to the GPU, and this time avoid all of the issues that plagued the initial paraLLEl RDP prototype.

Where does that leave us?

With a very accurate Angrylion-quality LLE RDP renderer running on the GPU, and a dynarec LLE RSP core, you will be surprised at how accurate Mupen 64 Plus is now. Nearly every commercial game runs now as expected with nearly no graphical issues, the sound is as you’d expect it to be, it looks, runs and functions just like a real N64. And if you’re on a discrete Nvidia or AMD GPU, your GPU activity will be 4% on average, whether it’s a stone-age GPU from the year 2013 like an AMD R9 290x, or an Nvidia Geforce 2080 Ti. Nearly any discrete GPU made from 2013 to 2020 that supports the Vulkan API seems to eat low-level N64 graphics for breakfast. CPU activity also has decreased significantly. With multithreaded Angrylion and Parallel RSP, there would be about 68% CPU activity on my rig. This is brought down to just 7 to 10% using paraLLEl RDP instead of Angrylion. Software rendering on the CPU is just a huge bottleneck no matter which way you slice it.

So for most practical purposes, using the paraLLEl RDP and paraLLEl RSP cores in tandem, the future is now. Accurate N64 emulation is here, it’s no longer slow, and it’s no longer completely CPU bound either. And you can play it on RetroArch right now, right today. We don’t have to wait for a near-accurate representation of an N64, it’s already here with us for all practical gameplay purposes.

How much faster is paraLLEl RDP compared to Angrylion? That is hard to say, and depends on the game you’re running. On average you can expect a 2x speedup. However, notice that at native resolution rendering, any discrete GPU since 2013 eats this workload for breakfast. This means you’re completely CPU bound in terms of performance most of the time. The better your CPU is at single threaded workloads (IPC), the better it will perform. Core count is a less significant factor. I think on my specific rig, it was my CPU that was the weakest link in the chain (a 7700k i7 Intel CPU paired with a 2080 Ti). The GPU matters relatively little, the 2080 Ti was mostly being completely idle during these tests. For that matter, so was an old 2013 AMD card that I would test with the same CPU – GPU activity remained flat at around 4%. As Themaister has indicated in his blog post, this leaves so much room for upscaled resolutions, which is on the roadmap for future versions.

Benchmarks

System specs: CPU – Intel Core i7 7700k | GPU – Geforce RTX 2080 Ti (11GB VRAM, 2018) | 16GB RAM

Title Angrylion ParaLLEl RDP (Synchronous) ParaLLEl RDP (Asynchronous)
007 GoldenEye 82fps 119fps 133fps
Banjo Tooie 72fps 132fps 148fps
Doom 64 174fps 282fps 322fps
F-Zero X 158fps 370fps 478fps
Hexen 156fps 300fps 360fps
Indiana Jones and the Infernal Machine 61fps 94fps 114fps
Killer Instinct Gold ~103fps ~168fps ~240fps
Legend of Zelda: Majora’s Mask 122fps 202fps 220fps
Mario Kart 64 ~178fps ~309fps ~330-350fps
Perfect Dark (High-res) 70fps 125fps 130fps
Pilotwings 64 87fps 125fps 144fps
Quake 188fps 262fps 300fps
Resident Evil 2 183fps 226fps 383fps (*)
Star Wars Episode I: Battle for Naboo 90fps 136fps 178fps
Super Mario 64 129fps 204fps 220fps
Vigilante 8 (Low-res) 63fps 91fps 112fps
Vigilante 8 (High-res) ~46-55fps ~92-99fps ~119fps
World Driver Championship ~109fps ~225fps ~257fps

* – Has game breaking issues in this mode

System specs: CPU – Intel Core i7 7700k | GPU – AMD Radeon R9 290x (4GB VRAM, 2013) | 16GB RAM

Title Angrylion ParaLLEl RDP (Synchronous) ParaLLEl RDP (Asynchronous)
007 GoldenEye 82fps 119fps 133fps
Banjo Tooie 72fps 132fps 148fps
Doom 64 174fps 282fps 322fps
F-Zero X 158fps 360fps 439fps
Hexen 156fps 288fps 352fps
Indiana Jones and the Infernal Machine 61fps 94fps 114fps
Killer Instinct Gold ~93fps ~162fps ~239fps
Legend of Zelda: Majora’s Mask 122fps 202fps 220fps
Mario Kart 64 ~157fps ~274fps ~292fps
Perfect Dark (High-res) 70fps 125fps 130fps
Pilotwings 64 87fps 125fps 144fps
Quake 189fps 262fps 326fps
Resident Evil 2 156fps 226fps 383fps (*)
Star Wars Episode I: Battle for Naboo 90fps 136fps 178fps
Super Mario 64 129fps 195fps 209fps
Vigilante 8 (Low-res) 63fps 91fps 112fps
Vigilante 8 (High-res) ~46-55fps ~92-99fps ~119fps
World Driver Championship ~109fps ~224fps ~257fps

* – Has game breaking issues in this mode

Core option explanations


paraLLEl RDP has some special dedicated options. You can change these by going to Quick Menu and going to Options. Here’s a quick breakdown of what they do –

ParaLLEl Synchronous RDP:

Turning this off allows for higher CPU/GPU parallelism. However, there are certain games that might produce problems if left disabled. An example of such a game is Resident Evil 2.

It has been verified that with the vast majority of games, disabling this can provide for at least a +10fps speedup. Usually the performance difference is much higher though. Try experimenting with it. If you experience no game breaking bugs or visual anomalies, it’s safe to disable this for the game you’re running and enjoy higher performance.

Video Interface Options
ParaLLEl-RDP emulates the N64 RDP’s VI module. This applied plenty of postprocessing to the final output image to further smooth out the picture. Some of the options down below allow you to enable/disable some of these VI settings on the fly. Disabling some of these and enabling some others could be beneficial if you want to use several frontend shaders on top, since disabling some of these postprocessing effects could result in a radically different output image.

(ParaLLEl-RDP) VI Interlacing Disabling this will disable the VI serration bits used for interlaced video modes. Turning this off essentially looks like basic bob deinterlacing, the picture might become shaky as a result when leaving this off.

(ParaLLEl-RDP) VI Gamma Filter Disabling this will disable the hardware gamma filter that some games use.

(ParaLLEl-RDP) VI Divot filter Disabling this will disable the median filter which is intended to clean up some glitched pixels coming out of the RDP. Subtle difference in output, but usually seems to apply to shadow blob decals.

(ParaLLEl-RDP) VI AA Disabling this will disable Anti-Aliasing.

(ParaLLEl-RDP) VI Dither Filter The VI’s dither filter is used to make color banding less apparent with 16-bit pixels.

(ParaLLEl-RDP) VI Bilinear VI bilinear is the internal upscaler in the VI. Disabling this is typically a good idea, since it’s typically used to upscale horizontally.

By disabling VI AA and enabling VI Bilinear, the picture output looks just like how Angrylion’s “Unfiltered” mode currently looks like.

FAQ

Will this renderer be ported to OpenGL?

Here is the short answer – no. Not by us, at least. Reasons: OpenGL is an outdated API compared to Vulkan that does not support the features required by Parallel-RDP. GL does not support 8/16bit storage, external memory host, or async compute. If one would be able to make it work, it would only work on the very best GL implementation, where Vulkan is supported anyways, rendering it mostly moot.

Ports to DirectX 12 are similarly not going to be considered by us, others can feel free to do so. One word of warning – even DirectX12 (yes, even Ultimate) is found lacking when it comes to providing the graphics techniques that ParaLLEl RDP is built around. Whoever will take on the endeavor to port this to DX12 or GL 4.5/4.6 will have their work cut out for them.

Mupen64Plus Next 2.0 – 64DD Support, Angrylion and GlideN64 in one build, Parallel RSP support, and Android!



What a massive release we have for you today! M4xw has been really delivering the goods now and we’re pleased to release Mupen64plusNext 2.0 today. This release would not be as significant as it is today without the combined efforts of LuigiBlood, Gillou, Fzurita and Themaister.

The latest version is now available on Android, Linux, Windows, and Libnx (Switch)! Updating to the latest core is as easy as starting RetroArch, going to Online Updater, and selecting ‘Update Installed Cores’. If you have not installed the core yet, instead go to Online Updater and select ‘Mupen64 Plus Next’ or ‘Mupen64 Plus Next GLES3’ from the list.

64DD support


Previously, only Parallel N64 had 64 Disk Drive support, courtesy of LuigiBlood. Work on it was left rather incomplete though.

Mupen64Plus Next now has a new implementation that LuigiBlood feels more comfortable with. Currently the way that you load 64DD content with Mupen64 Plus Next is completely different from how you do it on Parallel N64.

First, you need a BIOS file. Make sure the file ‘IPL.n64’ is located in your /Mupen64plus directory.

You can either use the subsystem for 64DD, or you can name the disk image the same as the ROM including extension.

If you need to load a specific cart with the Disk image, that would be: “homebrew.n64” and “homebrew.n64.ndd” then Load Content “homebrew.n64”.

Zelda 64: Dawn & Dusk - unofficial 64DD expansion game to OOT
Zelda 64: Dawn & Dusk – unofficial 64DD expansion game to OOT
64DD had an exclusive Sim City version, called Sim City 64
64DD had an exclusive Sim City version, called Sim City 64

Angrylion and GlideN64 in same build!

Previously, Mupen64Plus Next only had GLiden64 as an RDP graphics option, and only ParaLLel N64 had Angrylion.

Now, Mupen64Plus Next has both, and allows you to choose between them. To do so, go to Quick Menu -> Options, and change RDP Mode. Angrylion is a low-level software-rendered accurate renderer, while Gliden64 is a high-level emulation OpenGL renderer.

Angrylion is the most accurate the graphics are going to get with an N64 emulator – and it can be made relatively fast now thanks to the multithreading capabilities of Angrylion RDP Plus, as well as the Parallel RSP dynarec. You cannot internally change the resolution with Angrylion beyond what the N64 was capable of.

Gliden64 on the other hand takes a more pragmatic approach and emulates the RDP with a high-level approach. It is an OpenGL renderer. You can upscale the graphics, and there is a wide array of settings to tweak.

Most regular people will probably be satisfied by Gliden64 and HLE RSP, and indeed, for many platforms, that might be the only feasible way of attaining fullspeed. But Angrylion definitely fulfills a niche for those that want a more accurate portrayal of N64 graphics – and combined with an upscaling shader, it can still look remarkably good.

Parallel RSP support

Parallel RSP saw its first debut in ParaLLel N64, and now we have it backported to Mupen64Plus Next as well! Read our articles here and here for more information on Parallel RSP.

Parallel RSP is a Low-Level RSP plugin that serves as a replacement for Cxd4. You can use it in combination with Gliden64 and/or Angrylion. With Angrylion you are pretty much required to use either Parallel RSP or Cxd4 as your RSP plugin, HLE RSP won’t work. Cxd4 is an interpreter RSP plugin while Parallel RSP is a dynarec RSP plugin. Parallel RSP should be noticeably faster across the board than Cxd4.

You might see better performance with Mupen64plus Next and Angrylion/Parallel RSP vs. ParaLLEl N4, because Mupen64Plus Next uses the New_dynarec CPU core. ParaLLEl N64 instead uses the Hacktarux dynarec CPU core, which can be a tad bit slower.

NOTE: You can also use Parallel RSP in combination with Gliden64. While HLE RSP has made significant strides in emulating the vast majority of known RSP microcodes, there might still be some microcodes that have either not been reversed at all or were not accurately reversed. In this case, an LLE RSP plugin is always an option, and Parallel RSP ought to be the faster one of the two options.

Angrylion + Parallel RSP on Android – approaching fullspeed on high end phones?

50fps with Super Mario 64 on a Galaxy S10+ - RetroArch Mupen64Plus - Angrylion - Parallel RSP
50fps with Super Mario 64 on a Galaxy S10+ – RetroArch Mupen64Plus – Angrylion – Parallel RSP

Angrylion is now available as an option for both Parallel N64 and Mupen64plus Next on Android.

Mupen64plus Next definitely has a performance advantage over Parallel N64 when it comes to Angrylion. Tests have shown that the first area in Mario 64 gets about 50-51fps on a Samsung Galaxy S10+ American Snapdragon version and 40/45fps on a Samsung Galaxy S10+ European Exynos version.

Will the next generation of phones be capable of pulling off Angrylion at fullspeed? It’s certainly a tantalizing prospect!

NOTE: There might be several ways you have to ‘nudge’ your Android device to get the best performance out of Angrylion/Parallel RSP. Some things you can try:

– Enable ‘Sustained Performance Mode’. If you find it helps with the framerate, leave it on. If not, disable it.
– Enable ‘Disable Expansion Pak’. It might result in a small performance boost for games that don’t support the Expansion Pak.
– Go to Quick Menu -> Options. VI Overlay can have an additional performance impact on the framerate. ‘Filtered’ is the most demanding option while ‘Unfiltered’ should be fastest.
– Go to Quick Menu -> Options. ‘(AL) Multi threading)’ is set to ‘all threads’ by default, but in case for whatever reason the software does not make the right core determination, you might want to set the amount of cores manually here. Base this number on the amount of CPU cores that your Android device has.

Angrylion + Cxd4
Performance results – LG G8X and Samsung Galaxy S10+ (Snapdragon)

Angrylion + Parallel RSP
Performance results – LG G8X and Samsung Galaxy S10+ (Snapdragon)

HLE RSP improvements – HVQM support

The HVQM RSP microcode has now been implemented for HLE RSP (thanks to the combined efforts of CrashOveride and Gillou). In the past, the FMVs for Pokemon Puzzle League would only show up if you used Angrylion and an LLE RSP plugin. Now the graphics glitches in Pokemon Puzzle League and Yakouchuu II should be gone! This means that you can now use the GlideN64 renderer for these games as well.

Difference between ParaLLel N64 and Mupen64Plus Next

Available plugins Mupen64Plus Next: Gliden64, Angrylion

Available plugins Parallel N64: Glide64, Parallel RDP, Rice, GLN64, Angrylion

In Mupen64Plus Next’s favor – it is based on a much more recent mupen64plus-core version than Parallel N64, and thus has benefited from years of fixes and architectural improvements. It also uses the New_dynarec CPU core on Windows/Linux/Mac. It is a bit faster than the Hacktarux dynarec from Parallel N64.

There are also currently some disadvantages. The sound is currently crackly with some games like Doom 64 and Quake 64. There are currently some experiments being explored to deal with these issues.

64DD support right now is implemented completely differently in both cores.

Changelog

  • 64DD support (works through the subsystem menu)
  • Angrylion and GlideN64 are now inside the same build – you can switch inbetween them
  • HLE and LLE RSP support – with LLE your choices are between cxd4 [Interpreter] and Parallel RSP [Lightning/Lightrec dynarec]
  • Parallel RSP support for the first time in Mupen64 Plus Next
  • Available on Android with all of the above!
  • The latest HLE RSP improvements – HVQM support – Pokemon Puzzle League FMV support works now with HLE RDP renderers like GlideN64
  • Mitigation for SPECIAL_INT on downcounter flip – fixes freezes in Legend of Zelda: Majora’s Mask
  • Killer Instinct Gold now works with Angrylion + LLE RSP

paraLLEl RDP and RSP updates (September 2016)

Unfortunately, I haven’t had much time to work on paraLLEl lately, but there is plenty to update about.

paraLLEl RSP – Clang/LLVM RSP recompiler experiment

Looking at CPU profiles, paraLLEl RDP could never really shine, as it was being held back by the CXD4 RSP interpreter, so groundbreaking speedups could not be achieved. With paraLLEl RDP, the RSP was consuming well over 50% CPU time. This was known from the beginning before RDP work even started. After the first RDP pre-alpha release, focus shifted to RSP performance, and that’s what I’ve spent most time on. None of my machines are super-clocked modern i7s, which have been required to run N64 LLE at good speed.

Micro-optimizing the interpreter is a waste of time, I needed a dynarec. However, I have never written a dynarec or JITer for that matter before, and I was not going to spend months (years?) learning how to JIT code well for ~4 architectures (x86, x64, ARMv7, ARMv8). Instead, using libclang/libllvm as my codegen proved to be an interesting hack that worked surprisingly well in practice for this project.

Continue reading “paraLLEl RDP and RSP updates (September 2016)”

Nintendo 64 Vulkan Low-Level emulator – paraLLel – pre-alpha release

Vigilante 8 running in ParaLLEl.
Vigilante 8 running in ParaLLEl.
Here is a pre-alpha release of the hotly anticipated N64 Vulkan renderer, paraLLel. To coincide with this, a new RetroArch version has also been released that includes support for the async compute interface that this new renderer requires.

Also see our other major announcements today:

RetroArch 1.3.6 released

Lutro – easy retro game creation, powered by libretro

And our earlier story featured a couple days back on ParaLLEl –

First ever Vulkan Nintendo 64 emulator, ParaLLEl, coming soon, only for Libretro/

Continue reading “Nintendo 64 Vulkan Low-Level emulator – paraLLel – pre-alpha release”

RetroArch 1.3.6 released

RetroArch keeps moving forward, being the reference frontend for libretro and all. Here comes version 1.3.6, and once again we have a lot to talk about.

Where to get it

Windows/Mac/iOS (build only)/Nintendo/PlayStation – Get it here.

Android: You can either get it from F-Droid or from Google Play Store.

Linux: Since RetroArch is included now on most mainline Linux distributions’ package management repository systems, we expect their versions to be updated to 1.3.6 shortly.

I will release versions for MacOSX PowerPC (10.5 Leopard) and 32-bit Intel MacOS X 10.6 (Snow Leopard) later on, maybe today or tomorrow.

Usability improvements

Windows Drag and Drop support

Courtesy of mudlord, with the Windows version, you can now drag and drop a ROM (or any other content) onto RetroArch’s window, and it will attempt to load the correct core for it. If there is more than one core available for the type of content you dragged and dropped, it will present you with a slidedown list of cores to select from.

Vastly improved content downloading features

Starting with v1.3.6, RetroArch users can download compatible freeware content, such as the shareware release of Doom, right from the app. This video goes through the steps, which include fetching the core from the online updater, fetching the content from the repository and then launching the core and content we just downloaded.

Menu customization and aesthetics – XMB and MaterialUI

RetroArch v1.3.6 adds support for a number of themes in the default mobile menu, including both bright and dark themes.

There’s also the ability now to set a custom wallpaper in XMB and be able to colorize it with a color gradient. To do this, you go to Settings -> Menu, you set a wallpaper, and from there you have to set ‘Menu Shader Pipeline’ to OFF. You can then choose from one of the color palettes in ‘Color Theme’ in order to shade the background wallpaper, or just select ‘Plain’ in case you don’t want to colorize it.

Undo Load/Save State

Have you ever gotten through a tough part of a game and wanted to make a savestate only to hit the “load state” button instead and have to do it all over again? Or maybe you were practicing a particularly difficult maneuver–for a speedrun, perhaps–and accidentally saved a bad run over your practice point because you hit “save state” instead of “load state”? While savestates are considered one of the great advantages to emulating retro games, they can also lead to these frustrating situations where they wipe out progress instead of saving it, all because of one slip of the finger. RetroArch now has the ability to undo a save- or load-state action through some automatic state-shuffling that happens behind the scenes, so you never have to worry about these situations again.

Undo Load State – Before the ‘current’ state is altered by e.g. a ‘Load Savestate’ operation, ‘current’ is saved in memory and ‘Undo Load State’ restores it; you can also undo this option by using it again, which will make you flip-flop between 2 states.

Undo Save State – If there was a savestate file that was overwritten, this option restores it.

New Features

The main event of RetroArch 1.3.6 is obviously the fact that it makes it possible to run the N64 Vulkan core, paraLLEl. Previous versions of RetroArch will not be able to run this because of the new extensions to libretro Vulkan which we had to push to make this renderer possible.

Vulkan

Async compute core support – ready for ParaLLEl

It was already possible to run Vulkan-enabled libretro cores, but with this release, a few crucial features have been added. Support for queue transfers was added and a context negotiation interface was added.

With this we can now use multiple queues to overlap compute and shading in the frontend level, i.e. asynchronous compute. ParaLLEl would certainly not have been as fast or as effective were it not for this.

ParaLLEl now joins triple-A games like Rise of the Tomb Raider and Doom in heavily relying on Vulkan’s async compute capabilities for maximum efficiency. A test core was also written as a proof of concept for this interface.

If you want to read more about ParaLLEl, we have a compendium blog post for you to digest here.

Supports Windows, Linux, Android equally well now

The previous version already had Vulkan support to varying degrees, but now we feel we are finally at the point where Vulkan driver support in RetroArch is very much mature across most of the supported platforms.

Vulkan should work now on Android, on Windows, and on Linux, provided your GPU has a working Vulkan driver.

On Linux we now support even more video driver context features, such as VK_KHR_display support. This is a platform-agnostic KMS-like backend for Vulkan, which should allow you to run RetroArch with Vulkan without the need of an X11 or Wayland server running.

On Windows and Android, we include Vulkan support now. Vulkan has been tested on Android with NVIDIA Shield Tablet/Console, and both work. Be aware that there are some minuscule things which might not work correctly yet with Vulkan on Android. For instance, orientation changing still doesn’t work. This will be investigated.

Max swapchain images – driving latency even lower with Vulkan and friends

RetroArch already has built up quite a reputation for itself for being able to drive latency down to very low levels. But with new technologies, there is always room for improvement.

Max amount of swapchain images has now been implemented for both the DRM/KMS context driver for OpenGL (usable on Linux) and Vulkan now. What this entails, is that you can programmatically tell your video card to provide you with either triple buffering (3), double buffering (2) or single buffering (1). The previous default with DRM/KMS was 3 (triple buffering), so setting it to 2 could potentially shave off latency by at least 1 frame (as was verified by others). Setting to 1 won’t often get you single buffering with most monitors and drivers due to tearing and they will fall-back to (2) double buffering.

With Vulkan, RetroArch can programmatically infer to the video card what kind of buffering method it likes to be able to use, a vast improvement over the nonexistent options that existed before with OpenGL (from a platform-agnostic perspective).

What Vulkan brings to the table on Android

Vulkan has been tested to run on Android devices that support Vulkan, like Shield Tablet/Console. Latency has always been very bad on Android in the past. With Vulkan, frame times are significantly lower than with OpenGL, and we no longer have to leave Threaded Video enabled by default. Instead, we can turn off Threaded Video and letting RetroArch monitor the refresh rate dynamically, which is the more desirable solution since it allows for less jittery screen updates.

Audio latency can also be driven down significantly now with Vulkan. The current default is 128ms, with Vulkan we can drive it down to 64 or even 32ms.

Couple this with the aforementioned swapchain images support and there are multiple ways to drive latency down on Android now.

OpenGL music visualizer (for FFmpeg-enabled builds)

Versions of RetroArch like the Linux and Windows port happen to feature built-in integrated FFmpeg support, which allows you to watch movies and listen to music from within the confines of RetroArch.

We have added a music visualizer now. The scene is drawn as a cylindrical mesh with FFT (Fast Fourier Transform) heightmap lookups. Different colors are shaded using mid/side channels as well as left/right information for height.

Note that this requires at least GLES3 support (which is available as well through an extension which most GPUs should support by now).

Improvements to cores

TyrQuake

e0ia1Qg

User leileilol contributed a very cool feature to TyrQuake, Quake 64-style RGB colored lighting, except done in software.

To be able to use this feature, you need to create a subdir in your Quake data directory called ‘maps’, and you need to move ‘.lit’ files to this directory. These are the lighting map files that the Tyrquake core will use in order to determine how light should be positioned.

From there on out, you load up the Tyrquake core, you go to Quick Menu -> Options, you enable Colored Lighting. Restart the core and if your files are placed correctly, you should now see the difference.

Be aware that in order to do this, the game renderer shifts to 24bit color RGB rendering, and this in turn makes things significantly slower, although it should still be fairly playable even at higher resolutions.

View the image gallery here.

To download this, go to ‘Add Content’ -> ‘Download Content’. Go to ‘Tyrquake’, and download ‘quake-colored-lighting-pack.zip’. This should extract this zip to your Downloads dir, and inside the Quake directory. From there, you can just load Quake and the colored lighting maps should be found providing the ‘Colored Lighting’ option has been enabled.

SNES9x emulator input lag reduction

A user on our forum, Brunnis, began some investigations into input latency and found that there were significant gains to be made in Super Nintendo emulators by rescheduling when input polling and video blitting are being performed. Based upon these findings and after some pull requests made to SNES9x, SNES9x Next, and FCEUmm, at least 1 to 2 frames of input lag should be shaved off now.

Do read this highly interesting forum thread that led to these improvements here.

News for iOS 10 beta users

There is now a separate version for iOS 10 users. Apple once again changed a lot of things which makes it even more difficult for us to distribute RetroArch the regular way.

Dynamic libraries cores cannot be opened from the Documents directory of the app anymore in iOS 10. They can be opened from the app bundle, as long as they are code-signed. This reverts back to the previous behavior of RetroArch, where the cores need to be in the modules directory of the app bundle.

Go to this directory:

https://github.com/libretro/RetroArch/tree/master/pkg/apple

and open RetroArch_iOS10.xcodeproj inside Xcode.

Note – you will need to manually compile the cores, sign them, and drag them over to the modules directory inside Xcode.

Example –

1. You’d download a core with libretro-super.

A quick example (type this inside the commandline)

git clone https://github.com/libretro/libretro-super.git

./libretro-fetch.sh 2048

./libretro-build.sh 2048

This will compile the 2048 core inside /dist/ios.

2. Move the contents of this directory over to the ‘modules’ directory inside the RetroArch iOS 10 Xcode solution. It should presumably handle signing by itself.

Bugfixes/other miscellanous things

  • Stability/memory leak fixes – We subjected RetroArch to numerous Valgrind/Coverity/Xcode Memory leak checks in order to fix a plethora of memory leaks that had reared their ugly heads inbetween releases. We pretty much eliminated all of them. Not a sexy feature to brag about, but it involved lots of sweat, tears and effort, and the ramifications it has on the overall stability of the program is considerable.
  • There were some problems with Cg and GLSL shader selections which should now be taken care of.
  • ScummVM games can now be scanned in various ways (courtesy of RobLoach)
  • Downloading multiple updates at once could crash RetroArch – now fixed.
  • Several cores have gotten Retro Achievements support now. The official list of systems that support achievements now is: Mega Drive, Nintendo 64, Super Nintendo, Game Boy, Game Boy Advance, Game Boy Color, NES, PC Engine, Sega CD, Sega 32X, and Sega Master System.
  • You can now turn the supported extensions filter on or off from the file browser.

Effort to addressing user experience feedback

I think a couple of things should be addressed first and foremost. First, there is every intent to indeed make things like a WIMP (Windows Icons Mouse Pointers) interface around RetroArch. To this end, we are starting to make crossplatform UI widget toolkit code that will make it easy for us to target Qt/GTK/Win32 UI/Cocoa in one fell swoop.

We have also spent a lot of time plugging some of the rough edges around RetroArch and making the user interface more pleasurable to work with.

Youtube libretro channel

Hunterk/hizzlekizzle is going to be running the libretro Youtube channel from now on, and we’ll start putting up quick and direct Youtube videos there on how to be able to use RetroArch. It is our intent that this will do a couple of things:

1. Show people that RetroArch is easy to use and has numerous great features beneath the surface too.
2. It allows users to give constructive criticism and feedback on the UI operations they see and how they think they should be improved.
3. We hope to engage some seasoned C/C++ coders to help us get some of these UI elements done sooner rather than later. Most of RetroArch development mostly relies on a handful of guys – 5 at the most. It is a LOT of hard work for what amounts to a hobbyist project, and if we had a lot more developers seasoned in C/C++, stuff could be done quicker.
4. There is no intention at all to make RetroArch ‘obtuse’ for the sake of it, there is every intention to make it more accessible for people. Additional help would go a very long way towards that.

Regarding the current UIs and their direction, it is obviously meant to be a console-like UI experience. This might not be what desktop users are used to on their PCs but it is what we designed menu drivers like XMB to be. It is true that keyboard and mouse are mostly seen as afterthoughts in this UI but really, we wrote the UI with game consoles and something where a gamepad is the primary input device at all times, particularly since a keyboard to us is a poor way of playing these console-based games anyway.

Anyway, menu drivers like XMB and MaterialUI will never have any WIMP UI elements. HOWEVER, in upcoming versions, we will be able to flesh out the menubar and to allow for more basic WIMP UI elements.

RetroArch is meant to be a cutting-edge program that is ultra-powerful in terms of features. With that comes a bit of added complexity. However, we have every intent of making things easier, and with every release we put a lot of time and effort into improving things. But again, more developers would help out a substantial lot in speeding up certain parts that we are working on.

Our vision for the project involves an enormous workload and we’re considering differnt ways of generating additional support. If a Patreon might allow us to get more developers and get more stuff done faster, we might consider it. But we want such things to be carefully deliberated by both our internal development staff and the users at large. I hope you’ll be able to appreciate the relative rough edges around the program and appreciate the scope and the craft we have poured into the program. Please appreciate that we are pouring a lot of blood, sweat and tears into the program and that mostly we try to maintain an upper stiff chin when faced with all the criticism, but we do care and we do intend to do better. Volunteer coders are very welcome though, by people who have some time to spare and who want to make a difference. We ask for your understanding here, and we hope that by finally speaking out on this, users can gain a better understanding of our intent and be able to appreciate the program better in light of that.