Cores progress report – Catering to high-end desktops – Dolphin libretro core and others now supports resolutions of 8K and up!

Soul Calibur 2 running on the Dolphin core. Internal resolution is 12K, which gets downsampled to a 4K desktop resolution through Nvidia DSR.
Soul Calibur 2 running on the Dolphin core. Internal resolution is 12K, which gets downsampled to a 4K desktop resolution through Nvidia DSR.
Here at RetroArch/libretro, we have always insisted on catering to both the low-end as well as the high end. To further this purpose, we always make design considerations from this perspective, that whatever we do shouldn’t be at the cost of worse performance on lower specced hardware that we still support.

Newer generation emulators are increasingly catering to the high end and almost demand it by virtue of them being based on much more recent videogame systems. While testing RetroArch and various libretro cores on our new high-end Windows desktop PC, we noticed that we could really take things up a few notches to see what we could get out of the hardware.

Dolphin

While working on the Dolphin libretro core some more, we stumbled upon the issue that internal resolution increases were still not working properly. So while fixing that in the latest core, we felt that the default scaled resolution choices that Dolphin provides (up to 8x native resolution) weren’t really putting any stress on our Windows development box (a Core i7 7700K equipped with a Titan XP).

So, in the process we added some additional resolution options so you can get up to 12K. The highest possible resolution right now is 19x (12160×10032).

As for performance results, even at the highest 19x resolution, the average framerate was still around 81fps, although there were some frame drops here and there and I found it to be generally more safe to dial the internal resolution down to a more conservative 12x or 15x instead). 12x resolution would be 8680×6336, which is still well over 8K resolution.

Note that the screenshots here are compressed and they are downscaled to 4K resolution, which is my desktop resolution. This desktop resolution in turn is an Nvidia DSR custom resolution, so it effectively is a 4K resolution downsampled to my 1080p monitor. From that, I am running RetroArch with the Dolphin core. With RetroArch, downscaling is pretty much implicit and works on the fly, so through setting the internal resolution of the EFB framebuffer, I can go beyond 4K (unlike most games which just query the available desktop resolutions).





We ran some performance tests on Soul Calibur 2 with an uncapped framerate. Test box is a Core i7 7700k with 16GB of DDR4 3000MHz RAM, and an Nvidia Titan XP video card. We start out with the base 8x (slightly above 4K Ultra HD) resolution which is the highest integer scaled resolution that Dolphin usually supports. If you want to go beyond that on regular Dolphin, you have to input a custom resolution. Instead, we made the native resolution scales go all the way up to 19x.

On the Nvidia Control panel, nearly everything is maxed out – 8x anti-aliasing, MFAA, 16x Anisotropic filtering, FXAA, etc.

Resolution Performance (with OpenGL) Performance (with Vulkan)
8x (5120×4224) [for 5K] 166fps 192fps
9x (5760×4752) 165fps 192fps
10x (6400×5280) 164fps 196fps
11x (7040×5808) 163fps 197fps
12x (7680×6336) [for 8K] 161fps 193fps
13x (8320×6864) 155fps 193fps
14x (8960×7392) 152fps 193fps
15x (9600×7920) [for 9K] 139fps 193fps
16x (10240×8448) [for 10K] 126fps 172fps
17x (10880×8976) 115fps 152fps
18x (11520×9504) [for 12K] 102fps 137fps
19x (12160×10032) 93.4fps 123fps

OpenLara

OpenLara running at over 16K
OpenLara running at over 16K

The OpenLara core was previously capped at 1440p (2560×1440). We have added available resolutions now of up to 16K.

Resolution Performance
2560×1440 [for 1440p/2K] 642fps
3840×2160 [for 4K] 551fps
7680×4320 [for 8K] 407fps
15360×8640 [for 16K] 191fps
16000×9000 176fps

Craft

Craft core running at over 16K
Craft core running at over 16K

Previously, the Craft core supported only up to 1440p. Now it supports up to 16K and slightly higher.

For the Craft core, we are setting the ‘draw distance’ to 32, which is the highest available draw distance available to this core. With the draw distance set this far back, you can even see some pop-in right now (terrain that is not yet rendered and will only be rendered/shown when the viewer is closer in proximity to it).

Resolution Performance
2560×1600 [for 1440p/2K] 720fps
3840×2160 [for 4K] 646fps
7680×4320 [for 8K] 441fps
15360×8640 [for 16K] 190fps
16000×9000 168fps

Parallel N64 – Angrylion software renderer

This scene serves as our benchmark test for both the software Angrylion renderer as well as the Vulkan-based Parallel renderer.
This scene serves as our benchmark test for both the software Angrylion renderer as well as the Vulkan-based Parallel renderer.

So accurate software-based emulation of the N64 has remained an elusive pipe dream for decades. However, it seems things are finally changing now on high-end hardware.

This test was conducted on an Intel i7 7700K running at Boost Mode (4.80GHz). We are using both the OpenGL video driver and the Vulkan video driver for this test, and we are running the game Super Mario 64. The exact spot we are testing at it is at the Princess Peach castle courtyard.

Super Mario 64

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 73fps 75fps
Angrylion [with VI filter] 61fps 63fps

Quake 64

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 81fps 82.5fps
Angrylion [with VI filter] 68fps 72fps

Killer Instinct Gold

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 57.9fps 58.7fps
Angrylion [with VI filter] 54.6fps 55fps

GoldenEye 007

Tested at the Dam level – beginning

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 54.9fps 43.8fps
Angrylion [with VI filter] 45.6fps 40.9fps

Note that we are using the cxd4 RSP interpreter which, despite the SSE optimizations, would still be pretty slow compared to any RSP dynarec, so these results are impressive to say the least. There are games which dip more than this – for instance, Killer Instinct Gold can run at 48fps on the logo title screen, but on average, if you turn off VI filtering, most games should run at fullspeed with this configuration.

In case you didn’t notice already, Vulkan doesn’t really benefit us much when we do plain software rendering. We are talking maybe a conservative 3fps increase with VI filtering, and about 2fps or maybe even a bit less with VI turned off. Not much to brag about but it could help in case you barely get 60fps and you need a 2+ fps dip to avoid v-sync stutters.

Oddly enough, the sole exception to this is GoldenEye 007, where the tables are actually turned, and OpenGL actually leaps ahead of Vulkan quite significantly, conservatively by about 5fps with VI filter applied, and even higher with no VI filter. I tested this many times over to see if there was maybe a slight discrepancy going on, but I got the exact same results each and every time.

Parallel N64 – Parallel Vulkan renderer

Quake 64 on Parallel N64 - tested with both Angrylion and Parallel
Quake 64 on Parallel N64 – tested with both Angrylion and Parallel

So we have seen how software-based LLE RDP rendering runs. This puts all the workload on the CPU. So what if we reverse the situation and put it all on the GPU instead? That is essentially the promise of the Parallel Vulkan renderer. So let’s run the same tests on it.

This test was conducted on an Intel i7 7700K running at Boost Mode (4.80GHz). We are using the Vulkan video driver for this test, and we are running the game Super Mario 64. The exact spot we are testing at it is at the Princess Peach castle courtyard.

Super Mario 64

Description Performance
With synchronous RDP 192fps
Without synchronous RDP 222fps

Quake 64

Description Performance
With synchronous RDP 180fps
Without synchronous RDP 220fps

Killer Instinct Gold

Description Performance
With synchronous RDP 174fps
Without synchronous RDP 214fps

GoldenEye 007

Tested at the Dam level – beginning

Description Performance
With synchronous RDP 88fps
Without synchronous RDP 118fps

As you can see, performance nearly doubles when going from Angrylion to Parallel renderer with synchronous RDP enabled, and beyond with it disabled. Do note that asynchronous RDP is regarded as a hack and it can result in many framebuffer oriented glitches among other things, so it’s best to run with synchronous RDP for best results.

We are certain that by using the LLVM RSP dynarec, the performance difference between Angrylion and Parallel would widen even further. Even though there are still a few glitches and omissions in the Parallel renderer compared to Angrylion, it’s clear that there is a lot of promise to this approach of putting the RDP on the GPU.

Conclusion: It’s quite clear that even on a quad-core 4.8GHz i7 CPU, the CPU ‘nearly’ manages to run most games with Angrylion [software] at fullspeed but it doesn’t leave you with a lot of headroom really. Moving it to the GPU [through Parallel RDP] results in a doubling of performance with the conservative synchronous option enabled and even more if you decide to go with asynchronous mode (buggier but faster).

Beetle PSX

Previously, Beetle PSX would only provide internal resolution increases up to 8 times the original resolution. We have now extended this to 32 x for software and Vulkan, and 16x for OpenGL.

The results are surprising – while the Vulkan renderer is far more mature than the OpenGL renderer and implements the mask bit unlike the GL renderer (along with some other missing bits in the current GL renderer), the GL renderer leaps ahead in terms of performance at nearly every resolution.

Crash Bandicoot

Crash Bandicoot running at over 10K. Note this is being downsampled to 4K.
Crash Bandicoot running at over 10K. Note this is being downsampled to 4K.

Crash Bandicoot is a game that ran at a resolution of 512×240.

Resolution Performance (with OpenGL) [with PGXP] Performance (with OpenGL) [w/o PGXP] Performance (with Vulkan) [with PGXP] Performance (with Vulkan) [w/o PGXP] Performance (software OpenGL) Performance (software Vulkan)
8192×3840 [16x] [for 5K] 188.8fps 266fps 217fps 239fps 4.4fps 5.3fps
4096×1920 [8x] [for 2K] 216fps 296fps 218fps 240fps 16fps 17.5fps
2048×960 [4x] 215fps 296fps 216fps 239fps 52fps 57.9fps
1024×480 [2x] 216fps 296fps 216fps 239fps 138fps 145fps

Tekken 3

Tekken 3 running at over 10K, being downsampled to 4K.
Tekken 3 running at over 10K, being downsampled to 4K.

Tekken 3 is a game that ran at a resolution of 368×480.

Resolution Performance (with OpenGL) [with PGXP] Performance (with OpenGL) [w/o PGXP] Performance (with Vulkan) [with PGXP] Performance (with Vulkan) [w/o PGXP] Performance (software OpenGL) Performance (software Vulkan)
11776×15360 [32x] [for 12K] N/A N/A 127fps 127.4fps N/A N/A
5888×7680 [16x] [for 4K] 188.5fps 266fps 184.4fps 211fps 4.4fps 6.6fps
2944×3840 [8x] [for 2K] 186.5fps 208fps 183.5fps 269fps 22fps 25.2fps
1472×1920 [4x] 184.5fps 270fps 230.5fps 210fps 52fps 59.4fps
1024×480 [2x] 232fps 271fps 185.5fps 210fps 129fps 137fps

Reicast

Dead or Alive 2 running at over 12K resolution on Reicast
Dead or Alive 2 running at over 12K resolution on Reicast

Daytona USA 2001 running at over 12K resolution on Reicast
Daytona USA 2001 running at over 12K resolution on Reicast

Sonic Adventure running at over 12K resolution on Reicast
Sonic Adventure running at over 12K resolution on Reicast

Dead or Alive 2

Description Performance
4480×3360 206fps
5120×3840 206fps
5760×4320 206fps
6400×4800 204fps
7040×5280 206fps
7680×5760 206fps
8320×6240 204fps
8960×6720 204fps
9600×7200 207fps
10240×7680 206fps
10880×8160 207fps
11520×8640 207fps
12160×9120 194fps
12800×9600 193fps

As you can see, it isn’t until we reach 12160×9120 that Reicast’s performance finally lets up from an almost consistent 206/207fps to a somewhat lower value. Do note that this was testing the same environment. When alpha effects and RTT (Render to Texture) effects are being applied onscreen, there may well be dips on the higher than 8K resolutions whereas 8K and below would be able to handle it with relative ease.

Mupen64plus – GlideN64 OpenGL renderer

Super Mario 64 running at 8K resolution with Gliden64.
Super Mario 64 running at 8K resolution with Gliden64.

This core uses Mupen64plus as the core emulator plus the GlideN64 OpenGL renderer.

Super Mario 64

Description Performance
3840×2880 – no MSAA 617fps
3840×2880 – 2x/4x MSAA 181fps
4160×3120 – no MSAA 568fps
4160×3120 – 2x/4x MSAA 112fps
4480×3360 – no MSAA 538fps
4480×3360 – 2x/4x MSAA 103fps
4800×3600 – no MSAA 524fps
4800×3600 – 2x/4x MSAA 94fps
5120×3840 – no MSAA 486fps
5120×3840 – 2x/4x MSAA 82fps
5440×4080 – no MSAA 199fps
5440×4080 – 2x/4x MSAA 80fps
5760×4320 – no MSAA 194fs
5760×4320 – 2x/4x MSAA 74fps
6080×4560 – no MSAA 190fps
6080×4560 – 2x/4x MSAA 68fps
6400×4800 – no MSAA 186fps
6400×4800 – 2x/4x MSAA 61.3fps
7680×4320 – no MSAA 183fps
7680×4320 – 2x/4x MSAA 39.4fps

GoldenEye 007

Tested at the Dam level – beginning

Description Performance
3840×2880 – no MSAA 406fps
3840×2880 – 2x/4x MSAA 100fps
4160×3120 – no MSAA 397fps
4160×3120 – 2x/4x MSAA 65fps
4480×3360 – no MSAA 375fps
4480×3360 – 2x/4x MSAA 60fps
4800×3600 – no MSAA 342fps
4800×3600 – 2x/4x MSAA 54fps
5120×3840 – no MSAA 310fps
5120×3840 – 2x/4x MSAA 51fps
5440×4080 – no MSAA 70fps
5440×4080 – 2x/4x MSAA 46fps
5760×4320 – no MSAA 78.9fs
5760×4320 – 2x/4x MSAA 42fps
6080×4560 – no MSAA 86fps
6080×4560 – 2x/4x MSAA 37fps
6400×4800 – no MSAA 79fps
6400×4800 – 2x/4x MSAA 27fps
7680×4320 – no MSAA 79fps
7680×4320 – 2x/4x MSAA 33.2fps

Preface: Immediately after going beyond 3840×2880 (the slightly-higher than 4K resolution), we notice that turning on MSAA results in several black solid colored strips being rendered where there should be textures and geometry. Again, we notice that enabling MSAA takes a huge performance hit. It doesn’t matter either if you apply 2 or 4 samples, it is uniformly slow. We also notice several rendering bottlenecks in throughput – as soon as we move from 5120×3840 to 5440×4080 (a relatively minor bump), we go from 310fps to suddenly 70fps – a huge dropoff point. Suffice to say, while you can play with Reicast (Dreamcast emulator) and Dolphin (Gamecube/Wii) at 8K without effort and even have enough headroom to go all the way to 12K, don’t try this anytime soon with Gliden64.

We suspect there are several huge bottlenecks in this renderer that prevent it from reaching higher performance, especially since people on 1060s have also complained about less than stellar performance. That being said, there are certain advantages to Gliden64 vs. Glide64, it emulates certain FBO effects which GLide64 doesn’t. It also is less accurate than Glide64 in other areas, so you have to pick your poison on a per-game basis.

We still believe that the future of N64 emulation relies more on accurate renderers like Parallel RDP which are not riddled with per-game hacks vs. the traditional HLE RDP approach as seen in Gliden64 and Glide64. Nevertheless, people love their internal resolution upscaling, so there will always exist a builtin audience for these renderers, and it’s always nice to be able to have choices.

Shader Changes

Abstract

GLSL shaders now preferred over Cg when possible
Update to latest RetroArch for compatibility with updated GLSL shaders

Cg shaders demoted, GLSL promoted to first-class

Portability and compatibility are major goals for RetroArch and libretro, so we invested heavily in Nvidia’s Cg shader language, which worked natively anywhere their Cg Toolkit framework was available (that is, Windows, Linux and Mac OS X), as well as on PS3 and Vita, and could be machine-compiled to messy-but-usable GLSL (lacking a few features, such as runtime parameters) for platforms that lacked the framework (primarily ARM / mobile platforms). Cg was also so close to Microsoft’s HLSL shader language that many Cg shaders will compile successfully with HLSL compilers, such as those available with Windows’ D3D driver and on Xbox 360.

This was great for us because we could write shaders once and have them work pretty much everywhere.
Sadly, Nvidia deprecated the Cg language in 2012, which left us in a bad spot. Since then, we’ve been limping along with the same strategy as before, but with the uneasy understanding that Nvidia could stop supplying their Cg Toolkit framework at any time. Rather than sit idly by, waiting for that other shoe to drop, we took it upon ourselves to hand-convert the vast majority of our Cg shaders to native GLSL with all of the bells and whistles. TroggleMonkey’s monstrous masterpiece, CRT-Royale, still has a couple of bugs but is mostly working, along with its popular BVM-styled variant from user Kurozumi. Additionally, before this conversion, many of our Cg shaders were flaky or completely unusable on libretro-gl cores, such as Beetle-PSX-HW’s OpenGL renderer, but these native GLSL conversions should work reliably and consistently with any core/context except for those that require Vulkan (namely, ParaLLEl-N64’s and Beetle-PSX-HW’s Vulkan renderers).

With the GLSL shaders brought up to speed, we can finally join Nvidia in deprecating Cg, though it will still remain as an option–that is, we’re not *removing* support for Cg shaders or contexts at this point–and we will continue to use it where there is no other choice; namely, Windows’ D3D driver and the Xbox 360, PS3 and Vita ports. Moving forward, our focus for shaders will be on native GLSL and our slang/Vulkan formats, though we will likely still port some to Cg from time to time.

RetroArch now correctly handles #version directives in GLSL shaders; GLSL shader repo updated to match

There have been a number of updates to the GLSL shader language/spec over its long life, and shader authors can use #version directives (that is, a line at the top of the shader that says #version 130 or whatever) to tell compilers which flavor/version of GLSL is required for that shader. However, RetroArch has long had a strange behavior whereby it injected a couple of lines at the beginning of all GLSL shader files at compile time, and this broke any shader that attempted to use a #version directive, since those directives must be on the first line of the shader. This meant that our shaders couldn’t use #version directives at all, and all of our shaders lacked #version directives until very recently for this reason. These #version-less GLSL shaders are still perfectly compliant GLSL because GLSL v1.10 didn’t support directives, either, but the necessity of leaving off the #version started to cause some problems as we whipped our GLSL shader library into shape.

The error caused by adding a #version directive under the old behavior.

On AMD and Nvidia GPUs, the compilers would just toss up a warning about the missing directive and still expose whatever GLSL features were available to the GPU, which worked out great. On Intel IGPs, however, the compiler tosses the error and then reverts to only exposing the features available in ancient GLSL v1.10 (released way back in 2004). As a stopgap, we gave many shaders fallback codepaths that would still work in these circumstances, but a number of other shaders were either impossible to make compatible or even the compatible result was imperfect.

So, as of this commit (courtesy of aliaspider), RetroArch will no longer reject shaders with explicit #version directives, and we have added those directives to any shaders that require them at the lowest version that still compiles/functions properly. That is, if the shader doesn’t use any features that require greater than #version 110, they will still have no #version specified, and any shader that requires #version 120 but not #version 130 will not have its requirements increased to the higher version for no reason. This should keep our GLSL shaders as compatible as possible with older hardware, and including the #versions explicitly when needed will also make it easier for other programs/developers to utilize our shaders without any unnecessary guesswork due to behind-the-scenes magic.

This change does require a clean break, insofar as older versions of RetroArch will choke on the new #version directives (that is, they’ll fail to compile with the “#version must occur before any other program statement” error pictured above), so users with Nvidia or AMD GPUs must update their RetroArch installation if they want to use the updated shaders. Users with Intel IGPs will be no worse off if they don’t update, since those shaders were already broken for them, but they’ll probably *want* to update to gain access to the many fancy shaders that now work properly on their machines.

Mobile GPUs using GLES had many of the same issues that Intel IGPs had, with many shaders refusing to work without #version directives, but GLES compatibility added in a further complication: GLES requires its own separate #version directives, either #version 100 es or #version 300 es, which are different from and incompatible with desktop GL’s #versions. To get around this, we added a trick in RetroArch to change any #version of 120 or below to #version 100, which is roughly comparable in features to 120, and any #version 130 or above to #version 300 es whenever a GLES context is used. This should get everything working as effectively and consistently as possible on mobile GPUs, but if anything slipped through the cracks, be sure to file an issue report at the GLSL shader repo.

RetroArch 1.3.6 released

RetroArch keeps moving forward, being the reference frontend for libretro and all. Here comes version 1.3.6, and once again we have a lot to talk about.

Where to get it

Windows/Mac/iOS (build only)/Nintendo/PlayStation – Get it here.

Android: You can either get it from F-Droid or from Google Play Store.

Linux: Since RetroArch is included now on most mainline Linux distributions’ package management repository systems, we expect their versions to be updated to 1.3.6 shortly.

I will release versions for MacOSX PowerPC (10.5 Leopard) and 32-bit Intel MacOS X 10.6 (Snow Leopard) later on, maybe today or tomorrow.

Usability improvements

Windows Drag and Drop support

Courtesy of mudlord, with the Windows version, you can now drag and drop a ROM (or any other content) onto RetroArch’s window, and it will attempt to load the correct core for it. If there is more than one core available for the type of content you dragged and dropped, it will present you with a slidedown list of cores to select from.

Vastly improved content downloading features

Starting with v1.3.6, RetroArch users can download compatible freeware content, such as the shareware release of Doom, right from the app. This video goes through the steps, which include fetching the core from the online updater, fetching the content from the repository and then launching the core and content we just downloaded.

Menu customization and aesthetics – XMB and MaterialUI

RetroArch v1.3.6 adds support for a number of themes in the default mobile menu, including both bright and dark themes.

There’s also the ability now to set a custom wallpaper in XMB and be able to colorize it with a color gradient. To do this, you go to Settings -> Menu, you set a wallpaper, and from there you have to set ‘Menu Shader Pipeline’ to OFF. You can then choose from one of the color palettes in ‘Color Theme’ in order to shade the background wallpaper, or just select ‘Plain’ in case you don’t want to colorize it.

Undo Load/Save State

Have you ever gotten through a tough part of a game and wanted to make a savestate only to hit the “load state” button instead and have to do it all over again? Or maybe you were practicing a particularly difficult maneuver–for a speedrun, perhaps–and accidentally saved a bad run over your practice point because you hit “save state” instead of “load state”? While savestates are considered one of the great advantages to emulating retro games, they can also lead to these frustrating situations where they wipe out progress instead of saving it, all because of one slip of the finger. RetroArch now has the ability to undo a save- or load-state action through some automatic state-shuffling that happens behind the scenes, so you never have to worry about these situations again.

Undo Load State – Before the ‘current’ state is altered by e.g. a ‘Load Savestate’ operation, ‘current’ is saved in memory and ‘Undo Load State’ restores it; you can also undo this option by using it again, which will make you flip-flop between 2 states.

Undo Save State – If there was a savestate file that was overwritten, this option restores it.

New Features

The main event of RetroArch 1.3.6 is obviously the fact that it makes it possible to run the N64 Vulkan core, paraLLEl. Previous versions of RetroArch will not be able to run this because of the new extensions to libretro Vulkan which we had to push to make this renderer possible.

Vulkan

Async compute core support – ready for ParaLLEl

It was already possible to run Vulkan-enabled libretro cores, but with this release, a few crucial features have been added. Support for queue transfers was added and a context negotiation interface was added.

With this we can now use multiple queues to overlap compute and shading in the frontend level, i.e. asynchronous compute. ParaLLEl would certainly not have been as fast or as effective were it not for this.

ParaLLEl now joins triple-A games like Rise of the Tomb Raider and Doom in heavily relying on Vulkan’s async compute capabilities for maximum efficiency. A test core was also written as a proof of concept for this interface.

If you want to read more about ParaLLEl, we have a compendium blog post for you to digest here.

Supports Windows, Linux, Android equally well now

The previous version already had Vulkan support to varying degrees, but now we feel we are finally at the point where Vulkan driver support in RetroArch is very much mature across most of the supported platforms.

Vulkan should work now on Android, on Windows, and on Linux, provided your GPU has a working Vulkan driver.

On Linux we now support even more video driver context features, such as VK_KHR_display support. This is a platform-agnostic KMS-like backend for Vulkan, which should allow you to run RetroArch with Vulkan without the need of an X11 or Wayland server running.

On Windows and Android, we include Vulkan support now. Vulkan has been tested on Android with NVIDIA Shield Tablet/Console, and both work. Be aware that there are some minuscule things which might not work correctly yet with Vulkan on Android. For instance, orientation changing still doesn’t work. This will be investigated.

Max swapchain images – driving latency even lower with Vulkan and friends

RetroArch already has built up quite a reputation for itself for being able to drive latency down to very low levels. But with new technologies, there is always room for improvement.

Max amount of swapchain images has now been implemented for both the DRM/KMS context driver for OpenGL (usable on Linux) and Vulkan now. What this entails, is that you can programmatically tell your video card to provide you with either triple buffering (3), double buffering (2) or single buffering (1). The previous default with DRM/KMS was 3 (triple buffering), so setting it to 2 could potentially shave off latency by at least 1 frame (as was verified by others). Setting to 1 won’t often get you single buffering with most monitors and drivers due to tearing and they will fall-back to (2) double buffering.

With Vulkan, RetroArch can programmatically infer to the video card what kind of buffering method it likes to be able to use, a vast improvement over the nonexistent options that existed before with OpenGL (from a platform-agnostic perspective).

What Vulkan brings to the table on Android

Vulkan has been tested to run on Android devices that support Vulkan, like Shield Tablet/Console. Latency has always been very bad on Android in the past. With Vulkan, frame times are significantly lower than with OpenGL, and we no longer have to leave Threaded Video enabled by default. Instead, we can turn off Threaded Video and letting RetroArch monitor the refresh rate dynamically, which is the more desirable solution since it allows for less jittery screen updates.

Audio latency can also be driven down significantly now with Vulkan. The current default is 128ms, with Vulkan we can drive it down to 64 or even 32ms.

Couple this with the aforementioned swapchain images support and there are multiple ways to drive latency down on Android now.

OpenGL music visualizer (for FFmpeg-enabled builds)

Versions of RetroArch like the Linux and Windows port happen to feature built-in integrated FFmpeg support, which allows you to watch movies and listen to music from within the confines of RetroArch.

We have added a music visualizer now. The scene is drawn as a cylindrical mesh with FFT (Fast Fourier Transform) heightmap lookups. Different colors are shaded using mid/side channels as well as left/right information for height.

Note that this requires at least GLES3 support (which is available as well through an extension which most GPUs should support by now).

Improvements to cores

TyrQuake

e0ia1Qg

User leileilol contributed a very cool feature to TyrQuake, Quake 64-style RGB colored lighting, except done in software.

To be able to use this feature, you need to create a subdir in your Quake data directory called ‘maps’, and you need to move ‘.lit’ files to this directory. These are the lighting map files that the Tyrquake core will use in order to determine how light should be positioned.

From there on out, you load up the Tyrquake core, you go to Quick Menu -> Options, you enable Colored Lighting. Restart the core and if your files are placed correctly, you should now see the difference.

Be aware that in order to do this, the game renderer shifts to 24bit color RGB rendering, and this in turn makes things significantly slower, although it should still be fairly playable even at higher resolutions.

View the image gallery here.

To download this, go to ‘Add Content’ -> ‘Download Content’. Go to ‘Tyrquake’, and download ‘quake-colored-lighting-pack.zip’. This should extract this zip to your Downloads dir, and inside the Quake directory. From there, you can just load Quake and the colored lighting maps should be found providing the ‘Colored Lighting’ option has been enabled.

SNES9x emulator input lag reduction

A user on our forum, Brunnis, began some investigations into input latency and found that there were significant gains to be made in Super Nintendo emulators by rescheduling when input polling and video blitting are being performed. Based upon these findings and after some pull requests made to SNES9x, SNES9x Next, and FCEUmm, at least 1 to 2 frames of input lag should be shaved off now.

Do read this highly interesting forum thread that led to these improvements here.

News for iOS 10 beta users

There is now a separate version for iOS 10 users. Apple once again changed a lot of things which makes it even more difficult for us to distribute RetroArch the regular way.

Dynamic libraries cores cannot be opened from the Documents directory of the app anymore in iOS 10. They can be opened from the app bundle, as long as they are code-signed. This reverts back to the previous behavior of RetroArch, where the cores need to be in the modules directory of the app bundle.

Go to this directory:

https://github.com/libretro/RetroArch/tree/master/pkg/apple

and open RetroArch_iOS10.xcodeproj inside Xcode.

Note – you will need to manually compile the cores, sign them, and drag them over to the modules directory inside Xcode.

Example –

1. You’d download a core with libretro-super.

A quick example (type this inside the commandline)

git clone https://github.com/libretro/libretro-super.git

./libretro-fetch.sh 2048

./libretro-build.sh 2048

This will compile the 2048 core inside /dist/ios.

2. Move the contents of this directory over to the ‘modules’ directory inside the RetroArch iOS 10 Xcode solution. It should presumably handle signing by itself.

Bugfixes/other miscellanous things

  • Stability/memory leak fixes – We subjected RetroArch to numerous Valgrind/Coverity/Xcode Memory leak checks in order to fix a plethora of memory leaks that had reared their ugly heads inbetween releases. We pretty much eliminated all of them. Not a sexy feature to brag about, but it involved lots of sweat, tears and effort, and the ramifications it has on the overall stability of the program is considerable.
  • There were some problems with Cg and GLSL shader selections which should now be taken care of.
  • ScummVM games can now be scanned in various ways (courtesy of RobLoach)
  • Downloading multiple updates at once could crash RetroArch – now fixed.
  • Several cores have gotten Retro Achievements support now. The official list of systems that support achievements now is: Mega Drive, Nintendo 64, Super Nintendo, Game Boy, Game Boy Advance, Game Boy Color, NES, PC Engine, Sega CD, Sega 32X, and Sega Master System.
  • You can now turn the supported extensions filter on or off from the file browser.

Effort to addressing user experience feedback

I think a couple of things should be addressed first and foremost. First, there is every intent to indeed make things like a WIMP (Windows Icons Mouse Pointers) interface around RetroArch. To this end, we are starting to make crossplatform UI widget toolkit code that will make it easy for us to target Qt/GTK/Win32 UI/Cocoa in one fell swoop.

We have also spent a lot of time plugging some of the rough edges around RetroArch and making the user interface more pleasurable to work with.

Youtube libretro channel

Hunterk/hizzlekizzle is going to be running the libretro Youtube channel from now on, and we’ll start putting up quick and direct Youtube videos there on how to be able to use RetroArch. It is our intent that this will do a couple of things:

1. Show people that RetroArch is easy to use and has numerous great features beneath the surface too.
2. It allows users to give constructive criticism and feedback on the UI operations they see and how they think they should be improved.
3. We hope to engage some seasoned C/C++ coders to help us get some of these UI elements done sooner rather than later. Most of RetroArch development mostly relies on a handful of guys – 5 at the most. It is a LOT of hard work for what amounts to a hobbyist project, and if we had a lot more developers seasoned in C/C++, stuff could be done quicker.
4. There is no intention at all to make RetroArch ‘obtuse’ for the sake of it, there is every intention to make it more accessible for people. Additional help would go a very long way towards that.

Regarding the current UIs and their direction, it is obviously meant to be a console-like UI experience. This might not be what desktop users are used to on their PCs but it is what we designed menu drivers like XMB to be. It is true that keyboard and mouse are mostly seen as afterthoughts in this UI but really, we wrote the UI with game consoles and something where a gamepad is the primary input device at all times, particularly since a keyboard to us is a poor way of playing these console-based games anyway.

Anyway, menu drivers like XMB and MaterialUI will never have any WIMP UI elements. HOWEVER, in upcoming versions, we will be able to flesh out the menubar and to allow for more basic WIMP UI elements.

RetroArch is meant to be a cutting-edge program that is ultra-powerful in terms of features. With that comes a bit of added complexity. However, we have every intent of making things easier, and with every release we put a lot of time and effort into improving things. But again, more developers would help out a substantial lot in speeding up certain parts that we are working on.

Our vision for the project involves an enormous workload and we’re considering differnt ways of generating additional support. If a Patreon might allow us to get more developers and get more stuff done faster, we might consider it. But we want such things to be carefully deliberated by both our internal development staff and the users at large. I hope you’ll be able to appreciate the relative rough edges around the program and appreciate the scope and the craft we have poured into the program. Please appreciate that we are pouring a lot of blood, sweat and tears into the program and that mostly we try to maintain an upper stiff chin when faced with all the criticism, but we do care and we do intend to do better. Volunteer coders are very welcome though, by people who have some time to spare and who want to make a difference. We ask for your understanding here, and we hope that by finally speaking out on this, users can gain a better understanding of our intent and be able to appreciate the program better in light of that.

Vulkan progress report and initial impressions

Vulkan_API_logo.svg

So it’s already a couple of days ago since Vulkan launched. We were one of the first non-test programs in the world to launch on Day One with a quite mature Vulkan implementation. So far it seems only ARM Mali, nVidia and Intel Broadwell Mesa drivers can handle all the features inside RetroArch’s Vulkan backend.

This is a followup to the earlier article which can be found here.

WSI XCB support

Back when RetroArch added Vulkan support on Launch Day, there was only a working Wayland implementation.

We support Vulkan over XCB now too. This means you can get Vulkan to run now on Linux with an nVidia GPU since their binary  blob driver doesn’t support Wayland.

I have tested the XCB context successfully on an Intel Ivy Bridge GPU as well. You need DRI3 support in order for this to work.

In case the Vulkan driver does not work for you on Intel, try creating the file ‘/etc/X11/xorg.conf.d/20-intel.conf. The file can be found here.

We recommend that if you have the option to choose between the two (and on some GPUs you simply might not have this option, like nVidia), that you pick Wayland over XCB and an X11 server. It’s a lot smoother.

Continue reading “Vulkan progress report and initial impressions”