Introducing the RetroArch Open Hardware project


So ProjectFuture finally materialises! From the beginning, we have been unsatisfied with the general state of the retrogaming scene when it comes to being able to dump and play your own legally bought game cartridges. Solutions exist like the Retrode. There are some big issues with them though that limits their viability as something an average consumer can just buy readily off the shelf:

1. Super expensive.
2. No longer in production/out of stock
3. Rights to the product changing hands between sellers/store owners
4. Because of 3, usually one or two stores can only sell them.
5. The specs are closed so only a select few can assemble and sell them, limiting the ability of DIY homebrewers to make their own device.

While as a general rule of thumb, developers will always tell people to dump their own game cartridges, in reality there is nobody stepping up to the plate to make this either affordable, to integrate it well with existing software, or to make it possible for your homebrew hardware maker to easily build his own.

RetroArch Open Hardware is our attempt to shake up this sector of the retro games market, and our effort to revitalize the DIY market and shift it away from proprietary solutions. Our first Proof of Concept hardware device is an N64 cartridge adapter that you connect to any device with a USB Type-C cable. It will be relatively cheap to assemble and much faster than any existing competing device out there that does the same task.

RetroArch Integration


We have some high-level goals we aim to achieve with this project. We want seamless integration with RetroArch. When you attach this to RetroArch, it should be hopefully as simple to play the game as it is on a real game console when you plugged in the cartridge. That’s the level of integration we are aiming to achieve with this project, and none of the existing solutions out there really fit the bill.

When we mentioned before that we want RetroArch to be its own game console, we pretty much meant it. And being able to take your own game copies with you and run them with RetroArch seems like an obvious next step to take.

We have come up with a completely custom and lean design so that the person aiming to build this for themselves in DIY fashion will be able to build these relatively cheaply. We are convinced the transfer speeds are far in excess of any other similar product out on the market right now, which is just as well considering the biggest N64 game out there is 64MB in size.

The current transfer speed that we are achieving is ~4MB per second on a prototype device. Our target is a transfer speed of approximately ~ 4.5MB per second give or take.

In addition, Switch dock support will be there from Day One, working out of the box.

How does it work?

Attach it to any device and it will mount itself as a Mass Volume Storage device, mapping the cartridge as a bunch of files on the filesystem.

You insert the N64 cartridge into the cartridge reader and you connect it to a PC (or some other device) with a USB Type C-cable. The device will then map the contents of the cartridge itself as a Mass Storage device volume. EEPROM, Flash, ROM, and SRAM are mapped as separate files on this volume. (*)

Playing the game should be as easy as just loading the ROM from this device. So already even without the aforementioned RetroArch integration, it already works. But our hope is that with the RetroArch integration, we finally get the promise of a true cross-platform game console where you can take your games library with you, whether it’s digital or physical, and just use it across the devices that you already have RetroArch on. This is the dream and promise we have been slowly building towards – the power lies in the user’s hands, not that of any corporation or organization.

* – This might be subject to change. We are still considering whether to change this to a dedicated protocol to allow using cartridge hardware in an emulator core without just reading all of the cart as one big contiguous ROM file.

Prototype


This project has been ongoing now for the better part of a year. We have some internal prototypes and so far we can definitely confirm high success rates with our own cartridge collection. SRAM support already works in the firmware, but no EEPROM/FlashROM support yet. Your SRAM should work as long as your SRAM battery is not dead yet anyway. Some of these cartridges are over 20+ years old by now after all so an SRAM battery being dead is not an unlikely prospect at this point.

Some cartridges will need their cartridge connectors cleaned in order to work properly with this device. It’s a common problem among N64 game preservationists that I’m sure should not be news to anyone at this point.

Q&A


I’m sure there will be many questions in response to this article. We will remain tight lipped for now until we feel the time is right to release more details. We hope that RetroArch Open Hardware will be a contagious project that will see many contributors and participants working towards one common goal – being able to interface with the games media they’ve bought for all these decades and just being able to make it work with the software they’re already using without having to buy new closed-spec proprietary devices that lock you out of the software you’re already using. Free software is one thing, but it’s only as good as the hardware you’re running it on. Consider this our valiant effort of trying to get both sides in order.

For now, here is a gallery of screenshots to a few of our prototypes, brought to you by Sasa and m4xw.




paraLLEl N64 – Low-level RDP upscaling is finally here!

ParaLLEl RDP this year has singlehandedly caused a breakthrough in N64 emulation. For the first time, the very CPU-intensive accurate Angrylion renderer was lifted from CPU to GPU thanks to the powerful low-level graphics API Vulkan. This combined with a dynarec-powered RSP plugin has made low-level N64 emulation finally possible for the masses at great speeds on modest hardware configurations.

ParaLLEl RDP Upscaling

Jet Force Gemini running with 2x internal upscale
Jet Force Gemini running with 2x internal upscale

It quickly became apparent after launching ParaLLEl RDP that users have grown accustomed to seeing upscaled N64 graphics over the past 20 years. So something rendering at native resolution, while obviously accurate, bit-exact and all, was seen as unpalatable to them. Many users indicated over the past few weeks that upscaling was desired.

Well, now it’s here. ParaLLEl RDP is the world’s first Low-Level RDP renderer capable of upscaling. The graphics output you get is unlike any HLE renderer you’ve ever seen before for the past twenty years, since unlike them, there is full VI emulation (including dithering, divot filtering, and basic edge anti-aliasing). You can upscale in integer steps of the base resolution. When you set resolution upscaling to 2x, you are multiplying the input resolution by 2x. So 256×224 would become 512×448, 4x would be 1024×896, and 8x would be 2048×1792.

Now, here comes the good stuff with LLE RDP emulation. As said before, unlike so many HLE renderers, ParaLLEl RDP fully emulates the RCP’s VI Interface. As part of this interface’s postprocessing routines, it automatically applies an approximation of 8x MSAA (Multi-Sampled Anti-Aliasing) to the image. This means that even though our internal resolution might be 1024×896, this will then be further smoothed out by this aggressive AA postprocessing step.

Super Mario 64 running on ParaLLEl RDP with 2x internal upscale
Super Mario 64 running on ParaLLEl RDP with 2x internal upscale

This results in even games that run at just 2x native resolution looking significantly better than the same resolution running on an HLE RDP renderer. Look for instance at this Mario 64 screenshot here with the game running at 2x internal upscale (512×448).

How to install and set it up

RDP upscaling is available right now on Windows, Linux, and Android. We make no guarantees as to what kind of performance you can expect across these platforms, this is all contingent on your GPU’s Vulkan drivers and its compute power.

Anyway, here is how you can get it.

  • In RetroArch, go to Online Updater.
  • (If you have paraLLEl N64 already installed) – Select ‘Update Installed Cores’. This will update all the cores that you already installed.
  • (If you don’t have paraLLEl N64 installed already) – go to ‘Core Updater’ (older versions of RA) or ‘Core Downloader’ (newer version of RA), and select ‘Nintendo – Nintendo 64 (paraLLEl N64)’.
  • Now start up a game with this core.
  • Go to the Quick Menu and go to ‘Options’. Scroll down the list until you reach ‘GFX Plugin’. Set this to ‘parallel’. Set ‘RSP plugin’ to ‘parallel’ as well.
  • For the changes to take effect, we now need to restart the core. You can either close the game or quit RetroArch and start the game up again.

In order to upscale, you need to first set the Upscaling factor. By default, it is set to 1x (native resolution). Setting it to 2x/4x/8x then restarting the core makes the upscaling take effect.

Explanation of core options

A few new core option features have been added. We’ll briefly explain what they do and how you can go about using them.

  • (ParaLLEl-RDP) Upscaling factor (Restart)

Available options: 1x, 2x, 4x, 8x

The upscaling factor for the internal resolution. 1x is default and is the native resolution. 2x, 4x, and 8x are all possible. NOTE: It bears noting that 8x requires at least 5GB/6GB VRAM on your GPU. System requirements are steep for 8x and we generally don’t recommend anything less than a 1080 Ti or better for this. Your mileage may vary, just be forewarned. 2x and 4x by comparison are much lighter. Even when upscaling, the rendering is still rendering at full accuracy, and it is still all software rendered on the GPU. 4x upscale means 16x times the work that 1x Angrylion would churn through.

  • (paraLLEl-RDP) Downsampling

Available options: Disabled, 1/2, 1/4, 1/8

Also known as SSAA, this works pretty similar to the SSAA downscaling feature in Beetle PSX HW’s Vulkan renderer. The idea is that you internally upscale at a higher resolution, then set this option from ‘Disabled’ to any of the other values. What happens from there is that this internal higher resolution image is then downscaled to either half its size, one quarter of its size, or one eight of its size. This gives you a very smoothed out anti-aliased picture that for all intents and purposes still outputs at 240p/240i. From there, you can apply some frontend shaders on top to create a very nice and compelling look that still looks better than native resolution but is also still very faithful to it.

So, if you would want 4x resolution upscaling with 4x SSAA, you’d set ‘Downsample’ to ‘1/2’. With 4x upscale, and 1/4 downsample, you get 240p output with 16x SSAA, which looks great with CRT shaders.

  • (paraLLEl-RDP) Use native texture LOD when upscaling

This option is disabled by default.

We have so far only found one game that absolutely required this to be turned on for gameplay purposes. If you don’t have this enabled, the Princess-to-Bowser painting transition in Mario 64 is not there and instead you just see Bowser in the portrait from a far distance. There might be further improvements later to attempt to automatically detect these cases.

Most N64 games didn’t use mipmapping, but the ones that do on average benefit from this setting being off – you get higher quality LOD textures instead of a lower-quality LOD texture eventually making way for a more detailed one as you look closer. However, turning this option on could also be desirable depending on whether you favor accurate looking graphics or a facsimile of how things used to look.

  • (paraLLEl-RDP) Use native resolution for TEX_RECT

This option is on by default.

2D elements such as sprites are usually rendered with TEX_RECT commands, and trying to upscale them inevitably leads to ugly “seams” in the picture. This option forces native resolution rendering for such sprites.

 

Managing expectations

It’s important that people understand what the focus of this renderer is. There is no intent to have yet another enhancement-focused renderer here. This is the closest there has ever been to date of a full software rendered reimplementation of Angrylion on the GPU with additional niceties like upscaling. The renderer guarantees bit-exactness, what you see is what you would get on a real N64, no exceptions.

With a HLE renderer, the scene is rendered using either OpenGL or Vulkan rasterization rules. Here, neither is done – the exact rasterization steps of the RDP are followed instead, there are no API calls to GL to draw triangles here or there. So how is this done? Through compute shaders. It’s been established that you cannot correctly emulate the RDP’s rasterization rules by just simply mapping it to OpenGL. This is why previous attempts like z64gl fell flat after an initial promising start.

So the value proposition here for upscaling with ParaLLEl RDP is quite compelling – you get upscaling with the most accurate renderer this side of Angrylion. It runs well thanks to Vulkan, you can upscale all the way to 8x (which is an insane workload for a GPU done this way). And purists get the added satisfaction of seeing for the first time upscaled N64 graphics using the N64’s entire postprocessing pipeline finally in action courtesy of the VI Interface. You get nice dither filtering that smooths out really well at higher resolutions and can really fake the illusion of higher bit depth. HLE renderers have a lot of trouble with the kind of depth cuing and dithering being applied on the geometry, but ParaLLEl RDP does this effortlessly. This causes the upscaled graphics to look less sterile, whereas with traditional GL/Vulkan rasterization, you’d just see the same repeated textures everywhere with the same basic opacity everywhere. Here, we get dithering and divot filtering creating additional noise to the image leading to an overall richer picture.

So basically, the aim here is actually emulating the RDP and RSP. The focus is not on getting the majority of commercial games to just run and simulating the output they would generate through higher level API calls.

Won’t be done – where HLE wins

Therefore, the following requests will not be pursued at least in the near future:

* Widescreen rendering – Can be done through game patches (ASM patches applied directly to the ROM, or bps/ups patches or something similar). Has to be done on a per-game basis, with HLE there is some way to modify the view frustum and viewport dimensions to do this but it almost never works right due to the way the game occludes geometry and objects based on your view distance, so game patches implementing widescreen and DOF/draw distance enhancements would always be preferable.

So, in short, yes, you can do this with ParaLLEl RDP too, just with per-game specific patches. Don’t expect a core option that you can just toggle on or off.
* Rendering framebuffer effects at higher resolution – not really possible with LLE, don’t see much payoff to it either. Super-sampled framebuffer effects might be possible in theory.
* Texture resolution packs – Again, no. The nature of an LLE renderer is right there in the name, Low-Level. While the RDP is processing streams of data (fed to it by the RSP), there is barely any notion whatsoever of a ‘texture’ – it only sees TMEM uploads and tile descriptors which point to raw bytes. With High Level emulation, you have a higher abstraction level where you can ‘hook’ into the parts where you think a texture upload might be going on so you can replace it on the fly. Anyway, those looking for something like that are really at the wrong address with ParaLLEl RDP anyway. ParaLLEl RDP is about making authentic N64 rendering look as good as possible without resorting to replacing original assets or anything bootleg like that.
* Z-fighting/subpixel precision: In some games, there is some slight Z-fighting in the distance that you might see which HLE renderers typically don’t have. Again, this is because this is accurate RDP emulation. Z-fighting is a thing. The RDP only has 18-bit UNORM of depth precision with 10 bits of fractional precision during interpolation, and compression on top of that to squeeze it down to 14 bits. A HLE emulator can render at 24+ bits depth. Compounding this, because the RSP is Low-level, it’s sending 16-bit fixed point vertex coordinates to the RDP for rendering. A typical HLE renderer and HLE RSP would just determine that we are about to draw some 3D geometry and then just turn it into float values so that there is a higher level of precision when it comes to vertex positioning. If you recall, the PlayStation1’s GTE also did not deal with vertex coordinates in floats but in fixed point. There, we had to go to the effort of doing PGXP in order to convert it to float. I really doubt there is any interest to contemplate this at this point. Best to let sleeping dogs lie.
* Raw performance. HLE uses the hardware rasterization and texture units of the GPU which is far more efficient than software, but of course, it is far less accurate than software rendering.

Where LLE wins

Conversely, there are parts where LLE wins over HLE, and where HLE can’t really go –

* HLE tends to struggle with decals, depth bias doesn’t really emulate the RDP’s depth bias scheme at all since RDP depth bias is a double sided test. Depth bias is also notorious for behaving differently on different GPUs.
* Correct dithering. A HLE renderer still has to work with a fixed function blending pipeline. A software rendered rasterizer like ParaLLEl RDP does not have to work with any pre-existing graphics API setup, it implements its own rasterizer and outputs that to the screen through compute shading. Correct dither means applying dither after blending, among other things, which is not something you can generally do [with HLE]. It generally looks somewhat tacky to do dithering in OpenGL. You need 8-bit input and do blending in 8-bit but dither + quantization at the end, which you can’t do in fixed function blending.
* The entire VI postprocessing pipeline. Again, it bears repeating that not only is the RDP graphics being upscaled, so is the VI filtering. VI filtering got a bad rep on the N64 because this overaggressive quasi-8x MSAA would tend to make the already low-resolution images look even blurrier. But at higher resolutions as you can see here, it can really shine. You need programmable blending to emulate the VI’s coverage, and this just is not practical with OpenGL and/or the current HLE renderers out there. The VI has a quite ingenious filter that distributes the dither noise where it reconstructs more color depth. So not only are we getting post-processing AA courtesy of the RDP, we’re also getting more color depth.
* What You See Is What You Get. This renderer is the hypothetical what-if scenario of how an N64 Pro unit would look like that could pump out insane resolutions while still having the very same hardware. Notice that the RDP/VI implementations in ParaLLEl RDP have NOT been enhanced in any way. The only real change was modifying the rasterizer to test fractional pixel coordinates as well.
* Full accuracy with CPU readbacks. CPU can freely read and write on top of RDP rendered data, and we can easily deal with it without extra hacks.

Known issues

  • The deinterlacing process for interlaced video modes is still rather poor (just like Angrylion), basic bob and weave setup. There are plans to come up with a completely new system.
  • Mario Tennis glitches out a bit with upscaling for some reason, there might be subtle bugs in the implementation that only manifest on that game. This seems to not happen on Nvidia Windows drivers though.

Screenshots

The screenshots below here show ParaLLEl RDP running at its maximum internal input resolution, 8x the original native image. This means that when your game is running at say 256×224, it would be running at 2048×1792. But if your game is running at say 640×480 (some interlaced games actually set the resolution that high, Indiana Jones IIRC), then we’d be looking at 5120×3840. That’s bigger than 4K! Then bear in mind that on top of that you’re going to get the VI’s 8x MSAA on top of that, and you can probably begin to imagine just how demanding this is on your GPU given that it’s trying to run a custom software rasterizer on hardware. Suffice it to say, the demands for 2x and 4x will probably not be too steep, but if you’re thinking of using 8x, you better bring some serious GPU horsepower. You’ll need at least 5-6GB of VRAM for 8x internal resolution for starters.

Anyway, without much further ado, here are some glorious screenshots. GoldenEye 007 now looks dangerously close to the upscaled bullshot images on the back of the boxart!

GoldenEye 007 running with ParaLLEl RDP at 8x internal upscale
GoldenEye 007 running with ParaLLEl RDP at 8x internal upscale
Super Mario 64 running on ParaLLEl RDP with 8x internal upscale
Super Mario 64 running on ParaLLEl RDP with 8x internal upscale
Star Fox 64 running on ParaLLEl RDP with 8x internal upscale
Star Fox 64 running on ParaLLEl RDP with 8x internal upscale
Perfect Dark running on ParaLLEl RDP with 8x internal upscale in high-res mode
Perfect Dark running on ParaLLEl RDP with 8x internal upscale in high-res mode
World Driver Championship running on ParaLLEl RDP with 8x internal upscale
World Driver Championship running on ParaLLEl RDP with 8x internal upscale

Videos

Body Harvest

Perfect Dark

Legend of Zelda: Ocarina of Time

Super Mario 64

Coming to Mupen64Plus Next soon

ParaLLEl RDP will also be making its way into the upcoming new version of Mupen64Plus Next as well. Expect increased compatibility over ParaLLEl N64 (especially on Android) and potentially better performance in many games.

Future blog posts

There might eventually be some future blog post by Themaister going into more technical detail on the inner workings of ParaLLEl RDP. I will also probably release a performance test-focused blog post later testing a variety of different GPUs and how far we can take them as far as upscaling is concerned.

I can already tell you to neuter your expectations with regards to Android/mobile GPUs. I tested ParaLLEl RDP with 2x upscaling on a Samsung Galaxy S10+ and performance was about 36fps, this is with vsync off. With 1x native resolution I manage to get on average 64 to 70fps with the same games. So obviously mobile GPUs still have a lot of catching up to do with their discrete big brothers on the desktop.

At least it will make for a nice GPU benchmark for mobile hardware until we eventually crack fullspeed with 2x native!

paraLLEl N64 RDP – Android support and Intel iGPU improvements – What you should know (and what to expect)

Ridge Racer 64 running on Parallel RDP on an Android phone (with RetroArch)
Ridge Racer 64 running on Parallel RDP on an Android phone (with RetroArch)

Themaister wrote an article a few days ago talking in-depth about all the work that has gone into ParaLLEl RDP since launch.

Two of the important things discussed in this article were:
* Intel iGPU performance
* Android support

What you might not have realized from reading the article is that with the right tweaks, you can already get ParaLLEl RDP to run reasonably well. As indicated in the article he wrote, Themaister will be looking at WSI Vulkan issues specifically related to RetroArch since there definitely do seem to be some issues that have to be resolved. In the meantime, we have to resort to some workarounds. Workarounds or not, they will do the job for now.

How to install and set it up

  • In RetroArch, go to Online Updater.
  • (If you have paraLLEl N64 already installed) – Select ‘Update Installed Cores’. This will update all the cores that you already installed.
  • (If you don’t have paraLLEl N64 installed already) – go to ‘Core Updater’, and select ‘Nintendo – Nintendo 64 (paraLLEl N64)’.
  • Now start up a game with this core.
    Go to the Quick Menu and go to ‘Options’. Scroll down the list until you reach ‘GFX Plugin’. Set this to ‘parallel’. Set ‘RSP plugin’ to ‘parallel’ as well.
  • For the changes to take effect, we now need to restart the core. You can either close the game or quit RetroArch and start the game up again.

Intel iGPU

What you should do for optimum performance right now:

  • For Intel iGPU, I have found that what makes the biggest difference by far (on Windows 10 at least) is to run it in windowed mode instead of fullscreen. Fullscreen mode will have horribly crippled performance by comparison.

Performance

Once you have done this, the performance will actually not be that far behind with a run-off-the-mill iGPU from say a 2080 Ti (in asynchronous mode). Sure, it’s still a bit slower by about ~30fps, but it’s no longer the massive gulf in performance it was before where even Angrylion was beating ParaLLEl RDP in the performance department.

With synchronous, the difference between say a 2080 Ti and an iGPU should be a bit more pronounced.

Hopefully in future RetroArch versions, it will no longer be necessary to have to resort to windowed mode for good performance with Intel iGPUs. For now, this workaround will do.

Android

What you should do for optimum performance right now:

  • Turn vsync off. Go to Settings -> Video -> Synchronization, and make sure that ‘Vertical Sync (Vsync)’ is disabled.

NOTE: It is imperative that you turn V-Sync off for now. If not, performance will be so badly crippled that even Angrylion will be faster by comparison. Fortunately, there will be no noticeable screen tearing even with Vsync disabled right now.

Performance

I tested ParaLLEl RDP on two devices:

  • Nvidia Shield TV (2015)
  • Samsung Galaxy S10 Plus (2019) [European Exynos model]

NOTE: The European model of the Galaxy S10 Plus used here has the Samsung Exynos SoC (System-On-A-Chip). Generally these perform worse than the US models of the Galaxy phones, which use a Qualcomm Snapdragon SoC instead. You should therefore expect significantly better performance on a US model.

Performance on Shield TV

Here are some rough performance figures for the Nvidia Shield TV –

Title Performance
Mortal Kombat Trilogy 87 to 94fps
Yoshi’s Story 99fps
Doom 64 90 to 117fps
Tetris 64 117fps
Starcraft 64 177fps

It’s hard to put an exact number on other games, but just from a solely gameplay-focused perspective, you can get a near-locked framerate with games like Legend of Zelda: Ocarina of Time and Super Mario 64 if you run the PAL versions (which limit the framerate to 50fps instead of 60fps with NTSC versions). There might still be the odd frame drop in certain graphics intensive scenes but nothing too serious.

Similarly, games like 1080 Snowboarding drop below fullspeed with the NTSC version, but running them with the PAL version is nearly a locked framerate in all but the most intensive scenes.

Performance on Samsung Galaxy S10 Plus

Performance on a high-end 2019 phone like the Galaxy S10 Plus can tend to be more variable, probably because of the aggressive dynamic throttling being done on phones. Sometimes performance would be a significant step above the Shield TV where it could run NTSC versions of games like Legend of Zelda: Ocarina of Time and Super Mario 64 at fullspeed with no problem (save for the very odd frame drop here and there in very rare scenes), and then at other times it would perform similarly to a Shield TV. Your mileage may vary there.

Conclusions

Overall, it’s clear that certain battles have to be won on the Vulkan side, especially when it comes down to having to disable vsync at all so far for acceptable performance.

We’d like to learn more from people who have a Samsung Galaxy S20 or a similar high end phone released in 2020. Even a Snapdragon version of the S10 Plus would produce better results than what we see here.

So, Low-Level N64 emulation, is it attainable on Android? Yes, with the proper Vulkan extensions, and provided you have a reasonably modern and fast high end phone. The Shield TV is also a decent mid-range performer considering its age. Far from every game runs at fullspeed yet but the potential is certainly there for this to be a real alternative to HLE based N64 emulation on Android as hardware grows more powerful over the years.

FAQ

Some specific issues should be addressed –

Game compatibility is significantly lower on Android right now

The mupen64plus-core part of ParaLLEl N64 is older than the one found in Mupen64plus next. While on PC this is not so much of an issue because of the generally mature (but slower) Hacktarux dynarec, on ARM platforms it is a different story since new_dynarec was in a premature state back then. Not only that, LLE RDP + RSP plugin compatibility with new_dynarec was not even a consideration back then. So some games might not work at all right now with Parallel RDP+RSP on Android.

ParaLLEl N64 will likely receive a mupen64plus-core update soon, and Mupen64Plus Next might also in the near future get ParaLLEl RDP + ParaLLEl RSP support. So this situation will sort itself out.

You get a display error showing ‘ERR’ on your Android device

The Vulkan driver for your GPU is likely missing these two Vulkan extensions, which ParaLLEl RDP requires.

VK_KHR_8bit_storage
VK_KHR_16bit_storage

(Intel iGPU) Performance is halved (or more) in fullscreen mode

Known issue, read above. These issues have been identified and it’s a matter of finding the appropriate solution for these issues.

RetroArch 1.8.7 released!


RetroArch 1.8.7 has just been released.

Grab it here.

We will release a Cores Progress report soon going over all the core changes that have happened since the last report. It’s an exhaustive list, and especially the older consoles will receive a lot of new cores and improvements.

Remember that this project exists for the benefit of our users, and that we wouldn’t keep doing this were it not for spreading the love with our users. This project exists because of your support and belief in us to keep going doing great things. If you’d like to show your support, consider donating to us. Check here in order to learn more. In addition to being able to support us on Patreon, there is now also the option to sponsor us on Github Sponsors! You can also help us out by buying some of our merch on our Teespring store!

Highlights

There are many things this release post will not touch upon, such as all the extra cores that have been added to the various console platforms. We’ll spend some more time on that in a future Cores Progress Report post. We’ll go over some of the other highlights instead.

Netplay bugfixes

Some major netplay regressions snuck into version 1.8.5 and has remained a problem ever since. 1.8.7 finally fixes these issues.

New desktop-style Playlist View mode for MaterialUI

1.8.7 adds a new Desktop Thumbnail View to Material UI, available when using landscape display orientations. This is similar to Ozone’s playlist view. Above is a random screenshot showing what it looks like.

Notes:

  • The status bar at the bottom can be hidden by disabling Settings > Playlists > Show Playlist Sub-Labels
  • Touching/clicking the thumbnail bar toggles the fullscreen thumbnails view

This also represents a major refactor of MaterialUI’s menu entry handling code, which will make other kinds of playlist view mode easier to implement in the future.

Finally, this fixes two small existing issues:

  • Entry dividers now fade correctly during menu transition animations (this is subtle, and I only just realised that is wasn’t working!)
  • The ‘missing thumbnail’ placeholders now fade into view, just like normal thumbnails (previously, they were always displayed instantly, which was quite jarring)

Disable ‘Use Global Core Options File’ by default

1.8.7 changes the default setting of Use Global Core Options File to OFF.

This was only set to ON by default for consistency with legacy setups. There is no material benefit to this – in fact, a global core options file has the following downsides:

More file I/O – all options have to be read/written every time content is loaded or options are saved

Difficulty in editing option values by hand – e.g. sometimes this is necessary if a particular setting causes a core to crash, and if options for all cores are bundled together then sifting through them to find the one you need becomes a chore

Obsolescence – settings for old/unused/outdated cores hang around forever, and bloat the global options file without purpose. With per-core options, it is easy to remove settings for unwanted cores

Since settings are automatically imported from the legacy global file on first run when per-core files are enabled, changing the default behaviour will not harm any existing installation.

Don’t perform unnecessary cheevos initialisation when cheevos are disabled

Before, on all platforms with cheevos support rcheevos_load() is called each time content is loaded. This means the following happens even when cheevos are disabled:

  • If the core does not require the full content path (i.e. if RetroArch passes a data buffer directly), then a copy of the content data is made (up to 64 MB in size)
  • If the content is an m3u file, the file is opened and parsed to get the extension of the first file listed inside
  • A checksum is calculated for the content file extension
  • A task is pushed
  • A mutex is locked/unlocked several times

When Cheevos (Achievements) are disabled, all these things are unnecessary work, causing increased loading times and memory usage. On platforms with low memory (i.e. consoles) the unnecessary content data duplication is potentially harmful and may cause crashes.

1.8.7 very simply adds an ‘early out’ to rcheevos_load() which prevents the above unnecessary work when cheevos are disabled.

Cheevos: option to start a session with all achievements active

Adding an option to allow the players to start a gaming session with all achievements active (even the ones they have as unlocked on RetroAchievements.org).

When cheevos_start_active = true, instead of You have X of Y achievements unlocked, the player will see a message like this:

How the option looks in XMB:

And in Ozone:

Fallback directories for shader presets

This allows us to use the Menu Config and config file directories as fallback to store shader presets when the Video Shader directory is not writable by the user, thus following the same behavior shown by the “Save shader as” menu option.

This allows users to handle their own presets without having to mess with the directory configuration on distros such as ArchLinux, where shaders (among other assets) are managed through additional packages. But it also goes a bit further and changes the order of the preset directories, searching first on the Menu Config path, then on the Video Shader path, and finally on the directory of the config file.

This would improve the portability of the configuration for Android users, because they cannot explore the default shaders directory without rooting their devices. Moreover, I think it makes more sense, as regular configuration overrides are already being stored on the Menu Config path by default.

Testing
Assuming these directory values:

  • menu_config: /home/user/.config/retroarch/config/ (non-writable for testing purposes)
  • video_shaders: /home/user/.local/share/libretro/shaders/ (non-writable)
  • retroarch.cfg: /home/user/.config/retroarch/retroarch.cfg
  • The following menu options have all been successfully tested (see appended log output).

    Save options

    Save Shader Preset As

    [WARN] Failed writing shader preset to /home/user/.config/retroarch/config/foobar.glslp.
    [WARN] Failed writing shader preset to /home/user/.local/share/libretro/shaders/foobar.glslp.
    [INFO] Saved shader preset to /home/user/.config/retroarch/foobar.glslp.

    Save Global Preset

    [WARN] Failed to create preset directory /home/user/.config/retroarch/config/presets/.
    [WARN] Failed to create preset directory /home/user/.local/share/libretro/shaders/presets/.
    [INFO] Saved shader preset to /home/user/.config/retroarch/presets/global.glslp.

    Save Core Preset

    [WARN] Failed to create preset directory /home/user/.config/retroarch/config/presets/Snes9x/.
    [WARN] Failed to create preset directory /home/user/.local/share/libretro/shaders/presets/Snes9x/.
    [INFO] Saved shader preset to /home/user/.config/retroarch/presets/Snes9x/Snes9x.glslp.

    Save Content Directory Preset
    [WARN] Failed to create preset directory /home/user/.config/retroarch/config/presets/Snes9x/.
    [WARN] Failed to create preset directory /home/user/.local/share/libretro/shaders/presets/Snes9x/.
    [INFO] Saved shader preset to /home/user/.config/retroarch/presets/Snes9x/SNES.glslp.

    Save Game Preset
    [WARN] Failed to create preset directory /home/user/.config/retroarch/config/presets/Snes9x/.
    [WARN] Failed to create preset directory /home/user/.local/share/libretro/shaders/presets/Snes9x/.
    [INFO] Saved shader preset to /home/user/.config/retroarch/presets/Snes9x/Legend of Zelda, The – A Link to the Past (USA).glslp.

    Apply Changes option
    [WARN] Failed writing shader preset to /home/user/.config/retroarch/config/retroarch.glslp.
    [WARN] Failed writing shader preset to /home/user/.local/share/libretro/shaders/retroarch.glslp.
    [INFO] Saved shader preset to /home/user/.config/retroarch/retroarch.glslp.

    Run content log output
    Game specific shader preset found on fallback directory
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/config/presets
    [INFO] [Shaders]: preset directory: /home/user/.local/share/libretro/shaders/presets
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/presets
    [INFO] [Shaders]: Specific shader preset found at /home/user/.config/retroarch/presets/Snes9x/Legend of Zelda, The – A Link to the Past (USA).glslp.
    [INFO] [Shaders]: game-specific shader preset found.

    Folder specific shader preset found on fallback directory
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/config/presets
    [INFO] [Shaders]: preset directory: /home/user/.local/share/libretro/shaders/presets
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/presets
    [INFO] [Shaders]: Specific shader preset found at /home/user/.config/retroarch/presets/Snes9x/SNES.glslp.
    [INFO] [Shaders]: folder-specific shader preset found.

    Core specific shader preset found on fallback directory
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/config/presets
    [INFO] [Shaders]: preset directory: /home/user/.local/share/libretro/shaders/presets
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/presets
    [INFO] [Shaders]: Specific shader preset found at /home/user/.config/retroarch/presets/Snes9x/Snes9x.glslp.
    [INFO] [Shaders]: core-specific shader preset found.

    Global shader preset found on fallback directory
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/config/presets
    [INFO] [Shaders]: preset directory: /home/user/.local/share/libretro/shaders/presets
    [INFO] [Shaders]: preset directory: /home/user/.config/retroarch/presets
    [INFO] [Shaders]: Specific shader preset found at /home/user/.config/retroarch/presets/global.glslp.
    [INFO] [Shaders]: global shader preset found.

    Remove options

    Remove Global Preset
    [INFO] Deleted shader preset from /home/user/.config/retroarch/presets/global.glslp.

    Remove Core Preset
    [INFO] Deleted shader preset from /home/user/.config/retroarch/presets/Snes9x/Snes9x.glslp.

    Remove Content Directory Preset
    [INFO] Deleted shader preset from /home/user/.config/retroarch/presets/Snes9x/SNES.glslp.

    Remove Game Preset
    [INFO] Deleted shader preset from /home/user/.config/retroarch/presets/Snes9x/Legend of Zelda, The – A Link to the Past (USA).glslp.

    Some other noteworthy things

    • RetroArch WiiU now has working graphics widgets. OSD notifications are no longer just plain yellow-colored text.
    • RetroArch 3DS now has basic networking and Cheevos (RetroAchievements) support.
    • With RetroArch 1.8.7 and overclocking, NeoCD reaches fullspeed on RetroArch PSVita. Even audio playback doesn’t stutter any more

    Changelog

    What you’ve read above is just a small sampling of what 1.8.6 has to offer. There might be things that we forgot to list in the changelog listed below, but here it is for your perusal regardless.

    1.8.7

    • 3DS: Add IDs for Frodo
    • 3DS: Enable basic networking / cheevos
    • CHEEVOS/BUGFIX: Opening achievements list would crash RetroArch with badges enabled (on new games)
    • CHEEVOS: Option to start a session with all achievements active
    • CHEEVOS: Don’t perform unnecessary cheevos initialisation when cheevos are disabled. Should reduce startup times when loading content.
    • CORE OPTIONS: Disable ‘Use Global Core Options File’ by default
    • DOS/DJGPP: Add 32bit color support for cores
    • GLCORE: Switch to glcore video driver when requested by a core
    • LINUX/XDG: Use GenericName correctly in desktop entry
    • MAC/COCOA: Fix mouse cursor tracking
    • MENU/MATERIALUI: Add desktop-style playlist view mode
    • MENU/MATERIALUI/DESKTOPVIEW: When scrolling playlists, show last selected thumbnails while waiting for next entry to load
    • MENU/MATERIALUI: Limit tab switch rate when input repeat is active
    • MENU/OZONE: Fix sidebar playlist sort order when ‘Truncate Playlist Names’ is enabled
    • MENU/RGUI: Adjusted menu defaults, adjusted default scrolling speed
    • MENU/RGUI: Enable custom wallpaper when menu size is reduced at low resolutions
    • MENU/XMB: Limit tab switch rate when input repeat is active
    • NETPLAY: Fix regressions introduced in 1.8.5
    • RGUI: Add option to always stretch menu to fill the screen
    • WIIU: Enable graphics widgets

paraLLEl-RDP rewritten from scratch – available in paraLLEl n64 right now for RetroArch



The ParaLLEl N64 Libretro core has received an update today that adds the brand new paraLLEl-RDP Vulkan renderer to the emulator core.

I implore everybody to read Themaister’s blog post (Reviving and rewriting paraLLEl-RDP – Fast and accurate low-level N64 RDP emulation) for a deep dive into this new renderer.

Requirements

  • You need a graphics card that supports the Vulkan graphics API.
  • It’s currently only available on Windows and Linux.
  • Right now the renderer requires a specific Vulkan extension, called ‘VK_EXT_external_memory_host’. Only Nvidia Linux binary drivers for Vulkan currently doesn’t support this extension. It has been requested but there is no ETA yet on when they will implement this.

What’s new since the old ParaLLEl RDP?

  • Completely rewritten from the ground up
  • Bit-exact renderer
  • Should be pretty much on par with Angrylion accuracy-wise now – none of the issues that plagued the old paraLLEl RDP
  • Now emulates the VI (Video Interface) as well
  • Basic deinterlacing for interlaced video modes

How to install and set it up

  • In RetroArch, go to Online Updater.
  • (If you have paraLLEl N64 already installed) – Select ‘Update Installed Cores’. This will update all the cores that you already installed.
  • (If you don’t have paraLLEl N64 installed already) – go to ‘Core Updater’, and select ‘Nintendo – Nintendo 64 (paraLLEl N64)’.
  • Now start up a game with this core.
  • Go to the Quick Menu and go to ‘Options’. Scroll down the list until you reach ‘GFX Plugin’. Set this to ‘parallel’. Set ‘RSP plugin’ to ‘parallel’ as well.
  • For the changes to take effect, we now need to restart the core. You can either close the game or quit RetroArch and start the game up again.

Progress and development in N64 emulation over the past decade

State of HLE emulation

IMHO, this release today represents one of the biggest steps that have been taken so far to elevate Nintendo 64 emulation as a whole. N64 emulation has gotten a bad rep for over decades because of HLE RDP renderers that fail to accurately reproduce every game’s graphics correctly and tons of unemulated RSP microcode, but it’s gotten significantly better over the years. On the HLE front, things have progressed. GLideN64 has made big strides in emulating most of the major significant games, the HLE RSP implementation used by Mupen 64 Plus is starting to emulate most of the major micro codes that developers made for N64 games. So on that front, things have certainly improved. There are also obviously limiting factors on the HLE front. For instance, GLideN64 still requires OpenGL, and renderers for Vulkan and other modern graphics APIs have not been implemented as of this date (although they could be).

State of LLE emulation

So that’s the HLE front. But for the purpose of this blog article, we are mostly concerned here about Low-Level Emulation. Both HLE and LLE N64 emulation are valid approaches, but if we want to reproduce the N64 accurately, we ultimately have to go LLE. So, what is the state of LLE emulation?

For LLE emulation, some of the advancements over the past few years has been a multithreaded version of Angrylion. Angrylion is the most accurate software RDP renderer to date. Its main problem has always been how slow it is. Up until say the mid to late ’10s, desktop PCs just did not have the CPU power to run any game at fullspeed with this renderer. Multithreaded Angrylion has seen Angrylion make some big gains in the performance department previously thought unimaginable.

However, Angrylion as a software renderer can only be taken so far. The fact remains that it is a big bottleneck on the CPU, and you can easily see CPU activity exceeding over 65% on a modern rig with the multithreaded Angrylion renderer. Software rendering is just never going to be a particularly fast way of doing 3D rasterization.

So, back in 2016, the first attempt at making a hardware renderer that can compete with Angrylion was made. It was a big release for us and it marked one of the first pieces of software to be released that was designed exclusively around the then-new Vulkan graphics API. You can read our old blog post here.

It was a valiant first attempt at making a speedy Angrylion port to hardware. Unfortunately, this first version was full of bugs, and it had some big architectural issues that just made further development on it very hard. So it didn’t see much further development for the past few years.

This year, all the stars have aligned. First out of the gates was the resurrection of paraLLEl-RSP, another project by Themaister. Low-level N64 emulation places a big demand on the CPU, and while cxd4’s RSP interpreter is very accurate, to get at least a 2x leap in performance, a dynamic recompiler approach has to be taken. To that end, this year not only was paraLLEl-RSP resurrected, but we moved the dynamic recompiler architecture from LLVM to Lightrec. It’s a bit less performant than LLVM to be sure but it also has some big advantages – LLVM runtime libraries are very hard to embed and integrate for various platforms, while Lightrec doesn’t have these dependency issues. Furthermore, LLVM would take a long time recompiling code blocks, and it would cause big stutters during gameplay (for instance, bringing up the map in Doom 64 for the first time would cause like a 5-second freeze in the gameplay while it was recompiling a code block – obviously not ideal). With Lightrec, all those stutters were more or less gone.

So, Q1 2020. We now have multithreaded Angrylion which leverages the multi-core CPUs of today’s hardware to get better performance results. We have ParaLLEl RSP, a low-level RSP plugin with a dynamic recompiler that gives us a big bump in performance. But one piece of the puzzle is still missing, and it’s perhaps the most significant. Multithreaded Angrylion still is a software renderer and therefore it still massively bottlenecks the CPU. Whether you can spread that load out over multiple cores or not ultimately matters little – CPUs just are not good at doing fast 3D rasterization, a lesson learned by nearly every mid ’90s PC game developer, and why 3D accelerated hardware could not have come sooner.

So, the obvious Next Big Thing in N64 emulation was to get rid of this CPU bottleneck and move Angrylion kicking and screaming to the GPU, and this time avoid all of the issues that plagued the initial paraLLEl RDP prototype.

Where does that leave us?

With a very accurate Angrylion-quality LLE RDP renderer running on the GPU, and a dynarec LLE RSP core, you will be surprised at how accurate Mupen 64 Plus is now. Nearly every commercial game runs now as expected with nearly no graphical issues, the sound is as you’d expect it to be, it looks, runs and functions just like a real N64. And if you’re on a discrete Nvidia or AMD GPU, your GPU activity will be 4% on average, whether it’s a stone-age GPU from the year 2013 like an AMD R9 290x, or an Nvidia Geforce 2080 Ti. Nearly any discrete GPU made from 2013 to 2020 that supports the Vulkan API seems to eat low-level N64 graphics for breakfast. CPU activity also has decreased significantly. With multithreaded Angrylion and Parallel RSP, there would be about 68% CPU activity on my rig. This is brought down to just 7 to 10% using paraLLEl RDP instead of Angrylion. Software rendering on the CPU is just a huge bottleneck no matter which way you slice it.

So for most practical purposes, using the paraLLEl RDP and paraLLEl RSP cores in tandem, the future is now. Accurate N64 emulation is here, it’s no longer slow, and it’s no longer completely CPU bound either. And you can play it on RetroArch right now, right today. We don’t have to wait for a near-accurate representation of an N64, it’s already here with us for all practical gameplay purposes.

How much faster is paraLLEl RDP compared to Angrylion? That is hard to say, and depends on the game you’re running. On average you can expect a 2x speedup. However, notice that at native resolution rendering, any discrete GPU since 2013 eats this workload for breakfast. This means you’re completely CPU bound in terms of performance most of the time. The better your CPU is at single threaded workloads (IPC), the better it will perform. Core count is a less significant factor. I think on my specific rig, it was my CPU that was the weakest link in the chain (a 7700k i7 Intel CPU paired with a 2080 Ti). The GPU matters relatively little, the 2080 Ti was mostly being completely idle during these tests. For that matter, so was an old 2013 AMD card that I would test with the same CPU – GPU activity remained flat at around 4%. As Themaister has indicated in his blog post, this leaves so much room for upscaled resolutions, which is on the roadmap for future versions.

Benchmarks

System specs: CPU – Intel Core i7 7700k | GPU – Geforce RTX 2080 Ti (11GB VRAM, 2018) | 16GB RAM

Title Angrylion ParaLLEl RDP (Synchronous) ParaLLEl RDP (Asynchronous)
007 GoldenEye 82fps 119fps 133fps
Banjo Tooie 72fps 132fps 148fps
Doom 64 174fps 282fps 322fps
F-Zero X 158fps 370fps 478fps
Hexen 156fps 300fps 360fps
Indiana Jones and the Infernal Machine 61fps 94fps 114fps
Killer Instinct Gold ~103fps ~168fps ~240fps
Legend of Zelda: Majora’s Mask 122fps 202fps 220fps
Mario Kart 64 ~178fps ~309fps ~330-350fps
Perfect Dark (High-res) 70fps 125fps 130fps
Pilotwings 64 87fps 125fps 144fps
Quake 188fps 262fps 300fps
Resident Evil 2 183fps 226fps 383fps (*)
Star Wars Episode I: Battle for Naboo 90fps 136fps 178fps
Super Mario 64 129fps 204fps 220fps
Vigilante 8 (Low-res) 63fps 91fps 112fps
Vigilante 8 (High-res) ~46-55fps ~92-99fps ~119fps
World Driver Championship ~109fps ~225fps ~257fps

* – Has game breaking issues in this mode

System specs: CPU – Intel Core i7 7700k | GPU – AMD Radeon R9 290x (4GB VRAM, 2013) | 16GB RAM

Title Angrylion ParaLLEl RDP (Synchronous) ParaLLEl RDP (Asynchronous)
007 GoldenEye 82fps 119fps 133fps
Banjo Tooie 72fps 132fps 148fps
Doom 64 174fps 282fps 322fps
F-Zero X 158fps 360fps 439fps
Hexen 156fps 288fps 352fps
Indiana Jones and the Infernal Machine 61fps 94fps 114fps
Killer Instinct Gold ~93fps ~162fps ~239fps
Legend of Zelda: Majora’s Mask 122fps 202fps 220fps
Mario Kart 64 ~157fps ~274fps ~292fps
Perfect Dark (High-res) 70fps 125fps 130fps
Pilotwings 64 87fps 125fps 144fps
Quake 189fps 262fps 326fps
Resident Evil 2 156fps 226fps 383fps (*)
Star Wars Episode I: Battle for Naboo 90fps 136fps 178fps
Super Mario 64 129fps 195fps 209fps
Vigilante 8 (Low-res) 63fps 91fps 112fps
Vigilante 8 (High-res) ~46-55fps ~92-99fps ~119fps
World Driver Championship ~109fps ~224fps ~257fps

* – Has game breaking issues in this mode

Core option explanations


paraLLEl RDP has some special dedicated options. You can change these by going to Quick Menu and going to Options. Here’s a quick breakdown of what they do –

ParaLLEl Synchronous RDP:

Turning this off allows for higher CPU/GPU parallelism. However, there are certain games that might produce problems if left disabled. An example of such a game is Resident Evil 2.

It has been verified that with the vast majority of games, disabling this can provide for at least a +10fps speedup. Usually the performance difference is much higher though. Try experimenting with it. If you experience no game breaking bugs or visual anomalies, it’s safe to disable this for the game you’re running and enjoy higher performance.

Video Interface Options
ParaLLEl-RDP emulates the N64 RDP’s VI module. This applied plenty of postprocessing to the final output image to further smooth out the picture. Some of the options down below allow you to enable/disable some of these VI settings on the fly. Disabling some of these and enabling some others could be beneficial if you want to use several frontend shaders on top, since disabling some of these postprocessing effects could result in a radically different output image.

(ParaLLEl-RDP) VI Interlacing Disabling this will disable the VI serration bits used for interlaced video modes. Turning this off essentially looks like basic bob deinterlacing, the picture might become shaky as a result when leaving this off.

(ParaLLEl-RDP) VI Gamma Filter Disabling this will disable the hardware gamma filter that some games use.

(ParaLLEl-RDP) VI Divot filter Disabling this will disable the median filter which is intended to clean up some glitched pixels coming out of the RDP. Subtle difference in output, but usually seems to apply to shadow blob decals.

(ParaLLEl-RDP) VI AA Disabling this will disable Anti-Aliasing.

(ParaLLEl-RDP) VI Dither Filter The VI’s dither filter is used to make color banding less apparent with 16-bit pixels.

(ParaLLEl-RDP) VI Bilinear VI bilinear is the internal upscaler in the VI. Disabling this is typically a good idea, since it’s typically used to upscale horizontally.

By disabling VI AA and enabling VI Bilinear, the picture output looks just like how Angrylion’s “Unfiltered” mode currently looks like.

FAQ

Will this renderer be ported to OpenGL?

Here is the short answer – no. Not by us, at least. Reasons: OpenGL is an outdated API compared to Vulkan that does not support the features required by Parallel-RDP. GL does not support 8/16bit storage, external memory host, or async compute. If one would be able to make it work, it would only work on the very best GL implementation, where Vulkan is supported anyways, rendering it mostly moot.

Ports to DirectX 12 are similarly not going to be considered by us, others can feel free to do so. One word of warning – even DirectX12 (yes, even Ultimate) is found lacking when it comes to providing the graphics techniques that ParaLLEl RDP is built around. Whoever will take on the endeavor to port this to DX12 or GL 4.5/4.6 will have their work cut out for them.