Here is a pre-alpha release of the hotly anticipated N64 Vulkan renderer, paraLLel. To coincide with this, a new RetroArch version has also been released that includes support for the async compute interface that this new renderer requires.
Also see our other major announcements today:
And our earlier story featured a couple days back on ParaLLEl –
What is paraLLEl?
This is a standalone libretro core for now that we keep separate from the regular Mupen64plus libretro core that it is based on. It includes only a Vulkan rendering backend and a low-level RSP. This core will only work right now if you are running it with a Vulkan driver.
In the future, paraLLEl will be the new name for our N64 emulator which (while initially starting out as a Mupen64plus core) has grown into its very own entity. It will have among other things:
– Completely rewritten CPU cores, both interpreter and dynarec. We want to fix the remaining CPU bugs that prevents Mupen64plus from being able to run certain games that PJ64 can, for instance. We also want to be able to move away from having two separate dynarec systems in Mupen64plus, where one is aimed at Atoms and ARM CPUs, and the other is meant for desktop x86, and neither is particulary fast.
– A unified HLE video renderer that combines the best of Glide64, Gliden64, Rice, and GLN64, and offers optional runtime codepaths for performance.
– A unified LLE video renderer infrastructure that allows for both software rendering (Angrylion) and the Vulkan/GL 4.3 powered equivalent (the video plugin we now call paraLLEl).
– As part of the CPU rewrite, an RSP dynarec will also have been written around that time.
How to use this
- Download RetroArch 1.3.6 (or any future version from this point on). See this blog post here.
- Download the ParaLLel core. To do this, start up RetroArch. Inside the menu, go to Online Updater -> Core Updater. Scroll down in the list until you see Nintendo 64 (ParaLLEl).
- Download it.
- IMPORTANT! READ! Before starting, make sure that you have selected the Vulkan display driver. To check this, go to Settings -> Driver, and see if ‘Video’ says ‘vulkan’. If not, select it, and then restart RetroArch for the changes to take effect.
- Load the core with a ROM.
* A GPU capable of supporting the Vulkan API
* Vulkan drivers installed on your system
ParaLLEl has been tested on an nVidia Maxwell GPU and has been confirmed to be running well.
ParaLLEl has been tested on an AMD Radeon 250 and has been confirmed to be running well.
Intel MESA iGPU
Easily the biggest problem right now out of all Big Three. Thankfully, Mesa developer Jason Ekstrand was very receptive to our feedback and together with him, we have been able to get Vulkan N64 successfully booting now on Ivy Bridge all the way up to Broadwell.
When we first started the renderer, it would crash at startup and not even let us ingame.
Apparently some of the patches that helped get paraLLEl up and running also helped The Talos Principle finally start to go-ingame, so that is nice to see!
If you want to test paraLLEl on an Intel iGPU on Linux, be sure to read on until the last paragraphs. You will have to compile a bleeding edge fork of Mesa, we will explain to you how to do this. Do note that in the future, all these patches will have been pushed to upstream, so if you are reading this blog post a month from now, just go with the latest upstream Mesa instead.
At a later date in time it is likely that your Linux distribution’s package management system will feature newer versions of Mesa with these patches already incorporated.
See some of the before/after screenshots here.
RDP emulation status
As this is a pre-alpha release, the full transplantation of Angrylion to compute shaders is not yet completely done, although many games already run reliably.
Hunterk made a couple of videos of RetroArch running ParaLLEl using RetroArch’s built-in recording feature.
Super Mario 64
The quintessential N64 game. Not too many surprises here.
Note – you might have to enable ‘Synchronous Sync’ on Intel Ivy Bridge in order to be able to progress beyond the title screen. This might not be an issue on other hardware.
Mario Kart 64
The popular karting game. Note the framebuffer readback activity on the wall.
Legend of Zelda: Majora’s Mask
The follow-up to the Legend of Zelda: Ocarina of Time.
Note – for both Majora’s Mask and Ocarina of Time, it is required to enable ‘Synchronous Sync’ so that the subscreen works correctly.
In some ways the spiritual precursor to GTA3, made even by the same company. Note a couple of things: unlike with any RDP HLE plugin ever, not only can we cross the bridge, but we can do so without the player literally walking through the bridge instead of walking correctly on top of it. High-level RSP up until now has been too buggy to accurately enable both Body Harvest US and European versions from working with the crossbridge (a patch by LegendofDragoon made it work for the European version, but with any HLE plugin you would still see the player character essentially walking through the bridge when passing over it, and in the US version you just plain cannot cross the bridge unless you are using a low-level RSP plugin.
So, low-level RDP in conjunction with low-level RSP takes care of all that.
Jet Force Gemini
One of the later games released by Rare before the big Nintendo-Rare breakup. Shares some microcode with Diddy Kong Racing and Mickey’s Speedway USA, so all three games pretty much work without any major issues on paraLLEl.
There are some things to talk about here. There is one core option in paraLLEl in specific which has a heavy bearing on emulation accuracy:
* Synchronous RDP *
Turning this off allows for higher CPU/GPU parallelism. However, we found so far that many games either won’t run or have broken framebuffer effects if left disabled, so we felt it was safer to leave it enabled for now.
It has been verified that with certain games, disabling this can provide for at least a +10fps speedup. Try experimenting with it.
Things you need to know before running the alpha
Like they say, Rome wasn’t built in a day. This is an alpha release. Keep in mind that not every game right now will run correctly but that it’s expected we will be able to nail all these issues in quick succession shortly.
* Even though this renderer delivers on its promises and completely eliminates the RDP rendering bottleneck, the RSP bottleneck still remains. In order to make emulation significantly faster, we will therefore have to start moving away from having only an LLE interpreter RSP core, and move towards a dynamic recompiler. Work on this is commencing, so this bottleneck is just temporary. For now, the beefier your CPU is, the less chances you have of running into said bottlenecks.
* Not all games will work correctly right now. Porting the Angrylion renderer to Vulkan has been a challenge to say the least. There are still some unimplemented edge cases which will be fleshed out in the upcoming weeks.
Major bugs remaining/things left to be done
- VI overlay.
- Some framebuffer management corner cases
- Some obscure combiner features
- Interlacing bugs – games that output to the screen in interlaced mode will show some graphics glitches right now. Should be one of the more trivial things to fix.
What’s different between this and Angrylion RDP AIO/CEN64?
Angrylion RDP AIO and CEN64 are using the software-rendered Angrylion plugin (the base for most RDP-based renderers) and optimizing it with SSE code and multithreading. ParaLLEl instead is porting Angrylion to the GPU (through Vulkan) with compute shaders and leveraging async compute to be able to parallelize it as best as possible. CEN64 is also going for an RSP dynarec to speed up things further.
The next step for ParaLLEl will also be an RSP dynarec to increase performance, and in fact MarathonMan (the author of CEN64) has been willing to help us out with some performance ideas in this department.
Both approaches (optimizing the software-rendered Angrylion plugin) and hardware acceleration (ParaLLEl) are valid and worthwhile.
Honestly, for this release I have not yet bothered compiling it for Android and seeing if it will run on the Shield tablet. I will start doing that after this release is over. I think it will become more useful once RSP dynarec is in but it might already be neat to be able to run it regardless.
The sourcecode for ParaLLEl can be seen here. In order to compile it yourself right now, you need to git clone the mupen64plus libretro repository, and run the following command:
Compiling bleeding-edge Mesa for Intel Vulkan (Linux) in order to use paraLLEl
There are not a lot of Vulkan test cases out right now, so paraLLEl needed some special patches pushed to Mesa in order for it to work.
Here are the steps required:
1. Type these lines into the commandline:
git clone git://people.freedesktop.org/~jekstrand/mesa
Go to the directory by typing:
git checkout wip/retroarch
./configure --with-vulkan-drivers=intel --with-gallium-drivers= --with-egl-platforms=wayland,x11 --with-dri-drivers=i965
(Verify that in the output, Vulkan drivers does not say ‘no’).
2. After we have done all this, we will need to setup a file to point to our vulkan driver. If it does not exist, create it here:
This file should point to the compiled Vulkan driver library file’s path. This is the file you just created before.
Here is a local example of how it could look like. You will have to change the filename path to wherever your libvulkan_intel.so file is located.
Here is an example of how it could look:
You should replace the library path with the exact filename location on your system. Save this file (you might need root permissions for this).
Expect major speedups soon and other exciting news regarding dynarec unification plans. Daeken will be guest blogging about that soon, and it will have far ranging ramifications for libretro cores and emulators in general. Also, expect Beetle PSX to be one of the other major beneficaries of that.