Mupen64Plus Next 2.0 – 64DD Support, Angrylion and GlideN64 in one build, Parallel RSP support, and Android!



What a massive release we have for you today! M4xw has been really delivering the goods now and we’re pleased to release Mupen64plusNext 2.0 today. This release would not be as significant as it is today without the combined efforts of LuigiBlood, Gillou, Fzurita and Themaister.

The latest version is now available on Android, Linux, Windows, and Libnx (Switch)! Updating to the latest core is as easy as starting RetroArch, going to Online Updater, and selecting ‘Update Installed Cores’. If you have not installed the core yet, instead go to Online Updater and select ‘Mupen64 Plus Next’ or ‘Mupen64 Plus Next GLES3’ from the list.

64DD support


Previously, only Parallel N64 had 64 Disk Drive support, courtesy of LuigiBlood. Work on it was left rather incomplete though.

Mupen64Plus Next now has a new implementation that LuigiBlood feels more comfortable with. Currently the way that you load 64DD content with Mupen64 Plus Next is completely different from how you do it on Parallel N64.

First, you need a BIOS file. Make sure the file ‘IPL.n64’ is located in your /Mupen64plus directory.

You can either use the subsystem for 64DD, or you can name the disk image the same as the ROM including extension.

If you need to load a specific cart with the Disk image, that would be: “homebrew.n64” and “homebrew.n64.ndd” then Load Content “homebrew.n64”.

Zelda 64: Dawn & Dusk - unofficial 64DD expansion game to OOT
Zelda 64: Dawn & Dusk – unofficial 64DD expansion game to OOT
64DD had an exclusive Sim City version, called Sim City 64
64DD had an exclusive Sim City version, called Sim City 64

Angrylion and GlideN64 in same build!

Previously, Mupen64Plus Next only had GLiden64 as an RDP graphics option, and only ParaLLel N64 had Angrylion.

Now, Mupen64Plus Next has both, and allows you to choose between them. To do so, go to Quick Menu -> Options, and change RDP Mode. Angrylion is a low-level software-rendered accurate renderer, while Gliden64 is a high-level emulation OpenGL renderer.

Angrylion is the most accurate the graphics are going to get with an N64 emulator – and it can be made relatively fast now thanks to the multithreading capabilities of Angrylion RDP Plus, as well as the Parallel RSP dynarec. You cannot internally change the resolution with Angrylion beyond what the N64 was capable of.

Gliden64 on the other hand takes a more pragmatic approach and emulates the RDP with a high-level approach. It is an OpenGL renderer. You can upscale the graphics, and there is a wide array of settings to tweak.

Most regular people will probably be satisfied by Gliden64 and HLE RSP, and indeed, for many platforms, that might be the only feasible way of attaining fullspeed. But Angrylion definitely fulfills a niche for those that want a more accurate portrayal of N64 graphics – and combined with an upscaling shader, it can still look remarkably good.

Parallel RSP support

Parallel RSP saw its first debut in ParaLLel N64, and now we have it backported to Mupen64Plus Next as well! Read our articles here and here for more information on Parallel RSP.

Parallel RSP is a Low-Level RSP plugin that serves as a replacement for Cxd4. You can use it in combination with Gliden64 and/or Angrylion. With Angrylion you are pretty much required to use either Parallel RSP or Cxd4 as your RSP plugin, HLE RSP won’t work. Cxd4 is an interpreter RSP plugin while Parallel RSP is a dynarec RSP plugin. Parallel RSP should be noticeably faster across the board than Cxd4.

You might see better performance with Mupen64plus Next and Angrylion/Parallel RSP vs. ParaLLEl N4, because Mupen64Plus Next uses the New_dynarec CPU core. ParaLLEl N64 instead uses the Hacktarux dynarec CPU core, which can be a tad bit slower.

NOTE: You can also use Parallel RSP in combination with Gliden64. While HLE RSP has made significant strides in emulating the vast majority of known RSP microcodes, there might still be some microcodes that have either not been reversed at all or were not accurately reversed. In this case, an LLE RSP plugin is always an option, and Parallel RSP ought to be the faster one of the two options.

Angrylion + Parallel RSP on Android – approaching fullspeed on high end phones?

50fps with Super Mario 64 on a Galaxy S10+ - RetroArch Mupen64Plus - Angrylion - Parallel RSP
50fps with Super Mario 64 on a Galaxy S10+ – RetroArch Mupen64Plus – Angrylion – Parallel RSP

Angrylion is now available as an option for both Parallel N64 and Mupen64plus Next on Android.

Mupen64plus Next definitely has a performance advantage over Parallel N64 when it comes to Angrylion. Tests have shown that the first area in Mario 64 gets about 50-51fps on a Samsung Galaxy S10+ American Snapdragon version and 40/45fps on a Samsung Galaxy S10+ European Exynos version.

Will the next generation of phones be capable of pulling off Angrylion at fullspeed? It’s certainly a tantalizing prospect!

NOTE: There might be several ways you have to ‘nudge’ your Android device to get the best performance out of Angrylion/Parallel RSP. Some things you can try:

– Enable ‘Sustained Performance Mode’. If you find it helps with the framerate, leave it on. If not, disable it.
– Enable ‘Disable Expansion Pak’. It might result in a small performance boost for games that don’t support the Expansion Pak.
– Go to Quick Menu -> Options. VI Overlay can have an additional performance impact on the framerate. ‘Filtered’ is the most demanding option while ‘Unfiltered’ should be fastest.
– Go to Quick Menu -> Options. ‘(AL) Multi threading)’ is set to ‘all threads’ by default, but in case for whatever reason the software does not make the right core determination, you might want to set the amount of cores manually here. Base this number on the amount of CPU cores that your Android device has.

Angrylion + Cxd4
Performance results – LG G8X and Samsung Galaxy S10+ (Snapdragon)

Angrylion + Parallel RSP
Performance results – LG G8X and Samsung Galaxy S10+ (Snapdragon)

HLE RSP improvements – HVQM support

The HVQM RSP microcode has now been implemented for HLE RSP (thanks to the combined efforts of CrashOveride and Gillou). In the past, the FMVs for Pokemon Puzzle League would only show up if you used Angrylion and an LLE RSP plugin. Now the graphics glitches in Pokemon Puzzle League and Yakouchuu II should be gone! This means that you can now use the GlideN64 renderer for these games as well.

Difference between ParaLLel N64 and Mupen64Plus Next

Available plugins Mupen64Plus Next: Gliden64, Angrylion

Available plugins Parallel N64: Glide64, Parallel RDP, Rice, GLN64, Angrylion

In Mupen64Plus Next’s favor – it is based on a much more recent mupen64plus-core version than Parallel N64, and thus has benefited from years of fixes and architectural improvements. It also uses the New_dynarec CPU core on Windows/Linux/Mac. It is a bit faster than the Hacktarux dynarec from Parallel N64.

There are also currently some disadvantages. The sound is currently crackly with some games like Doom 64 and Quake 64. There are currently some experiments being explored to deal with these issues.

64DD support right now is implemented completely differently in both cores.

Changelog

  • 64DD support (works through the subsystem menu)
  • Angrylion and GlideN64 are now inside the same build – you can switch inbetween them
  • HLE and LLE RSP support – with LLE your choices are between cxd4 [Interpreter] and Parallel RSP [Lightning/Lightrec dynarec]
  • Parallel RSP support for the first time in Mupen64 Plus Next
  • Available on Android with all of the above!
  • The latest HLE RSP improvements – HVQM support – Pokemon Puzzle League FMV support works now with HLE RDP renderers like GlideN64
  • Mitigation for SPECIAL_INT on downcounter flip – fixes freezes in Legend of Zelda: Majora’s Mask
  • Killer Instinct Gold now works with Angrylion + LLE RSP

Parallel N64 with Parallel RSP dynarec release – fast and accurate N64 emulation is now here!

Hot on the heels of our Beetle PSX dynarec public beta release, here comes another bombshell, this time targeting that other big 5th generation console, the Nintendo 64.

ParaLLel N64 is back with a vengeance, and this time Parallel RSP coupled with Angrylion Plus RDP is the enabling technology for a big performance jump in low-level accurate N64 emulation.

Parallel RSP

We have released today a new version of Parallel N64 for Windows and Linux that adds back the Parallel RSP dynarec. This together with multithreaded Angrylion makes Parallel N64 the fastest LLE N64 emulator by far.

We have also added missing bits and pieces to Parallel RSP that allows for several games to work, such as World Driver Championship, Stunt Racer 64, and Gauntlet Legends. On top of that, we also optimized some parts in Angrylion RDP Plus which resulted in greatly enhanced core/thread utilization on a 16-core Ryzen CPU. It should also similarly scale well downwards.

Just to illustrate how dramatic of a difference all these performance-focused enhancements make: on an underpowered 2012 Core i5 laptop, the following games are capable of being played at fullspeed: Resident Evil 2, Mario Kart 64, 1080 Snowboarding, Doom 64, Quake 64, Mischief Makers, Bakuretsu Muteki Bangaioh, Bust-A Move 2 Arcade Edition, Kirby 64 – The Crystal Shards, Mortal Kombat Trilogy, Forsaken 64, Harvest Moon 64.

A 2012 Core i5 desktop CPU like the Core i5 3570k should be more than capable enough of playing the vast majority of N64 games at fullspeed with Angrylion+Parallel RSP, with only rare exceptions like Star Wars Episode 1 Racer being too much for it.

Dramatically lowering the performance ceiling like this has vast implications for the viability of low-level accurate N64 emulation, and opens the door for more people to enjoy bug-free N64 emulation, previously the preserve of only the most overpowered PCs.

Right now, only Linux and Windows are able to enjoy the benefits of Parallel RSP due to the LLVM dependency. This might solve itself later on, though.

How to get it

On Windows: Just downloaded the latest Parallel N64 core from RetroArch’s Online Updater menu. You can either download it from Core Updater, or you can select ‘Update Installed Cores’ if you already had a prior version of Parallel N64 installed.

On Linux: Download the latest Parallel N64 core from RetroArch’s Online Updater menu. You can either download it from Core Updater, or you can select ‘Update Installed Cores’ if you already had a prior version of Parallel N64 installed.

IMPORTANT: On Linux, the situation is a bit more complicated. If you find that the core won’t load on your Linux distribution, it is because LLVM on Linux normally links against libtinfo. For the version of Parallel N64, you will need to make sure you have libtinfo5 installed.

On Ubuntu Linux, you can do this by going into the terminal and typing in the following:

sudo apt-get install libtinfo5

Consult the documentation of your Linux distribution’s package manager for more details on how you can download this package. Once installed, the core should work. If you have a more recent version of libtinfo already installed, creating a symlink for libinfo5.so might be enough.

How to use it

Go into Quick Menu -> Options.

Make sure that ‘GFX Plugin’ is set to ‘angrylion’, and ‘RSP Plugin’ is set to ‘parallel’.

Restart the core.

Explanation of the options

Angrylion (VI Mode) – The N64’s video output interface ‘VI’ would postprocess the image in a very elaborate way. We will explain all of the modes down below:

VI Filter: – All VI filtering is being applied with no omissions.

VI Filter – AA+Blur – Only anti-aliasing + blur filtering by the VI is being applied.

VI Filter – AA+DeDither – Only anti-aliasing + dedithering filtering by the VI is being applied.

VI Filter – AA Only – Only anti-aliasing by the VI is being applied.

Unfiltered: – This is the fastest mode. There is no VI filtering being applied of any sort, the color buffer is being directly output.

Depth: – The VI’s depth buffer as a grayscale image

Coverage: – Coverage as a grayscale image

Angrylion (Thread sync level) – With most games it is fine to leave this at ‘Low’. Low will be the fastest, with High being significantly slower. Games which need Thread sync level set to High to prevent rendering glitches include but are not limited to Starcraft 64, Conker’s Bad Fur Day and Paper Mario.

Angrylion (Multi-threading) – You will want to always enable this. When it is disabled, Angrylion will be entirely single-threaded. It will be much slower in single-threaded mode.

Increased compatibility for Parallel RSP

The following games will now work with Parallel RSP:

* World Driver Championship
* Gauntlet Legends
* Stunt Racer 64
* Mario no Photopie (graphics fixed)

Performance tests

NOTE: All these tests were performed with the game Super Mario 64 (USA). Exact scene being tested can be seen in the screenshot down below.


* Cxd4 means the Interpreter RSP core. This was previously the only option you could use in combination with Angrylion – it needs an LLE RSP core, it cannot work with a HLE RSP core.
* All the VI Filter tests are done with Thread Sync Low.

Test hardware: Desktop PC – AMD Ryzen 9 3950x, Windows 10 (16 cores, 32 HW threads)

RSP Mode VI Filter VI Filter – AA+Blur VI Filter – AA+DeDither VI Filter – AA Only Unfiltered – Thread Sync Low Unfiltered – Thread Sync Mid Unfiltered – Thread Sync High
Cxd4 174 VI/s 176 VI/s 175 VI/s 178 VI/s 180 VI/s 175 VI/s 92 VI/s
Parallel RSP 235 VI/s 238 VI/s 238 VI/s 240 VI/s 245 VI/s 235 VI/s 106 VI/s

Test hardware: Desktop PC – Intel Core i7 7700k @ 4.2GHz, Windows 10 (4 cores, 8 HW threads)

RSP Mode VI Filter VI Filter – AA+Blur VI Filter – AA+DeDither VI Filter – AA Only Unfiltered – Thread Sync Low Unfiltered – Thread Sync Mid Unfiltered – Thread Sync High
Cxd4 95 VI/s 96 VI/s 96 VI/s 99 VI/s 104 VI/s 96 VI/s 86 VI/s
Parallel RSP 139 VI/s 145 VI/s 144 VI/s 151 VI/s 170 VI/s 146 VI/s 121 VI/s

Test hardware: Desktop PC – Intel Core i5 3570k @ 4GHz, Windows 7 x64 (4 cores, 4 HW threads)

RSP Mode VI Filter VI Filter – AA+Blur VI Filter – AA+DeDither VI Filter – AA Only Unfiltered – Thread Sync Low Unfiltered – Thread Sync Mid Unfiltered – Thread Sync High
Cxd4 89 VI/s 92 VI/s 91 VI/s 95 VI/s 103 VI/s 103 VI/s 97 VI/s
Parallel RSP 118 VI/s 124 VI/s 121 VI/s 128 VI/s 145 VI/s 145 VI/s 133 VI/s

Test hardware: Laptop PC – Intel Core i5 3210M @ 2.50GHz, Ubuntu Linux 19.04 (2 cores, 4 HW threads)

This is a 2012 Lenovo G580 laptop.

RSP Mode VI Filter VI Filter – AA+Blur VI Filter – AA+DeDither VI Filter – AA Only Unfiltered – Thread Sync Low Unfiltered – Thread Sync Mid Unfiltered – Thread Sync High
Cxd4 39 VI/s 39 VI/s 42 VI/s 41 VI/s 43 VI/s 39 VI/s 39 VI/s
Parallel RSP 52 VI/s 55 VI/s 54 VI/s 58 VI/s 67 VI/s 66 VI/s 61 VI/s

We recommend for a configuration as low-end as this that you just keep VI Overlay to ‘Unfiltered’.

The majority of games on a configuration this low-end experience dips below full-speed, however, there are a fair few games right now which already run at fullspeed. This is by no means a definitive or exhaustive list but here’s the ones for which I can confirm run at fullspeed with very rare dips if at all:

Resident Evil 2, Mario Kart 64, 1080 Snowboarding, Doom 64, Quake 64, Mischief Makers, Bakuretsu Muteki Bangaioh, Bust-A Move 2 Arcade Edition, Kirby 64 – The Crystal Shards, Mortal Kombat Trilogy, Forsaken 64, Harvest Moon 64

Currently known issues

* Do not use ‘Sync to Exact Content Framerate’ for Angrylion + Parallel RSP. You won’t get good results.
* Recompiling blocks the first time can lead to a very slight stutter. Thankfully this only happens the first time, and never happens afterwards for the runtime duration.
* On Linux there might be dependency issues related to libtinfo5. Read the section ‘How to Get It’ where we try to explain how to solve this problem at least for Ubuntu Linux. NOTE: This situation might resolve itself in the future in case we move away from LLVM for the RSP part.

Parallel N64 Multithreaded Angrylion update

Here is a quick update on some new patches we have pushed to the Parallel N64 core –

1 – You can now get anywhere from a 6fps (conservative) to a 10fps or more performance boost with multithreaded Angrylion core by enabling a new option called ‘Send Audio Lists To RSP HLE’. Instead of sending audio lists to the low-level RSP plugin (cxd4), it will instead send these to the HLE (High-Level Emulated) RSP plugin instead. Note: If a game does not use the RSP for audio processing, you will not notice a speedup by enabling this. Nevertheless – many games benefit from this already.

NOTE: Indiana Jones and the Infernal Machine might have bad audio with this option enabled, my guess is that the MusyX HLE audio code is still not perfect or we need to have something backported still to make it so. Will look into that tomorrow.

2 – We followed the advice of ata8 (the original Angrylion RDP Plus plugin) and refactored some of the RDRAM code. As a result we are getting a very minor performance boost now on Linux. It’s still not anywhere near it should be compared to the Windows version but it is an improvement nonetheless –

Mario 64 – VI overlay on – 77fps (after) instead of 72fps (before)
Mario 64 – VI overlay off – 87fps (after) instead of 84fps (before)

Hope you enjoy these low-hanging fruit performance gains. Back to getting RetroArch 1.6.8 ready!

Parallel N64 with Multithreaded Angrylion released!

We originally intended to release this together with the new RetroArch version right before the end of this month. However, we want to take a few more days to ensure that the release of RetroArch 1.6.8 is solid and that we don’t rush it out of the gates in a premature state. We ask for your patience, it won’t take too long, a couple of days at most. In the meantime, we have the Parallel N64 core with multithreaded Angrylion ready to go!

This is a heavily modified version of ata4‘s Angrylion RDP Plus plugin. It has the following distinctive characteristics so far:

1 – Made a bunch of changes so that performance in Linux/Mingw is not as bad as it was previously (still worse than Windows though).
2 – Does not require OpenGL context 3.2, or OpenGL at all. It is purely a software renderer that can use any output video driver you want in your libretro frontend. So you can use this in conjunction with OpenGL, Direct3D, Vulkan, etc.

Credit goes to mudlord, Brad Parker and AIO for being able to get this done in such short notice. I helped out along the way too.

Available for

  • Linux
  • Windows
  • Android

Where to get it

1. Start RetroArch.
2. Go to Online Updater -> Update Cores.
3. Download ‘Nintendo 64 (Parallel N64)’ from the list.

How to use it

1. Start up the Parallel N64 core with any game.

2. Go to Quick Menu -> Options. Make sure that you set ‘GFX Plugin’ to ‘angrylion’ and ‘RSP Plugin’ to çxd4′. Restart RetroArch.

3. It should now use multithreaded Angrylion as the graphics plugin.

Performance

This scene serves as our benchmark test. Fullspeed framerate has been enabled.
This scene serves as our benchmark test. Fullspeed framerate has been enabled.

For the purpose of this performance test, I am running the game Super Mario 64.

The system on which the tests are being performed is a Core i7 7700k processor with 16GB of RAM running Windows 10 and Linux respectively.

Windows

CPU Core Angrylion version OS Performance (with VI Overlay on) Performance (with VI Overlay off)
Cached interpreter Windows 10 Old Angrylion 52fps 63fps
Dynarec Windows 10 Old Angrylion 52fps 64fps
Dynarec Windows 10 New Angrylion Multithreaded 114fps 123fps
Cached interpreter Windows 10 New Angrylion Multithreaded 106fps 118fps

Linux

CPU Core Angrylion version OS Performance (with VI Overlay on) Performance (with VI Overlay off)
Cached interpreter Linux Old Angrylion 53fps 63fps
Dynarec Linux Old Angrylion 55fps 65fps
Dynarec Linux New Angrylion Multithreaded 72fps 84fps
Cached interpreter Linux New Angrylion Multithreaded 69fps 82fps

macOS

Too slow to be worth bothering with, singlethreaded Angrylion actually turned out faster here. That is why the Mac version will still be using the old Angrylion version.

Videos

Conker’s Bad Fur Day

Banjo Tooie

Biohazard 2/Resident Evil 2

Killer Instinct Gold

Super Mario 64

Sources

https://github.com/libretro/parallel-n64

https://github.com/ata4/angrylion-rdp-plus/commits/master

Performance tips

Some core options have the potential to dramatically improve performance.

Quick Menu -> Options -> Framerate – You can set this to either ‘Original’ or ‘Fullspeed’. Original will attempt to run the game at its original framerate, while Fullspeed bumps it up to 60 V/Is. Note – if you find a game is running below fullspeed on your system, consider setting this to ‘Original’. I know that in Conker’s Bad Fur Day and Pilotwings 64, there is a big performance impact if you set it to ‘Fullspeed’.

Quick Menu -> Options -> VI Overlay – Disabling this can give you a 10 to 20fps speedup at the expense of the VI overlay’s filtering being lost, leading to a more pixelated but less blurry image. Also note that some games may not work properly with VI Overlay off right now, such as Resident Evil 2.

How to improve the graphics

In case you find the N64’s native resolution and blurry VI filter to be unpalatable, we want to bring your attention to various things you can do to improve your graphics.

In this video we will be showing you how to apply a so-called ‘Super VI Mode’ filter in order to improve the N64’s graphics.

Note – how these shaders will perform depends entirely on the power of your GPU. The configuration you see later in the video (nnedi-4x) requires a lot more GPU power than the former one (2x). Be mindful of this.

This video will teach you:
* How to load shader presets
* How to stack additional shader chains on top of existing shader presets
* How to configure shader parameters to adjust the screen.

We hope this video will tickle your curiosity so that you will try to hit upon even more fancy shader configurations! The sky is the limit with RetroArch and our common shaders library.

Cores progress report – Catering to high-end desktops – Dolphin libretro core and others now supports resolutions of 8K and up!

Soul Calibur 2 running on the Dolphin core. Internal resolution is 12K, which gets downsampled to a 4K desktop resolution through Nvidia DSR.
Soul Calibur 2 running on the Dolphin core. Internal resolution is 12K, which gets downsampled to a 4K desktop resolution through Nvidia DSR.
Here at RetroArch/libretro, we have always insisted on catering to both the low-end as well as the high end. To further this purpose, we always make design considerations from this perspective, that whatever we do shouldn’t be at the cost of worse performance on lower specced hardware that we still support.

Newer generation emulators are increasingly catering to the high end and almost demand it by virtue of them being based on much more recent videogame systems. While testing RetroArch and various libretro cores on our new high-end Windows desktop PC, we noticed that we could really take things up a few notches to see what we could get out of the hardware.

Dolphin

While working on the Dolphin libretro core some more, we stumbled upon the issue that internal resolution increases were still not working properly. So while fixing that in the latest core, we felt that the default scaled resolution choices that Dolphin provides (up to 8x native resolution) weren’t really putting any stress on our Windows development box (a Core i7 7700K equipped with a Titan XP).

So, in the process we added some additional resolution options so you can get up to 12K. The highest possible resolution right now is 19x (12160×10032).

As for performance results, even at the highest 19x resolution, the average framerate was still around 81fps, although there were some frame drops here and there and I found it to be generally more safe to dial the internal resolution down to a more conservative 12x or 15x instead). 12x resolution would be 8680×6336, which is still well over 8K resolution.

Note that the screenshots here are compressed and they are downscaled to 4K resolution, which is my desktop resolution. This desktop resolution in turn is an Nvidia DSR custom resolution, so it effectively is a 4K resolution downsampled to my 1080p monitor. From that, I am running RetroArch with the Dolphin core. With RetroArch, downscaling is pretty much implicit and works on the fly, so through setting the internal resolution of the EFB framebuffer, I can go beyond 4K (unlike most games which just query the available desktop resolutions).





We ran some performance tests on Soul Calibur 2 with an uncapped framerate. Test box is a Core i7 7700k with 16GB of DDR4 3000MHz RAM, and an Nvidia Titan XP video card. We start out with the base 8x (slightly above 4K Ultra HD) resolution which is the highest integer scaled resolution that Dolphin usually supports. If you want to go beyond that on regular Dolphin, you have to input a custom resolution. Instead, we made the native resolution scales go all the way up to 19x.

On the Nvidia Control panel, nearly everything is maxed out – 8x anti-aliasing, MFAA, 16x Anisotropic filtering, FXAA, etc.

Resolution Performance (with OpenGL) Performance (with Vulkan)
8x (5120×4224) [for 5K] 166fps 192fps
9x (5760×4752) 165fps 192fps
10x (6400×5280) 164fps 196fps
11x (7040×5808) 163fps 197fps
12x (7680×6336) [for 8K] 161fps 193fps
13x (8320×6864) 155fps 193fps
14x (8960×7392) 152fps 193fps
15x (9600×7920) [for 9K] 139fps 193fps
16x (10240×8448) [for 10K] 126fps 172fps
17x (10880×8976) 115fps 152fps
18x (11520×9504) [for 12K] 102fps 137fps
19x (12160×10032) 93.4fps 123fps

OpenLara

OpenLara running at over 16K
OpenLara running at over 16K

The OpenLara core was previously capped at 1440p (2560×1440). We have added available resolutions now of up to 16K.

Resolution Performance
2560×1440 [for 1440p/2K] 642fps
3840×2160 [for 4K] 551fps
7680×4320 [for 8K] 407fps
15360×8640 [for 16K] 191fps
16000×9000 176fps

Craft

Craft core running at over 16K
Craft core running at over 16K

Previously, the Craft core supported only up to 1440p. Now it supports up to 16K and slightly higher.

For the Craft core, we are setting the ‘draw distance’ to 32, which is the highest available draw distance available to this core. With the draw distance set this far back, you can even see some pop-in right now (terrain that is not yet rendered and will only be rendered/shown when the viewer is closer in proximity to it).

Resolution Performance
2560×1600 [for 1440p/2K] 720fps
3840×2160 [for 4K] 646fps
7680×4320 [for 8K] 441fps
15360×8640 [for 16K] 190fps
16000×9000 168fps

Parallel N64 – Angrylion software renderer

This scene serves as our benchmark test for both the software Angrylion renderer as well as the Vulkan-based Parallel renderer.
This scene serves as our benchmark test for both the software Angrylion renderer as well as the Vulkan-based Parallel renderer.

So accurate software-based emulation of the N64 has remained an elusive pipe dream for decades. However, it seems things are finally changing now on high-end hardware.

This test was conducted on an Intel i7 7700K running at Boost Mode (4.80GHz). We are using both the OpenGL video driver and the Vulkan video driver for this test, and we are running the game Super Mario 64. The exact spot we are testing at it is at the Princess Peach castle courtyard.

Super Mario 64

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 73fps 75fps
Angrylion [with VI filter] 61fps 63fps

Quake 64

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 81fps 82.5fps
Angrylion [with VI filter] 68fps 72fps

Killer Instinct Gold

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 57.9fps 58.7fps
Angrylion [with VI filter] 54.6fps 55fps

GoldenEye 007

Tested at the Dam level – beginning

Description Performance (with OpenGL) Performance (with Vulkan)
Angrylion [no VI filter] 54.9fps 43.8fps
Angrylion [with VI filter] 45.6fps 40.9fps

Note that we are using the cxd4 RSP interpreter which, despite the SSE optimizations, would still be pretty slow compared to any RSP dynarec, so these results are impressive to say the least. There are games which dip more than this – for instance, Killer Instinct Gold can run at 48fps on the logo title screen, but on average, if you turn off VI filtering, most games should run at fullspeed with this configuration.

In case you didn’t notice already, Vulkan doesn’t really benefit us much when we do plain software rendering. We are talking maybe a conservative 3fps increase with VI filtering, and about 2fps or maybe even a bit less with VI turned off. Not much to brag about but it could help in case you barely get 60fps and you need a 2+ fps dip to avoid v-sync stutters.

Oddly enough, the sole exception to this is GoldenEye 007, where the tables are actually turned, and OpenGL actually leaps ahead of Vulkan quite significantly, conservatively by about 5fps with VI filter applied, and even higher with no VI filter. I tested this many times over to see if there was maybe a slight discrepancy going on, but I got the exact same results each and every time.

Parallel N64 – Parallel Vulkan renderer

Quake 64 on Parallel N64 - tested with both Angrylion and Parallel
Quake 64 on Parallel N64 – tested with both Angrylion and Parallel

So we have seen how software-based LLE RDP rendering runs. This puts all the workload on the CPU. So what if we reverse the situation and put it all on the GPU instead? That is essentially the promise of the Parallel Vulkan renderer. So let’s run the same tests on it.

This test was conducted on an Intel i7 7700K running at Boost Mode (4.80GHz). We are using the Vulkan video driver for this test, and we are running the game Super Mario 64. The exact spot we are testing at it is at the Princess Peach castle courtyard.

Super Mario 64

Description Performance
With synchronous RDP 192fps
Without synchronous RDP 222fps

Quake 64

Description Performance
With synchronous RDP 180fps
Without synchronous RDP 220fps

Killer Instinct Gold

Description Performance
With synchronous RDP 174fps
Without synchronous RDP 214fps

GoldenEye 007

Tested at the Dam level – beginning

Description Performance
With synchronous RDP 88fps
Without synchronous RDP 118fps

As you can see, performance nearly doubles when going from Angrylion to Parallel renderer with synchronous RDP enabled, and beyond with it disabled. Do note that asynchronous RDP is regarded as a hack and it can result in many framebuffer oriented glitches among other things, so it’s best to run with synchronous RDP for best results.

We are certain that by using the LLVM RSP dynarec, the performance difference between Angrylion and Parallel would widen even further. Even though there are still a few glitches and omissions in the Parallel renderer compared to Angrylion, it’s clear that there is a lot of promise to this approach of putting the RDP on the GPU.

Conclusion: It’s quite clear that even on a quad-core 4.8GHz i7 CPU, the CPU ‘nearly’ manages to run most games with Angrylion [software] at fullspeed but it doesn’t leave you with a lot of headroom really. Moving it to the GPU [through Parallel RDP] results in a doubling of performance with the conservative synchronous option enabled and even more if you decide to go with asynchronous mode (buggier but faster).

Beetle PSX

Previously, Beetle PSX would only provide internal resolution increases up to 8 times the original resolution. We have now extended this to 32 x for software and Vulkan, and 16x for OpenGL.

The results are surprising – while the Vulkan renderer is far more mature than the OpenGL renderer and implements the mask bit unlike the GL renderer (along with some other missing bits in the current GL renderer), the GL renderer leaps ahead in terms of performance at nearly every resolution.

Crash Bandicoot

Crash Bandicoot running at over 10K. Note this is being downsampled to 4K.
Crash Bandicoot running at over 10K. Note this is being downsampled to 4K.

Crash Bandicoot is a game that ran at a resolution of 512×240.

Resolution Performance (with OpenGL) [with PGXP] Performance (with OpenGL) [w/o PGXP] Performance (with Vulkan) [with PGXP] Performance (with Vulkan) [w/o PGXP] Performance (software OpenGL) Performance (software Vulkan)
8192×3840 [16x] [for 5K] 188.8fps 266fps 217fps 239fps 4.4fps 5.3fps
4096×1920 [8x] [for 2K] 216fps 296fps 218fps 240fps 16fps 17.5fps
2048×960 [4x] 215fps 296fps 216fps 239fps 52fps 57.9fps
1024×480 [2x] 216fps 296fps 216fps 239fps 138fps 145fps

Tekken 3

Tekken 3 running at over 10K, being downsampled to 4K.
Tekken 3 running at over 10K, being downsampled to 4K.

Tekken 3 is a game that ran at a resolution of 368×480.

Resolution Performance (with OpenGL) [with PGXP] Performance (with OpenGL) [w/o PGXP] Performance (with Vulkan) [with PGXP] Performance (with Vulkan) [w/o PGXP] Performance (software OpenGL) Performance (software Vulkan)
11776×15360 [32x] [for 12K] N/A N/A 127fps 127.4fps N/A N/A
5888×7680 [16x] [for 4K] 188.5fps 266fps 184.4fps 211fps 4.4fps 6.6fps
2944×3840 [8x] [for 2K] 186.5fps 208fps 183.5fps 269fps 22fps 25.2fps
1472×1920 [4x] 184.5fps 270fps 230.5fps 210fps 52fps 59.4fps
1024×480 [2x] 232fps 271fps 185.5fps 210fps 129fps 137fps

Reicast

Dead or Alive 2 running at over 12K resolution on Reicast
Dead or Alive 2 running at over 12K resolution on Reicast

Daytona USA 2001 running at over 12K resolution on Reicast
Daytona USA 2001 running at over 12K resolution on Reicast

Sonic Adventure running at over 12K resolution on Reicast
Sonic Adventure running at over 12K resolution on Reicast

Dead or Alive 2

Description Performance
4480×3360 206fps
5120×3840 206fps
5760×4320 206fps
6400×4800 204fps
7040×5280 206fps
7680×5760 206fps
8320×6240 204fps
8960×6720 204fps
9600×7200 207fps
10240×7680 206fps
10880×8160 207fps
11520×8640 207fps
12160×9120 194fps
12800×9600 193fps

As you can see, it isn’t until we reach 12160×9120 that Reicast’s performance finally lets up from an almost consistent 206/207fps to a somewhat lower value. Do note that this was testing the same environment. When alpha effects and RTT (Render to Texture) effects are being applied onscreen, there may well be dips on the higher than 8K resolutions whereas 8K and below would be able to handle it with relative ease.

Mupen64plus – GlideN64 OpenGL renderer

Super Mario 64 running at 8K resolution with Gliden64.
Super Mario 64 running at 8K resolution with Gliden64.

This core uses Mupen64plus as the core emulator plus the GlideN64 OpenGL renderer.

Super Mario 64

Description Performance
3840×2880 – no MSAA 617fps
3840×2880 – 2x/4x MSAA 181fps
4160×3120 – no MSAA 568fps
4160×3120 – 2x/4x MSAA 112fps
4480×3360 – no MSAA 538fps
4480×3360 – 2x/4x MSAA 103fps
4800×3600 – no MSAA 524fps
4800×3600 – 2x/4x MSAA 94fps
5120×3840 – no MSAA 486fps
5120×3840 – 2x/4x MSAA 82fps
5440×4080 – no MSAA 199fps
5440×4080 – 2x/4x MSAA 80fps
5760×4320 – no MSAA 194fs
5760×4320 – 2x/4x MSAA 74fps
6080×4560 – no MSAA 190fps
6080×4560 – 2x/4x MSAA 68fps
6400×4800 – no MSAA 186fps
6400×4800 – 2x/4x MSAA 61.3fps
7680×4320 – no MSAA 183fps
7680×4320 – 2x/4x MSAA 39.4fps

GoldenEye 007

Tested at the Dam level – beginning

Description Performance
3840×2880 – no MSAA 406fps
3840×2880 – 2x/4x MSAA 100fps
4160×3120 – no MSAA 397fps
4160×3120 – 2x/4x MSAA 65fps
4480×3360 – no MSAA 375fps
4480×3360 – 2x/4x MSAA 60fps
4800×3600 – no MSAA 342fps
4800×3600 – 2x/4x MSAA 54fps
5120×3840 – no MSAA 310fps
5120×3840 – 2x/4x MSAA 51fps
5440×4080 – no MSAA 70fps
5440×4080 – 2x/4x MSAA 46fps
5760×4320 – no MSAA 78.9fs
5760×4320 – 2x/4x MSAA 42fps
6080×4560 – no MSAA 86fps
6080×4560 – 2x/4x MSAA 37fps
6400×4800 – no MSAA 79fps
6400×4800 – 2x/4x MSAA 27fps
7680×4320 – no MSAA 79fps
7680×4320 – 2x/4x MSAA 33.2fps

Preface: Immediately after going beyond 3840×2880 (the slightly-higher than 4K resolution), we notice that turning on MSAA results in several black solid colored strips being rendered where there should be textures and geometry. Again, we notice that enabling MSAA takes a huge performance hit. It doesn’t matter either if you apply 2 or 4 samples, it is uniformly slow. We also notice several rendering bottlenecks in throughput – as soon as we move from 5120×3840 to 5440×4080 (a relatively minor bump), we go from 310fps to suddenly 70fps – a huge dropoff point. Suffice to say, while you can play with Reicast (Dreamcast emulator) and Dolphin (Gamecube/Wii) at 8K without effort and even have enough headroom to go all the way to 12K, don’t try this anytime soon with Gliden64.

We suspect there are several huge bottlenecks in this renderer that prevent it from reaching higher performance, especially since people on 1060s have also complained about less than stellar performance. That being said, there are certain advantages to Gliden64 vs. Glide64, it emulates certain FBO effects which GLide64 doesn’t. It also is less accurate than Glide64 in other areas, so you have to pick your poison on a per-game basis.

We still believe that the future of N64 emulation relies more on accurate renderers like Parallel RDP which are not riddled with per-game hacks vs. the traditional HLE RDP approach as seen in Gliden64 and Glide64. Nevertheless, people love their internal resolution upscaling, so there will always exist a builtin audience for these renderers, and it’s always nice to be able to have choices.