Mednafen/Beetle PSX – PGXP arrives!

Mednafen/Beetle PSX has made another significant stride forward! iCatButler has contributed a working backport of PGXP for Mednafen/Beetle PSX.

PlayStation rasterization issues

Several issues can be noticed in most PlayStation games’ graphics.

Wobbly polygons (lack of subpixel precision)

This can often be noticed on character models. In addition to the models looking very glitchy/wobbly, you can also often notice that their face textures look deformed. If you apply very granular movement to the character, you can often notice a character model’s head deforming based on its distance from the camera. For examples of this, try to find a scene in Final Fantasy VIII/Resident Evil 1 where the characters are very close to the camera. With no subpixel precision, you would see a continuously changing and deformed face texture on the main characters.

Dancing/warping textures (lack of perspective correct texturing)

In many games, you will see the texture maps on the ground dancing/warping as you move around the environment. See the video above for an example of what we are talking about.

In addition to this, other games (like Ridge Racer Revolution, or Tomb Raider 2) can also sometimes show random color outlines around texture maps while the camera is moving.

All of the aforementioned issues can be traced back to the lack of perspective correct texturing.

This, combined with the lack of subpixel precision, can result in a very glitchy look in many a PlayStation game. Most PlayStation-to-PC ports didn’t put the extra effort in to fix these issues either. For instance, the aforementioned issues were fixed in Resident Evil 1 PC but then by the time we get to Resident Evil 3: Nemesis for PC, we can spot wobbling polygons again on the character models. A similar thing happened with the Final Fantasy series. The issues were fixed for Final Fantasy VII but Final Fantasy VIII’s PC port featured the same subpixel precision issues as the PSX version. Lack of time/effort and optimizing specifically around the PSX’s limitations are probably a reason for developers taking shortcuts even on far more capable hardware.

Geometry Transfer engine and integer precision math

The PlayStation’s MIPS R3000A processor was a very barebones MIPS CPU, very outdated even by 1994 standards. What made it special at the time was the coprocessor attached to it which made it very adequate at 3D rendering: the Geometry Transfer Engine. It’s a coprocessor (attached to COP2) which has fixed-function functionality allowing the programmer to do perspective transforms, rotations, light sourcing, depth cuing, etc. Being able to do this at nearly zero cost was the envy of many a PC around that time period. Without this coprocessor, the paltry 33MHz main CPU would have been unable to render most of the graphics at full speed performance.

The PSX had no floating point unit installed as COP1 (coprocessor 1), and neither did the GTE have any float support. It deals mainly in fixed point math, and because of the rotate and translation transformations involved that occur on the GTE side, precision errors will build up and this will result in the ‘wobbling’ effect you can see manifested onscreen. At the very end when the GTE has done its processing, it will output the results of the perspective transform to the GPU as integer pixel coordinates. The GPU then has to draw the scene. Polygons will snap into place until one of the vertices moves enough to snap into a different pixel.

The N64 had a similar configuration to the PSX, except that the equivalent to GTE was called RSP there – also installed as COP2 (coprocessor 2). Unlike the PSX, though, the N64 also had an FPU installed as COP1 (coprocessor 1). Like the GTE, RSP deals mainly with integer fixed-point math, but unlike the GTE, custom microcode can be uploaded to the RSP, making it more flexible and capable of performing custom tasks programmed by the game developer specifically for the game. In certain games you can see similar wobbling issues (F-Zero X’ vehicles is a good example), but it’s far less severe.

Furthermore, what made the difference in terms of the N64 having more stable rendering compared to the PSX is the lack of perspective correct texturing, which the N64 does have. Calculating the coordinates so that a texture would look correct from any angle would be computationally expensive, costing maybe 2 divisions per sample. By only applying perspective-correct sample coordinates at certain intervals, rendering could be done much quicker on the PSX, but at the cost of the ‘warping/dancing’ polygons that you can see in so many PSX games. To combat these issues, you’d see developers like Psygnosis using many tricks in games like Wipeout to ‘mask’ these issues (for instance by subdividing textures into many parts).

The solution: PGXP

PGXP attempts to kill two birds with one stone. First, it introduces subpixel precision to get rid of the wobbling polygon issues. Second, it adds perspective correct texturing to stop the ‘textue warping/dancing’ issues.

iCatButler first started integrating PGXP into the emulator PCSX-R (by injecting it into Pete’s OGL2 plugin, a closed-source plugin). It seems this attempt has been successful and was the first one out of the starting gates, but there are some geometry issues which seems partly can be attributed to the ageing fixed-function Pete’s OGL2 renderer.

iCatButler now has backported this functionality to Mednafen/Beetle PSX’s GL renderer. According to iCatButler, there are less geometry issues with Mednafen/Beetle PSX’s GL renderer. See this screenshot for instance –

A comparison between PGXP with PCSXR vs. Mednafen/Beetle PSX
A comparison between PGXP with PCSXR vs. Mednafen/Beetle PSX

Example video

This video shows a before/after example of PGXP + perspective correct texturing with the game Tomb Raider.

Other examples

How to use it?

You need to use the latest version of Mednafen/Beetle PSX HW. In case you don’t already have it or you are not on the latest version, you can get it from RetroArch’s Online Updater.

You need to be using the OpenGL renderer in order to use PGXP. To make sure of this, go to Quick Menu -> Options and make sure it says ‘opengl’ at ‘Renderer (restart)’.

To enable PGXP, you set ‘PGXP operation mode’ to a value other than ‘OFF’.

‘memory’ – the default enabled mode. This is a less CPU-intensive version of PGXP, and it also tends to be less buggy than the other one. Try this by default unless you want to experiment.
‘memory + CPU’ – the secondary enabled mode. This is more CPU intensive and can result in geometry glitches. However, it has been demonstrated that it can help reduce wobblyness in polygons even more for certain games (such as Resident Evil 3: Nemesis).

There are two other PGXP options which you might want to experiment with:

‘PGXP vertex cache’ – maintains a cache for all the vertices. This might result in better performance but we recommend you leave it off for the majority of games since it can result in graphics glitches.

‘PGXP perspective correct texturing’ – This enables or disables ‘perspective correct textures’. For most games you definitely want this enabled, otherwise enabling PGXP will only get you subpixel precision.

Performance tricks

The GL renderer is still in a very inoptimal state overall. In order to increase performance, you can turn V-Sync off. To do this, go to Settings -> Video and turn off V-Sync.

Other things that can help is reducing the internal resolution and/or setting Texture filtering to ‘nearest’. ‘3point N64’ and ‘bilinear’ can result in more visual glitches and definitely has a negative impact on performance.

What’s next for Mednafen/Beetle PSX?

The GL renderer is definitely not as optimal as it could be, and we are trying to find solutions for making it much faster. Stay tuned!

RetroArch Web Player

An Emscripten port of RetroArch has existed for years, but until recently, we never had a good opportunity to launch it in a state we felt comfortable with. Well, until now that is.

Web Player

So what is RetroArch Web Player? It’s a port of RetroArch that runs inside your web browser, powered by emscripten and asm.js. Most modern browsers available today should be compatible. That being said, we strongly recommend you use Google Chrome right now for smooth v-synced gameplay with no audio crackling.

You can check it out right here!

Dropbox support may not work in this embedded-player we added to this post, we haven’t enabled SSL on our main site and Dropbox doesn’t allow any sort of wildcard or regex on web apps.

You can also click on ‘Web Player’ in the top-right corner of the website and in our page in order to use it.

Continue reading “RetroArch Web Player”

paraLLEl RDP and RSP updates (September 2016)

Unfortunately, I haven’t had much time to work on paraLLEl lately, but there is plenty to update about.

paraLLEl RSP – Clang/LLVM RSP recompiler experiment

Looking at CPU profiles, paraLLEl RDP could never really shine, as it was being held back by the CXD4 RSP interpreter, so groundbreaking speedups could not be achieved. With paraLLEl RDP, the RSP was consuming well over 50% CPU time. This was known from the beginning before RDP work even started. After the first RDP pre-alpha release, focus shifted to RSP performance, and that’s what I’ve spent most time on. None of my machines are super-clocked modern i7s, which have been required to run N64 LLE at good speed.

Micro-optimizing the interpreter is a waste of time, I needed a dynarec. However, I have never written a dynarec or JITer for that matter before, and I was not going to spend months (years?) learning how to JIT code well for ~4 architectures (x86, x64, ARMv7, ARMv8). Instead, using libclang/libllvm as my codegen proved to be an interesting hack that worked surprisingly well in practice for this project.

The RSP has some characteristics which made it easier to design a JITer for than any normal CPU:

  • Separate instruction and data memory (4K each)
  • Only way to modify instruction memory is through explicit DMA instructions
  • Main CPU can poke into IMEM over MMIO, but it’s trivial to check for changes in IMEM on RSP entry
  • Fixed instruction length (MIPS)
  • No exceptions, IRQ handling, MMU or any annoying stuff
  • No complex state handling, just complex arithmetic in the vector co-processor

However, the RSP has one complicating issue, and that is micro-code. RSP IMEM will change rapidly as micro-code for graphics, UI, audio is shuffled in and out of the core, so the dynarec must deal with constantly changing IMEM.

Reusing CEN64/CXD4 RSP

The goal was to make it fast, not writing everything from scratch, so reusing CEN64’s excellent vector unit implementation made implementing COP2 a breeze. Various glue code like COP0/LWC/SWC was pulled from CXD4 as it was easier to reuse considering it was already in Mupen. Several bugs were found and fixed in CXD4’s COP2 implementation while trying to match the interpreter and dynarec implementations, which is always a plus.

Basic codegen approach

When we want to start executing at a given PC, we see if the code following that PC has been seen before. If not, generate equivalent C code, compile it to LLVM IR with libclang, then use LLVM MCJIT to generate optimized executable code. To avoid having to recompile long implementations of vector instructions all the time, external functions can be used in the C code, and LLVM can be given a symbol table which resolves all symbols on-the-fly.

Obviously the JIT time is far longer than a hand-written JITer would be, so reducing recompiles to the bare minimum is critical. RSP IMEM is small enough that we could cache all generated C code and its generated code on disk if this become annoying enough.

Debuggable JIT output

Compile the binary with -rdynamic and instead of going through libclang/llvm, dump C code to disk, compile through system() and load as a dynamic library, and step through in GDB. The -rdynamic is important so that the .so can link automatically to COP0/COP2 calls.

Difficult control flow? Longjump!

While we need performance, we don’t need to go to extremes. Whenever the code-gen hits a particularly complicated case to handle, we can cop-out by longjumping up our stack and re-entry from where our PC would be. A good case of this is the MIPS branch delay slot, which is one of the single most annoying features to implement in a MIPS dynarec. The common is easy to implement, but branching in a branch delay slot? Classic ouch scenario. Last instruction in a block sets up branch delay slot and first instruction in next block needs to resolve it? Ouch. That first instruction can also set up a branch delay slot, and so it goes …

If IMEM has been invalidated due to DMA, we can similarly longjump out and re-check IMEM, similar for the BREAK instruction.

Return stack prediction

JAL and JALR calls assume that their linked address will be returned to in a stack like fashion. On JAL/JALR, block entry is called recursively in the hope we can return back to it. To potentially avoid having to deal with indirect jump [jr $r31], jr will check the return stack and simply return if it matches an earlier JAL/JALR.

Async JIT compiles?

To reduce stalls, we could kick JIT compiles off to a thread and interpret as a fallback.

Failed attempt #1 – Hashing entire IMEM and recompile it

This failed badly as even though micro-code is very static, IMEM will contain garbage data that is never executed. No compiled block was ever reused.

Failed attempt #2 – Hashing fixed block size from PC

The JIT lookahead was set to 64 instructions (tiny for a regular dynarec, but IMEM is already tiny …). This doesn’t really help. Blocks close to garbage regions would trigger 2-3 recompiles every frame, which killed performance.

Successful attempt #3? – Analyzing logical end of block before hash

The idea of #2 was okay, but the real fix was to pre-analyze the block and find where the block would logically have to end, then hash and compile the estimated range. I haven’t tested every game obviously, but it seems very promising. No recompiles have happened after a block is first seen.

The obvious difficult RSP LLE games seem to work just fine along with the Angrylion software renderer. With Angrylion and paraLLEl RSP, the RDP eats up 70% of the profile, and RSP is barely anywhere to be seen, ~0.5% here and there from the expected heavy-hitters like VMADN, hashing and validating IMEM and so on.


Lots of games which used to dip down to ~35/40 FPS now ran at full-speed with paraLLEl RDP async/RSP combo, which was very pleasing.


LLVM RSP is really a proof-of-concept. Codegen should ideally be moved to a leaner JIT system, Tarogen by Daeken seems like a good way forward.

RDP bug-fixing

After I was happy with the RSP, it was time to squash low-hanging rendering bugs.

Paper Mario Glitches

The copy pipe in RDP works in strides of 64-bit, and the rasterizer rasterizes at this granularity. Lots of sprite based games seem to use this behavior.

Copy pipe glitches

The fix was to mask the X coordinate in varying stage so that rasterization tests would happen on 64-bit boundaries, as if RDP rasterization isn’t painful enough as is …

Pilotwings shadows

Pilotwings broken shadows

Why did a classic HLE bug show up here? Well, this is caused by a clever hack in Pilotwings where the shadows are masked out by framebuffer aliasing!

First, the color buffer pointer and depth buffer pointers are assigned to the same location in memory. Depth test is turned on, but depth update is off … But, color writes to depth, so this is a problem. 5/5/5/3 16-bit color data now needs to alias per-pixel with a 3.11/4 depth buffer, and the fix was to implement a special path for the aliasing scenario where depth would be decoded after every color write. Pilotwings did stencil shadows without stencil, clever.

Fortunately, since it’s implemented with compute, this was trivial to implement once the problem was understood.

Interestingly enough, the UI bug in Pilotwings was also solved by this.

“Fixing” async mode

ParaLLEl framebuffer handling code is fairly incomplete (it’s really hard x___x), and async mode was causing several lockups, even in games which did not use the framebuffer for effects. The problem was that async framebuffer readbacks came in too late, and the game had already decided to reuse the existing memory for non-graphics data. Overwriting that data broke everything obviously.

The temporary fix is to maintain a shadow RDRAM buffer in async mode, separate from the regular RDRAM. At least stuff doesn’t crash anymore. The proper fix will be a unified model between full sync and async modes, but this is arguably the hardest part of writing any GPU accelerated plugin for these whacky graphics chips.

Async mode is how to unlock large performance gains. Can’t complain about 120+ FPS on Mario 64 on my toaster rig, used to be ~30 FPS with Angrylion/CXD4.

Corrupt textures in GoldenEye (and possibly other Rare games)

There have been many bugs in the palette part of TMEM emulation, and as expected, this was also a case of this. GoldenEye used TL parameter in load_tlut to do weird offsets from the base texture pointer. This was unimplemented before, so stepping through Angrylion line by line helped figure out how this offset should be implemented.

Weird looking wall textures in Turok


The problem here was RDP’s interesting “detail LOD” feature. This, along with LOD sharpen was unimplemented, and implementing that fixed the issue.

Mario Tennis / Mario Golf weird UI blending bugs

Mario Tennis/Golf are really hard games to emulate and it’s still pretty broken, but some UI bugs were bugging me.

Blending in RDP is very complex and a minefield for rendering bugs, of course there had to be yet another way to do alpha tested sprites.

Instead of alpha blending, or alpha testing directly like any sensible game would do, Camelot decided to use alpha-to-coverage, then color on coverage as a mux for the … blending mux? Coverage overflow would happen when alpha was non-zero, basically a bizarre way to do alpha testing. Funny enough, this was implemented correctly already, but a cute little underflow in coverage update was actually causing the bug. The blender passed its tests all the way to coverage update with a coverage of 0, who would have thought that was possible! That path could only trigger on the very specific render state bits that was set. One liner fix and two whacky UI bugs were gone.

How to debug this

To drill down issues, first, I dump RDP traces from either paraLLEl RDP or Angrylion. The trace records all RDP commands and updates to RDRAM.

In the offline tool, I can replay the trace and dump all frames. Once I’ve zoomed in on the interesting frame, I trace that frame, primitive by primitive. The end result is a series of images for that frame. I can then replay the frame, and break on the exact primitive I want to debug.


This concludes the first paraLLEl update. Still lots of issues to sort out, framebuffer management and full VI emulation the biggest targets to shoot for.

RetroArch 1.3.6+ beta release for PlayStation3!

The PlayStation 3 port is back after it was decommissioned for a long time. Consider this a beta version in anticipation of the upcoming 1.3.7 version which will be further fleshed out.

Also check out our concurrent release for the PS Vita:

RetroArch 1.3.6+ beta released for PS Vita (HENkaku-ready)!

Thanks to PSGL, the PlayStation3 driver can use the XMB menu driver using the OpenGL rendering backend. The simplified ribbon should be running properly in the background too.
Thanks to PSGL, the PlayStation3 driver can use the XMB menu driver using the OpenGL rendering backend. The simplified ribbon should be running properly in the background too.

Where to get it

Ezi0 graciously provided these two binaries for us.

CEX version: download here.

DEX version: download here.

What doesn’t work yet

This version can be considered a beta release. Here are the current issues:

  • You cannot scan for content as of right now. Instead, for now you should just load content directly from the filesystem.
  • To be able to use zipped ROMs on emulators like SNES9x and other similar emulators, always use ‘Open Archive As Folder’, then select the ROM you want to use. Don’t use ‘Load Archive With Core’ which won’t work for now.
  • If you go to ‘Information’ -> ‘Core Information’, it currently doesn’t show anything. Not a big deal for now but something we will want to fix later on regardless.
  • None of the ‘downloading’ features right now will work in the PS3 port. Our networking stack code for PS3 apparently requires some customizations still. If there are any PS3 devs who can help with this, by all means.

The PS3 version now uses the XMB menu driver, a big step-up from the previous versions’ RGUI menu driver. The font driver we are currently using for PS3 is the default bitmap font, so it doesn’t look as good as it could be, but we are going to be moving over to more fancy font rendering shortly, possibly using stb_font or something similar.

A sneak peek at 1.3.7 features

Since this is a current nightly snapshot release, PS3 and Vita users are able to get a sneak peek at some of the features that will be part of RetroArch 1.3.7 for the other platforms.

  • Improved error handling. When loading the wrong ROM into a core, most cores should be able to now gracefully exit instead of just quitting or crashing RetroArch.
  • Much more complete info message system. Press ‘Select’ on any entry and 99% of the time it should show you a handy help message explaining to you what each setting does. The ‘English’ language setting is currently the one that is most complete, for all other languages we need to wait until translators have finished adding all the help messages in their own language.

RetroArch 1.3.6+ beta release for PS Vita HENkaku!

RetroArch appearing on the PS Vita Live Area homepage.  Screenshot was taken on a PS TV.
RetroArch appearing on the PS Vita Live Area homepage. Screenshot was taken on a PS TV.

Today we are releasing a beta version of RetroArch 1.3.6+ (latest snapshot, release candidate for 1.3.7) for the Playstation3 and PS Vita. Be sure to thank frangarcj for the latter since he went through the trouble of making sure we could make the jump from Rejuvenate to HENKaku in swift order.

Where to get it

You can get the PS Vita/PS TV version here:

Grab the latest archive.

How to install

1. Extract the contents of this 7zip archive to a folder somewhere on your PC. This will extract a bunch of vpk files to your harddrive.
2. On the PS Vita/PS TV, make sure the HENkaku exploit has already been installed. Go to the bubble ‘molecularShell’ and start it. Once inside the filebrowser, press ‘Select’ to start the FTP server. Write down the FTP server address you see here.
3. Go back to your desktop PC, start up an FTP client, and input the IP address and port that was displayed on your PS Vita/PS TV. Transfer the vpk files to some place on either ur0: (internal storage) or ux0: (this being your Memory Card).
4. On the PS Vita/PS TV, press circle to go back. Then once back inside the filebrowser, go to the directory where you extracted the vpk files. Install the cores you want.
5. Exit the ‘molecularShell’ program. You should now be back inside the home screen. From here,
RetroArch bubbles should start appearing in the menu. You can now use RetroArch.

How to install ROMs

1. The same way you installed RetroArch. Start ‘molecularShell’, go through the previous section’s steps 2 and 3 again, but this time transfer roms over instead. Then load them from RetroArch.

NXEngine/Cave Story running on RetroArch Vita.
NXEngine/Cave Story running on RetroArch Vita.

State of the port

What works

* (Core-related) CatSFC / SNES. Works fine. Should run at fullspeed for most games (dip to 42fps on Yoshi’s Island intro screen (SuperFX2 game), seems to be fullspeed otherwise).
* (Core-related) FB Alpha CPS1. Works fine. Fullspeed. Runs all games. Use an older FBA romset.
* (Core-related) FB Alpha CPS2. Works fine. Fullspeed. Runs all the big CPS2 ROMs. Use an older FBA romset.
* (Core-related) FB Alpha Neo Geo. Works fine. Fullspeed. Not all big ROMs will work right now though. KOF 96 could be loaded (about 23/24MB). Might have to play around with heap to be able to get bigger ROMs to load. For now expect same size limitations as the Wii port. Hopefully we can get past this soon.
* (Core-related) FCEUmm. Works fine. Fullspeed.
* (Core-related) Genesis Plus GX. Works fine. Fullspeed.
* (Core-related) Mednafen Neo Geo Pocket Color. Works fine. Framerate at around 52/53fps. There might be a new Neo Geo Pocket Color core coming soon (not Mednafen NGP) which would run at fullspeed with no problems.
* (Core-related) Handy / Lynx. Works fine. Fullspeed.
* (Core-related) Mednafen Wonderswan. Works fine. Fullspeed.
* (Core-related) Mednafen Virtual Boy. Works. Too slow (around 26fps). Corrupted pitch (likely due to 32bit color). Speedups/idle loop optimization hacks MIGHT bring this fullspeed later on.
* (Core-related) NXEngine / Cave Story. Works fine. Fullspeed.

What doesn’t work

* ROMS have to be currently unzipped now (EXCEPT for the FB Alpha ROMs).
* No core switching yet from inside RetroArch. For now, each core is a standalone program.
* Add Content -> Download Content currently doesn’t work. If you do try it, you might have to restart.
* History list doesn’t work yet.
* Save states can be saved, but cannot be loaded yet. We need to figure out why.
* Threading needs to be still implemented.
* (Core-related) 2048 core. ‘Start content’ does not work yet like it should. Wait until we fix this.
* (core-related) Prboom/Doom core. Crashes after loading a Doom WAD. Wait until we fix this.
* (Core-related) Picodrive core. Crashes after loading a ROM. Wait until we fix this.
* (Core-related) QuickNES core. Doesn’t load a ROM. Wait until we fix this.
* (Core-related) Gambatte core. Doesn’t load a ROM. Wait until we fix this.
* (Core-related) SNES9x Next core. Doesn’t load a ROM. Wait until we fix this. Will likely be too slow compared to CatSFC anyway, so not sure if worth it for Vita.
* (Core-related) DOSBox core. Haven’t tried this yet. Likely too slow to be worthwhile.

Future plans

* More cores, fix remaining cores that are broken.
* Try to get Cg runtime working so we can have Cg shaders running.
* Try to get multiple gamepads working on PS TV.
* More.

A sneak peek at 1.3.7 features

Since this is a current nightly snapshot release, PS3 and Vita users are able to get a sneak peek at some of the features that will be part of RetroArch 1.3.7 for the other platforms.

  • Improved error handling. When loading the wrong ROM into a core, most cores should be able to now gracefully exit instead of just quitting or crashing RetroArch.
  • Much more complete info message system. Press ‘Select’ on any entry and 99% of the time it should show you a handy help message explaining to you what each setting does. The ‘English’ language setting is currently the one that is most complete, for all other languages we need to wait until translators have finished adding all the help messages in their own language.

Nintendo 64 Vulkan Low-Level emulator – paraLLel – pre-alpha release

Vigilante 8 running in ParaLLEl.
Vigilante 8 running in ParaLLEl.
Here is a pre-alpha release of the hotly anticipated N64 Vulkan renderer, paraLLel. To coincide with this, a new RetroArch version has also been released that includes support for the async compute interface that this new renderer requires.

Also see our other major announcements today:

RetroArch 1.3.6 released

Lutro – easy retro game creation, powered by libretro

And our earlier story featured a couple days back on ParaLLEl –

First ever Vulkan Nintendo 64 emulator, ParaLLEl, coming soon, only for Libretro/

What is paraLLEl?

This is a standalone libretro core for now that we keep separate from the regular Mupen64plus libretro core that it is based on. It includes only a Vulkan rendering backend and a low-level RSP. This core will only work right now if you are running it with a Vulkan driver.

In the future, paraLLEl will be the new name for our N64 emulator which (while initially starting out as a Mupen64plus core) has grown into its very own entity. It will have among other things:

– Completely rewritten CPU cores, both interpreter and dynarec. We want to fix the remaining CPU bugs that prevents Mupen64plus from being able to run certain games that PJ64 can, for instance. We also want to be able to move away from having two separate dynarec systems in Mupen64plus, where one is aimed at Atoms and ARM CPUs, and the other is meant for desktop x86, and neither is particulary fast.
– A unified HLE video renderer that combines the best of Glide64, Gliden64, Rice, and GLN64, and offers optional runtime codepaths for performance.
– A unified LLE video renderer infrastructure that allows for both software rendering (Angrylion) and the Vulkan/GL 4.3 powered equivalent (the video plugin we now call paraLLEl).
– As part of the CPU rewrite, an RSP dynarec will also have been written around that time.

How to use this

  • Download RetroArch 1.3.6 (or any future version from this point on). See this blog post here.
  • Download the ParaLLel core. To do this, start up RetroArch. Inside the menu, go to Online Updater -> Core Updater. Scroll down in the list until you see Nintendo 64 (ParaLLEl).
  • Download it.
  • IMPORTANT! READ! Before starting, make sure that you have selected the Vulkan display driver. To check this, go to Settings -> Driver, and see if ‘Video’ says ‘vulkan’. If not, select it, and then restart RetroArch for the changes to take effect.
  • Load the core with a ROM.

Preliminary requirements

* A GPU capable of supporting the Vulkan API
* Vulkan drivers installed on your system


ParaLLEl has been tested on an nVidia Maxwell GPU and has been confirmed to be running well.


ParaLLEl has been tested on an AMD Radeon 250 and has been confirmed to be running well.


Easily the biggest problem right now out of all Big Three. Thankfully, Mesa developer Jason Ekstrand was very receptive to our feedback and together with him, we have been able to get Vulkan N64 successfully booting now on Ivy Bridge all the way up to Broadwell.

When we first started the renderer, it would crash at startup and not even let us ingame.

Apparently some of the patches that helped get paraLLEl up and running also helped The Talos Principle finally start to go-ingame, so that is nice to see!

If you want to test paraLLEl on an Intel iGPU on Linux, be sure to read on until the last paragraphs. You will have to compile a bleeding edge fork of Mesa, we will explain to you how to do this. Do note that in the future, all these patches will have been pushed to upstream, so if you are reading this blog post a month from now, just go with the latest upstream Mesa instead.

At a later date in time it is likely that your Linux distribution’s package management system will feature newer versions of Mesa with these patches already incorporated.

See some of the before/after screenshots here.

RDP emulation status

As this is a pre-alpha release, the full transplantation of Angrylion to compute shaders is not yet completely done, although many games already run reliably.

Hunterk made a couple of videos of RetroArch running ParaLLEl using RetroArch’s built-in recording feature.

Super Mario 64

The quintessential N64 game. Not too many surprises here.

Note – you might have to enable ‘Synchronous Sync’ on Intel Ivy Bridge in order to be able to progress beyond the title screen. This might not be an issue on other hardware.

Mario Kart 64

The popular karting game. Note the framebuffer readback activity on the wall.

Legend of Zelda: Majora’s Mask

The follow-up to the Legend of Zelda: Ocarina of Time.

Note – for both Majora’s Mask and Ocarina of Time, it is required to enable ‘Synchronous Sync’ so that the subscreen works correctly.

Body Harvest

In some ways the spiritual precursor to GTA3, made even by the same company. Note a couple of things: unlike with any RDP HLE plugin ever, not only can we cross the bridge, but we can do so without the player literally walking through the bridge instead of walking correctly on top of it. High-level RSP up until now has been too buggy to accurately enable both Body Harvest US and European versions from working with the crossbridge (a patch by LegendofDragoon made it work for the European version, but with any HLE plugin you would still see the player character essentially walking through the bridge when passing over it, and in the US version you just plain cannot cross the bridge unless you are using a low-level RSP plugin.

So, low-level RDP in conjunction with low-level RSP takes care of all that.

Jet Force Gemini

One of the later games released by Rare before the big Nintendo-Rare breakup. Shares some microcode with Diddy Kong Racing and Mickey’s Speedway USA, so all three games pretty much work without any major issues on paraLLEl.

There are some things to talk about here. There is one core option in paraLLEl in specific which has a heavy bearing on emulation accuracy:

* Synchronous RDP *

Turning this off allows for higher CPU/GPU parallelism. However, we found so far that many games either won’t run or have broken framebuffer effects if left disabled, so we felt it was safer to leave it enabled for now.

It has been verified that with certain games, disabling this can provide for at least a +10fps speedup. Try experimenting with it.

Things you need to know before running the alpha

Killer Instinct Gold running in ParaLLEl. This background would normally be glitched in any other HLE plugin.
Killer Instinct Gold running in ParaLLEl. This background would normally be glitched in any other HLE plugin.

Like they say, Rome wasn’t built in a day. This is an alpha release. Keep in mind that not every game right now will run correctly but that it’s expected we will be able to nail all these issues in quick succession shortly.

* Even though this renderer delivers on its promises and completely eliminates the RDP rendering bottleneck, the RSP bottleneck still remains. In order to make emulation significantly faster, we will therefore have to start moving away from having only an LLE interpreter RSP core, and move towards a dynamic recompiler. Work on this is commencing, so this bottleneck is just temporary. For now, the beefier your CPU is, the less chances you have of running into said bottlenecks.
* Not all games will work correctly right now. Porting the Angrylion renderer to Vulkan has been a challenge to say the least. There are still some unimplemented edge cases which will be fleshed out in the upcoming weeks.

Major bugs remaining/things left to be done

Bangai-o running in ParaLLel.
Some of the current issues that still makes this a pre-alpha candidate

  • VI overlay.
  • Some framebuffer management corner cases
  • Some obscure combiner features
  • Interlacing bugs – games that output to the screen in interlaced mode will show some graphics glitches right now. Should be one of the more trivial things to fix.

What’s different between this and Angrylion RDP AIO/CEN64?

Angrylion RDP AIO and CEN64 are using the software-rendered Angrylion plugin (the base for most RDP-based renderers) and optimizing it with SSE code and multithreading. ParaLLEl instead is porting Angrylion to the GPU (through Vulkan) with compute shaders and leveraging async compute to be able to parallelize it as best as possible. CEN64 is also going for an RSP dynarec to speed up things further.

The next step for ParaLLEl will also be an RSP dynarec to increase performance, and in fact MarathonMan (the author of CEN64) has been willing to help us out with some performance ideas in this department.

Both approaches (optimizing the software-rendered Angrylion plugin) and hardware acceleration (ParaLLEl) are valid and worthwhile.

Android port

Honestly, for this release I have not yet bothered compiling it for Android and seeing if it will run on the Shield tablet. I will start doing that after this release is over. I think it will become more useful once RSP dynarec is in but it might already be neat to be able to run it regardless.


Legend of Zelda: Ocarina of Time running in ParaLLEl.

The sourcecode for ParaLLEl can be seen here. In order to compile it yourself right now, you need to git clone the mupen64plus libretro repository, and run the following command:


Compiling bleeding-edge Mesa for Intel Vulkan (Linux) in order to use paraLLEl

There are not a lot of Vulkan test cases out right now, so paraLLEl needed some special patches pushed to Mesa in order for it to work.

Here are the steps required:

1. Type these lines into the commandline:

git clone git://

Go to the directory by typing:

cd mesa

git checkout wip/retroarch

autoreconf -vfi

./configure --with-vulkan-drivers=intel --with-gallium-drivers= --with-egl-platforms=wayland,x11 --with-dri-drivers=i965

(Verify that in the output, Vulkan drivers does not say ‘no’).


2. After we have done all this, we will need to setup a file to point to our vulkan driver. If it does not exist, create it here:


This file should point to the compiled Vulkan driver library file’s path. This is the file you just created before.

Here is a local example of how it could look like. You will have to change the filename path to wherever your file is located.

Here is an example of how it could look:

"file_format_version": "1.0.0",
"ICD": {
"library_path": "/home/squarepusher/libretro-super/mesa/lib/",
"abi_versions": "1.0.3"

You should replace the library path with the exact filename location on your system. Save this file (you might need root permissions for this).

What’s next?

Expect major speedups soon and other exciting news regarding dynarec unification plans. Daeken will be guest blogging about that soon, and it will have far ranging ramifications for libretro cores and emulators in general. Also, expect Beetle PSX to be one of the other major beneficaries of that.

RetroArch 1.3.6 released

RetroArch keeps moving forward, being the reference frontend for libretro and all. Here comes version 1.3.6, and once again we have a lot to talk about.

Where to get it

Windows/Mac/iOS (build only)/Nintendo/PlayStation – Get it here.

Android: You can either get it from F-Droid or from Google Play Store.

Linux: Since RetroArch is included now on most mainline Linux distributions’ package management repository systems, we expect their versions to be updated to 1.3.6 shortly.

I will release versions for MacOSX PowerPC (10.5 Leopard) and 32-bit Intel MacOS X 10.6 (Snow Leopard) later on, maybe today or tomorrow.

Usability improvements

Windows Drag and Drop support

Courtesy of mudlord, with the Windows version, you can now drag and drop a ROM (or any other content) onto RetroArch’s window, and it will attempt to load the correct core for it. If there is more than one core available for the type of content you dragged and dropped, it will present you with a slidedown list of cores to select from.

Vastly improved content downloading features

Starting with v1.3.6, RetroArch users can download compatible freeware content, such as the shareware release of Doom, right from the app. This video goes through the steps, which include fetching the core from the online updater, fetching the content from the repository and then launching the core and content we just downloaded.

Menu customization and aesthetics – XMB and MaterialUI

RetroArch v1.3.6 adds support for a number of themes in the default mobile menu, including both bright and dark themes.

There’s also the ability now to set a custom wallpaper in XMB and be able to colorize it with a color gradient. To do this, you go to Settings -> Menu, you set a wallpaper, and from there you have to set ‘Menu Shader Pipeline’ to OFF. You can then choose from one of the color palettes in ‘Color Theme’ in order to shade the background wallpaper, or just select ‘Plain’ in case you don’t want to colorize it.

Undo Load/Save State

Have you ever gotten through a tough part of a game and wanted to make a savestate only to hit the “load state” button instead and have to do it all over again? Or maybe you were practicing a particularly difficult maneuver–for a speedrun, perhaps–and accidentally saved a bad run over your practice point because you hit “save state” instead of “load state”? While savestates are considered one of the great advantages to emulating retro games, they can also lead to these frustrating situations where they wipe out progress instead of saving it, all because of one slip of the finger. RetroArch now has the ability to undo a save- or load-state action through some automatic state-shuffling that happens behind the scenes, so you never have to worry about these situations again.

Undo Load State – Before the ‘current’ state is altered by e.g. a ‘Load Savestate’ operation, ‘current’ is saved in memory and ‘Undo Load State’ restores it; you can also undo this option by using it again, which will make you flip-flop between 2 states.

Undo Save State – If there was a savestate file that was overwritten, this option restores it.

New Features

The main event of RetroArch 1.3.6 is obviously the fact that it makes it possible to run the N64 Vulkan core, paraLLEl. Previous versions of RetroArch will not be able to run this because of the new extensions to libretro Vulkan which we had to push to make this renderer possible.


Async compute core support – ready for ParaLLEl

It was already possible to run Vulkan-enabled libretro cores, but with this release, a few crucial features have been added. Support for queue transfers was added and a context negotiation interface was added.

With this we can now use multiple queues to overlap compute and shading in the frontend level, i.e. asynchronous compute. ParaLLEl would certainly not have been as fast or as effective were it not for this.

ParaLLEl now joins triple-A games like Rise of the Tomb Raider and Doom in heavily relying on Vulkan’s async compute capabilities for maximum efficiency. A test core was also written as a proof of concept for this interface.

If you want to read more about ParaLLEl, we have a compendium blog post for you to digest here.

Supports Windows, Linux, Android equally well now

The previous version already had Vulkan support to varying degrees, but now we feel we are finally at the point where Vulkan driver support in RetroArch is very much mature across most of the supported platforms.

Vulkan should work now on Android, on Windows, and on Linux, provided your GPU has a working Vulkan driver.

On Linux we now support even more video driver context features, such as VK_KHR_display support. This is a platform-agnostic KMS-like backend for Vulkan, which should allow you to run RetroArch with Vulkan without the need of an X11 or Wayland server running.

On Windows and Android, we include Vulkan support now. Vulkan has been tested on Android with NVIDIA Shield Tablet/Console, and both work. Be aware that there are some minuscule things which might not work correctly yet with Vulkan on Android. For instance, orientation changing still doesn’t work. This will be investigated.

Max swapchain images – driving latency even lower with Vulkan and friends

RetroArch already has built up quite a reputation for itself for being able to drive latency down to very low levels. But with new technologies, there is always room for improvement.

Max amount of swapchain images has now been implemented for both the DRM/KMS context driver for OpenGL (usable on Linux) and Vulkan now. What this entails, is that you can programmatically tell your video card to provide you with either triple buffering (3), double buffering (2) or single buffering (1). The previous default with DRM/KMS was 3 (triple buffering), so setting it to 2 could potentially shave off latency by at least 1 frame (as was verified by others). Setting to 1 won’t often get you single buffering with most monitors and drivers due to tearing and they will fall-back to (2) double buffering.

With Vulkan, RetroArch can programmatically infer to the video card what kind of buffering method it likes to be able to use, a vast improvement over the nonexistent options that existed before with OpenGL (from a platform-agnostic perspective).

What Vulkan brings to the table on Android

Vulkan has been tested to run on Android devices that support Vulkan, like Shield Tablet/Console. Latency has always been very bad on Android in the past. With Vulkan, frame times are significantly lower than with OpenGL, and we no longer have to leave Threaded Video enabled by default. Instead, we can turn off Threaded Video and letting RetroArch monitor the refresh rate dynamically, which is the more desirable solution since it allows for less jittery screen updates.

Audio latency can also be driven down significantly now with Vulkan. The current default is 128ms, with Vulkan we can drive it down to 64 or even 32ms.

Couple this with the aforementioned swapchain images support and there are multiple ways to drive latency down on Android now.

OpenGL music visualizer (for FFmpeg-enabled builds)

Versions of RetroArch like the Linux and Windows port happen to feature built-in integrated FFmpeg support, which allows you to watch movies and listen to music from within the confines of RetroArch.

We have added a music visualizer now. The scene is drawn as a cylindrical mesh with FFT (Fast Fourier Transform) heightmap lookups. Different colors are shaded using mid/side channels as well as left/right information for height.

Note that this requires at least GLES3 support (which is available as well through an extension which most GPUs should support by now).

Improvements to cores



User leileilol contributed a very cool feature to TyrQuake, Quake 64-style RGB colored lighting, except done in software.

To be able to use this feature, you need to create a subdir in your Quake data directory called ‘maps’, and you need to move ‘.lit’ files to this directory. These are the lighting map files that the Tyrquake core will use in order to determine how light should be positioned.

From there on out, you load up the Tyrquake core, you go to Quick Menu -> Options, you enable Colored Lighting. Restart the core and if your files are placed correctly, you should now see the difference.

Be aware that in order to do this, the game renderer shifts to 24bit color RGB rendering, and this in turn makes things significantly slower, although it should still be fairly playable even at higher resolutions.

View the image gallery here.

To download this, go to ‘Add Content’ -> ‘Download Content’. Go to ‘Tyrquake’, and download ‘’. This should extract this zip to your Downloads dir, and inside the Quake directory. From there, you can just load Quake and the colored lighting maps should be found providing the ‘Colored Lighting’ option has been enabled.

SNES9x emulator input lag reduction

A user on our forum, Brunnis, began some investigations into input latency and found that there were significant gains to be made in Super Nintendo emulators by rescheduling when input polling and video blitting are being performed. Based upon these findings and after some pull requests made to SNES9x, SNES9x Next, and FCEUmm, at least 1 to 2 frames of input lag should be shaved off now.

Do read this highly interesting forum thread that led to these improvements here.

News for iOS 10 beta users

There is now a separate version for iOS 10 users. Apple once again changed a lot of things which makes it even more difficult for us to distribute RetroArch the regular way.

Dynamic libraries cores cannot be opened from the Documents directory of the app anymore in iOS 10. They can be opened from the app bundle, as long as they are code-signed. This reverts back to the previous behavior of RetroArch, where the cores need to be in the modules directory of the app bundle.

Go to this directory:

and open RetroArch_iOS10.xcodeproj inside Xcode.

Note – you will need to manually compile the cores, sign them, and drag them over to the modules directory inside Xcode.

Example –

1. You’d download a core with libretro-super.

A quick example (type this inside the commandline)

git clone

./ 2048

./ 2048

This will compile the 2048 core inside /dist/ios.

2. Move the contents of this directory over to the ‘modules’ directory inside the RetroArch iOS 10 Xcode solution. It should presumably handle signing by itself.

Bugfixes/other miscellanous things

  • Stability/memory leak fixes – We subjected RetroArch to numerous Valgrind/Coverity/Xcode Memory leak checks in order to fix a plethora of memory leaks that had reared their ugly heads inbetween releases. We pretty much eliminated all of them. Not a sexy feature to brag about, but it involved lots of sweat, tears and effort, and the ramifications it has on the overall stability of the program is considerable.
  • There were some problems with Cg and GLSL shader selections which should now be taken care of.
  • ScummVM games can now be scanned in various ways (courtesy of RobLoach)
  • Downloading multiple updates at once could crash RetroArch – now fixed.
  • Several cores have gotten Retro Achievements support now. The official list of systems that support achievements now is: Mega Drive, Nintendo 64, Super Nintendo, Game Boy, Game Boy Advance, Game Boy Color, NES, PC Engine, Sega CD, Sega 32X, and Sega Master System.
  • You can now turn the supported extensions filter on or off from the file browser.

Effort to addressing user experience feedback

I think a couple of things should be addressed first and foremost. First, there is every intent to indeed make things like a WIMP (Windows Icons Mouse Pointers) interface around RetroArch. To this end, we are starting to make crossplatform UI widget toolkit code that will make it easy for us to target Qt/GTK/Win32 UI/Cocoa in one fell swoop.

We have also spent a lot of time plugging some of the rough edges around RetroArch and making the user interface more pleasurable to work with.

Youtube libretro channel

Hunterk/hizzlekizzle is going to be running the libretro Youtube channel from now on, and we’ll start putting up quick and direct Youtube videos there on how to be able to use RetroArch. It is our intent that this will do a couple of things:

1. Show people that RetroArch is easy to use and has numerous great features beneath the surface too.
2. It allows users to give constructive criticism and feedback on the UI operations they see and how they think they should be improved.
3. We hope to engage some seasoned C/C++ coders to help us get some of these UI elements done sooner rather than later. Most of RetroArch development mostly relies on a handful of guys – 5 at the most. It is a LOT of hard work for what amounts to a hobbyist project, and if we had a lot more developers seasoned in C/C++, stuff could be done quicker.
4. There is no intention at all to make RetroArch ‘obtuse’ for the sake of it, there is every intention to make it more accessible for people. Additional help would go a very long way towards that.

Regarding the current UIs and their direction, it is obviously meant to be a console-like UI experience. This might not be what desktop users are used to on their PCs but it is what we designed menu drivers like XMB to be. It is true that keyboard and mouse are mostly seen as afterthoughts in this UI but really, we wrote the UI with game consoles and something where a gamepad is the primary input device at all times, particularly since a keyboard to us is a poor way of playing these console-based games anyway.

Anyway, menu drivers like XMB and MaterialUI will never have any WIMP UI elements. HOWEVER, in upcoming versions, we will be able to flesh out the menubar and to allow for more basic WIMP UI elements.

RetroArch is meant to be a cutting-edge program that is ultra-powerful in terms of features. With that comes a bit of added complexity. However, we have every intent of making things easier, and with every release we put a lot of time and effort into improving things. But again, more developers would help out a substantial lot in speeding up certain parts that we are working on.

Our vision for the project involves an enormous workload and we’re considering differnt ways of generating additional support. If a Patreon might allow us to get more developers and get more stuff done faster, we might consider it. But we want such things to be carefully deliberated by both our internal development staff and the users at large. I hope you’ll be able to appreciate the relative rough edges around the program and appreciate the scope and the craft we have poured into the program. Please appreciate that we are pouring a lot of blood, sweat and tears into the program and that mostly we try to maintain an upper stiff chin when faced with all the criticism, but we do care and we do intend to do better. Volunteer coders are very welcome though, by people who have some time to spare and who want to make a difference. We ask for your understanding here, and we hope that by finally speaking out on this, users can gain a better understanding of our intent and be able to appreciate the program better in light of that.

Lutro – easy retro game creation powered by Libretro

We are going to be making Libretro (and RetroArch, by extension) more usable for content creators, and the first part in that endeavor is the official launch of Lutro.

Lutro is an in-development Love 2D reimplementation written in Lua and implemented as a libretro core. With Lutro, it is possible to easily create Lua games with no knowledge of C being necessary, or having to compile any code.

Sample games

To demonstrate the flexibility and power of Lutro, we have assembled a few Lutro-based games which you can freely download from our server. They are purposefully kept simple so that the content creator can use them for their own attempts at creating a game.



A recreation of the game Pong for Lutro.



This is a Love2D-based endless runner game that has been ported to Lutro.


One of kivutar’s first proof of concept demos showing off Lutro. It’s a scrolling 2D platform game with no real game mechanics beyond jumping.


Another 2D platformer showcase example for Lutro, this time illustrating how a Metroidvania-style game could work as a Lutro game. It has several screens which were implemented and some game mechanics including combat, item collecting, jumping, etc.

The Game Of Life

A recreation of Conway’s The Game Of Life. Press one of the buttons to regenerate the algorithm again. Can be quite CPU intensive depending on the system and environment you run Lutro on and/or whether or not LuaJIT is available.



A recreation of Tetris for Lutro.

How to use the existing Lutro games

Start up RetroArch (version 1.3.6 or later).

Make sure you have downloaded the Lutro core first. To do that, do the following:

Downloading the Lutro core

1. Go to ‘Online Updater’.

2. Go to ‘Core Updater’.

3. Browse through the list and select ‘Lutro’. This will download the core. Once done, exit this screen and go back to the main menu by pressing the back button.

Downloading a game

There are several games we allow you to download from our servers.

1. Go to ‘Add Content’.

2. Go to ‘Download Content’.

3. Select the folder ‘lutro’.

4. Download any of the games. Once you’re done, press the back button or another key to go back to the main menu.

5. Go to ‘Load Content’.

6. Go to ‘Select Downloaded File and Detect Core’.

7. Go to the lutro directory.

8. Select the subdir of the Lutro game you want to play.

9. Select ‘main.lua’, and the game will start up.

Case study example : LutroSpaceship

In order to show off how Lutro could be used to create a good 2D game, kivutar has created a Metroidvania-style platformer game. It’s called ‘LutroSpaceship’. In it, you are thrown into a 2D Metroidvania style world with multiple screens you can explore. You can swing your lightsword to kill enemies. Enemies will drop collectibles that you can pick up. There are several traps you will have to avoid such as a laser beam. The game ends once you have reached a passageway.

As a budding content creator, you can pick up where we left off in this demo and continue the game from there. All it takes is some familiarity with how Love2D-based games work, a text editor and editing the Lua source files. After having edited these files, you can run the game again and immediately sample your changes.

Work in progress

Be aware that Lutro right now is not feature-complete with Love2D.

  • There are several missing API functions that still have to be implemented. View the list here. We will keep this updated as we go along.
  • We are in the process of adding an audio mixer to Lutro to complete some of the remaining missing Love2D functionality.
  • Right now, there is no hardware rendering acceleration, everything is done mainly on the framebuffer. This would not be ideal for games that rely heavily on 3D-based rendering or transformation/scaling but it does have the advantage that the Lutro core/games can run on systems where there is no OpenGL support to begin with.

First ever revolutionary N64 Vulkan emulator coming soon – only for libretro (paraLLEl)


For years, Nintendo 64 emulation has been pretty bad and lagging significantly behind Nintendo Gamecube/Wii emulation. At least 90 to 95% of the remaining problems are at the RDP level, the N64’s video subcomponent chip. By moving away from High-Level Emulation of the RDP, we could solve most of the remaining problems. The problem has been that for a long time, it seemed impossible to do this at playable speeds. Software rendering is too slow for a GPU from this timeframe, and older versions of OpenGL have too many crippling limitations in order to allow for a 1:1 reprogramming and port of Angrylion to GL.

At last, this dire situation will change in the upcoming days and we can finally release to the public something that will revolutionize N64 emulation forever so that we can move away from all of the hacky HLE video plugins that have been released in recent years.

The world’s first-ever low-level N64 video plugin implemented using the Vulkan API!

And not just any video plugin either. This is a reimplementation/port of Angrylion to Vulkan. This will be the first time most will be able to get anywhere close to playable speeds with an accuracy-based N64 video renderer.

This hardware renderer is unique for the following reasons:

  • This is the first N64 emulator project ever so far to receive Vulkan support.
  • This is the first time ever that an emulator takes advantage of asynchronous compute (exclusive only to DirectD12/Vulkan) for hardware rasterization of an emulated GPU.
  • This is the first time ever that the Angrylion renderer has been ported to a graphics API. It is the first time an RDP LLE video renderer for N64 has been capable of running at fullspeed. It marks a shift away from decades of inaccurate high-level emulation of the N64’s RDP which made for buggy N64 emulation in general.

How to use it?

When it will be released in the upcoming days, this is what you will need in order to use it.

  • You will need the latest RetroArch version (either nightlies or the upcoming 1.3.5 version). The libretro API has been updated to make asynchronous compute cores possible, hence why ‘Mupen64plus HW libretro’ will not work on any older version of RetroArch.
  • Your video card also needs to support the Vulkan graphics API.

When RetroArch 1.3.5 gets released

Download the new RetroArch 1.3.5, go to ‘Online Updater’, go to ‘Core Updater’.

From there, go to ‘Experimental’, and download Mupen64plus HW. This will download the Vulkan-enabled Mupen64plus core.

Before trying to use it, make sure your video card supports the Vulkan API otherwise it won’t work!

Why RDP LLE? Why is this significant?

For years, Nintendo 64 emulators have fixated upon a High-Level Emulation approach to emulate the RDP, the N64’s video rasterizer. Examples include Glide64, Rice, GLN64 (and its recent fork, GlideN64).

It is a practical but imperfect way of emulating the RDP for many reasons:

  • These plugins require numerous game-specific hacks and workarounds. It becomes a real maintenance chore and there’s plenty of missing graphical effects to this day. Examples include: missing lens flares in Turok: The Dinosaur Hunter, corrupt backgrounds in Killer Instinct Gold and GoldenEye 007, fiddly auxilliary frame buffer glitches, inaccurate approximations of graphical effects due to combiner issues, etc.
  • Most of these HLE RDP plugins recycle a lot of old code. For instance, Gliden64 is mostly a collage of GLN64 + Glide64 code, but the code recycling goes deeper than that. Low-level triangle rasterization functions in both Glide64 and Gliden64 are borrowed from Z64 GL, an RDP plugin by Ziggy. The problem is that bugs still exist in these sections of the code. Most of the low-level rasterization functions that keep being borrowed in these high-level plugins are directly responsible for many of the remaining glitches you can see. And since the code was written by outside people who are no longer active in the scene, it doesn’t seem likely it is ever going to get fixed.
  • There are other legacy issues. The most notorious one of all is of course Glide64, which originally targeted (you guessed it) the obsolete 3Dfx graphics API Glide. We are talking GL 1.2 / 1.3-ish era here, really stone-age. An OpenGL wrapper for Glide had to be written around Glide64 in order to get it to run with OpenGL-supported video cards in the first place, but the wrapper code unfortunately is far from optimal. Other plugins like Z64 GL still seem to use OpenGL 1.4x-era code and lots of questionable fixed function wrapper code.
  • Many games use custom RSP microcode to do certain game tasks. For instance, Rogue Squadron uses custom RSP microcode for terrain heightmap generation, while games like Resident Evil 2 and Legend of Zelda: Ocarina of Time use the RSP for video and image decompression routines. Usually this would call for a high-level implementation/approximation of what the game would expect to be returned to the RDP, and to also implement corresponding high-level displaylist implementations on the RDP rasterization side. Many games simply have never had their custom microcode properly reverse engineered, so the only way to play these games is to use a combination of a low-level RSP plugin and a low-level RDP renderer. Most of the existing microcode was actually handed to devs on a silver platter and it seems the remaining microcodes will probably never be reversed for this reason.
  • You run into pretty big bottlenecks with traditional GL rendering for which no real solutions exist, frame buffer bottlenecks, depth buffer bottlenecks, etc. More recent versions of OpenGL (4.3+) have made it possible to fix some of the issues, like better depth compare, faster and more efficient framebuffer to framebuffer copying, but it’s still honestly a big inoptimal mess.
  • Coverage emulation is usually completely stubbed out in HLE video plugins.
  • All of these plugins have so far completely avoided trying to emulate the VI interface. The VI interface basically reads from the RDP’s frame buffer and sends it to the digital-to-analog converter to create the video output. Along the way it applies several postprocessing effects including what appears to be 8x MSAA. I guess some can blame for this VI interface for leading to the ‘smudged’/’smoothed out’/’blurry’ look of many N64 games. But hey, we’re going for authentic here :)

Enter this new renderer. It takes as a base Angrylion (the most accurate RDP rasterizer yet so far) and it uses compute shaders to transfer the workload to the GPU instead of the CPU. Angrylion has been known to render nearly all games accurately unlike regular HLE N64. The only problem has been that it has been too slow to run at full-speed because of it being completely software rendered, which puts all the strain on the CPU. RDP LLE changes that around so that this rendering bottleneck is completely gone. With RDP LLE, the only remaining bottleneck will be the interpreter RSP plugin that a low-level RDP plugin has to use.

Work remaining to be done

With this video renderer we have aimed for a GL 4.3 / Vulkan featureset in order to escape most of the bottlenecks and limitations that usually drags N64 emulation down. From now on, there will be two big remaining tasks to be done:

  • We will have to port the code over to OpenGL 4.3+. Lower subsets of OpenGL won’t work as this renderer requires compute shader support.
  • With the RDP bottleneck being completely gone with this renderer, RSP has now become the main bottleneck. We will have to write a recompiler for the RSP in order to attain even better performance and reduce the RSP bottleneck as much as possible. So far, only Project64 has an RSP recompiler like this, but there are plans of using Daeken’s generic recompiler system in order to come up with something equivalent for Mupen64plus libretro.

Asynchronous compute raymarching libretro test core


In order to make this renderer possible, extensions to the libretro API had to be added.

For educational reasons and in order to serve as a proof of concept on how to make your own libretro core that takes advantage of the recently added asynchronous compute capabilities, a test core has been made, called ‘libretro-test-vulkan-async-compute‘.

It is a basic test program that demonstrates raymarching being done in Vulkan. We’d very much like to see people improve upon this and collaborate to make a more impressive core out of it.

You can find the sourcecode for this sample test core inside RetroArch’s source code directory tree (cores/libretro-test-vulkan-async-compute in specific).


It has been a long time coming, but finally with paraLLEl, N64 emulation can finally become ‘good enough’ and we no longer need to have patchwork renderer plugins that try to fix graphics issues on a per-game basis.

Components of a Source Engine Libretro Frontend

When combined with a 3D game engine, such as the Source engine, the libretro framework is able to run on any surface in the game.

Replacing the simple game texture shown on an in-game TV screen with a fully functional libretro instance can do anything ranging from playing a full-length video with the ffmpeg core, to running interactive homebrew software, all on the in-game screen.

Adding libretro support to a 3D frontend allows a wide range of media to be loaded using libretro’s ever growing collection of cores. Only a few key components need to be in place to get started. This article outlines what was required to add libretro support to the Source engine for the 3D-frontend Anarchy Arcade (  The process for other popular game engines is also very similar.

The Source engine has three notable road bumps when implementing libretro.

First is that the Source engine is DirectX, while libretro requires OpenGL for 3D acceleration.  Without OpenGL, only the cores that support software mode will be able to function.  Generally, this means only 2D cores will work properly, but technically it will depend on what the core supports.

The second road bump is audio.  Anarchy Arcade uses PortAudio ( to play the audio streams that libretro provides.  However, the Source engine itself may be capable of playing such audio streams more optimally.  What ever the sound solution, it is still simple enough to get the audio streams from libretro and send them where ever they need to go.

The third road bump is frame scaling.  The resolution of the frame that libretro generates will not match the resolution of your texture.  The frame data must be re-scaled in real time before it can be written to the texture.

After libretro is initialized, a core module must be loaded and its interface built so that it can communicate with your frontend.  The libretro core will be asking your frontend for different variables and options that your frontend must provide answers to.  The core will also be delivering the video buffer and the audio streams to your frontend.

All that your frontend needs to do is display what libretro gives it.  Your frontend can also forward keyboard & gamepad input to the libretro core so the user can interact with it.  This communication all takes place during the main loop.

Source Engine Main Loop

After libretro is initialized, its retro_run method needs be able to be called as often as possible, without interfering with normal engine operations.  A good place to plug your code into the Source engine for this is in CAutoGameSystemPerFrame::Update.

With a core loaded and retro_run being called from your main loop, libretro is fully functioning.  Now the frontend just needs to display what it is given, and forward user input to the libretro core.

Source Engine Texture Access

Cores that have software rendering (such as most 2D cores) will basically require a memcopy of the frame that libretro gives your frontend.  The Source engine has a special kind of texture called a procedural texture that allows you to write directly to its pixels, or do memcopies onto it.  It works by plugging into the logic of the ITextureRegenerator::RegenerateTextureBits method to do a memcopy of the libretro frame buffer.  This effectively draws what ever libretro is rendering onto the in-game texture.

Source Engine Input Capture

Finally, your frontend needs to send button state info to the libretro core so that the user can interact with it.  In the Source engine, button states can be determined whenever needed by using the vgui::input()->IsKeyDown method.

Libretro Frontend on Source Engine w/ VR Mode

After all of these components are implemented into your frontend, you can use your interactive texture on how ever many surfaces you want.  There is no additional performance impact for using it on more than 1 surface.  You can also optimize your implementation so that memcopies only occur when they need to by assigning a CEntityMaterialProxy to the material that references your procedural texture.  With these 3 simple components, you are able to run many of the libretro cores available at, or your own homebrew software on the in-game screens of your Source engine 3D frontend.