Libretro Cores Updates – PCSX2 Alpha release for Windows and more!

PCSX2 Alpha core available for Windows now!

Today we’re releasing an alpha version of the PCSX2 libretro core on the buildbot. It’s available for Windows only right now, but the same core has been tested to work on Xbox One/Series systems as well.

Important things to know

  • Create the following directory in your system directory – “pcsx2”. You can put all the asset files there from your regular PCSX2 install. See pic below for an example.

    You need a working BIOS inside /pcsx2/bios. PCSX2, unlike Play!, will not work without a real BIOS.

  • It’s still an alpha version. Things are rough around the edges. Expect bugs and things to be incomplete.
  • There’s a working OpenGL renderer and a Direct3D11 renderer option. Direct3D 11 renderer can be faster than OpenGL but also has less features. Pick whichever works best for you. On Xbox you will only be able to use Direct3D11 anyways.
  • This core uses the x86_64 dynarec which was added to PCSX2 a year ago. It is still less compatible than the 32bit x86 dynarec in PCSX2, so keep that in mind. It’s for similar reasons that the software renderer right now won’t work (it’s not compatible yet with x86_64, not in upstream either).
  • There’s a bug that can happen right now upon closing content or exiting RetroArch with the PCSX2 core on Windows – the RetroArch process might not completely cleanly shut itself off and you might still be able to see a 0% CPU process remaining in the Task Manager. We have not been able to figure out how to fix that yet as the PCSX2 codebase is a definite case of ‘here be dragons’, but for now when this happens, you can just bring up the Task Manager and close it manually. It shouldn’t have a real detriment on performance but it is of course far from ideal and hopefully something we can fix soon with the help of some contributors. We have found this happens the most with the Direct3D 11 renderers.
  • Switching resolution at runtime right now can be a bit unstable, so does switching fullscreen resolution. We might just make resolution switching require a restart since this tends to be too unstable for now.

Update policy for this core
Hard-forked core for now. Govanify is going through many (necessary) refactors in PCSX2 upstream to make the code more portable, and he has also expressed his interest in an upstreamable libretro core somewhere down the line when the final refactoring of the GUI is complete. So either of two things can happen when that happens, if this is more closely aligned to upstream core is better in every way including performance, this will be replaced. If not, we will likely have two cores, one being the upstream-friendly core and this being the hard-forked one. As of this moment a lot of work remains to be done on PCSX2 to sort out all the internals that are chockful of nonportable code. Therefore, for now, the experimental PCSX2 core kinda is doing its own thing. PCSX2 still has a lot of inherently nonportable code in it, from WxWidgets to libglib. It’s for that reason that we don’t have a Linux core yet on our buildbot. We hope that we will be able to figure out a proper portable core for Linux users soon.

Expect a lot of Quality of Life enhancements to this core soon. We wanted to share this with you now rather than sit on it for even longer, now that it’s on the buildbot we can at least push regular updates to it, people can report issues and developers can contribute. Everyone wins.

Play! (experimental PlayStation2 emulator) is back on the buildbot!

The Play! libretro core is back on our buildbot! It took some time for this to be readded to our modern new buildbot but here it is!

Available right now for: Android (AArch64/ARMv7/x86), macOS (Intel), Linux (32bit/64bit), and Windows (32bit/64bit).

Update policy for this core
Upstream. Updates are pulled straight from the upstream repository.

DuckStation/SwanStation core updated

The DuckStation/SwanStation core has been updated to a build from a week ago.

Update policy for this core
This will remain a shallow fork and attempts are being made to make the surface area for patches small so that we can easily pull in updates. Some new contributors have jumped onboard and they want to ensure this core remains updated.

Mainline MAME available for iOS11 and up/tvOS !

The mainline MAME core is now available for iOS and tvOS users! It is also available for Mac Intel x64 users, and we hope that we can make it available for ARM Mac users soon as well.

Also important to note we will be updating to version 0.330 soon.

Update policy for this core
This is pretty much a shallow fork. It just attempts to pull in the latest changes from upstream without making many changes.

gpSP (Game Boy Advance emulator)

Many changes have been made under the hood to significantly increase performance of gpSP. One of the big changes that led to at least a 5-6% performance improvement was the removal of libco (a library used for cooperative threading). 3DS users especially should be in for a treat with gpSP, but realistically everyone benefits across the board, whether you’re on an ARM or x86 system. We can quite comfortably state you will be hard pressed to find a better and more well performing version of this emulator anywhere else right now.

* Support for generic MIPS (including Dingux)
* Built-in BIOS that ships internally and requires no external files/dependencies
* Removal of libco (5-6% perf or so) and some performance fixes for x86/arm/mips (for perf)
* Fixed the x86 dynarec and the ARM dynarec, that were pretty much broken. Good speedups (or power savings on portable devices)

Immediate roadmap for gpSP: PS2 support (coming soon (TM)), rumble and tilt sensor support, AARch64 dynarec (mid-term goal)

One caveat right now is that these changes have unfortunately caused it to no longer work on PS Vita. However, the plan is to fix this soon, and for this to be only a temporary thing.

Update policy for this core
Hard fork, self-maintained.

Picodrive (Sega Mega Drive/Master System/Sega CD/32X emulator)

This has received many speed improvements and enhancements courtesy of irixx, 32X support has seen plenty of improvements, and the results speak for themselves. Works wonderfully on OpenDingux-based devices and PS Vita, some of the lower end systems out there.

A separate blog article might be written about all the changes soon.

Update policy for this core

Hard fork, self-maintained

FCEUmm (Nintendo Entertainment System emulator)

Some important new updates from New Rising Sun that adds more mapper support (PR notes are his):

Add features to mapper 332/BMC-WS:

  • implement additional outer bank bit to support multicarts twice as large as WS-1001 (“Super 40-in-1”)
  • implement CNROM-128/-256 mode using two or four inner banks switched via CNROM-like latch at CPU $8000-$FFFF
  • implement solder pad/DIP switch change via soft-reset, resulting in different multicart menus
  • soft-resetting resets to menu as on real hardware, not just the currently-selected game.
    Tested with the HH-xxx series of multicarts, some of which available on LIBG, as well as the previously-supported WS-1001.
  • I wanted to add a couple of multicart mappers that use the J.Y. Company ASIC and fix a few bugs while doing so. I ended up rewriting the whole thing by porting the code from NintendulatorNRS. This improves the modularity of the code by wrapping the PRG and CHR sync functions with mapper-specific ANDs and ORs, which is necessary for multicart use, and hopefully improves the readability (though I understand that’s a matter of taste). Previous save states had to be invalidated in any case since the previous code only saved four of the eight CHR LSB registers.

    Source organization changes:

    Combine duplicate code from 90.c, bmc13in1jy110.c and sc-127.c into one new file jyasic.c.
    Features/corrections:

    Window text not shown at the beginning of Tiny Toon Adventures 6 (“Baabs is dreaming about becoming an actress”).
    Bad cursor sprite on “Mighty Morphin’ Power Rangers III”‘s title screen.
    Some mapper 295 multicarts not working, such as SC-126.
    Added Adder to ALU (not used by any game).
    Add IRQ mode 3 (writes to CPU address space).
    Save all eight CHR LSB registers (previous code only saved the first four).
    Ignore writes to x800-xFFF except 5800 and C800, needed for Final Fight 3 on SC-128.
    Allow DIP setting to be read from all possible locations: 5000, 5400, 5C00, necessary for a few multicarts.
    Adds the following mappers: 282, 358, 386, 387, 388, 397, 421.

    Remaining issues:

    IRQ timing is not as perfect as the NintendulatorNRS code from which it was taken, which had been verified against real hardware. This is due to there not being a true “PPU Read Handler” in the mapper interface (as far as I can see), and PA12 timing in the core PPU emulation being not accurate enough to replace the “clock IRQ eight times per horizontal blanking” solution that I kept from the previous code.
    There remains one J.Y. ASIC-using mapper: 394, which mounts both the J.Y. ASIC and an MMC3 clone and uses extra register bits to switch between the two. That is a bit difficult to code and will be for another day.
    Successfully tested with:
    Mapper 35:
    Warioland II (JY039)

    Mapper 281:
    (晶太 JY-052) 1996 Super HiK 4-in-1 – 新系列阿拉丁雙Ⅲ组合卡
    (晶太 JY-052) 1997 Super HiK 4-in-1 – 新系列叢林泰山組合卡
    (晶太 JY-053) 1996 Super HiK 4-in-1 – 新系列獅子王超强组合卡
    (晶太 JY-054) 1996 Super HiK 4-in-1 – 新系列快打旋風Ⅱ组合卡
    (晶太 JY-054) 1997 Super HiK 4-in-1 – 超級新系列大金鋼4代組合卡
    (晶太 JY-055) 1996 Super HiK 4-in-1 – 新系列阿拉丁Ⅲ.致命武器組合卡
    (晶太 JY-066) 1996 Power Rangers HiK 4-in-1 – 新系列金鋼戰士專輯組合卡
    (晶太 JY-066) 1997 Power Rangers HiK 4-in-1 – 新系列金鋼戰士專輯組合卡
    (晶太 JY-068) 1996 Super HiK 3-in-1 – 新系列阿拉丁Ⅲ.激龜Ⅲ組合強卡
    (晶太 JY-080) 1996 Super HiK 3-in-1 – 新系列中國兔寶寶全輯組合卡
    (晶太 JY-088) 1996 Super HiK 5-in-1

    Mapper 282:
    (晶太 JY-062) 1996 Super Mortal Kombat Ⅲ Series – 新系列真人快打三代組合卡 18-in-1
    (晶太 JY-064) 1996 新超強 18-in-1 阿拉丁組合系列卡 – Super Aladdin Ⅲ Series Card
    (晶太 JY-069) 1996 Super HiK 4-in-1 – 新系列忍Ⅲ.沼澤怪獸組合卡
    (晶太 JY-070) 1996 Super HiK 3-in-1 – 新系列眞人Ⅲ.兔寶寶組合強卡
    (晶太 JY-071) 1996 Super HiK 3-in-1 – 新系列眞人Ⅲ.小新2.明王組合卡
    (晶太 JY-079) 1996 Super HiK 3-in-1 – 新系列眞人Ⅲ.金鋼4.蜜蜂組合卡
    (晶太 JY-084) 1996 Photo Gun 9-in-1
    (晶太 JY-098) 1997 Super HiK 6-in-1
    (晶太 JY-101) 1997 Super HiK 18-in-1
    (晶太 JY-105) 1997 Super HiK 21-in-1
    (晶太 JY-114) 1998 Super HiK 5-in-1
    (晶太 SC-128) Super 25-in-1 Final Fight
    (晶太 SC-130) Super Photo-Gun 13-in-1

    Mapper 295:
    (晶太 JY-010) Super Ball Series 18-in-1
    (晶太 JY-014B) 1996 Soccer 7-in-1 – 足球專輯 (rev0)
    (晶太 JY-014B) 1996 Soccer 7-in-1 – 足球專輯 (rev1)
    (晶太 JY-050) 1997 Super HiK 8-in-1 (rev1)
    (晶太 JY-095) 1997 Super HiK 4-in-1
    (晶太 JY-096) 1997 Super HiK 7-in-1
    (晶太 JY-097) 1997 Super HiK 8-in-1
    (晶太 JY-099) 1997 Super HiK 4-in-1
    (晶太 JY-100) 1997 Super HiK 5-in-1
    (晶太 JY-109) 1997 Super 9-in-1
    (晶太 JY-110) 1997 Super 13-in-1
    (晶太 SC-126) 方塊專集 HiK Block 14-in-1

    Mappers 90/209/211:
    (晶太 CK-124) Super HiK 8-in-1
    (晶太 JY-118) 1998 Super 3-in-1
    (晶太 JY-120A) 1998 Super 45-in-1
    (晶太 JY-122) 115 超強合卡
    Aladdin – 阿拉丁
    Aladdin III, Popeye II꞉ Travels in Persia
    Contra Spirits (1995)
    Donkey Kong Country 4
    Final Fight 3
    Mickey Mania 7
    Mighty Morphin’ Power Rangers III
    Mighty Morphin’ Power Rangers IV꞉ The Movie
    Mike Tyson’s Punch-Out!! (JY021)
    MK3 – Special 56 Peoples
    Mortal Kombat 2-in-1
    Mortal Kombat II Special
    Super Aladdin꞉ The Return of Jafar
    Super Mario & Sonik 2 (rev1)
    Super Mario World
    Super Mortal Kombat 2-in-1
    Tiny Toon Adventures 6
    中國兔寶寶 Rabbit
    真 Samurai Spirits 2꞉ 覇王丸地獄変
    鉄拳 – Tekken 2

    mappers 358/386/387/388/397/421:
    (晶太 JY-016) 1997 Super Game 7-in-1
    (晶太 JY-056) 1996 Super HiK 4-in-1 – 新系列真人快打Ⅱ特別版 (rev0)
    (晶太 JY-056) 1996 Super HiK 5-in-1 – 新系列眞人快打Ⅲ56人特別版 (rev1)
    (晶太 JY-082) 1996 Soccer 6-in-1
    (晶太 JY-087) 1996 Super HiK 4-in-1
    (晶太 JY-089) 1996 Super HiK 4-in-1
    (晶太 JY-090) 1996 Super HiK 5-in-1
    (晶太 JY-093) 1996 Super HiK 4-in-1
    (晶太 JY-094) 1996 Super HiK 4-in-1
    (晶太 JY-113) 1998 Super HiK 5-in-1
    (晶太 JY-117) 1998 Super HiK 6-in-1
    (晶太 SC-129) 98 街頭快打格鬥 15-in-1

    bSNES HD Beta/bsnes mainline available for iOS/tvOS/ARM64 Mac

    Widescreen SNES emulation on your iDevice and ARM Mac is now available!

    Update policy for this core
    Upstream, gets built straight from DerKoun’s upstream repository.

    Genesis Plus GX Wide available for iOS/tvoS/ARM64 Mac

    Another welcome addition to the Apple fold – the widescreen-enhanced version of Genesis Plus GX! For more on that, read this article here.

    Update policy for this core
    Hard fork, as it makes many sweeping changes to Genesis Plus GX to achieve widescreen support, which would make resyncing with upstream very hard to do.

    Other cores which have received updates

    Here is a list of other cores that have received updates, but for which we cannot post any changelogs due to lack of time. We might go into some more of the changes here later on.

    Beetle PSX
    FB Neo
    MAME 2003 Plus
    VICE
    P-UAE (Amiga emulator)

paraLLEl N64 – Low-level RDP upscaling is finally here!

ParaLLEl RDP this year has singlehandedly caused a breakthrough in N64 emulation. For the first time, the very CPU-intensive accurate Angrylion renderer was lifted from CPU to GPU thanks to the powerful low-level graphics API Vulkan. This combined with a dynarec-powered RSP plugin has made low-level N64 emulation finally possible for the masses at great speeds on modest hardware configurations.

ParaLLEl RDP Upscaling

Jet Force Gemini running with 2x internal upscale
Jet Force Gemini running with 2x internal upscale

It quickly became apparent after launching ParaLLEl RDP that users have grown accustomed to seeing upscaled N64 graphics over the past 20 years. So something rendering at native resolution, while obviously accurate, bit-exact and all, was seen as unpalatable to them. Many users indicated over the past few weeks that upscaling was desired.

Well, now it’s here. ParaLLEl RDP is the world’s first Low-Level RDP renderer capable of upscaling. The graphics output you get is unlike any HLE renderer you’ve ever seen before for the past twenty years, since unlike them, there is full VI emulation (including dithering, divot filtering, and basic edge anti-aliasing). You can upscale in integer steps of the base resolution. When you set resolution upscaling to 2x, you are multiplying the input resolution by 2x. So 256×224 would become 512×448, 4x would be 1024×896, and 8x would be 2048×1792.

Now, here comes the good stuff with LLE RDP emulation. As said before, unlike so many HLE renderers, ParaLLEl RDP fully emulates the RCP’s VI Interface. As part of this interface’s postprocessing routines, it automatically applies an approximation of 8x MSAA (Multi-Sampled Anti-Aliasing) to the image. This means that even though our internal resolution might be 1024×896, this will then be further smoothed out by this aggressive AA postprocessing step.

Super Mario 64 running on ParaLLEl RDP with 2x internal upscale
Super Mario 64 running on ParaLLEl RDP with 2x internal upscale

This results in even games that run at just 2x native resolution looking significantly better than the same resolution running on an HLE RDP renderer. Look for instance at this Mario 64 screenshot here with the game running at 2x internal upscale (512×448).

How to install and set it up

RDP upscaling is available right now on Windows, Linux, and Android. We make no guarantees as to what kind of performance you can expect across these platforms, this is all contingent on your GPU’s Vulkan drivers and its compute power.

Anyway, here is how you can get it.

  • In RetroArch, go to Online Updater.
  • (If you have paraLLEl N64 already installed) – Select ‘Update Installed Cores’. This will update all the cores that you already installed.
  • (If you don’t have paraLLEl N64 installed already) – go to ‘Core Updater’ (older versions of RA) or ‘Core Downloader’ (newer version of RA), and select ‘Nintendo – Nintendo 64 (paraLLEl N64)’.
  • Now start up a game with this core.
  • Go to the Quick Menu and go to ‘Options’. Scroll down the list until you reach ‘GFX Plugin’. Set this to ‘parallel’. Set ‘RSP plugin’ to ‘parallel’ as well.
  • For the changes to take effect, we now need to restart the core. You can either close the game or quit RetroArch and start the game up again.

In order to upscale, you need to first set the Upscaling factor. By default, it is set to 1x (native resolution). Setting it to 2x/4x/8x then restarting the core makes the upscaling take effect.

Explanation of core options

A few new core option features have been added. We’ll briefly explain what they do and how you can go about using them.

  • (ParaLLEl-RDP) Upscaling factor (Restart)

Available options: 1x, 2x, 4x, 8x

The upscaling factor for the internal resolution. 1x is default and is the native resolution. 2x, 4x, and 8x are all possible. NOTE: It bears noting that 8x requires at least 5GB/6GB VRAM on your GPU. System requirements are steep for 8x and we generally don’t recommend anything less than a 1080 Ti or better for this. Your mileage may vary, just be forewarned. 2x and 4x by comparison are much lighter. Even when upscaling, the rendering is still rendering at full accuracy, and it is still all software rendered on the GPU. 4x upscale means 16x times the work that 1x Angrylion would churn through.

  • (paraLLEl-RDP) Downsampling

Available options: Disabled, 1/2, 1/4, 1/8

Also known as SSAA, this works pretty similar to the SSAA downscaling feature in Beetle PSX HW’s Vulkan renderer. The idea is that you internally upscale at a higher resolution, then set this option from ‘Disabled’ to any of the other values. What happens from there is that this internal higher resolution image is then downscaled to either half its size, one quarter of its size, or one eight of its size. This gives you a very smoothed out anti-aliased picture that for all intents and purposes still outputs at 240p/240i. From there, you can apply some frontend shaders on top to create a very nice and compelling look that still looks better than native resolution but is also still very faithful to it.

So, if you would want 4x resolution upscaling with 4x SSAA, you’d set ‘Downsample’ to ‘1/2’. With 4x upscale, and 1/4 downsample, you get 240p output with 16x SSAA, which looks great with CRT shaders.

  • (paraLLEl-RDP) Use native texture LOD when upscaling

This option is disabled by default.

We have so far only found one game that absolutely required this to be turned on for gameplay purposes. If you don’t have this enabled, the Princess-to-Bowser painting transition in Mario 64 is not there and instead you just see Bowser in the portrait from a far distance. There might be further improvements later to attempt to automatically detect these cases.

Most N64 games didn’t use mipmapping, but the ones that do on average benefit from this setting being off – you get higher quality LOD textures instead of a lower-quality LOD texture eventually making way for a more detailed one as you look closer. However, turning this option on could also be desirable depending on whether you favor accurate looking graphics or a facsimile of how things used to look.

  • (paraLLEl-RDP) Use native resolution for TEX_RECT

This option is on by default.

2D elements such as sprites are usually rendered with TEX_RECT commands, and trying to upscale them inevitably leads to ugly “seams” in the picture. This option forces native resolution rendering for such sprites.

 

Managing expectations

It’s important that people understand what the focus of this renderer is. There is no intent to have yet another enhancement-focused renderer here. This is the closest there has ever been to date of a full software rendered reimplementation of Angrylion on the GPU with additional niceties like upscaling. The renderer guarantees bit-exactness, what you see is what you would get on a real N64, no exceptions.

With a HLE renderer, the scene is rendered using either OpenGL or Vulkan rasterization rules. Here, neither is done – the exact rasterization steps of the RDP are followed instead, there are no API calls to GL to draw triangles here or there. So how is this done? Through compute shaders. It’s been established that you cannot correctly emulate the RDP’s rasterization rules by just simply mapping it to OpenGL. This is why previous attempts like z64gl fell flat after an initial promising start.

So the value proposition here for upscaling with ParaLLEl RDP is quite compelling – you get upscaling with the most accurate renderer this side of Angrylion. It runs well thanks to Vulkan, you can upscale all the way to 8x (which is an insane workload for a GPU done this way). And purists get the added satisfaction of seeing for the first time upscaled N64 graphics using the N64’s entire postprocessing pipeline finally in action courtesy of the VI Interface. You get nice dither filtering that smooths out really well at higher resolutions and can really fake the illusion of higher bit depth. HLE renderers have a lot of trouble with the kind of depth cuing and dithering being applied on the geometry, but ParaLLEl RDP does this effortlessly. This causes the upscaled graphics to look less sterile, whereas with traditional GL/Vulkan rasterization, you’d just see the same repeated textures everywhere with the same basic opacity everywhere. Here, we get dithering and divot filtering creating additional noise to the image leading to an overall richer picture.

So basically, the aim here is actually emulating the RDP and RSP. The focus is not on getting the majority of commercial games to just run and simulating the output they would generate through higher level API calls.

Won’t be done – where HLE wins

Therefore, the following requests will not be pursued at least in the near future:

* Widescreen rendering – Can be done through game patches (ASM patches applied directly to the ROM, or bps/ups patches or something similar). Has to be done on a per-game basis, with HLE there is some way to modify the view frustum and viewport dimensions to do this but it almost never works right due to the way the game occludes geometry and objects based on your view distance, so game patches implementing widescreen and DOF/draw distance enhancements would always be preferable.

So, in short, yes, you can do this with ParaLLEl RDP too, just with per-game specific patches. Don’t expect a core option that you can just toggle on or off.
* Rendering framebuffer effects at higher resolution – not really possible with LLE, don’t see much payoff to it either. Super-sampled framebuffer effects might be possible in theory.
* Texture resolution packs – Again, no. The nature of an LLE renderer is right there in the name, Low-Level. While the RDP is processing streams of data (fed to it by the RSP), there is barely any notion whatsoever of a ‘texture’ – it only sees TMEM uploads and tile descriptors which point to raw bytes. With High Level emulation, you have a higher abstraction level where you can ‘hook’ into the parts where you think a texture upload might be going on so you can replace it on the fly. Anyway, those looking for something like that are really at the wrong address with ParaLLEl RDP anyway. ParaLLEl RDP is about making authentic N64 rendering look as good as possible without resorting to replacing original assets or anything bootleg like that.
* Z-fighting/subpixel precision: In some games, there is some slight Z-fighting in the distance that you might see which HLE renderers typically don’t have. Again, this is because this is accurate RDP emulation. Z-fighting is a thing. The RDP only has 18-bit UNORM of depth precision with 10 bits of fractional precision during interpolation, and compression on top of that to squeeze it down to 14 bits. A HLE emulator can render at 24+ bits depth. Compounding this, because the RSP is Low-level, it’s sending 16-bit fixed point vertex coordinates to the RDP for rendering. A typical HLE renderer and HLE RSP would just determine that we are about to draw some 3D geometry and then just turn it into float values so that there is a higher level of precision when it comes to vertex positioning. If you recall, the PlayStation1’s GTE also did not deal with vertex coordinates in floats but in fixed point. There, we had to go to the effort of doing PGXP in order to convert it to float. I really doubt there is any interest to contemplate this at this point. Best to let sleeping dogs lie.
* Raw performance. HLE uses the hardware rasterization and texture units of the GPU which is far more efficient than software, but of course, it is far less accurate than software rendering.

Where LLE wins

Conversely, there are parts where LLE wins over HLE, and where HLE can’t really go –

* HLE tends to struggle with decals, depth bias doesn’t really emulate the RDP’s depth bias scheme at all since RDP depth bias is a double sided test. Depth bias is also notorious for behaving differently on different GPUs.
* Correct dithering. A HLE renderer still has to work with a fixed function blending pipeline. A software rendered rasterizer like ParaLLEl RDP does not have to work with any pre-existing graphics API setup, it implements its own rasterizer and outputs that to the screen through compute shading. Correct dither means applying dither after blending, among other things, which is not something you can generally do [with HLE]. It generally looks somewhat tacky to do dithering in OpenGL. You need 8-bit input and do blending in 8-bit but dither + quantization at the end, which you can’t do in fixed function blending.
* The entire VI postprocessing pipeline. Again, it bears repeating that not only is the RDP graphics being upscaled, so is the VI filtering. VI filtering got a bad rep on the N64 because this overaggressive quasi-8x MSAA would tend to make the already low-resolution images look even blurrier. But at higher resolutions as you can see here, it can really shine. You need programmable blending to emulate the VI’s coverage, and this just is not practical with OpenGL and/or the current HLE renderers out there. The VI has a quite ingenious filter that distributes the dither noise where it reconstructs more color depth. So not only are we getting post-processing AA courtesy of the RDP, we’re also getting more color depth.
* What You See Is What You Get. This renderer is the hypothetical what-if scenario of how an N64 Pro unit would look like that could pump out insane resolutions while still having the very same hardware. Notice that the RDP/VI implementations in ParaLLEl RDP have NOT been enhanced in any way. The only real change was modifying the rasterizer to test fractional pixel coordinates as well.
* Full accuracy with CPU readbacks. CPU can freely read and write on top of RDP rendered data, and we can easily deal with it without extra hacks.

Known issues

  • The deinterlacing process for interlaced video modes is still rather poor (just like Angrylion), basic bob and weave setup. There are plans to come up with a completely new system.
  • Mario Tennis glitches out a bit with upscaling for some reason, there might be subtle bugs in the implementation that only manifest on that game. This seems to not happen on Nvidia Windows drivers though.

Screenshots

The screenshots below here show ParaLLEl RDP running at its maximum internal input resolution, 8x the original native image. This means that when your game is running at say 256×224, it would be running at 2048×1792. But if your game is running at say 640×480 (some interlaced games actually set the resolution that high, Indiana Jones IIRC), then we’d be looking at 5120×3840. That’s bigger than 4K! Then bear in mind that on top of that you’re going to get the VI’s 8x MSAA on top of that, and you can probably begin to imagine just how demanding this is on your GPU given that it’s trying to run a custom software rasterizer on hardware. Suffice it to say, the demands for 2x and 4x will probably not be too steep, but if you’re thinking of using 8x, you better bring some serious GPU horsepower. You’ll need at least 5-6GB of VRAM for 8x internal resolution for starters.

Anyway, without much further ado, here are some glorious screenshots. GoldenEye 007 now looks dangerously close to the upscaled bullshot images on the back of the boxart!

GoldenEye 007 running with ParaLLEl RDP at 8x internal upscale
GoldenEye 007 running with ParaLLEl RDP at 8x internal upscale
Super Mario 64 running on ParaLLEl RDP with 8x internal upscale
Super Mario 64 running on ParaLLEl RDP with 8x internal upscale
Star Fox 64 running on ParaLLEl RDP with 8x internal upscale
Star Fox 64 running on ParaLLEl RDP with 8x internal upscale
Perfect Dark running on ParaLLEl RDP with 8x internal upscale in high-res mode
Perfect Dark running on ParaLLEl RDP with 8x internal upscale in high-res mode
World Driver Championship running on ParaLLEl RDP with 8x internal upscale
World Driver Championship running on ParaLLEl RDP with 8x internal upscale

Videos

Body Harvest

Perfect Dark

Legend of Zelda: Ocarina of Time

Super Mario 64

Coming to Mupen64Plus Next soon

ParaLLEl RDP will also be making its way into the upcoming new version of Mupen64Plus Next as well. Expect increased compatibility over ParaLLEl N64 (especially on Android) and potentially better performance in many games.

Future blog posts

There might eventually be some future blog post by Themaister going into more technical detail on the inner workings of ParaLLEl RDP. I will also probably release a performance test-focused blog post later testing a variety of different GPUs and how far we can take them as far as upscaling is concerned.

I can already tell you to neuter your expectations with regards to Android/mobile GPUs. I tested ParaLLEl RDP with 2x upscaling on a Samsung Galaxy S10+ and performance was about 36fps, this is with vsync off. With 1x native resolution I manage to get on average 64 to 70fps with the same games. So obviously mobile GPUs still have a lot of catching up to do with their discrete big brothers on the desktop.

At least it will make for a nice GPU benchmark for mobile hardware until we eventually crack fullspeed with 2x native!