Introducing Vulkan PSX renderer for Beetle/Mednafen PSX

I needed a break from paraLLEl RDP, and I wanted to give PSX a shot to have an excuse to write a higher level Vulkan renderer backend. The renderer backends in Beetle PSX are quite well abstracted away, so plugging in my own renderer was a trivial task.

The original PlayStation is certainly a massively simpler architecture than N64, especially in the graphics department. After one evening of studying the Rustation renderer by simias and PSX GPU docs, I had a decent idea of how it worked.

Many hardware features of the N64 are non-existent:

  • Perspective correctness (no W from GTE)
  • Texture filtering
  • Sub-pixel precision on vertices (wobbly polygons, wee)
  • Mipmapping
  • No programmable texture cache
  • Depth buffering
  • Complex combiners

My goal was to create a very accurate HW renderer which supports internal upscaling. Making anything at native res for PSX is a waste of time as software renderers are basically perfected at this point in Mednafen and more than fast enough due to the simplicity.

Another goal was to improve my experience with 2D heavy games like the Square RPGs which heavily mix 2D elements with 3D. I always had issues with upscaling plugins back in the day as I always had to accept blocky and ugly 2D in order to get crisp 3D. Simply sampling all textures with bilinear is one approach, but it falls completely flat on PSX. Content was not designed with this in mind at all, and you’ll quickly find that tons of artifacts are created when the bilinear filtering tries to filter outside its designated blocks in VRAM.

The final goal is to do all of this without ugly hacks, game specific workarounds or otherwise shitty code. It was excusable in a time where graphics APIs could not cleanly express what emulation authors wanted to express, but now we can. Development of this renderer was a fairly smooth ride, mostly done in spare time over ~2 months.

Credits

This renderer would not exist without the excellent Mednafen emulator and Rustation GL renderer.

Tested hardware/drivers

  • nVidia Linux/Windows 375.xx+ (works fully)
  • AMDGPU-PRO 16.30 Linux (works fully)
  • Mesa Intel (Ivy Bridge half-way working, Broadwell+, fully working, you’ll want to build from Git to get some important bug fixes which were uncovered by this renderer :D)
  • Mesa Radeon RADV (fully working, you’ll want to build from Git to get support for input attachments)

– But, but, I don’t have a Vulkan-capable GPU

Well, read on anyways, some of this work will benefit the GL renderer as well.

– But, but, you’re stupid, you should do this in GL 1.1 and hack it until it works

No 🙂

– Fine, but clearly this is just for shits and giggles

Doing it for the lulz is always a valid reason.

Source

The source will be merged upstream to Github immediately.

PSX GPU overview

The PSX GPU is a very simple and dumb triangle rasterizer with some tricks.

VRAM

The PSX has a 1024×512 VRAM at 16bpp, giving us 1MB of VRAM to work with. Interestingly enough, this VRAM is actually organized as a 2D grid, and not a flat array with width/height/stride. This certainly simplifies things a lot as we can now represent the VRAM as a texture instead of shuffling data in and out of SSBOs.

Unlike N64, the CPU doesn’t have direct access to this VRAM (phew), so access is mediated by various commands.

Textures

The PSX can sample textures at 4-bit palettes, 8-bit palettes or straight ABGR1555, very neat and simple. Texture coordinates are confined to a texture window, which is basically an elaborate way to implement texture repeats. Textures are sampled directly from VRAM, but there is a small texture cache. For purposes of emulation, this cache is ignored (except for one particular case which we’ll get to …).

An annoying feature is that the color “0x0000” on PSX is always transparent, so all fragment shaders which sample textures might have to discard, another reason to be careful with bilinear.

Shading options

PSX just has 3 shading options, which makes our life very simple:

  • Interpolate color from vertices
  • Interpolate UV and sample nearest neighbor
  • Sample texture multiplied by interpolated color (gouraud shading)

It is practical to not use uber-shading approaches here.

Semi-transparency

PSX has a weird way of dealing with transparency. There is no real alpha channel to speak of, we only have one bit, so what PSX does is set a constant transparency formula, (A + B, 0.5A + 0.5B, B – A, or 0.25A + B). If the high-bit of a texture color is set, transparency is enabled, if not, the fragment is considered opaque. Semi-transparent color-only primitives are simply always transparent.

Mask-bit

Possibly the most difficult feature of the PSX GPU is the mask-bit. The alpha bit in VRAM is considered a “read-only” bit if mask bit testing is enabled and the read-only bit is set. This affects rendering primitives as well as copies from CPU and VRAM-to-VRAM blits.

Especially mask-bit emulation + semi-transparency creates a really difficult blending scenario which I haven’t found a way to do correctly with fixed function (but that won’t stop us in Vulkan). Correctly emulating mask-bit lets us render Silent Hill correctly. The trees have transparent quads around them without it.

silent-hill-usa-161203-133827

Intersecting VRAM blits

It is possible, and apparently, well defined on PSX to blit from one part of VRAM to another part where the rects intersect. Reading the Mednafen/Beetle software implementation, we need to kind of emulate the texture cache. Fortunately, this was very doable with compute shaders, although not very efficient.

Implementation details

Feature – Adaptive smoothing

As mentioned, I prefer smooth 2D with crisp-looking 3D. I devised a scheme to do this in post.

The basic idea is to look at our 4x or 8x scaled image, we then mip-map that down to 1x with a box filter. While mip-mapping, we analyze the variance within the 4×4 or 8×8 block and stick that in alpha. The assumption here is that if we have nearest-neighbor scaled 2D elements, they typically have a 1:1 pixel correspondency in native resolution, and hence, the variance within the block will be 0. With 3D elements, there will be some kind of variance, either by values which were shaded slightly differently, or more dramatically, a geometry edge. We now compute an R8_UNORM “bias-mask” texture at 1x scale, which is 0.0 where we estimate we have 3D elements, and 1.0 where we estimate we have 2D. To avoid sharp transitions in LOD, the bias-mask is then blurred slightly with a 3×3 gaussian kernel (might be a better non-linear filter here for all I know).

On final scanout we simply sample the bias-mask, multiply that by log2(scale) and use that as an explicit lod in textureLod() with trilinear sampling, and magically 2D elements look smooth without compromising the 3D sharpness. Sure, it’s not perfect, but I’m quite happy with the result.

Consider this scene from FF IX. While some will prefer this look (it’s toggleable), I’m not a big fan of blocky nearest-neighbor backgrounds together with high-res models.

nearest

With adaptive smoothing, we can smooth out the background and speech bubble back to native resolution where they belong. You may notice that the shadow under Vivi is sharp, because the shadow which modulates the background is not 1:1. This is the downside of doing it in post certainly, but it’s hard to notice unless you’re really looking.

adaptive

The bias mask texture looks like this after the blur:

bias

Potential further ideas here would be to use the bias-mask as a lerp between xBR-style upscalers if we wanted to actually make the GPU not fall asleep.

There is nothing inherently Vulkan specific about this method, so it will possibly arrive in the GL backend at some point as well. It can probably be used with N64 as well.

Obviously, for 24-bpp display modes (used for FMVs), the output is always in native resolution.

GPU dump player

Just like the N64 RDP, having an offline dump player for debugging, playback and analysis is invaluable, so the first thing I did was to create a basic dump format which captures PSX GPU commands and plays them back. This is also nice for benchmarking as any half-capable GPU will be bottlenecked on CPU.

PGXP support

Supporting PGXP for sub-pixel precision and perspective correctness was trivial as all the work happens outside the renderer abstraction to begin with. I just had to pass down W to the vertex shader.

Mask bit emulation

Mask bit emulation without transparency is quite trivial. When rendering, we just use fixed function blending, src = INV_DST_ALPHA, dst = DST_ALPHA.

With semi-transparency things get weird. To solve this, I made use of Vulkan’s subpass self-dependency feature which allows us to read the pixel of the framebuffer which enables programmable blending. Now, mask-bit emulation becomes trivial. This feature is a standard way of doing the equivalent of GL_ARB_texture_barrier, GL_EXT_framebuffer_fetch and all the million extensions which implement the same thing in GL/GLES. For mask-bit in copies and blits, this is done in compute, so implementing mask bit here is trivial.

Copies/Blits

Copies in VRAM are all implemented in compute. The main reason for this is mask bit emulation becomes trivial, plus that we can now overlap GPU execution of rendering and blits if they don’t intersect each other in VRAM. It is also much easier to batch up these blits with compute, whereas doing it in fragment adds some restrictions as we would need to potentially create many tiny render passes to blit small regions one by one, and we need blending to implement masked blits, which places some restrictions on which formats we can use.

When blitting blocks which came from rendered output, the implementation blits the high-res data instead. This improves visual quality in many cases.

Being careful here made the FF8 battle swirl work for me, finally. I’ve never seen that work properly in HW plugins before 🙂 PGXP is enabled here with perspective correctness as well.

Intersecting VRAM blits

For intersected VRAM blits, I dispatch one 128-thread compute group which basically implements the C++ variant as-is. It emulates the texture cache by reading in data from VRAM into registers, barrier(), then writing out. This then loops through the blit region. It’s fairly rare, but this case does trigger in surprising places, so I figured I better do it as accurate as I could.

The Framebuffer Atlas – Hazard tracking

The entire VRAM is one shared texture where we do all our rendering, scanout, blits, texture sampling and so on. I needed a system where I could track hazards like sampling from a VRAM region that has been rendered to, and deal with changing resolutions where crazy scenarios like CPU blitting raw pixels over a framebuffer region which was rendered in high resolution. Vulkan allows us to go crazy with simultaneous use of textures (VK_IMAGE_LAYOUT_GENERAL is a must here), as long as we deal with sync ourselves.

In order to support high-res rendering and sampling from textures, I needed to deal with two variants of VRAM:

  • One RGBA8_UNORM texture at 1024x512xSCALE, rationale for RGBA8 is mandated support for image load/store, it has alpha (for mask bit) and good enough bit-depth. This texture also has log2(scale) + 1 mip-levels, so we can do the mip-mapping step of adaptive smoothing in place.
  • One R32_UINT texture at 1024×512. I actually wanted R16_UINT, but this is not mandated for image load/store. Here we store the “raw” bit pattern of VRAM, which makes paletted texture reads way cheaper than having to pack/unpack from UNORM.

I split the VRAM into 8×8 blocks. All hazards and dependencies are tracked at this level. I chose 8×8 simply because it fits neatly into 64 threads on blits (wavefront size on AMD), and the smallest texture window is 8 pixels large, nice little coincidence 🙂

Each block has 2 bits allocated to it to track domain ownership:

  • Block is only valid in UNSCALED domain.
  • Block is valid in both UNSCALED and SCALED, but prefer SCALED.
  • Block is valid in both UNSCALED and SCALED, but prefer UNSCALED.
  • Block only valid in SCALED domain.

Whenever I need to read or write VRAM in a particular domain, I need to check the atlas to see if the domain is out-of-sync. If it is, I will inject compute workgroups which “resolve” one domain to the other.

If the access is a “write” access, the block will be set to “UNSCALED only” or “SCALED only” so that if anyone tries to access the block in a different domain, they will have to resolve the block first.

To resolve SCALED to UNSCALED, a simple box-filter is used. In effect, at 4x scale we get 16x supersampling, or 64x SSAA at 8x scale 😀 To resolve UNSCALED to SCALED, nearest neighbor is used. The rationale for doing it this way is that resolving up and down in scale is a stable process. Using nearest neighbor for up-resolves also works excellent with adaptive smoothing since we will get a smoothed version of the block which was resolved from UNSCALED and wasn’t overwritten by SCALED later.

Another cool thing is that I use R32_UINT in the UNSCALED domain, so I actually pack in 10-bit color on resolve. Regular ABGR1555 goes in the lower 16-bits, but the upper 16 bits are used to hide “hidden” precision bits, which are used if the texture is ever read as straight ABGR1555. This greatly improves image quality on any framebuffer effects without having to resolve with dithering and keeps palette reads efficient.

One very interesting bug I had at some point was that the Silent Hill intro screen wouldn’t fade out, it turns out that it samples the previous frame with a feedback factor of ~0.998! We have to be very careful here with rounding, the problem I had was that at slightly darker tones round(unorm_color * 0.998) == unorm_color, so just a smear instead of fade out, especially since the main framebuffer was just 5-bit per color … The fix here was to try to mimic rounding modes closer to what PSX does on 8-bit -> 5-bit, simply chopping away LSBs, now it looked correct. Using 10-bpp resolves improved things a bit more.

Now, even though I can sync between domains, I still need to track write-after-write, read-after-write and write-after-read hazards within a domain. Whenever a GPU command reads or writes from an 8×8 block, I check to see if there are any pipeline stages which have also accessed the pipeline stage in a way which is a hazard. E.g. a write will need a pipeline barrier if there are any readers or writes, but a reader only needs to check if there are writers. If such a hazard is detected, a callback is signalled with which pipeline stages need to participate in a vkCmdPipelineBarrier and which caches need to be flushed/invalidated on the GPU. The bits are then cleared. Overall, I make due with 16 bits per block, which is very compact.

While this scheme is fine for blits and copies and whatnot, renderpasses are handled a bit special, instead of checking the atlas for every primitive, the bounding box of the render pass itself is considered instead. Only when the bounding box of the renderpass increases does it damage the atlas and resolve any hazards which may arise.

Render passes are always done in-order, so hazards between render passes are ignored in the atlas for simplicity. Overall, the performance of this approach turned out to be great, and seems to be very accurate for the content I’ve tested.

The atlas implementation is API agnostic, so hopefully this should fix some bugs in the GL renderer as well if integrated.

Render pass batching

PSX renders one primitive at a time, so it is quite obvious that we need to aggressively batch primitives. There is another side here which is important to consider, for tiled-based renderers on mobile, each render pass has a very significant cost in that beginning/ending the render pass needs to read-in/write-out all memory associated in the render area, which is a quite large drain on performance. PSX games tend to make it difficult for us as clear rects come in, scissor boxes change and we need to batch as aggressively as we can here. Not all games use clear rects, so using loadOp = CLEAR isn’t always an option.

The approach I took is very similar to the Rustation renderer, but with some extra considerations for Vulkan render passes.

As a primitive comes in, it is placed in one or more queues:

  • Opaque non-textured primitives
  • Opaque textured primitives
  • Semi-transparent textured primitives
  • Semi-transparent primitives (including textured) and primitives which use mask bit

The screen space bounding box is computed for this primitive. Using this information we can figure out if the primitive is “scissor invariant”, i.e. the scissor box cannot clip any pixel in the primitive. If this is the case, we can say the primitive belongs to scissor instance -1. If the primitive can be clipped, we assign the primitive a scissor index. The scissor index increases whenever set_draw_area() changes the scissor box. From here, we enter the atlas to see if the union of the scissored bounding box and existing render pass area increases, if it does, we need to check for hazards to avoid any synchronization issues. If this happens, we flush out the render pass, synchronize and start a new render pass.

For clear rects, we similarly expand the render area as needed. If the current render area == clear rect area, this becomes a clear op candidate. If the render pass is later flushed with this particular area, we know we can use loadOp = CLEAR and save lots of readbacks on tiled GPUs, yay.

When the render pass is flushed, we render out our queues in a particular order. While the PSX GPU does not have depth buffers, it doesn’t mean we cannot use depth buffering ourselves to sort primitives in a more favorable order.

Opaque primitives are sorted by scissor index, and then front-to-back. These are rendered first. Then, we consider the semi-transparent textured primitives, these are conditionally semi-transparent. Just like Rustation, we render the primitives as if they were opaque, and discard the fragments if they are indeed opaque. If they are opaque, we end up writing the primitives Z to the depth buffer, serving as a mask (depth test = LESS) when we later redraw these primitives again a second time.

Now that we have sorted out the opaque pixels, we render the semi-transparent primitives in-order, batching up as many primitives as we can depending on the VkPipeline they need to use. While some crazy reordering can be done here if primitives don’t overlap, I doubt it’s worth it.

Primitives which use mask bit and semi-transparency are always drawn alone, because we need to perform a by-region vkCmdPipelineBarrier(COLOR_ATTACHMENT -> INPUT_ATTACHMENT) to safely read the framebuffer. This is quite expensive on IMR GPUs, but performance is just fine in the prime example of this PSX feature, which is Silent Hill. On tile based GPUs, this is basically free though, so that will be interesting to test in the future. 🙂

An important case where having a tight bounding box on our draw area is the MGS codec, generated from a frame trace in rsx-player:

The “bloom” effect is done by rendering the codec text to the lower left, then blend it with offsets on top, effectively creating a gauss kernel (!?) If we used the draw rect naively as the bounding box, we would create 13-15 render passes just to draw this thing as the hazard tracker would think that we rendered to a framebuffer while also trying to sample from it at the same time, one render pass for each blend step.

Line rendering

Line rendering is always a PITA. PSX has a very particular rasterization pattern which games sometimes rely on to draw primitives correctly, you may have noticed the one pixel that was wrong in the video above … ye, it’s using lines, go figure. The current implementation generates a quad which tries its best to approximate wide lines to match the rasterization pattern of PSX, but it’s not quite there yet.

Vulkan higher level API

This time around I wanted an excuse to create a higher level Vulkan API. The Vulkan backend in the RDP is a bit too explicit in hindsight and adding things like VI filtering would require a ton of boilerplate crap to deal with render passes etc … so, this time around I wanted to do it better.

I’m quite happy with the API as a standalone renderer API and I hope to reuse this in the RDP and other side projects when I get back to that.

Reusable PSX renderer implementation

The renderer exposes a C++ API which closely matches the PSX GPU. It should be fairly straight forward to reuse in any other PSX emulator or maybe even used as a renderer for a retro-themed game which tries to mimic the look and feel of early 3D games.

Performance

As you can expect, performance is good. The better desktop GPUs easily render this stuff at 8K resolution if you’re crazy enough to try that. In more modest resolutions, 1000++ FPS is easy, you’re going to be CPU bound in the emulator anyways, might as well crank it as high as it’ll go. The atlas hazard tracking doesn’t seem to appear in my profiles, so I guess it’s fast enough.

Interestingly enough, I was worried that VK_IMAGE_LAYOUT_GENERAL would decimate performance on AMD, but it seems just fine, guess I’m not bandwidth bound. 🙂

Enabling this renderer in Beetle

Make sure you enable the Vulkan backend in RetroArch. Beetle will now try multiple backends until one of them succeeds, the final fallback is software.

Bugs

While I haven’t tested every game there is, I think it’s quite solid already, in far better shape than paraLLEl RDP is at least. The bugs I know of so far are all minor visual glitches which are likely due to either upscaling or slightly off rasterization rules.

Source code repository

The source code repository to Mednafen/Beetle PSX can be found here –

https://github.com/libretro/beetle-psx-libretro

 

We are now on Patreon!

unnamed

The Libretro Project (comprised of Libretro, Lakka, and RetroArch) is now on Patreon! We hope this Patreon will enable us to accelerate development and be able to serve users in lots of benevolent ways!

Visit us here: https://www.patreon.com/libretro

This Patreon covers the Libretro, Lakka, RetroArch projects. And another, soon to be disclosed project as well.

Right now we are at $230 as of this minute. We thank every Patron so far that has helped us get to this stage in such short time, suffice to say you won’t be let down! Let’s go over some of the goals as they stand!

$150 – Bounty for core work every month! Reached!

Already the $150 goal has been reached which will allow us to place bounties for core work to be done! We allocate a total of $50 / month that will go towards bounties.

$200 – ProjectFuture Greenlight! Reached!

I will be revealing soon what this project is about. Let’s just state it’s going to be an even bigger and more expansive project than RetroArch has been so far, and it’s one of the main reasons why we finally went ‘why not?’ with regards to the Patreon. Stay tuned!

This is going to take months and months of work, and will take other considerable resources in order for us to be able to see it to completion, and it’s definitely one of those ‘flying very close to the sun’-type endeavors, but as with everything with this project, ‘dreaming big’ and ‘foolhardy’ are comfortable bedfellows.

$400 – Netplay/matchmaking server!

We want RetroArch users to be able to play online multiplayer games with each other through the RetroArch interface. We are going to allow for PSN/XBLA-like features, except free of charge! The prospect of true crossplatform free netplay from an easy and console UI-style interface is soon to be within reach once we hit this target!

The aim is that every user will be able to quickly and easily setup a netplay game from within RetroArch without the need of a keyboard/mouse! We want console-style netplay ease of use !

$500 – Stability checks, Quality Assurance, etc!.

It’s no secret that for years we have relied on volunteer work in order to get where we are. This entire project entails a maddening amount of work that we have to put in on a daily basis to keep the entire show up and running, and the amount of work keeps growing every time we add another platform port or add a new core.

Once we hit our $500 target, we are going to be paying a couple of developers whom have been loyal towards the project to keep tabs and checkups on RetroArch and various libretro-related cores on a bi-monthly basis. This way, bugs and regressions are easily spotted and we can instantly fix them.

$600 – Development bounties!

We are going to be posting bounties for various remaining issues (whether it be RetroArch or cores), and any developer will be able to fix these issues and claim the reward!

Finally we can start claiming bounties for some of the things that RetroArch and Lakka might still be missing! Good developers don’t grow on trees, neither do contributors. We hope that through these bounties we will be able to significantly improve the software and get to our goals much quicker!

NOTE: The amount of money that will be allocated for this is variable and decided at our own discretion.

A RetroArch retrospective and what to look forward to

I have been following the events on a few libretro related threads in reddit and I find it quite disappointing to see the amount of hatred directed to a project that has done nothing but do what end-users wanted for more than three years now. I also find it terrible (but interesting none the less) that the social media post is more active than the actual highlight.

Disclaimer: this represents my own experiences and my points of view with regard of the situations that surround our project.

Anyway…

A bit of my personal history with the project:

Let’s look back all the way to 2013. RetroArch was still called SSNES, a fairly small commandline program with just a few cores, a launcher that could be used to adjust options and that’s it. No bells or whistles, just a few nice cores implemented under one frontend with a common feature set. I hadn’t really been using emulators since the zSnes days other than a few tries with mobile emulators on my WinCE device.

I just had built a game-room / tv-room. So I setup XBMC and loved it. Soon I started looking for emulators that would work nicely with my setup. I installed Nestopia and some XBMC plugin that acted as a launcher with worked mostly fine. I liked the emulation but I also like the fact that I could set hotkeys to save, load, and it presented nice OSD messages on non-game actions and I could drive the whole thing with my gamepad only. I hoped other emulators would have the same features but I was let down almost instantly. Regardless I pursued my objective with a miriad of tools (Pinnacle Game Profiler, Xpadder, Joy2Key, batch files, Daemon Tools to name a few).

Continue reading “A RetroArch retrospective and what to look forward to”

Mednafen/Beetle PSX – PGXP arrives!

Mednafen/Beetle PSX has made another significant stride forward! iCatButler has contributed a working backport of PGXP for Mednafen/Beetle PSX.

PlayStation rasterization issues

Several issues can be noticed in most PlayStation games’ graphics.

Continue reading “Mednafen/Beetle PSX – PGXP arrives!”

RetroArch Web Player

An Emscripten port of RetroArch has existed for years, but until recently, we never had a good opportunity to launch it in a state we felt comfortable with. Well, until now that is.

Web Player

So what is RetroArch Web Player? It’s a port of RetroArch that runs inside your web browser, powered by emscripten and asm.js. Most modern browsers available today should be compatible. That being said, we strongly recommend you use Google Chrome right now for smooth v-synced gameplay with no audio crackling.

You can check it out right here!

Continue reading “RetroArch Web Player”

paraLLEl RDP and RSP updates (September 2016)

Unfortunately, I haven’t had much time to work on paraLLEl lately, but there is plenty to update about.

paraLLEl RSP – Clang/LLVM RSP recompiler experiment

Looking at CPU profiles, paraLLEl RDP could never really shine, as it was being held back by the CXD4 RSP interpreter, so groundbreaking speedups could not be achieved. With paraLLEl RDP, the RSP was consuming well over 50% CPU time. This was known from the beginning before RDP work even started. After the first RDP pre-alpha release, focus shifted to RSP performance, and that’s what I’ve spent most time on. None of my machines are super-clocked modern i7s, which have been required to run N64 LLE at good speed.

Micro-optimizing the interpreter is a waste of time, I needed a dynarec. However, I have never written a dynarec or JITer for that matter before, and I was not going to spend months (years?) learning how to JIT code well for ~4 architectures (x86, x64, ARMv7, ARMv8). Instead, using libclang/libllvm as my codegen proved to be an interesting hack that worked surprisingly well in practice for this project.

Continue reading “paraLLEl RDP and RSP updates (September 2016)”

RetroArch 1.3.6+ beta release for PlayStation3!


The PlayStation 3 port is back after it was decommissioned for a long time. Consider this a beta version in anticipation of the upcoming 1.3.7 version which will be further fleshed out.

Also check out our concurrent release for the PS Vita:

RetroArch 1.3.6+ beta released for PS Vita (HENkaku-ready)!

Thanks to PSGL, the PlayStation3 driver can use the XMB menu driver using the OpenGL rendering backend. The simplified ribbon should be running properly in the background too.
Thanks to PSGL, the PlayStation3 driver can use the XMB menu driver using the OpenGL rendering backend. The simplified ribbon should be running properly in the background too.

Continue reading “RetroArch 1.3.6+ beta release for PlayStation3!”

RetroArch 1.3.6+ beta release for PS Vita HENkaku!

RetroArch appearing on the PS Vita Live Area homepage.  Screenshot was taken on a PS TV.
RetroArch appearing on the PS Vita Live Area homepage. Screenshot was taken on a PS TV.

Today we are releasing a beta version of RetroArch 1.3.6+ (latest snapshot, release candidate for 1.3.7) for the Playstation3 and PS Vita. Be sure to thank frangarcj for the latter since he went through the trouble of making sure we could make the jump from Rejuvenate to HENKaku in swift order.

Continue reading “RetroArch 1.3.6+ beta release for PS Vita HENkaku!”

RetroArch 1.3.6 released

RetroArch keeps moving forward, being the reference frontend for libretro and all. Here comes version 1.3.6, and once again we have a lot to talk about.

Where to get it

Windows/Mac/iOS (build only)/Nintendo/PlayStation – Get it here.

Android: You can either get it from F-Droid or from Google Play Store.

Linux: Since RetroArch is included now on most mainline Linux distributions’ package management repository systems, we expect their versions to be updated to 1.3.6 shortly.

I will release versions for MacOSX PowerPC (10.5 Leopard) and 32-bit Intel MacOS X 10.6 (Snow Leopard) later on, maybe today or tomorrow.

Usability improvements

Windows Drag and Drop support

Courtesy of mudlord, with the Windows version, you can now drag and drop a ROM (or any other content) onto RetroArch’s window, and it will attempt to load the correct core for it. If there is more than one core available for the type of content you dragged and dropped, it will present you with a slidedown list of cores to select from.

Vastly improved content downloading features

Starting with v1.3.6, RetroArch users can download compatible freeware content, such as the shareware release of Doom, right from the app. This video goes through the steps, which include fetching the core from the online updater, fetching the content from the repository and then launching the core and content we just downloaded.

Menu customization and aesthetics – XMB and MaterialUI

RetroArch v1.3.6 adds support for a number of themes in the default mobile menu, including both bright and dark themes.

There’s also the ability now to set a custom wallpaper in XMB and be able to colorize it with a color gradient. To do this, you go to Settings -> Menu, you set a wallpaper, and from there you have to set ‘Menu Shader Pipeline’ to OFF. You can then choose from one of the color palettes in ‘Color Theme’ in order to shade the background wallpaper, or just select ‘Plain’ in case you don’t want to colorize it.

Undo Load/Save State

Have you ever gotten through a tough part of a game and wanted to make a savestate only to hit the “load state” button instead and have to do it all over again? Or maybe you were practicing a particularly difficult maneuver–for a speedrun, perhaps–and accidentally saved a bad run over your practice point because you hit “save state” instead of “load state”? While savestates are considered one of the great advantages to emulating retro games, they can also lead to these frustrating situations where they wipe out progress instead of saving it, all because of one slip of the finger. RetroArch now has the ability to undo a save- or load-state action through some automatic state-shuffling that happens behind the scenes, so you never have to worry about these situations again.

Undo Load State – Before the ‘current’ state is altered by e.g. a ‘Load Savestate’ operation, ‘current’ is saved in memory and ‘Undo Load State’ restores it; you can also undo this option by using it again, which will make you flip-flop between 2 states.

Undo Save State – If there was a savestate file that was overwritten, this option restores it.

New Features

The main event of RetroArch 1.3.6 is obviously the fact that it makes it possible to run the N64 Vulkan core, paraLLEl. Previous versions of RetroArch will not be able to run this because of the new extensions to libretro Vulkan which we had to push to make this renderer possible.

Vulkan

Async compute core support – ready for ParaLLEl

It was already possible to run Vulkan-enabled libretro cores, but with this release, a few crucial features have been added. Support for queue transfers was added and a context negotiation interface was added.

With this we can now use multiple queues to overlap compute and shading in the frontend level, i.e. asynchronous compute. ParaLLEl would certainly not have been as fast or as effective were it not for this.

ParaLLEl now joins triple-A games like Rise of the Tomb Raider and Doom in heavily relying on Vulkan’s async compute capabilities for maximum efficiency. A test core was also written as a proof of concept for this interface.

If you want to read more about ParaLLEl, we have a compendium blog post for you to digest here.

Supports Windows, Linux, Android equally well now

The previous version already had Vulkan support to varying degrees, but now we feel we are finally at the point where Vulkan driver support in RetroArch is very much mature across most of the supported platforms.

Vulkan should work now on Android, on Windows, and on Linux, provided your GPU has a working Vulkan driver.

On Linux we now support even more video driver context features, such as VK_KHR_display support. This is a platform-agnostic KMS-like backend for Vulkan, which should allow you to run RetroArch with Vulkan without the need of an X11 or Wayland server running.

On Windows and Android, we include Vulkan support now. Vulkan has been tested on Android with NVIDIA Shield Tablet/Console, and both work. Be aware that there are some minuscule things which might not work correctly yet with Vulkan on Android. For instance, orientation changing still doesn’t work. This will be investigated.

Max swapchain images – driving latency even lower with Vulkan and friends

RetroArch already has built up quite a reputation for itself for being able to drive latency down to very low levels. But with new technologies, there is always room for improvement.

Max amount of swapchain images has now been implemented for both the DRM/KMS context driver for OpenGL (usable on Linux) and Vulkan now. What this entails, is that you can programmatically tell your video card to provide you with either triple buffering (3), double buffering (2) or single buffering (1). The previous default with DRM/KMS was 3 (triple buffering), so setting it to 2 could potentially shave off latency by at least 1 frame (as was verified by others). Setting to 1 won’t often get you single buffering with most monitors and drivers due to tearing and they will fall-back to (2) double buffering.

With Vulkan, RetroArch can programmatically infer to the video card what kind of buffering method it likes to be able to use, a vast improvement over the nonexistent options that existed before with OpenGL (from a platform-agnostic perspective).

What Vulkan brings to the table on Android

Vulkan has been tested to run on Android devices that support Vulkan, like Shield Tablet/Console. Latency has always been very bad on Android in the past. With Vulkan, frame times are significantly lower than with OpenGL, and we no longer have to leave Threaded Video enabled by default. Instead, we can turn off Threaded Video and letting RetroArch monitor the refresh rate dynamically, which is the more desirable solution since it allows for less jittery screen updates.

Audio latency can also be driven down significantly now with Vulkan. The current default is 128ms, with Vulkan we can drive it down to 64 or even 32ms.

Couple this with the aforementioned swapchain images support and there are multiple ways to drive latency down on Android now.

OpenGL music visualizer (for FFmpeg-enabled builds)

Versions of RetroArch like the Linux and Windows port happen to feature built-in integrated FFmpeg support, which allows you to watch movies and listen to music from within the confines of RetroArch.

We have added a music visualizer now. The scene is drawn as a cylindrical mesh with FFT (Fast Fourier Transform) heightmap lookups. Different colors are shaded using mid/side channels as well as left/right information for height.

Note that this requires at least GLES3 support (which is available as well through an extension which most GPUs should support by now).

Improvements to cores

TyrQuake

e0ia1Qg

User leileilol contributed a very cool feature to TyrQuake, Quake 64-style RGB colored lighting, except done in software.

To be able to use this feature, you need to create a subdir in your Quake data directory called ‘maps’, and you need to move ‘.lit’ files to this directory. These are the lighting map files that the Tyrquake core will use in order to determine how light should be positioned.

From there on out, you load up the Tyrquake core, you go to Quick Menu -> Options, you enable Colored Lighting. Restart the core and if your files are placed correctly, you should now see the difference.

Be aware that in order to do this, the game renderer shifts to 24bit color RGB rendering, and this in turn makes things significantly slower, although it should still be fairly playable even at higher resolutions.

View the image gallery here.

To download this, go to ‘Add Content’ -> ‘Download Content’. Go to ‘Tyrquake’, and download ‘quake-colored-lighting-pack.zip’. This should extract this zip to your Downloads dir, and inside the Quake directory. From there, you can just load Quake and the colored lighting maps should be found providing the ‘Colored Lighting’ option has been enabled.

SNES9x emulator input lag reduction

A user on our forum, Brunnis, began some investigations into input latency and found that there were significant gains to be made in Super Nintendo emulators by rescheduling when input polling and video blitting are being performed. Based upon these findings and after some pull requests made to SNES9x, SNES9x Next, and FCEUmm, at least 1 to 2 frames of input lag should be shaved off now.

Do read this highly interesting forum thread that led to these improvements here.

News for iOS 10 beta users

There is now a separate version for iOS 10 users. Apple once again changed a lot of things which makes it even more difficult for us to distribute RetroArch the regular way.

Dynamic libraries cores cannot be opened from the Documents directory of the app anymore in iOS 10. They can be opened from the app bundle, as long as they are code-signed. This reverts back to the previous behavior of RetroArch, where the cores need to be in the modules directory of the app bundle.

Go to this directory:

https://github.com/libretro/RetroArch/tree/master/pkg/apple

and open RetroArch_iOS10.xcodeproj inside Xcode.

Note – you will need to manually compile the cores, sign them, and drag them over to the modules directory inside Xcode.

Example –

1. You’d download a core with libretro-super.

A quick example (type this inside the commandline)

git clone https://github.com/libretro/libretro-super.git

./libretro-fetch.sh 2048

./libretro-build.sh 2048

This will compile the 2048 core inside /dist/ios.

2. Move the contents of this directory over to the ‘modules’ directory inside the RetroArch iOS 10 Xcode solution. It should presumably handle signing by itself.

Bugfixes/other miscellanous things

  • Stability/memory leak fixes – We subjected RetroArch to numerous Valgrind/Coverity/Xcode Memory leak checks in order to fix a plethora of memory leaks that had reared their ugly heads inbetween releases. We pretty much eliminated all of them. Not a sexy feature to brag about, but it involved lots of sweat, tears and effort, and the ramifications it has on the overall stability of the program is considerable.
  • There were some problems with Cg and GLSL shader selections which should now be taken care of.
  • ScummVM games can now be scanned in various ways (courtesy of RobLoach)
  • Downloading multiple updates at once could crash RetroArch – now fixed.
  • Several cores have gotten Retro Achievements support now. The official list of systems that support achievements now is: Mega Drive, Nintendo 64, Super Nintendo, Game Boy, Game Boy Advance, Game Boy Color, NES, PC Engine, Sega CD, Sega 32X, and Sega Master System.
  • You can now turn the supported extensions filter on or off from the file browser.

Effort to addressing user experience feedback

I think a couple of things should be addressed first and foremost. First, there is every intent to indeed make things like a WIMP (Windows Icons Mouse Pointers) interface around RetroArch. To this end, we are starting to make crossplatform UI widget toolkit code that will make it easy for us to target Qt/GTK/Win32 UI/Cocoa in one fell swoop.

We have also spent a lot of time plugging some of the rough edges around RetroArch and making the user interface more pleasurable to work with.

Youtube libretro channel

Hunterk/hizzlekizzle is going to be running the libretro Youtube channel from now on, and we’ll start putting up quick and direct Youtube videos there on how to be able to use RetroArch. It is our intent that this will do a couple of things:

1. Show people that RetroArch is easy to use and has numerous great features beneath the surface too.
2. It allows users to give constructive criticism and feedback on the UI operations they see and how they think they should be improved.
3. We hope to engage some seasoned C/C++ coders to help us get some of these UI elements done sooner rather than later. Most of RetroArch development mostly relies on a handful of guys – 5 at the most. It is a LOT of hard work for what amounts to a hobbyist project, and if we had a lot more developers seasoned in C/C++, stuff could be done quicker.
4. There is no intention at all to make RetroArch ‘obtuse’ for the sake of it, there is every intention to make it more accessible for people. Additional help would go a very long way towards that.

Regarding the current UIs and their direction, it is obviously meant to be a console-like UI experience. This might not be what desktop users are used to on their PCs but it is what we designed menu drivers like XMB to be. It is true that keyboard and mouse are mostly seen as afterthoughts in this UI but really, we wrote the UI with game consoles and something where a gamepad is the primary input device at all times, particularly since a keyboard to us is a poor way of playing these console-based games anyway.

Anyway, menu drivers like XMB and MaterialUI will never have any WIMP UI elements. HOWEVER, in upcoming versions, we will be able to flesh out the menubar and to allow for more basic WIMP UI elements.

RetroArch is meant to be a cutting-edge program that is ultra-powerful in terms of features. With that comes a bit of added complexity. However, we have every intent of making things easier, and with every release we put a lot of time and effort into improving things. But again, more developers would help out a substantial lot in speeding up certain parts that we are working on.

Our vision for the project involves an enormous workload and we’re considering differnt ways of generating additional support. If a Patreon might allow us to get more developers and get more stuff done faster, we might consider it. But we want such things to be carefully deliberated by both our internal development staff and the users at large. I hope you’ll be able to appreciate the relative rough edges around the program and appreciate the scope and the craft we have poured into the program. Please appreciate that we are pouring a lot of blood, sweat and tears into the program and that mostly we try to maintain an upper stiff chin when faced with all the criticism, but we do care and we do intend to do better. Volunteer coders are very welcome though, by people who have some time to spare and who want to make a difference. We ask for your understanding here, and we hope that by finally speaking out on this, users can gain a better understanding of our intent and be able to appreciate the program better in light of that.

Nintendo 64 Vulkan Low-Level emulator – paraLLel – pre-alpha release

Vigilante 8 running in ParaLLEl.
Vigilante 8 running in ParaLLEl.
Here is a pre-alpha release of the hotly anticipated N64 Vulkan renderer, paraLLel. To coincide with this, a new RetroArch version has also been released that includes support for the async compute interface that this new renderer requires.

Also see our other major announcements today:

RetroArch 1.3.6 released

Lutro – easy retro game creation, powered by libretro

And our earlier story featured a couple days back on ParaLLEl –

First ever Vulkan Nintendo 64 emulator, ParaLLEl, coming soon, only for Libretro/

Continue reading “Nintendo 64 Vulkan Low-Level emulator – paraLLel – pre-alpha release”