Flycast world’s first Dreamcast emulator to receive Vulkan renderer – available later today on RetroArch with nightly core!

The first Dreamcast emulator ever to get a Vulkan renderer. Completely open-source, written from scratch, and available later today on RetroArch. Update your core later today to get the latest version with the Vulkan renderer! Available for Android, Windows, and Linux.

For more information, read down below…

Wait … a new what?

The renderer is the emulator component that emulates the Dreamcast/Naomi GPU chip, namely the PowerVR Series2. It was one of the first generations of 3D chips, with only a fixed pipeline. The PowerVR2 supported DirectX 6.0, which was the graphics API used by Windows CE games on the Dreamcast. Successors of the PowerVR2 would later be found in the original iPhone and iPod Touch (PowerVR4), iPhone 4 and iPad (PowerVR5) and many many other mobile devices. Now the Dreamcast GPU is more than 20 years old. You might think it should be easy to emulate such an ancient chip on modern hardware, right? Well … yes for the most part. But there’s one thing that the PVR2 does really well, and it’s order-independent transparency. And even today this is still not trivial to implement even on modern hardware. You won’t find this feature in Open GL or DirectX, and you need a pretty recent version of these APIs to be able to emulate it, which means manually sorting individual pixels from back to front and blending them together, and doing this for each visible pixel on the screen!

OK, but what about Vulkan?

For those of you who are not familiar with Vulkan, it is a relatively new 3D graphics API, basically a follow-on to Open GL. Open GL is quite permissive and has little declarative constraints. You just throw stuff at the driver when you need to and the driver’s job is to figure it out. The downside of this is that the Open GL driver often needs to guess what you’ll do next and he might not guess right. And when it doesn’t, performance suffers. Vulkan is radically different in that everything must be declared in advance, in great details, and there’s very little room for improvisation on the part of the driver. Vulkan works much closer to the hardware than Open GL does. So you can expect less overhead, more reliability and better performance in many cases.

The downside of Vulkan is the sheer amount of code you have to write to display just a single triangle on the screen, let alone a full-featured Dreamcast renderer. Last time I checked, the Vulkan renderer had 47 source files and around 7800 lines of code. (The Open GL renderer only has around 6000 lines of code.)

So what do we get?

As with Open GL, there are actually two Vulkan renderers: The first one uses a traditional single render pass with per-triangle or per-mesh sorting done by the CPU. The second one is capable of order-independent transparency with per-pixel sorting performed by the GPU. It uses multiple subpasses to compose the final image: the first subpass draws the opaque geometry depth map and the shadows casted on them. The second subpass renders all opaque geometry to a temporary color framebuffer, and transparent geometry into a huge pixel linked list. The last subpass then renders shadow volumes for translucent geometry. And finally all pixels are sorted and blended together using the opaque framebuffer of the previous subpass as background.

The next Flycast nightly build will have support for Vulkan on all major platforms: Windows, Linux and Android. In terms of features, the new renderer should be on par with the Open GL renderer, with the notable exception of lightgun crosshair and VMU screens display, which will be added soon. However, expect to find bugs and crashes here and there as is expected with any new piece of software. Also it may be slower than Open GL depending on many factors such as GPU, driver version, game being played, etc. We’ll do our best to fix any issue encountered and overcome performance issues. When reporting problems, make sure to indicate what GPU you’re using and the Vulkan driver version. It is highly recommended to upgrade your drivers to the latest version available, especially on mobile.

Here is a showcase of the differences between the basic and OIT renderers. By the way, this also applies to Open GL.

Here the hair of these ladies show glitching triangles in basic mode.

In Speed Devils 2, the shadow volumes (called “Modifier Volumes” in Dreamcast literature) are used in a special way to project headlights. This is only possible by using deferred rendering.

In this example, look at Ryo’s cast shadow on his left. There is a fog effect applied to this scene, but the basic single pass renderer cannot apply a fog effect to the cast shadow. In the OIT renderer, the shadow is perfectly fogged.

In Jet Set Radio, the character is composed of translucent polygons, and these polygons can be shadowed as well. Only the OIT renderer can properly render shadows cast on translucent polygons.

To finish, here is another seldom used GPU features: secondary accumulation buffer. It can be used to do tri-linear filtering and other effects. This is Evil Dead – Hail to the King and it is clear that the basic renderer is having a hard time here.

Final thoughts

Yes, the per-pixel alpha transparency option which to this date was only available on Windows and Linux now also works on Android with the Vulkan renderer. However, keep in mind that per-pixel alpha sorting is heavily memory bandwidth-limited. It has been tested on a Mali G76 (Samsung Galaxy S10+) – and it runs acceptably at 640×480 or 800×600 resolution. Your mileage may vary depending on the GPU power inside your Android phone. We recommend you to find that sweet spot which works best for you, and if results are too bad with per-pixel alpha enabled, turn back to per-triangle.

Some clear advantages of the Vulkan renderer is that frame pacing is much better than the OpenGL renderer, and performance is far higher when it comes to texture uploads and/or framebuffer manipulation. For example – when you KO an opponent in Dead Or Alive 2 against an explosive wall – the framerate would often tumble a bit on GL, but no such issues with Vulkan. Similar improvements can be noticed in Virtua Tennis 2 – when certain framebuffer effects happen after a replay, performance is much more steady with Vulkan thanks to the high degree of parallelism.

With Vulkan, we have heard reports that virtually all sound crackles and stutters are gone. That’s because with vulkan you choose the sync points where you wait. In GL the driver has to guess and sometimes it fails. These effects are using render to texture, and with OpenGL this creates sync issues.

Flycast WinCE has merged into regular Flycast – only one core now! Plus – Switch port teaser!

So, flyinghead finally feels confident enough that we are at a stage where the WinCE branch can be merged into master.

This is a pretty big deal. This marks the first time that an open source Dreamcast emulator has Windows CE support in a mainline release, along with arcade Naomi support. Right now, the only other emulator that manages to emulate both these is a closed source emulator called demul. So this is a pretty big milestone for us.

So, what do you have to do from this point on?

– Go to the Online Updater, select ‘Update Cores’, and download ‘Flycast’.
– Remove the old Flycast WinCE core. It no longer serves any purpose, you can just download the Flycast core instead which was Windows CE support now built-in. If you have any games in your old playlists that still use the Flycast WinCE core, reset the core association and make it use the main Flycast core now.

This has been a tremendous undertaking and in the process, so many improvements have been made as a result:

* (Windows CE) The reason why the Windows CE build was separate before was that the addition of the MMU codepath would greatly hurt the performance of every non-Windows CE-based game. This is no longer the case thankfully. Instead, performance has actually improved over the non-Windows CE build from before (see the next point).
* (Optimizations / Performance) Thanks to SSA optimizations, performance is better across the board now for every game, whether it is a Windows CE-based game or not. We figure it’s about 30% faster on average give or take.
* (Libretro core-specific) Certain core options now get hidden depending on other settings that are turned on/off which they are dependent on. For instance – ‘Show VMU Display Settings’ – if you enable this and then leave the Quick Menu options screen and re-enter it again, it will show all the VMU display options. Turn this off and repeat the same process in order to hide all the VMU display-related options again. This will greatly unclutter the options list.
* (Libretro core-specific) A new setting that appears when Threaded Rendering is enabled – ‘Delay Frame Swapping’. This waits until the frame is rendered by the digital video encoder which means it’s being displayed on the screen. This helps avoid displaying bogus/empty frames that would not be shown on a real console. Without this option enabled, you can get heavy screen flickering with some games (such as South Park Chef’s Luv Shack and NFL Quarterback Club 2000).
* (Libretro core-specific) Auto-configuration of early input polling when threaded rendering is enabled. Threaded rendering needs early input polling configured or else input will be buggy. In the past, the user needed to do this manually on RetroArch, which greatly complicated things. Now instead, we do this behind the scenes through a private libretro API extension, so you no longer have to tediously configure this yourself. It was annoying and could mess with your configuration since for other cores you would really want to set it to late input polling for the best input latency.
* AICA sound emulation has greatly improved and can now be considered mostly feature complete. We have included a before and after comparison of Skies of Arcadia so you can get an idea of how much of an improvement it is – this is Skies of Arcadia before and after implementing the LPF (Low-Pass Filter) –

Before –

After –

So, in short, the low-pass filter has been implemented. This filter has an envelope, similar to the already implemented one for amplitude. It varies the cut-off frequency for attack/decay/sustain/release. The other thing added is the pitch LFO, which is used to create a vibrato effect. So far, the only sound inaccuracy I have noticed in a game is that certain music that is triggered upon events (such as in Resident Evil Code: Veronica) does not get properly faded out. Apart from that, sound accuracy has seen a tremendous improvement, really night and day in a lot of respects in games like Resident Evil 2 and Skies of Arcadia.

* (AMD) Per-pixel alpha sorting has been fixed on AMD GPUs. Previously ,it would look like this on Windows (at the main BIOS menu) –

Switch port teaser

Flycast Libretro is coming to RetroArch Switch courtesy of Datamats! For now it uses the interpreter core only, but m4xw is going to work on getting the dynarec working. No ETA and don’t bug the devs about it until it’s done!

RetroArch 1.7.8 (v3) released + big core updates! (bsnes HD, Flycast, Android, etc)

RetroArch 1.7.8 was a very ambitious release, and as a result, it is taking some time to iron out some of the kinks. Instaed of leaving you waiting for a month again to fix some crucial bugs, we’d rather release these point fix releases first instead so that we leave you with a rock-solid 1.7.8 in the end while we then shift our focus and attention to 1.7.9.

In light of that, we are releasing version 3 right now, which will be especially beneficial for Mac users. The future-proof Metal Mac version should now work flawlessly on Macs with an AMD graphics card (they previously produced heavy graphics glitches inside the menu). It has just been released! Grab it here.

If you’d like to show your support, consider donating to us. Check here in order to learn more.

For all other details surrounding version 1.7.8, we refer you to our original article here.

Changes

  • GLCORE: Ensure correct scaling of menu texture (with RGUI)
  • IPS: Soft-Patch any IPS size
  • METAL: Fix overlay issue – setup correct viewport before rendering overlay
  • METAL/STB: Fix font driver issue with AMD GPUs on MacOS.
  • MENU/RGUI: Correctly rescale menu when resizing window if aspect ratio lock is enabled.
  • OSX: Remove OSX suffix in window title
  • PSP: Fix audio conversion code
  • REMAPS: Fix analog remapping regression -analog remapping would break controls

bsnes HD – Released for Windows/Linux/Android, and soon iOS and Mac!

bsnes HD should now be available on Linux, Windows and Android for RetroArch users! It’s based on the latest version of bsnes, and it should be significantly faster than previous bsnes versions.

In this video, we show you some of the HD Mode 7 features that are unique to this version. They make for a fairly significant difference overall as we’re sure you’ll agree!

This core is not fully complete yet and might still have some omissions. Also, the ‘bsnes HD’ name is temporary, and we will be doing some house cleaning of the various bsnes cores we are maintaining soon. The plan is to have an improved Core Updater in later RetroArch versions that allows for better categorization and filtering in the future so that users can more easily manage their cores.

On Android for the first time!

This is the first time the latest version of bsnes will appear on Android, courtesy of Libretro/RetroArch! Our core version of this is called bsnes HD, and you can grab it from the Core Updater right now! Just make sure to update the core info files first (by going to Online Updater, then selecting ‘Update Core Info Files’). bsnes HD should be a fair bit faster than the other bsnes cores already available, plus it has enhanced overclocking features and the acclaimed HD Mode 7 features.

In this video you see it running on a Samsung Galaxy S10+ (Exynos model) with 3x HD Mode 7 scaling applied, and it runs at fullspeed all the way.

User and contributor harakari has reported that he can run HD Mode 7 at 4x scaling and still have games run at fullspeed on his iPhone XS Max, so if anything, expect even better performance on high-end iDevices!

Flycast – WinCE core now 30% faster on average and sound improvements!

The FlycastWinCE version should now be 30% faster on average for non-Windows CE games! We have figured out a way to have the Windows CE code additions no longer affect the main performance of the emulator. For that reason, after we have ironed out some of the final kinks, you can expect there to be only one Flycast core moving forward. On top of that, flyinghead has really gone to town with some much needed audio improvements on the AICA and DSP side. Witness this long standing audio sample bug that is now finally fixed with the game Resident Evil: Code Veronica. This and many other sound bugs (such as the audio samples continuing to be repeated during the battle loading scenes in Soul Calibur) have been fixed now.

World-first – Windows CE Dreamcast games running on Android!

The Flycast WinCE core is now available for Android users!

  • 30% speedup in non-Windows CE games thanks to extensive optimizations made to the dynarec by flyinghead. No more performance reduction of non-Windows CE games.
  • Windows CE support further improved.

It is now possible to play Dreamcast Windows CE games on Android! Please be aware that this is very CPU intensive and that you should probably expect 15 to 25fps on high-end Android phones right now.

Note that Windows CE games could still be unstable on Android, and that you need a real BIOS for Windows CE to work. It won’t work with the HLE BIOS.

NOTE: We anticipate that after the final kinks has been ironed out, that we will merge the Windows CE parts of this core back into Flycast, and that the separate Flycast WinCE core will disappear from there on. So this separate core is only a temporary thing for now. We will let you know when this will happen. Video was recorded on a Galaxy S10+.

Mupen64 Plus Next – No more 10 second startup times on Windows!

Thanks to an important bug fixed by mudlord, Mupen64 Plus Next should no longer take up to 10 seconds to start up any game on Windows.

Mupen64 Plus Next is an up to date version of Mupen 64 Plus with the latest GlideN64 renderer.

More progress reports on other cores soon

Cores are often updated on a daily basis, and a lot of the times, all the amazing enhancements and improvements they receive go underreported because we’re so busy with development. We feel it is time to shine more of a light on these changes, so we will be doing more periodic updates on core updates as they come along.