RetroArch – In Development – World-first text to speech in emulators – Update!

Earlier this month we showed you RetroArch’s world first text to speech implementation for emulators. You can read that previous article here.

Since then, this feature has been immeasurably improved. Onscreen character recognition and live text to speech translation is now done at the press of a button. You bind the AI Service key to a button or key of your choice, and as soon as you press it, a scan of the image will be taken in real-time. Any characters that were recognized as text will then be translated from text to speech.

In this video, we are running a local instance of vgtranslate on the same computer. This cuts down a lot on the latency you could perceive in the previous video. The other big difference is that the core no longer has to be paused manually and then unpaused to do the OCR scan – you now press a hotkey and the game continues running without any interruption. This provides for a much more smooth and seamless experience.

Shown in this video is a test run of several cores and games: Quake 1 with the Tyrquake core, Mega Man 4 with a NES emulator core, Trials of Mana/Seiken Densetsu 3 with a SNES emulator core, and finally Castlevania 3 with a NES emulator core. The OCR/text to speech system works with ANY libretro core that does not use hardware acceleration right now. So any core that doesn’t rely on OpenGL/Vulkan/Direct3D in order to function should be good to go.

RetroArch – In Development – World-first text to speech in emulators! OCR Bounty Progress!

The OCR bounty’s next milestone has been reached! Previously, it was finally possible to recognize patterns of text by the OCR/translation APIs. Now, text to speech has been achieved!

NOTE: This is a work in progress/development, and is not reflective yet of the actual finished product.

In this video, you see us running RetroArch with an old school SNES RPG, Soul Blazer. Right now, this is a very experimental implementation that is far from finished and not really ready for general gameplay, but this will be further developed from this point on.

Anyway, in this video you see us doing the following:

* Whenever we pause RetroArch, the OCR process starts. It will first attempt to recognize text onscreen.

* Once it has recognized text, you will see outlines onscreen surrounding the recognized text.

* Then it passes this off to the Translate Text To Speech engine. Once done, it generates a log message about it (you can see this in the video).

* From this point on, you can unpause RetroArch and you can actually hear the voice over.

NOTE: Just to emphasize, this is NOT how it will work like in the actual finished product. This is a purely ‘WIP developer’ video we’re showing.