RetroArch 1.11.1 release!


RetroArch 1.11.1 has just been released.

Grab it here.

If you’d like to learn more about upcoming releases, please consult our roadmap here.

Remember that this project exists for the benefit of our users, and that we wouldn’t keep doing this were it not for spreading the love to our users. This project exists because of your support and belief in us to keep going doing great things. We have always prioritized the endusers experience, and unlike others, we have never emburdened them with in-app ads, monetization SDKs or paywalled features, and we intend to continue to do so. If you’d like to show your support, consider donating to us. Check here in order to learn more. In addition to being able to support us on Patreon, there is now also the option to sponsor us on Github Sponsors! You can also help us out by buying some of our merch on our Teespring store!

NOTE: The Android version on Samsung Galaxy Store, Huawei AppGallery, and Amazon App Store will be updated soon. We will remove this notice when it has been updated. Until then, grab the APK from our site.

NOTE: Several size optimizations have been made to the packages. We no longer pre-install all of the optional XMB theme packs or other miscellaneous assets. Previously we also shipped autoconfig files that were irrelevant for that specific platform. By excluding these files from the package, we have managed to reduce the filesize and overall amount of files of RetroArch downloads/installs significantly. On consoles this will be very helpful where SD card/FTP installs can tend to be very slow.

If you still want to have all assets, you can go to Online Updater and select ‘Update Assets’. This will install all assets.

Changelog

1.11.1

  • GENERAL: Fix DEFAULT_FILL_TITLE_MACRO
  • NETWORKING: Add the const qualifier to some function parameters
  • NETWORKING/NETPLAY/UPNP: Add a private or CGNAT address warning to UPnP
  • SAVESTATES/SCREENSHOTS: Avoid ‘video_gpu_screenshot’ with savestates
  • UWP: Better ‘Save on quit’ fix

1.11.0

  • 3DS: Add unique ID’s
  • 3DS: Add bottom menu options
  • 3DS: Set bottom_asset directory default
  • 3DS: Only enable internal counter with CONSOLE_LOG defined
  • 3DS: Set default bottom font values
  • 3DS: Fix CIA installation issues
  • 3DS: Support latest libctru
  • ANDROID: Add HAVE_ACCESSIBILITY
  • ANDROID: Gingerbread support
  • ANDROID: Touchpads support
  • ANDROID: Builtin Xperia Play autoconfig profile
  • ANDROID: Disable Feral GameMode for Android – only available on Linux
  • ANDROID: Add a configurable workaround for Android reconnecting devices
  • ANDROID/FDROID: Add F-Droid metadata to repo in Fastlane format
  • AUDIO/AUDIO MIXER: Add missing locks for thread safety
  • AUDIO/AUDIO MIXER: Fix audio mixer memory leak + remove redundant ‘single threaded’ rthreads implementation
  • AUTOSAVE: Change/improve exit behavior of autosave thread – if condition variable is signaled, the loop is ran another last time so we can do a final check/save before stopping the thread.
  • CDROM: Fix memory leak caught with asan – buf passed to filestream_read_file
  • CORE INFO/NETPLAY: Ensure current core info is initialized at runloop_event_init_core when netplay is enabled
  • CHEEVOS: Upgrade to rcheevos 10.4
  • CHEEVOS: Allow creating auto savestate in hardcore
  • CHEEVOS: prevent invalid memory reference if game has achievements but core doesn’t expose memory
  • CHEEVOS: Release achievement badge textures when video driver is deinitialized
  • CHEEVOS: Re-enforce hardcore limitations once achievements are loaded
  • CHEEVOS/MENU/MATERIALUI: Show achievement badge icons in MaterialUI driver
  • D3D9: D3D9 has been split up into two drivers – D3D9 HLSL (max compatibility, no shader support yet) and D3D9 Cg (dependent on deprecated Nvidia Cg runtime library)
  • D3D9/HLSL/XMB: XMB fix
  • D3D9/CG: D3D9 Cg driver fixed
  • D3D11: Fix overlay not showing up
  • D3D11/12: Reduce lag with WaitForVBlank – this rather simple addition seems to make D3D11/12 very very close to Vulkan/GLCore regarding input lag.
  • D3D11/12: Add waitable swapchains and max frame latency option
  • D3D11/12: Make waitable swapchains optional
  • DATABASE: Reformat ‘rdb_entry_int’ – Nitpick adjustments for database entries: Capitalize “Release Date”, and remove space before : from Release Date rows which use integer
  • DATABASE/EXPLORE: Allow On-Demand Thumbnails in Explore menu
  • DATABASE/EXPLORE/MENU/OZONE/XMB/RGUI: Explore menu thumbnails
  • DISC CONTROL: Better Disc Control append focus
  • DOS/DJGPP: Add a workaround for libc bug
  • AUTOMATIC FRAME DELAY: Added slowmotion resiliency
  • AUTOMATIC FRAME DELAY: Added string representation for seeing the current effective delay without opening statistics
  • AUTOMATIC FRAME DELAY: Added “ms” to logging and “(ms)” to label just like in Audio Latency
  • GENERAL: Don’t bake in OpenAL and libcaca by default unless explicitly enabled with configure switch.
  • GENERAL: Reduce amount of strlen calls
  • GENERAL: Reduce or simply sin/cosf calls
  • GFX: Fix readability and precision issues in aspectratio_lut
  • GFX: Add option to manually enable/disable automatic refresh rate switching
  • GFX: Enable automatic configuration of ‘VSync Swap Interval’
  • GFX/FONT/FREETYPE: Use FT_New_Memory_Face – first read it from file to memory beforehand –
    this solves an asset extraction issue when selecting ‘Update Assets’ – apparently FT_New_Face keeps an open file handle to the font file which
    prevents it from being overwritten/deleted while the program is still running.
  • GFX/THUMBNAILS: Thumbnail aspect ratio fix
  • GFX/THREADED VIDEO: Optimizations, fixes and cleanups
  • GFX/VIDEO FILTERS: Add Upscale_240x160-320×240 video filter with ‘mixed’ method
  • GLSLANG: Fix compilation with ./configure –disable-builtinglslang – was missing linking against -lMachineIndependent and -lGenericCodeGen static libs
  • INPUT: Fix off by one error for input_block_timeout setting. Also default to 0 for this setting (pretty massive performance gain)
  • INPUT: Analog button mapping fixes
  • INPUT/HID/OSX: Fix DualShock3 support
  • INPUT/HID/LINUX: (qb) Disable HAVE_HID by default for now for Linux as long as there are no working backends for both
  • INPUT/HID/WINDOWS: (qb) Disable HAVE_HID by default for now for Windows as long as there are no working backends for both
  • INPUT/HID/WIIU: Fix DualShock3 support
  • INPUT/OVERLAY: Block pointer input when overlay is pressed
  • INPUT/REMAPPING: input_remapping_save_file – existing remapping file was needlessly reloaded
  • INPUT/REMAPPING: Add option to disable automatic saving of input remap files
  • INPUT/LINUX/UDEV: Fix lightgun scaling on Y axis
  • INPUT/LINUX/X11/LED: Add LED keyboard driver
  • INPUT/WINDOWS/LED: LED keyboard driver cleanup
  • INPUT/WINDOWS/WINRAW: Clear key states when unfocused
  • INPUT/WINDOWS/WINRAW: Fix pointer device position
  • IOS: iOS app icon fixes & revisions
  • LIBRETRO/SAVESTATES: Implement an api call for context awareness
  • LOCALIZATION: Updates
  • LOCALIZATION: Add Catalan language option
  • LOCALIZATION: Fix some bad localization
  • LINUX: Make memfd_create call more backwards compatible by calling it through syscall – on older systems, you’ll have to include linux/memfd.h for the MFD_ defines, and call memfd_create() via the the syscall(2) wrapper (and include unistd.h and sys/syscall.h for it work). We exclude linux/memfd.h header include because we already provide the MFD_ defines in case they are missing
  • LINUX/MALI FBDEV: Fix assertion failed on video threaded switch
  • MENU: Menu paging navigation adjustments
  • MENU: New Menu Items for disabling Info & Search buttons in the menu
  • MENU: Allow the user to use volume up/down/mute hotkeys from within the menu
  • MENU: Add missing sublabels for non-running Quick Menu
  • MENU: Reorganize Quick Menu Information
  • MENU: Savestate thumbnails – Savestate slot reset action
  • MENU: Allow changing savestate slots with left/right on save/load
  • MENU: Add ‘Ago’ to playlist last played styles
  • MENU: Add proper icons for shader items
  • MENU/MATERIALUI: Add icon for ‘Download Thumbnails’
  • MENU/XMB: Add options for hiding header and horizontal title margin
  • MENU/XMB: Dynamic wallpaper fixes
  • MENU/XMB: Add Daite XMB Icon Theme
  • MENU/XMB/OZONE: Savestate thumbnail aspect ratio
  • MENU/XMB/OZONE: Core option category icon refinements
  • MENU/XMB/OZONE: Fullscreen thumbnail browsing
  • MENU/XMB/OZONE: Add playlist icons under ‘Load Content’
  • MENU/XMB/OZONE: Thumbnail improvements
  • MENU/XMB/OZONE: Savestate thumbnail fullscreen + dropdown
  • MENU/XMB/OZONE: Prevent unnecessary thumbnail requests when scrolling through playlists
  • MENU/OZONE: Fix playlist thumbnail mouse hover after returning from Quick Menu
  • MENU/OZONE: Thumbnail visibility corrections
  • MENU/OZONE: Playlist metadata reformat
  • MENU/OZONE: Savestate thumbnail fixes
  • MENU/OZONE: Add savestate thumbnails
  • MENU/OZONE: Header icon spacing adjustment
  • MENU/RGUI: Savestate thumbnails
  • MENU/SETTINGS: Turn Advanced Settings on by default, this entire filtering of settings will need a complete rethink anyways
  • MENU/WIDGETS: Widget color + position adjustments
  • MIYOO: Exclude unused HAVE_HID for Miyoo
  • MIYOO: Enable screenshots
  • MIYOO: Enable rewind
  • NETWORK: Allow MITM server selection on OK callback
  • NETWORK: Replace socket_select calls
  • NETWORK: Implement binary network streams
  • NETWORK: Poll support
  • NETWORK: Check connect errno for successful connection
  • NETWORK: Get rid of the timeout_enable parameter for socket_connect
  • NETWORK: Fix getnameinfo_retro’s port value for HAVE_SOCKET_LEGACY platforms
  • NETWORK: Define inet_ntop and inet_pton for older Windows versions
  • NETWORK: Define isinprogress function
  • NETWORK/NATT: Move natt files to “network”
  • NETWORK/NETWORK STREAMS: Add function netstream_eof
  • NETWORK/NETPLAY: Fix game CRC parsing
  • NETWORK/NETPLAY: Disable and hide stateless mode
  • NETWORK/NETPLAY: Change default for input sharing to “no sharing”
  • NETWORK/NETPLAY: Enforce a timeout during connection
  • NETWORK/NETPLAY: Disallow clients from loading states and resetting
  • NETWORK/NETPLAY: Special saves directory for client
  • NETWORK/NETPLAY: Ensure current content is reloaded before joining a host
  • NETWORK/NETPLAY: Fix client info devices index
  • NETWORK/NETPLAY: Fix input for some cores when hosting
  • NETWORK/NETPLAY: Memory leak fixes
  • NETWORK/NETPLAY: Force a core update when starting netplay
  • NETWORK/NETPLAY: Fix NAT traversal announce for HAVE_SOCKET_LEGACY platforms
  • NETWORK/NETPLAY: Refactor fork arguments
  • NETWORK/NETPLAY: Fix content reload deadlocks on static core platforms
  • NETWORK/NETPLAY: Disallow netplay start when content is not loaded for static core platforms
  • NETWORK/NETPLAY: Show client slowdown information
  • NETWORK/NETPLAY: Improve check frames menu entry
  • NETWORK/NETPLAY: Do not try to receive new data if the data is in the buffer
  • NETWORK/NETPLAY: Copy data on receive, even if the buffer is full
  • NETWORK/NETPLAY: Fix lobby sublabel CRC display on some platforms
  • NETWORK/NETPLAY: Support for customizing chat colors
  • NETWORK/NETPLAY: Small launch compatibility patch adjustments
  • NETWORK/NETPLAY: Support for banning clients
  • NETWORK/NETPLAY: Minor tweaks to the find content task
  • NETWORK/NETPLAY: Support for gathering client info and kicking
  • NETWORK/NETPLAY: Fix possible deadlock
  • NETWORK/NETPLAY: Initialize client’s allow_pausing to true
  • NETWORK/NETPLAY: Disable netplay for unsupported cores – with stateless mode being disabled for now, there is no reason not to include this. Refuse to initialize netplay when the current core is not supported (no proper savestates support)
  • NETWORK/NETPLAY/DISCOVERY: Ensure fixed width ints on packet struct
  • NETWORK/NETPLAY/DISCOVERY: Support for IPv4 tunneling (6to4)
  • NETWORK/NETPLAY/DISCOVERY/TASKS: Netplay/LAN Discovery Task refactor – aims to prevent blocking the main thread while awaiting for the LAN discovery timeout; This is accomplished by moving the whole discovery functionality into its task and using a non-blocking timer to finish the task. Also fixes discovery sockets not being made non-blocking, which could cause the main thread to hang for very long periods of time every pre-frame.
  • NETWORK/NETPLAY/TASKS: Find content task refactor – fixes many issues along the way, including a couple of nasty memory leaks that would leak thousands of bytes each time the task ran. It also expands the original concept by matching currently run content by filename (CRC matching is always performed first though).
  • NETWORK/NETPLAY/TASKS: Find content task refactor – Ensure CRC32 is 8 characters long
  • NETWORK/NETPLAY/LOBBY: Add setting for filtering out rooms with non-installed cores
  • NETWORK/NETPLAY/LOBBY: Hide older (incompatible) rooms
  • NETWORK/NETPLAY/LOBBY: Add a toggleable filter for passworded rooms. In addition, move lobby filters into its own submenu for better organization.
  • NETWORK/NETPLAY/MENU: Chat supported info for the host kick submenu
  • NETWORK/NETPLAY/MENU: Localize relay servers
  • NETWORK/NETPLAY/MENU: Host Ban Submenu
  • NETWORK/NETPLAY/MENU: Add client devices info to the kick sub-menu
  • NETWORK/NETPLAY/MENU: Path: Netplay -> Host -> Kick Client – Allows the host to kick clients. Allows the host to view client information: connected clients (names), status (playing/spectating) and ping.
  • NETWORK/NETPLAY/VITA: Add net_ifinfo support
  • NETWORK/NETPLAY/VITA: Enable partial LAN discovery
  • NETWORK/NETPLAY/VITA: Change default UDP port to 19492
  • NETWORK/NETPLAY/VITA: Do not multiply negative timeout values
  • NETWORK/NETPLAY/VITA: Fix epoll’s timeout parameter
  • NETWORK/NETPLAY/VITA: Launch compatibility patch
  • NETWORK/NETPLAY/3DS: Launch compatibility patch
  • NETWORK/NETPLAY/3DS: Adapt POLL for 3DS platform
  • NETWORK/NETPLAY/PS3: Launch compatibility patch
  • NETWORK/NETPLAY/WII: Enable net_ifinfo for some features. In practice, this only allows the netplay’s UPnP task to succeed on the Wii.
  • NETWORK/NETPLAY/WIIU: Launch compatibility patch
  • NETWORK/NETPLAY/SWITCH: Launch compatibility patch
  • NETWORK/UPNP: Attempt support for remaining platforms
  • NETWORK/UPNP: Support for IPv4 tunneling
  • ODROID GO2: Increase DEFAULT_MAX_PADS to 8 for ODROIDGO2, since that impacts the RG351[X] consoles. The RG351[X] have a USB host controller and can have an arbitrary number of USB gamepads.
  • ONLINE UPDATER: Online Updater menu reorganizing
  • OSX: Fixed items of system top menu bar on macOS
  • OSX: Revision to macOS app icon set
  • PLAYLISTS: Ensure history list will contain CRC32
  • PLAYLISTS: Fix CRC32 comparison – as state->content_crc has “|crc” suffix.
  • PS4/ORBIS: Orbis/PS4 Support using OrbisDev toolchain
  • PS4/ORBIS: Update xxHash dependency
  • PS4/ORBIS: Shader cache
  • RETROFW: Exclude unused HAVE_HID for RetroFW
  • RETROFW: Support battery indicator on RetroFW
  • RETROFW: Enable menu toggle button on retrofw devices
  • SHADERS: Shader Preset Loading of Multiple additional #references lines for settings
  • SHADERS: Shader Load Extra Parameter Reference Files – this adds the ability to put additional #reference lines inside shader presets which will load additional settings. The first reference in the preset still needs to point at a chain of presets which ends with a shader chain, and subsequent #reference lines will load presets which only have parameter values adjustment. This allows presets to be made with a modular selection of settings. For example with the Mega Bezel one additional reference could point at a preset which contained settings for Night mode vs Day mode, and another reference could point to a preset which contained settings for how much the screen should be zoomed in.
  • SHADERS/MENU: Increase shader scale max value
  • SCANNER/DC: Fix Redump bin/cue scan for some DC games
  • SCANNER/GC/WII: Add RVZ/WIA scan support for GC/Wii
  • SCANNER/PS1: Improved success rate of Serial scanning on PS1 by adding support for the xx.xxx format
  • SCANNER/PS1: Changed return value of detect_ps1_game function to actually return a failure when the Serial couldn’t be extracted. Scanner will then fallback on crc check, and usually ends up finding the games in the database.
  • SWITCH: Enable RWAV (WAV audio file) support
  • STRING: Do not assume char is unsigned
  • TASKS: More thread-awareness in task callbacks
  • TASKS: Fix race condition at task_queue_wait
  • TVOS: Revised tvOS icons w/ updated alien.
  • VFS: Fix various VFS / file stream issues
  • VULKAN: Fix more validation errors
  • VULKAN: Attempt to fix validation errors with HDR swapchain. Always use final render pass type equal to swapchain format. Use more direct logic to expose if filter chain emits HDR10 color space or not
  • VULKAN/ANDROID: Honor SUBOPTIMAL on non-Android since you’d want to recreate swapchains then. On Android it can be promoted to SUCCESS.
  • SUBOPTIMAL_KHR can happen there when rotation (pre-rotate) is wrong.

  • VULKAN/DEBUG: Automatically mark buffer/images/memory with names
  • VULKAN/DEBUG: Move over to VK_EXT_debug_utils. Debug marker is deprecated years ago.
  • VULKAN/HDR: Fix leak of HDR UBO buffer
  • VULKAN/BFI: Fix BFI (Black Frame Insertion) regression
  • WINDOWS: Fix exclusive fullscreen video refresh rate when vsync swap interval is not equal to one – refresh rate in exclusive fullscreen mode was being incorrectly multiplied by vsync swap interval, breaking swap interval functionality at the gfx driver level
  • WIN32: Do optimization for Windows where we only update the title with SetWindowText when the previous title differs from the current title
  • WIN32: Skip console attach when logging to file
  • WIN32: Remove black margins with borderless non-fullscreen window
  • WIN32/TASKBAR: Release ITaskbarList3 on failed HrInit – pointer wasn’t NULL’d, thus set_window_progress would cause weird behavior
  • WII/GX: Fix potential datarace
  • WIIU: Implement sysconf and __clear_cache
  • WIIU: Add OS memory mapping imports
  • UWP: Added launch protocol arg ‘forceExit’ so a frontend can tell an already-running RetroArch UWP instance to quit.
  • UWP: Enable core downloader/updater
  • UWP: Remove copy permissions as its inefficient as we can just directly assign the new ACL and that works
  • Xbox/UWP: Remove expandedResources
  • Xbox/UWP: UWP OnSuspending crash fix
  • Xbox/UWP: Enable savestate file compression by default for UWP/Xbox – got told there are no more issues with it
  • Xbox/UWP: Add support for 4k to angle on xbox for MSVC2017 build

ParaLLEl-RDP – How the upscaled rendering works

This is a technical article on how upscaling in LLE works on the N64 RDP. Accurate upscaling in LLE is something which has not been done before (it has been done in a HLE framework, but accurate is the key word here), due to its extremely intense performance requirements, but with paraLLEl-RDP running on the GPU with Vulkan, this is now practical, and the results are faithful to what N64 games would look like if games rendered at a very high resolution. There are no compromises on accuracy, and I believe this is a correct representation of upscaling in a “what-if” scenario. The changes required to add this were actually fairly minimal, and there aren’t really any hacks involved. However, we have to be somewhat conservative in what we attempt to enhance.

Main concepts

Unified Memory Architecture – fully accurate frame buffer behavior

A complicated problem with the N64 is that the RDP and CPU have a unified memory architecture, and this complicates a lot. We must assume that the CPU can read arbitrary pixels that the RDP rendered, and the CPU can overwrite pixels written by the RDP earlier. In upscaling, this gets weird very quickly since the CPU does not understand upscaling. To support this, the GPU renders everything twice, once in the native domain, and finally in the upscaled domain. With this approach, the CPU cannot observe that upscaling is happening. It also improves performance in synchronous mode, since we can just render native resolution before we unblock CPU, and the GPU can go on to render upscaled render passes asynchronously, which takes a longer time.

Rasterization at sub-pixel precision

The core mathematical problem to solve for upscaling is how we are going to rasterize at sub-pixel precision. This gets somewhat interesting, since the RDP is fully defined in fixed-point, and there is limited precision available. Fortunately, there are enough bits of precision that we can add extra sub-pixel precision to the rasterization equations. 8x is the theoretically maximum upscaling we can achieve without going beyond 32-bit fixed point math. 8x is complete overkill, 2x and 4x are more than enough anyways.

Instancing RDRAM

Given that we have a requirement of unified memory architecture, paraLLEl-RDP directly implements a unified memory architecture (UMA) as mentioned above where the GPU reads and writes directly into RDRAM. This ensures full accuracy, and this is usually where HLE fails, as implementing UMA at this level is not practical with the traditional graphics pipeline in GPUs. To extend paraLLEl-RDP’s approach to upscaling, I went with multiple copies of RDRAM, one copy for each sub-sample. This works really well, because at any time, if we detect that any write happens in an unscaled context, e.g. CPU writes, we can simply duplicate samples up to upscaled domain. This is essentially some kind of faux MSAA where each pixel has multiple samples associated with it. This is the memory we end up allocating for a 4x upscale (4×4 = 16 samples):

  • RDRAM (8 MB) – Allocated on host with VK_EXT_external_memory_host. This is fully coherent with emulated CPU.
  • Hidden RDRAM (4 MB) – Device local
  • RDRAM reference buffer (8 MB) – Device local
  • Multisampled RDRAM (8 * 16 MB) – Device local
  • Multisampled Hidden RDRAM (4 * 16 MB) – Device local

The reference buffer is there so we can track when CPU writes to RDRAM. Essentially, before we render anything on the GPU, we compare RDRAM against the reference buffer. If there is a difference, the CPU must have clobbered the pixel, and the RDRAM is now duplicated to all the samples of RDRAM. After rendering something, we update the reference buffer, so we know it’s safe to use upscaled pixels later.

When rendering an upscaled pixel (X, Y), we convert the coordinate to native pixel (X, Y) and convert the sub-pixel to an RDRAM instance, e.g.:

ivec2 upscaled_pixel = ivec2(x, y);
ivec2 subpixel = upscaled_pixel & (SCALING_FACTOR - 1);
ivec2 native_pixel = upscaled_pixel >> SCALING_LOG2;
int rdram_instance = subpixel.y * SCALING_FACTOR + subpixel.x;
read_write_rdram(native_pixel, rdram_instance);

Upscaled VI interface

Adding upscaling to the VI interface is fairly straight forward since we can convert e.g. 16 samples back to a 4×4 block of pixels. From there, we just follow the exact same algorithms that we do for native rendering. This means we get correct VI AA, divot and de-dither happening at high resolution.

Modifying rasterization rules

The RDP is a span rasterizer, a very classic design. The rasterization rules are extremely specific and cannot be accurately represented using normal OpenGL/Vulkan triangle rasterization rules, which are based on barycentric plane equations (to the best of my knowledge you can only approximate).

The RDP receives pre-computed triangle setup data from the RSP. We specify three lines with the triangle setup, where one line is the “major” line XH, and a second line is picked from the two “minor” lines XM/XL, depending on y >= YM. Two values YH and YL limit which scanlines we should render. This lets us implement triangles, or more complicated primitives if we want to. Bisqwit made a really cool ongoing video series on software rendering a while back which also implements a span rasterizer, which is very useful to watch if you want a deeper understanding of this approach.

This triangle setup data is defined more specifically as:

  • XH, XM, XL: 32-bit values in the format of s12.15.x. The 4 MSB are sign-extended, and the single LSB is ignored (we can exploit this bit for more precision later!)
  • dXHdy, dXMdy, dXLdy: 32-bit values in the format of s12.13.xxx. 4 MSBs are sign-extended, and 3 LSBs are ignored. This represents the slope of the line for XH, XM and XL.
  • YH: This is a s12.2 value which represents the first scanline we render. There is 2 bits of subpixel precision, which is very useful because the RDP will sample coverage for 4 sub-scanlines per scanline.
  • YM: This s12.2 value represents the first sub-scanline where XL is selected as the minor line, otherwise XM is used.
  • YL: This represents the final sub-scanline which is rendered. The sub-scanline of YL is not included in rasterization.

The algorithm for native resolution in GLSL:

// Interpolate X at all 4 Y-subpixels.
// Check Y dimension.
int yh_interpolation_base = int(setup.yh) & ~(SUBPIXELS - 1);
int ym_interpolation_base = int(setup.ym);

int y_sub = int(y * SUBPIXELS);
ivec4 y_subs = y_sub + ivec4(0, 1, 2, 3);

// dxhdy and others are (setup value >> 2) since we're stepping one sub-scanline at a time, not whole lines. This is why more LSBs are ignored for the slopes.
ivec4 xh = setup.xh + (y_subs - yh_interpolation_base) * setup.dxhdy;
ivec4 xm = setup.xm + (y_subs - yh_interpolation_base) * setup.dxmdy;
ivec4 xl = setup.xl + (y_subs - ym_interpolation_base) * setup.dxldy;
xl = mix(xl, xm, lessThan(y_subs, ivec4(setup.ym)));

ivec4 xh_shifted = quantize_x(xh); // A very specific quantizer, see source ...
ivec4 xl_shifted = quantize_x(xl);

ivec4 xleft, xright;
if (flip) // Flip is a bit set in triangle setup to mark primitive winding.
{
    xleft = xh_shifted;
    xright = xl_shifted;
}
else
{
    xleft = xl_shifted;
    xright = xh_shifted;
}

We have now computed a range of which pixels to render for each sub-scanline, where [xleft, xright) is the range. If xright <= xleft, the sub-scanline does not receive coverage. The quantizer is somewhat esoteric, but we essentially quantize X down to 8 sub-pixels of precision (>> 13). This is used later for multi-sampled coverage in the X dimension.

To add upscaling, the modifications are straight forward.

int yh_interpolation_base = int(setup.yh) & ~(SUBPIXELS - 1);
int ym_interpolation_base = int(setup.ym);
yh_interpolation_base *= SCALING_FACTOR;
ym_interpolation_base *= SCALING_FACTOR;

int y_sub = int(y * SUBPIXELS);
ivec4 y_subs = y_sub + ivec4(0, 1, 2, 3);

// Interpolate X at all 4 Y-subpixels.
ivec4 xh = setup.xh * SCALING_FACTOR + (y_subs - yh_interpolation_base) * setup.dxhdy;
ivec4 xm = setup.xm * SCALING_FACTOR + (y_subs - yh_interpolation_base) * setup.dxmdy;
ivec4 xl = setup.xl * SCALING_FACTOR + (y_subs - ym_interpolation_base) * setup.dxldy;
xl = mix(xl, xm, lessThan(y_subs, ivec4(SCALING_FACTOR * setup.ym)));

This is an accurate representation, as the only thing we do here is to shift in more bits into triangle setup, as long as this does not overflow, we’re golden. After this step, we have scissoring. Scissor coordinates are u10.2 fixed point, so it means the maximum resolution for the RDP is 1024×1024. With 8x upscale and 8 sub-pixels of X precision, we can barely pack the resulting range in unsigned 16-bits without overflow.

Modifying varying interpolation

Attribute interpolation is a little more interesting. There are 8 varyings, which all have the same setup data:

  • Shade Red/Green/Blue/Alpha
  • S
  • T
  • 1/W
  • Z

Each varying has 4 values:

  • Base value – sampled at coordinate (XH, YH) (kinda … it’s complicated)
  • dVdx – Change in value for 1 pixel in X dimension
  • dVde – Change in value when following the major axis down one line, and sampling at the next line’s XH. Basically dVde = dVdx * dXdy + dVdy. I’m not sure why this even exists, it makes the interpolation math a little easier I suppose?
  • dVdy – This feels very redundant, but it is what it is. It is only used for coverage fixup and LOD computation.

We cannot shift in extra bits here, unlike rasterization, so we have to be a little creative here. To stay faithful, and avoid overflow, we need to ensure that the interpolation is correct for each sample point which matches sample points for native resolution, and for the inner sub-pixels, we remove some bits of precision in the derivative. Essentially, instead of doing something like this (not the correct math, see code, here for brevity):

int base_interpolated_x = ((setup.xh + (y - base_y) * setup.dxhdy)) >> 16;
rgba = attr.rgba;
int dy = y - base_y;
int dx = x - base_interpolated_x;
rgba += dy * attr.drgba_de;
rgba += dx * attr.drgba_dx;

we do …

int base_interpolated_x = ((setup.xh + (y - base_y) * setup.dxhdy)) >> 16;
rgba = attr.rgba;
int dy = y - base_y;
int dx = x - base_interpolated_x;
rgba += (dy >> SCALING_LOG2) * attr.drgba_de + (dy & (SCALING_FACTOR - 1)) * (attr.drgba_de >> SCALING_LOG2);
rgba += (dx >> SCALING_LOG2) * attr.drgba_dx + (dx & (SCALING_FACTOR - 1)) * (attr.drgba_dx >> SCALING_LOG2);

The added error here is microscopic.

Workarounds

Some games do not work correctly when we upscale, since the game never intended to render sub-pixels. This usually comes into play in two major scenarios, which we need to workaround.

Using LOD for clever hackery

The mip-mapping on N64 is quite flexible, and sometimes two entirely different textures represent LOD 0 and LOD 1 for smooth distance based effects. When upscaling with e.g. 4x, we essentially get a LOD factor which is a LOD bias of -2 (log2(1/4)). An optional workaround is to compensate by applying a positive LOD bias ourselves to emit LOD levels the game expects. Ideally, this workaround is applied only in places where it’s needed.

Sprite rendering / TEX_RECT

Many games render sprites with TEX_RECT with the expectation that textures are rendered 1:1 with input texels to output texels. When we start upscaling, the game might have forgot to disable bilinear filtering, and we start filtering outside the texture boundaries, i.e., against garbage, which shows up as ugly seams in the image. The simple workaround is to render TEX_RECT primitives as if they are not upscaled. This is necessary anyways for the COPY pipe, since the COPY pipe only updates the varying interpolator every 8th framebuffer byte. We cannot safely upscale these kinds of primitives either way.

Conclusion

There isn’t much more to it. Adding upscaling to ParaLLEl-RDP was not all that complicated compared to the other insanity that went into making this renderer work. It’s a principled approach to the upscaling which I believe could theoretically work in a custom RDP hardware design.