IonRa
New Member
- Joined
- Jun 24, 2008
- Messages
- 30 (0.00/day)

Introduction
The technology’s hardware core, the Graphics Processing Unit (GPUs), is one of the major factors determining computer performance in displaying 3D graphics scenes. However, GPU manufactures are still limited by die size, power and heat dissipation issues, as well as price/performance limitations set by the market.
As a result, one of today’s common solutions for upgrading performance is to use multiple GPUs to share the load of graphics performance. Not only does this maintain reasonable power consumption, it also allows consumers to upgrade existing cards by using Add-in Graphics Boards (AiBs). The demand for scalable platforms and uncompromised visual quality is migrating from high end enthusiasts to the mainstream segments. This mass market segment is now demanding great graphics performance and at the same time expecting new solutions to be flexible, easy to deploy and maintain pricing levels.
Today’s multi-GPU solutions have been developed by a select number of vendors, and require the consumer to use only identical GPUs from that particular vendor, which seriously limits consumer choice. Another obstacle is the requirement for special multi-GPU connectors. Furthermore, to install multiple GPUs, the consumer needs to be tech-savvy. For most consumers, getting the proper hardware and performing this kind of installation is beyond their technical abilities. To overcome these obstacles, multi-GPU support needs to provide a smoother upgrade path and more flexibility for regular users. In order to enable this, a totally different approach is needed to interoperability between GPUs and to the multi-GPU enabling technology architecture. This white paper discusses graphics processing architecture and presents a new technological approach that can enable multi-GPU processing independent of the vendor. Through this approach, consumers will be able to upgrade to multiple GPU processing and load balancing, with less complexity and without being locked into a particular vendor.

Graphics Processing Architecture Overview
The architecture of today’s Graphics Processing Unit (GPU) includes the primary computing components shown in Figure 1 below
The geometry processor (also known as the geometry shader) is responsible for processing polygons and creating the actual order between the objects, their location in the frame, perspective distortion, and partial removal of hidden polygons. The output of the processor/shader is a raster polygon. Other
polygons that are in a hidden part of an object, or objects that are hidden behind other objects, are discarded. The pixel processor (also known as the pixel shader in advanced architectures) fills each polygon with the correct texture, adding shades, lighting effects and color variations. The final output of
the two processors is stored in a frame buffer memory and sent to the display.
This sequential processing of the frame creates three major
potential bottlenecks:
• Geometry shader: bottleneck processing of frames where there are many changes like movements of objects or new objects appear
• Pixel shader: bottleneck of high “per-pixel-operations” such as high resolutions and anti -aliasing
• Memory capacity and access time: bottlenecks in memory capacity in major operations, for example, when large textures are being swapped
Different parallelization methods should be implemented in the various application scenarios to resolve each of the bottlenecks and allows better performance scaling. The flexibility to select the correct parallelization method in real-time that matches the application scenario is essential for getting optimized results.
The selection should be such that the correct parallelization method is activated based on the current application scenario.
Parallel Graphics Processing Methods
To address the need for processing power, the GPU vendors have turned to the multi -GPU approach, similar to the multi -core approach of the general CPU and even network processors. Using this approach, any number of graphics cards can simultaneously process a single frame within an application or game. In this topology, GPUs are connected to the Northbridge via the PCIe slots and one of the graphics cards is connected to the display.

Split Frame / Tiling
The split frame or tiling methodology is not commonly used today. When implemented, each GPU is configured to handle a specific part of the screen, for example, upper or lower part in a dual-GPU configuration. The exact positions of where the frame splits are determined dynamically according to the processing power required to process each part.
The split frame method reduces the number of pixels processed by each so that the pixel shading bottleneck is reduced. However, each GPU still needs to store in its memory the entire screen, so the geometry shader and memory bottlenecks are not affected. This memory storage activity slows down the system and constitutes a major drawback of this methodology.
Split frame/ tiling works best when there are no inter-frame dependencies and the per-pixel operation is the significant bottleneck, which is common to many of the games these days. However, it breaks down when there are other bottlenecks or inter-frame dependencies and render-to-texture techniques exist in the application.

Alternate Frame
The Alternate Frame method is the most commonly used for today’s multiple-GPU solutions. In this method, each frame is assigned alternately to each GPU, such that each GPU performs the rerndering while the other GPU is rendering the previous frame. This provides more time for each GPU to render the frame. For example, in a two-GPU scenario, the first GPU handles the even (n) frames and the second GPU handles the odd (n+1) frames. The main drawback of this method is latency and scaling over two GPUs. With high frame rates, the latency is rarely noticeable.
Alternate Frame methodology performs best when each consecutive frame is well balanced, such that it takes approximately the same time to render each frame, and the GPUs are identical in their performance. When the GPUs are not identical, or inter-frame dependencies exist in the application, this methodology tends to break down. Inter-frame dependencies are found in most of the game titles developed in the last few years.
Real Time Distributed Processing
A multi -GPU solution that will be accessible to most users should meet the following requirements:
• Allow users to choose their favorite GPU for the performance and price
• Enable choice for future system upgrades. Consumers should not have to scrap their current graphics technology
or be locked-in to their existing vendor when they are looking for an add-on to their existing system.
• Eliminate the need for special or proprietary connectors
• Provides application scalability when more than one graphic card is installed.
• Allow non-identical GPUs to work in a system, thereby avoiding the need to replace both graphics cards when one
is faulty or outdated.
With these requirements in mind, LucidLogix developed the Lucid HYDRA Engine. The Lucid HYDRA engine is the first
dedicated silicon solution implementing real time distributed processing (RTDP) to deliver these requirements.
Load balancing in a frame & between frames
The HYDRA engine contains processes to analyze the frames before rendering and intelligently distribute the rendering tasks between the GPUs on board. The frame decision mechanism resolves bottlenecks and inter-frame dependencies prior to rendering, in real time, such that there is no additional latency.
The HYDRA engine contains a generic solution for different games, as well as rendering methods and an auto-correct loadbalancing scheme for scaling. For GPUs that are not identical in performance or manufacturer; the HYDRA engine allocates the resources appropriately during processing for optimization of
the GPU rendering power. The following image shows the engine architecture:

Last edited by a moderator: