CPU, GPU or Both?
Finding the right multi-processing approach for engineering applications.
Latest News
April 1, 2013
In general, it’s apparent that GPUs have more of the floating-point computational horsepower needed for many engineering computations, but it’s rarely that clear-cut. Many computations aren’t strictly floating point, and even if they are, many can’t be broken down to execute in parallel. That requires computations that execute the same series of instructions on different data, a process known as single instruction, multiple data (SIMD).
But for those SIMD computations that can execute in parallel, and use the floating point data, GPUs offer an enticing and high-performance alternative to the use of industry-standard CPUs. In parallel, they can perform simulation-specific computations significantly faster than CPUs. While many performance comparisons exist, most seem to cluster around a performance improvement in single-processor operations of GPUs over CPUs of about 2.5x.
Intel industry standard processors provide a high rate of throughput, especially for integer operations. They tend to work well for most engineering computations, but aren’t optimized for all kinds of computation.
NVIDIA’s CUDA enables massively computational systems such as this one that can support hundreds of thousands of cores. |
So, engineers seeking the highest level of performance need to look at not only at the computations they perform, but also their mix of computations. It is possible to make this analysis very detailed and specific in nature, but most engineering teams would be fine just looking at the type of work they do and their mix of computations.
Weighing the Benefits
Most GPUs lack certain features that programmers need in many types of software. For example, GPUs don’t have stack pointers and therefore don’t support recursion, the act of a function calling itself. That type of computation tends to be slow and not called for in graphics operations.
Lacking features such as these lets GPUs execute code more quickly, but the code itself has to be changed or simplified to do so. Porting code requires engineering effort; how much depends highly on the type of code and structure of the application. For an entire application, it is likely to be a significant effort. For only parts that can be effectively parallelized, the effort won’t be nearly as great.
That’s the primary reason why CPUs and GPUs are more complementary than competitive in the nature of their workloads. They do different things well. Most, if not all engineering application vendors leave user interface code and editing code to run on the CPU. Computational code that can make use of graphics processing and parallel operations is being increasingly ported to GPUs.
One popular option for the execution of parallel code on GPUs is the compute unified device architecture (CUDA), a parallel computing platform and programming model developed by NVIDIA for use with its GPU families. It gives developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs, so that code can be written or ported to run directly on the GPUs.
CUDA works by having its own processor cores and shared memory. The CPU dispatches GPU-compiled code and data (it works slightly differently depending on the software technology being used) to the GPU cores, where the computation occurs. When the computation is complete, the results are passed back to the CPU that controls the application.
Vendors such as MathWorks and AccelerEyes offer independent ways of dispatching code and data to CUDA GPUs, enabling easy GPU use of the MATLAB engineering programming language. MATLAB also makes it easy to break up a computation to run on a specified number of processors and cores.
In addition to the CUDA C/C++ and CUDA Fortran programming languages, the CUDA platform supports other computational interfaces, including OpenCL, Microsoft’s DirectCompute, and C++ AMP. Third-party wrappers are also available for languages such as Python, Fortran and Java. There is also native CUDA support in Mathematica.
Best of Both Worlds
One approach to take is a combination of both CPUs and GPUs, using a multiprocessing option called OpenCL. OpenCL, which has been adopted by a number of vendors (Apple, AMD and Intel, among others), is a framework for writing programs that can run across different processors. OpenCL provides the ability to dispatch computations to either a CPU or GPU (as well as digital signal processors and even programmable gate arrays), depending on what the code is designed to run on.
For engineers, it has the potential to make the dichotomy between industry-standard CPUs and GPUs seamless and transparent. It lets code run where it makes the most sense. If the OpenCL standard is used by software vendors, code should be portable within implementations that follow the standard.
The AMD FirePro 5900 supports the OpenCL standard that allows for the sharing of CPUs and GPUs in a single system. |
AMD has integrated OpenCL as its programming framework for its FirePro family of GPUs, as well as its CPU offerings. According to Antoine Reymond, alliances manager at AMD, OpenCL is supported by The Khronos Group, an American not-for-profit member-funded industry consortium. “It is a collaborative effort that ensures there is a common programming standard across different implementations,” he explains.
OpenCL includes a programming language based on C for writing kernels that execute on OpenCL devices, plus application programming interfaces (APIs) that are used to define and then control the platforms. The C language is somewhat limited, in that it doesn’t allow function pointers or recursion. That means any existing code still has to be modified to work with either GPU vendor.
OpenCL can be used to give an application access to a GPU for non-graphical computing, such as engineering computations. The intriguing thing about OpenCL is that it offers the ability to use both CPUs and GPUs in combination. Of course, code still has to be compiled for one or other, so it’s not quite that straightforward.
Once again, the same computational limitations apply as with CUDA. But because the GPUs and CPUs share memory, passing computations off to GPUs tends to be faster than with CUDA. In either case, it is likely that both CUDA and OpenCL implementations on GPUs will deliver significantly better parallel execution of computational code than CPUs.
The Choice is Yours
Is there truly a timesaving value for engineers in using GPUs in parallel execution, whether using CUDA or OpenCL? It depends on the types of workstation or multiprocessing system being used, on the software, and on the types of computations being performed.
Chances are you will benefit, if you do a lot of data analysis or simulation. Depending on which multiprocessing standard your systems support, CUDA and OpenCL each offer similar performance advantages. But if you’re doing single-threaded operations, such as design, or if you’re engaged in a lot of more general-purpose computing, you will see little, if any advantage.
Of course, your engineering software still has to support GPU execution. That’s becoming less of an issue with commercial software today, as an increasing number of vendors are compiling parts of their code for NVIDIA and/or AMD GPUs. Unless you’re using a niche vendor, or have your own code, chances are you’ll find a GPU solution.
All this leads to why it might be more appropriate to use a shared multiprocessing or cluster system. You’re probably not going to make use of your workstation GPUs for general-purpose engineering computing. Instead, you’re going to concentrate on making a single GPU system more available to engineers who can make the best use of it.
Rather than one or the other, you should be looking at a mix of both types of processors. If your work is more heavily skewed toward design, you probably want to lean toward a more CPU-heavy approach. Of course, you want those CPUs to be the fastest and most powerful in general, even if they have fewer cores.
But if you do a lot analysis and simulation, you want processors and cores—and some of them should probably be GPUs, assuming that you can get compiled GPU code for your application. For those types of computations, a mixed system using both CPU and GPU cores using CUDA or OpenCL would work equally well.
Contributing Editor Peter Varhol covers the HPC and IT beat for DE. His expertise is software development, math systems, and systems management. You can reach him at [email protected].
More Info
Subscribe to our FREE magazine,
FREE email newsletters or both!Latest News
About the Author
Peter VarholContributing Editor Peter Varhol covers the HPC and IT beat for Digital Engineering. His expertise is software development, math systems, and systems management. You can reach him at [email protected].
Follow DE