Help for HPC in the Cloud

Offloading simulation to the cloud requires help from specialists.

The flexible licensing terms of COMSOL Server can help empower enterprises to deploy its CAE application in an on-premise HPC cluster or in a public cloud. Image courtesy of COMSOL.

COMSOL Application Server The flexible licensing terms of COMSOL Server can help empower enterprises to deploy its CAE application in an on-premise HPC cluster or in a public cloud. Image courtesy of COMSOL.

While the cloud is quickly becoming the platform of choice for many enterprise applications, it’s not a slam-dunk for deploying high-performance computing (HPC) environments, especially for organizations light on internal cloud and HPC expertise.

For general business tools like sales or human-resource related applications or even mission critical back-office platforms such as ERP (enterprise resource management), the cloud is increasingly the go-to approach as companies look to capitalize on pay-as-you-go pricing schemes and the ability to be up and running quickly without a lot of IT handholding. According to Computerworld’s 2015 Forecast, 42% of IT decision makers planned to increase spending on cloud computing in 2015, the greatest spike among enterprises with over 1,000 employees.

All flavors of cloud deployment are feeling the love from the mainstream enterprise. The Cisco Global Cloud Index found that of the total cloud workload, 28% is projected to be Infrastructure-as-a-Service (IaaS) workloads, 13% will be Platform-as-a-Service (PaaS), and the most popular deployment model will be Software-as-Service (SaaS), owning 59% of the workloads by 2018.

Yet when it comes to spinning up HPC resources in the cloud, the offerings are far less mature and the customer uptake much slower. Public cloud platforms like Amazon Web Services (AWS), Microsoft Azure and IBM BlueMix are far less optimized for the more demanding and specialized requirements of HPC environments tuned to run engineering applications such as simulation and optimization. Moreover, some of the barriers slowing down mainstream cloud deployments — things like security concerns, complications integrating legacy systems and unexpected costs — are also proving to be hurdles to effectively establishing cloud HPC as a go-to resource for engineering organizations.

“If you look at offerings like Amazon or Azure, you see most of the cloud layers do have a solid infrastructure for business computing,” says Srikanth “Sam” Mahalingam, chief technology officer for HPC and Cloud Solutions at Altair, makers of CAE software. “However, when it comes to technical computing, there are other more demanding needs that are still not addressed.”

The most significant gap in current public cloud offerings’ ability to support robust HPC environments is the lack of high-speed interconnect fabrics. Traditional HPC clusters have integrated technologies like InfiniBand and Intel’s Omni-Path architecture to ensure high levels of communication within the system as well as between systems. This kind of low-latency communications is critical for running large-scale simulations and optimizations in a cloud HPC environment, and to date, few providers have built out these capabilities as part of their public cloud platforms, says Dominic Daninger, vice president of Engineering for Nor-Tech, a systems integrator specializing in HPC implementations.

“Many computational fluid dynamics (CFD) or heat modeling applications that have a lot of intermodal chat going on really benefit from the very low latency fabrics like InfiniBand,” Daninger says. “Very few cloud providers can offer that and if they do, there’s a substantial cost to it.”

Security is another big obstacle slowing cloud-based HPC adoption for many organizations, particularly those who see their HPC environments — tightly coupled with advanced simulation software — as a strategic, competitive edge. “There’s a reluctance within many engineering groups [to go to the cloud] because they know their IP (intellectual property) is the company’s gold and they don’t want to put it outside of the company,” he says.

Knowledge is HPC Power

Unlike business systems, which users can easily spin up in no time just by renting compute cycles and storage on AWS or Azure clouds, there’s a lot more complexity involved in configuring an HPC environment — even in the cloud. “It’s not as easy as it looks,” Daninger says. “You have to look deeper and be knowledgeable enough to know the types of jobs you’re looking to run in the cloud.”

When considering a cloud platform for HPC, organizations will need to have as much in-house expertise as they would if they were configuring an on-premise HPC cluster. For example, they will still need to determine the right operating system, cluster management software, post processing visualization capabilities, even what data is moving back and forth to get the job done. Beyond HPC domain expertise, they need to be versed in all of the small details related to the specific cloud offering and cloud deployment — for example, knowing the costs involved in moving data up to the cloud and more importantly, back down to on-premise engineering systems.

“Many companies don’t have the expertise to spin this up on their own — they need specialists like us to help them do that,” Daninger contends.

Given the complexities and the minefield of potential hidden costs, experts say cloud-based HPC is not universally the best value proposition at this time, despite on-going reports to the contrary. Larger firms that have already invested in data centers and have HPC expertise on-site are still better served keeping HPC infrastructure on site, says Altair’s Mahalingam. However, the story changes for small and mid-sized companies.

“It’s really about a price/performance tradeoff,” Mahalingam says. “Since smaller players don’t have the HPC infrastructure, they’re better off going to a public cloud with their jobs running a little slower than outlaying a capital expense for systems with high-speed interconnects,” he says.

Cloud-based HPC also makes sense for both large and small companies who have periods of peak compute requirements to accommodate simulation cycles for a particular project or a critical stage of the development cycle, according to Bill Mannel, vice president and general manager, HPC and Big Data Solutions for HP Servers at Hewlett-Packard. However, again, it’s not as straightforward to flip a switch on cloud-based capabilities for a short-term HPC burst as it is to turn up the processing volume for a CRM (customer relationship management) or mainstream business application.

That Was Then, This is Now

The limitations surrounding cloud HPC may be short lived, however. Cloud providers, HPC specialists and CAE software vendors recognizing the opportunity and are working hard to make cloud-based HPC capabilities more palatable for a larger swath of users. Microsoft, for example, just recently announced it will offer professional graphics applications and accelerated computing capabilities on Azure with NVIDIA GPUs (graphics processing units), staking claim to being the first cloud platform to provide NVIDIA GRID 2.0 virtualized graphics. By deploying the latest version of NVIDIA GRID in its N-Series virtual machine offering, Azure delivers on the ability for engineers and designers to visualize complex, data-intensive designs and simulations from anywhere, Microsoft officials say.

ANSYS Enterprise Cloud The ANSYS Enterprise Cloud is designed to give more engineers access to the software and hardware tools they need. Image courtesy of ANSYS.

For its part, HP offers what it bills as a “self-service” HPC package, but it’s designed for organizations that want an appliance-like approach to HPC for an on-premise private cloud, not for the public cloud. Mannel also says HP doesn’t offer turnkey HPC solutions for the cloud or otherwise; rather its approach is to leverage reference architectures and consulting services to create the optimal HPC environment for individual customers, most of which are larger companies. “You may not make the right choices without some level of consultation about what works best for your workload,” he says.

Software Vendors Enter the Ring

Many of the simulation ISVs (independent software vendors) are also taking a stab at making cloud-based HPC more accessible, particularly for their specific CAE offerings. At COMSOL, cloud-based HPC isn’t currently in top demand among its customers, but the company believes that will change over time, according to Phil Kinnane, COMSOL’s vice president of Business Development. As part of its effort to support the shift, COMSOL now offers a more liberal floating licensing policy with its COMSOL Server offering, which now makes it possible to launch an instance on AWS. “You can use the software on a workstation with multi-core processors or use it in the cloud and distribute it on those resources,” he says.

The ANSYS Enterprise Cloud, launched in May, empowers the transition of simulation workloads to the cloud as a turnkey process. Powered by the ANSYS Cloud Gateway and the ANSYS reference architecture, the ANSYS Enterprise Cloud is delivered as a single-tenant solution through a dedicated corporate account on the AWS public cloud. The platform, which can be managed either by internal IT experts or by ANSYS certified service partners, offers secure storage and data management.

Altair is now offering its HPC solutions on AWS and Microsoft Azure, and is even promoting a free trial of CAE on the cloud as part of a Cloud Challenge with AWS and Intel to showcase the benefits of this kind of deployment. In addition to its fully configured physical HPC/CAE HyperWorks Unlimited appliance that can be used in a private cloud, Altair has recreated a similar environment for use on the public cloud, accessible by engineers in small companies through a Web browser.

Having all of the components packaged up in a virtual or physical appliance is still critical given the complexity of the HPC environment. “For example, explicit solvers need computational cores more than memory, but implicit solvers need more memory than cores,” Mahalingam says. “If you look at the appliance stack, there are a lot of layers and there is a right configuration of nodes for different kinds of jobs. We have optimized the whole stack, eliminated complexity, and can also serve as the single point of contact for support.”

More Info

Share This Article

About the Author

Beth Stackpole's avatar
Beth Stackpole

Beth Stackpole is a contributing editor to Digital Engineering. Send e-mail about this article to [email protected].

Follow DE