Optimizing high-performance computing via object storage for CAE applications.
Manufacturers in the automotive, aerospace, and oil and gas industries are turning to compute-intensive CAE software to help their organizations gain a competitive edge. From the automotive industry’s perennial efforts to improve the safety of vehicles using simulation software and finite element analysis to the oil and gas industry’s use of computational fluid dynamics (CFD) to aid in fuel exploration, CAE technology is pushing organizations to speed up time to market, improve design and quality, and generate returns on capital investment. While the software itself requires intense computer power, progress in CAE applications has accelerated with the use of powerful high-performance Linux clusters. However, these clusters and their associated parallel processing applications place enormous demands on storage systems ill-equipped to handle the computational power or resulting data throughput.
In order to keep these Linux clusters operating efficiently and to extract maximum value from IT expenditures, companies using CAE software need to keep these clusters busy at all times. And to do that, a high-performance cluster is dependent upon high-performance storage. Without an adequate storage solution, clusters (and the designers and engineers) will stand idle waiting for input/output storage bottlenecks to be cleared. Traditional Storage Currently, there are two types of network storage systems, each distinguished by its command sets. First is the SCSI block I/O command set, used by storage area networks (SAN), which provides high random I/O and data throughput performance via direct access to the data at the level of the disk drive or fibre channel. Network attached storage (NAS) systems use protocols such as NFS or CIFS command sets for accessing data with the benefit that multiple nodes can access the data as the metadata (describes where the data exists) on the media is shared. To achieve the high-performance and data-sharing benefits that Linux clusters can provide requires a fundamentally new storage design, one that can offer both the performance benefits of direct access to disk and the easy administration provided by shared files and metadata. That new storage design is an object-based storage architecture.
Figure 1: The Panasas ActiveScale Storage Cluster eliminates bottlenecks by separating the metadata from the data path using a parallel file system with object-based storage. .
A New Storage Frontier Object storage offers virtually unlimited growth in capacity and bandwidth, making it well-suited for handling large data sets generated by Linux clusters. Unlike conventional storage systems, data is managed as large virtual objects. An object is a combination of application (file) data and storage attributes (metadata) that define the data. Managing data as objects, as opposed to traditional storage blocks, means that files can be divided into separate pieces. As illustrated in Figure 1 (above) these blocks are then distributed across storage media known as object-based storage devices (OSDs). So just as the Linux clusters spread the work evenly across compute nodes for parallel processing, the object-based storage architecture allows data to be spread across OSDs for parallel access. It is massively parallel processing on the front end, matched by massively parallel storage on the back end.
Figure 2: The Panasas Shelf represents the heart of a Panasas storage cluster system. The Shelf accepts the company’s blade storage and networking devices. Full Shelf units link together to create a storage cluster. Each Panasas Shelf comes with two redundant power supplies and a unit-wide battery backup module. .
This architecture delivers substantial benefits. By separating the control path from the data path, file system and metadata management capabilities are moved to the nodes in the Linux cluster, and give nodes direct access to storage devices. By doing so, OSDs autonomously serve data to end-users and radically improve data throughput by creating parallel data paths. Instead of pumping all information through one path, which creates major bottlenecks as data size and number of nodes increase, Linux cluster nodes can securely read and write data objects in parallel to all OSDs in the storage cluster.
With object-based storage, the Linux compute cluster now has parallel and direct access to all of the data spread across the OSDs within the shared storage. The large volume of data is therefore accessed in one simple step by the Linux cluster for computation. While the simulation and visualization data may still need processing for weeks at a time, the object model of storage drastically improves the amount, speed, and movement of data between storage and compute clusters.
Garth Gibson is co-founder and chief technology officer at Panasas Inc. Gibson did the groundwork research and co-wrote the seminal paper on RAID, now a checklist feature throughout storage industry products. Send your comments about this article through e-mail by clicking here. Please reference “Expanding Space, November 2005” in your message.
Object Architecture Helps ICME Rock The Stanford University Institute for Computational and Mathematical Engineering (ICME) is a multidisciplinary organization designed to develop advanced numerical simulation methodologies that will facilitate the design of complex, large-scale systems in which turbulence plays a controlling role. Researchers at the Institute use HPC (high-performance computing) as part of a comprehensive research program to better predict flutter and limit cycle oscillations for aircraft, to better understand the impact of turbulent flow on jet engines, and to develop computational methodologies that can be expected to facilitate the design of naval systems with lower signature levels. Their research can lead to improvements in the performance and safety of aircraft, reduction of engine noise, and new underwater navigation systems.
Researchers at the Institute use their 164-node Nivation Linux cluster to tackle large-scale simulations. The Institute recently deployed the Rocks open source cluster software on several large-scale clusters, including Nivation, to support its work. Rocks provides a collection of integrated components that can be used to build, maintain, and operate a cluster. Its core functions include installing Linux, configuring the compute nodes, and monitoring and managing the cluster.
The Challenge While Rocks helped the Institute maximize the compute power of Nivation, an existing storage system hindered overall system performance, often getting hung, which limited the productivity of the cluster and end users. “It’s imperative that our clusters be fully operational at all times,” said Steve Jones, Technology Operations Manager at the Institute. “The productivity of our organization is dependent upon each cluster running a peak optimization.”
‹ ‹ Packaged as dense, redundant 5TB shelves, the Panasas NAS supports two modes of data access: a DirectFLOW out-of-band path that enables parallel access; and the NFS/CIFS in-band path that supports UnixNFS and Microsoft Windows protocols. . The Institute needed a solution that could scale in both in capacity and the number of cluster nodes supported while providing random I/O performance. “High performance is critical to our success at the Institute,” said Jones. “We needed a storage solution that would allow our cluster to maximize its CPU capabilities.” Additionally, Jones supports several clusters by himself, so ease of installation and management was essential. A critical goal was to quickly install a storage system and immediately move data. The Solution The Institute selected the Panasas ActiveScale Storage Cluster to eliminate the I/O bottleneck hindering overall performance of the Nivation cluster. By offering an integrated hardware and distributed file system software solution, a 5-TB Panasas Storage Cluster was set up and in production within a matter of hours. This was an important benefit for Steve Jones. “We don’t have a huge team and weeks of time to integrate discrete systems,” said Jones. “We do science here, not IT.”
To achieve high data bandwidth, the Institute installed the Panasas DirectFLOW client protocol on each node in the cluster. The direct node to disk access offered by DirectFLOW allowed the Institute to achieve an immediate order of magnitude improvement in performance. “By leveraging the object-based architecture, we were able to completely eliminate our storage I/O bottleneck,” said Jones. The Panasas out-of-band DirectFLOW data path moves file system capabilities to Linux compute clusters, enabling direct and highly parallel data access between the nodes and storage devices.
The Result Since installing the Panasas Storage Cluster, the Institute has been able to maximize the productivity of the Nivation cluster, consistently running at 100% utilization. This allows the Institute to deliver accelerated results to their customers. “Our user community is dependent upon our clusters to deliver results as quickly as possible,” says Jones. “Previously, we had to limit the growth of our clusters because of I/O issues,” said Jones. “With the object-based architecture, we are empowered to build the largest, fastest clusters possible. We now have a shared storage resource that can scale in both capacity and performance.”—G.G.
Contact Information
Panasas, Inc. Fremont, CA
Rocks Cluster Distribution
Stanford University Institute for Computational and Mathematical Engineering Stanford, CA |