Preparing for AI-Driven Simulation
Organizing past and present data is the key to developing surrogate models.
July 29, 2024
Simulation is costly—not just in computing expense but also in time. As a result, engineering teams that rely on the practice tend to save their simulation results, partly to avoid unnecessary duplication and partly to reuse them as guidance for subsequent design evaluations. Most of them probably didn’t anticipate that one day they would be able to feed the data into an artificial intelligence (AI) program to develop surrogate models or reduced order models. That day has arrived, ushered in by purpose-built applications like Ansys SimAI.
Ansys launched SimAI in January 2024, describing it as “a physics-agnostic, software-as-a-service (SaaS) application that combines the predictive accuracy of Ansys simulation with the speed of generative AI.” In this article, we speak to Ansys and other experts to understand how you can take advantage of this new method, and whether your legacy data is reusable.
Reduced Order Model (or) Surrogate Model
“Reduced order modeling (ROM) and model order reduction (MOR) are techniques for reducing the computational complexity of a full-order, high-fidelity model while preserving the expected fidelity within a satisfactory error,” according to engineering calculation software maker MathWorks. The terms ROM and surrogate model are often used interchangeably. In AI-driven simulation, such models drastically reduce the time required to evaluate performance of new designs by bypassing the need to run full-scale physics-based simulation.
What Data Do You Need?
For most people, when considering simulation data, what comes to mind is the color-coded finite element analysis (FEA) or computational fluid dynamic (CFD) results, revealing pressure distribution, heat buildups and structural deformations in a rotatable 3D model. But for AI training, what you need is something else. It’s not the gigabyte-size 3D models.
“You need the raw data,” says Ilya Tolchinsky, lead product manager for artificial intelligence/machine learning (AI/ML) at Ansys. “Essentially it’s a cloud of points in a 3D field, showing the input values and correlating output pressures, stresses and displacements.”
Bjorn Sjodin, senior vice president of product management, COMSOL, similarly points out, “Essentially it’s a spreadsheet, but it contains so much data, maybe millions of rows, so it’s usually stored in a more manageable format.”
This is good news. That means, the data is not tied to the authoring software program, or a particular software vendor. Since simulation results can be exported into the raw data format, in theory, the results from any FEA or CFD program can serve as the source for AI training.
Context-Dependent Solvers
In May 2024, at its annual software user conference Siemens Realize Live, Siemens promised to integrate more AI-powered features into its cloud-hosted Xcelerator portfolio [https://tinyurl.com/3kc4ctx6]. Wouter Dehandschutter, director of technical product management, Siemens Digital Industries Software (Siemens), says, “To make use of old simulation data for ML, you need to give it context, to be able to tell what your data is about and what use cases it can support.”
This context serves as a diagnostic tool when the surrogate model begins to exhibit strange behaviors. “You might find out that the end user is using it outside the spectrum of the training data,” he points out.
“Think of the surrogate model as a dedicated solver—a very fast solver for a specific application, limited by what the training data represents,” says Tolchinsky. “It’s important to note that, the more the end usage deviates from the conditions of the training data set, the less reliable the model is. A surrogate model developed from a training dataset involving hydrofoils on racing boats, for example, should not be used to predict how another type of nautical device might behave in air and water. The original dataset does not represent the case under investigation.”
Dehandschutter believes the context of the ROM is tied to product lifecycle management (PLM) data, suggesting the two need to be integrated.
“PLM solutions help you keep your data in the context of the product’s lifecycle. They have a record of products being updated or obsoleted. And your ROM is developed with the data from a certain point in time of the product’s lifecycle. It’s important to prevent people from using an outdated ROM to evaluate new products,” he says.
Tolchinsky views surrogate models as an aspect of digital twins. He says, “SimAI is surrogate models for design, so there may not even be an associated [physical] product at certain points in time, and the surrogate model can be used for multiple products, such as a ROM for any wing on any aircraft. Ansys already has [product data management (PDM)] solutions, so it makes perfect sense to integrate it with Ansys SimAI.”
Legacy Data vs. Fresh Start
In using simulation data for AI training, two distinct groups face different challenges: those attempting to reuse legacy data, and those starting with a clean slate. For the first group, the challenge is, their simulation protocols may have changed.
“If you have already tweaked the way you configure your simulations, and if you no longer use the same parameters, then you may not want to reuse these old results for developing surrogate models,” Tolchinsky says. In that sense, “those starting off from a clean slate will have a much simpler task.”
Simpler but not necessarily easier. To develop surrogate models, the users need to feed considerable data into the algorithm—the more the better. A 60- or 70-year-old automotive company with robust simulation practices should have no problem producing a dataset for ML. However, small and midsize engineering firms are unlikely to have hundreds of simulations in their archives. This raises questions about the reliability of the model.
“Having not enough data is indeed very common,” says Sjodin. “But in COMSOL Multiphysics software, we have some good techniques for handling this issue. We can let you mathematically generate more data, by adding variations in things like length or materials.”
Machine learning functions are integrated into the core COMSOL Multiphysics software, according to Sjodin. In addition, COMSOL also has an Uncertainty Quantification Module. The tools “provides a general interface for screening, sensitivity analysis, uncertainty propagation, and reliability analysis,” COMSOL writes.
For those trying to reuse legacy data, there may be some manual work required. After all, AI training is about using computation methods to identify the correlations between inputs and output results: For example, the relationship between the loads applied to a specific surface and the fluctuations in the safety factor. To do that, the ML program needs to know at least these two related value sets.
“We’ve made it as easy as possible for you to sort and tag values like these in our software once you’ve imported the raw data,” says Sjodin.
ROM as Intellectual Property Protection
In intellectual property (IP) sensitive industries like automotive, aerospace and defense, manufacturers often closely guard their design and simulation data, worrying that they contain competitive advantages and proprietary secrets. This concern also hinders collaboration, stopping them from sharing certain models and files with consultants and suppliers. Here, quite surprisingly, ROM applications could be a safeguard.
“The ROM or surrogate model serves as a kind of encryption,” says Sjodin. When you compress the simulation results through ML in a neural network, they go through an irreversible process. There’s no way for someone to reverse-engineer the original design from the surrogate model.”
Cloud or On Premises?
With AI or ML training, an old dilemma—a ghost from the storage era—crops up once again. Does the training occur in the cloud, or on premises? And if the former, what are the security implications?
ML doesn’t necessarily mean the data need to be uploaded to the cloud, Dehandschutter points out. “There’s a lot of variability in this landscape. It depends on how heavy the algorithm is and how much data you have, among others,” he says.
“The path we have chosen is to let the customers do the training on their own hardware,” Sjodin says. “We offer the [AI/ML] technology, but the customers can do the training on premise or using their own cloud resources.”
Most PLM vendors offer cloud-hosted data storage, and many leading simulation vendors offer on-demand cloud resources to accelerate FEA and CFD runs. But currently, offering on-demand cloud storage and compute for AI/ML training is not the norm. Trusted market leaders with time-test infrastructures can easily fill the need. It’s also bound to be a new source of revenue for them. At the same time, new vendors may also emerge to capitalize on this trend.
More Ansys Coverage
More COMSOL Coverage
More MathWorks Coverage
Subscribe to our FREE magazine,
FREE email newsletters or both!About the Author
Kenneth WongKenneth Wong is Digital Engineering’s resident blogger and senior editor. Email him at [email protected] or share your thoughts on this article at digitaleng.news/facebook.
Follow DE