Can AI Eat Itself?

Looks like synthetic training data is here to stay—it's something engineering organizations should keep a close eye on.

Looks like synthetic training data is here to stay—it's something engineering organizations should keep a close eye on.

The use of artificial intelligence (AI) in engineering applications is expanding rapidly, with new software solutions and new vendors offering AI-assisted CAD and simulation tools entering the market. While these tools can potentially help optimize and streamline some engineering workflows, they should be approached with caution and due diligence.

We already know that some popular AI platforms occasionally hallucinate, for example, but there are other issues as well. One concept I only recently learned about is model collapse. The large language models (LLMs) that fuel systems like ChatGPT need to ingest a lot of data. Right now, many of these LLMs are being trained on material pulled from large swaths of the Internet, although it is unclear exactly where all of that content comes from. It is likely that a lot of that content, particularly text and images, is probably copyright protected or at least not as clearly in the public domain as the generative AI proponents would have us believe.


And as the Internet steadily offers up a lot more AI-generated content, the LLMs may very well wind up being trained on data that comes from AI-created material. When this happens, the models lose their connection to real, human-created data. The more of this AI content that the LLMs absorb, the less reliable they become. Eventually, according to some research reports, they just start spewing out complete nonsense.

For the most popular and well-publicized uses of AI like ChatGPT or AI image generation, we have already seen example of this, and since these tools are allowing people to create AI-generated books, papers, and artwork that are reingested into the LLMs, I expect things are going to get worse.

But it’s one thing to create a horrible, AI-generated research paper full of inaccuracies. In the engineering world, the risks of model collapse could lead to catastrophic machine failure or even death.

Luckily, most design and simulation AI use cases involve limited sets of reliable data to build models. However, that still leaves open the possibility of the base data sets being corrupted with AI-generated models over time. That means real, human engineers will have a new role to play in policing these systems, and making sure there is human-generated data to maintain the models.

So far, that’s what we are seeing. AI-based simulation and design systems are largely in co-pilot mode, helping engineers sort through possibilities and identifying design flaws. No one is asking an AI solution to design a jet engine on its own, and that shouldn’t be the goal.

But synthetic training data is a reality, and one that engineering organizations should keep a close eye on.

Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.

About the Author

Brian Albright's avatar
Brian Albright

Brian Albright is the editorial director of Digital Engineering. Contact him at [email protected].

Follow DE