Dawning of a New Era in Data Science
New product development now incorporates machine learning, but lack of data science experts persists.
Engineering Computing News
Engineering Computing Resources
May 3, 2021
The use of data science and machine learning (ML) continue to grow in product development. Manufacturers have found new ways to gather and use the vast amounts of data now available. Not only do new products benefit from the use of ML during R&D, some even have ML added as a new feature within the product.
When we last looked at data science for product development, we found that the hardware and software for conducting data science work were available, but there was a lack of trained personnel. In the time since that report, the situation has not changed much. Gartner calls the situation one of using “citizen data scientists,”—people without an academic background in computer science but who have been drafted into being the local data science expert.
“It would be good to combine the two,” says Seth DeLand, product marketing manager, data analytics at MathWorks. “There are many certification programs out there. An engineering-savvy person can pick up data science techniques quickly. The field is ‘mathy’ and similar to the way engineers think.”
Yet, DeLand thinks companies with large enough R&D budgets should still hire data science experts.
“They see the big picture. With both practical and academic experience, they can understand the end-to-end lifecycle pipeline,” DeLand adds.
Data Science and Autonomous Vehicles
Autonomous vehicle (AV) R&D relies on data science as it moves closer to the dream of truly driverless vehicles. AVs are also a use case where engineers need the insights of data science beyond development.
“If there is uncertainty around the vehicle, the related data must be analyzed,” says Leslie Nooteboom, chief product officer at Humanising Autonomy, an artificial intelligence (AI) R&D firm specializing in the interaction of people and machines. “In our models, we don’t simply provide a prediction of what people think is happening around the car. We also provide an accurate prediction of what will happen.”
Humanising Autonomy creates intelligent software that embeds into the sensors and the main “brain” of an AV. The software enables a pipeline for learning from real-world deployments. Data is analyzed on the vehicle—an example of edge computing—and from a central cloud-based location.
Nooteboom says this allows for real-time information informing decisions, and an overall data analytics model for the AVs based on continuous learning that comes from outside the vehicle.
In real-time operation inside the vehicle, the software from Humanising Autonomy divides unexpected external behavior into four categories: True Positive, False Positive, True Negative and False Negative.
“These are the situations where the software did not function as it should. This is the data required for data model performance,” Nooteboom says. The results update the vehicle’s deep learning decision process, and add to the knowledge base of all other vehicles and new vehicles that are not yet built.
A Challenge to the Business Model
This bifurcated use of data within and outside the vehicle raises an important business model issue. Automotive manufacturing is based on the purchase as a final transaction. Traditionally, there is no added value on the part of the manufacturer beyond supporting its warranty. AVs require near-constant updates of their deep learning model. Either the manufacturer will host the continuously updating model or they must assign it to a third party.
“The old model is ‘ship and forget,’” says Nooteboom. To succeed with AVs, automotive companies must become more like software companies, and offer more of a subscription model approach to sales and customer relations.
The autonomous vehicle AI data model must be able to cope with unforeseen circumstances. Nooteboom tells the story of a client getting unusual readings that the AI could not interpret, so it displayed them as “flying people without any body movement.”
The development team and the vendor figured out the data was from people using electronic scooters in traffic, a class of behavior that was previously unaccounted.
“So we created a new class of object, a mobility class that behaves differently from bicycles or walking,” Nooteboom explains.
Creating a New Product Feature
Data science and ML are changing product development in two ways, says DeLand. The first is use of real-world data to inform product design and engineering. The second is the integration of ML into products, which makes the ML model a new product feature.
DeLand cites a new initiative at Ford as an example of the first use. They are logging data from vehicles driving in the real world, and uploading it to computing clusters for analysis.
“To us, this is traditional data analysis,” DeLand notes. In this use case, Ford notes how equipment is used, and they use the data for simulations to see how a component behaves.
For the second change—integrating ML into products—DeLand cites use cases from two more automotive companies, BMW and Mahindra. BMW is using a ML model to understand when a vehicle is in an oversteer/drift situation.
“There are many variables to determine traction,” DeLand notes, “so they hired professional drivers to do drifts.”
Based on the accumulated data, BMW not only tested various drift scenarios in simulation, but engineers created a ML model that predicts oversteer, and made it part of an electronic control unit (ECU).
Mahindra has created a ML model to estimate road conditions. The model assesses vehicle data in real time and makes adjustments. An example would be a truck that has to leave the interstate highway and drive on a dirt road.
DeLand says MathWorks has invested in software to connect to data sources, which is often not found within the computer running MathWorks software.
“Event data is the key,” DeLand says. “These repositories of in-field data are often huge; petabytes of real-world data.”
An engineer doesn’t care about all the data, but seeks something specific.
“We help users understand how to write expressions to study all their data and get back the subset they need,” DeLand explains. “The analyses are complex. Getting only the subset they need is essential to keeping the compute manageable.”
But ML principles come out of university math and computer science departments, not engineering schools.
“So we build apps for training machine learning models,” says DeLand. MathWorks often recommends best practices and offers “guide rails” to their users to help apply ML to product development.
Simulation and Data Science
Simulation makes large data repositories available for product development teams. The rise of real-world data and ML results makes for a second category of data. DeLand of MathWorks says the new way to think about simulation is, “data science informs simulation, and simulation informs data science.”
MathWorks sees more customers taking time to extract “interesting” data from on-board machine learning apps, which they then use in a simulation.
“They are simulating on real-world data as part of the design process,” says DeLand. This is more than testing correct behavior, but is a simulation of what happens when something goes wrong.
“You can build failure prediction into the model,” he says. This cutting-edge application of real-world data is especially valuable in situations where failure data is either lacking or not gathered because it is too expensive, as in jet engine design.
Keeping Hardware Up to Data
Computer vendors were quick to provide workstations for data science as soon as NVIDIA announced its Data Science Workstation specification in 2019.
“We see machine learning and deep learning impacting every industry, not just product development,” says Matt Allard, an industry strategist for Dell Technologies. “We break down product design into distinct phases of design, validation and operations. We see artificial intelligence, machine learning and deep learning applied across all phases.”
Dell is also noticing use of data science in manufacturing operations.
“Manufacturing lines are increasingly rigged with IT devices generating piles of process control data,” Allard says. “Pure data science can be useful there to draw inferences from processes.”
This data is unstructured, so it needs AI to “tease out from the noisy data the information needed,” he says. “This requires heavy compute platforms.”
“This truly is the golden era of AI, the confluence of affordable hardware at every level,” adds Kyle Harper, data science workstation strategist for Dell Technologies. “The economics make sense for both the hardware and the algorithms. They were once only theoretical but are now practical.”
When NVIDIA defined a workstation-class data science computing platform, it brought a new level of democratization for the use of data science in product development. NVIDIA continues to support data science with its hardware and software. There is new work in embedding predictive techniques into products, notes Scott McClellan, senior director of the data science product group at NVIDIA.
“Engineering has been data-driven for a long time,” McClellan says. “What is new is applying more probabilistic techniques, instead of just rule-based or deterministic techniques.”
The difference between R&D testing and real-world performance is huge.
“It requires a rigorous end-to-end lifecycle regarding training, data sets and the models used for training,” adds McClellan, who notes that NVIDIA is seeing a rise in a three-step approach: data management; experiment management and model management.
More Dell Coverage
More MathWorks Coverage
More NVIDIA Coverage
About the Author
Randall S. Newton is principal analyst at Consilia Vektor, covering engineering technology. He has been part of the computer graphics industry in a variety of roles since 1985.Follow DE