Can AI Be Effective with Lean Data?

Can AI be effective with lean data? Yes, but it requires balancing data analysis with what’s practical from an operations perspective.
7 mins

Over the last decade, we’ve seen an explosion of conversations around data. We always recognized the value of having information and the easier it became to collect data, the more we collected. The prevailing belief was that with enough data, we could predict the future and yield:

  • Reduced carbon emissions
  • Eliminate unplanned failures
  • Improved efficiency
  • Improved mechanical availability
  • Improved production capacity

However, that desire for how we leverage all the information that’s available to us created a problem. On one hand, we see data as the key to the future - it helps us to become more efficient and sustainable in our operations. On the other hand, when we consider how to implement the technology side into overall improvement efforts, we run into concerns regarding big data. And those concerns become the lock on the gate for the future. Data suddenly enters this dichotomy of being the key to the future as well as the barrier.  

Industrials are rich in data but poor with information

Since 2016, we’ve spent $2T on Industry 4.0 investments. When you consider the average manufacturing facility, we’re generating immense amounts of data and continue to do so, but the challenge we’re facing is that we may be generating a lot of data, but we’re not creating more information. When we start discussing why lean data is important, the dichotomy becomes important in all this. That’s because we always knew we wanted the data but weren’t sure how to use it yet.  

Why do we struggle with data?

  • We knew we wanted it before we knew how to manage it.
  • We don’t know if we can trust the predictions or insights.
  • Managing large volumes of data from diverse sources can be overwhelming.
  • We worry that we don’t have enough data, it isn’t accurate, or we are missing data.

Take the example of when handheld devices first entered the market twenty years ago. It became easy to collect data and so we started collecting a multitude of data points such as vibration, pressure, and more, thinking that by having all this data, we could figure out so much in terms of trends, equipment degradation signs, instant investigations, and other useful insights.  

However, the problem was that we collected so much data, we became overwhelmed by it. Therefore, instead of doing anything useful with this data and achieving excellent results, it went into a black hole. This, in turn, taught people that if we don’t get any value from data, we don’t need to maintain it.  

Can we trust the data?

Fast forward to modern times, and that’s our same challenge today. We don’t know if we can trust AI because we’re not sure if we trust all our data. We don’t know if it’s accurate, what’s missing, what instrumentation is maintained well, and what’s not. This is the relationship with data that we witness repeatedly as the pattern that’s creating many of our barriers. Taking a lean approach with data means understanding that we can’t become enamored with all the possibilities. Data analysis needs to be balanced with what’s practical from an operations perspective.

How can organizations get a handle on their data?

Given that we’ve spent the last couple of decades teaching operators (our main custodians of data and primary recipients of analysis) there is no inherent value in much of the data analyzed. Inevitably, data collection and analysis became a burden to them. In order to move forward with AI today, we must ask if we can trust the data. The answer lies in creating a value proposition early.

Adding value by changing our relationship with data

Creating value isn’t so much about solving the hardest problems, but about working toward small improvements. First, start by asking people how they get value from their data. Value to an operator could mean saving their engineers 30-40% of their time, enabling them to focus on what they do best (i.e.: troubleshooting and solving complex problems) and less about data collection.  

To effectively implement AI, start with small applications using the available data to build confidence in the value extracted from data streams. As we begin to gain insights from the data, we simultaneously build trust because we see AI at work, noting interesting findings and saving time. This changes the relationship with the data.  

In other words, when you interact with data, you must start by finding the opportunity for creating value. We’re not referring to the value to management or supervisors either, but about the value to those who are responsible for collecting and maintaining data. For instance, if the process engineer is using AI, you must ensure that there’s a feedback loop with the operators who are responsible for maintaining the data. The value proposition must start with them. Then, you can proceed to add and layer in more data if needed.

How do you know if you have enough data to get started?

From AI’s perspective, the more data, the better. Therefore, the question is less about enough and more about answering the following questions:  

  • Quality: Does it tell the story accurately?
  • Quantity: Is it enough to tell a story?
  • Descriptiveness: Does it tell a story you care about?

The focus shouldn’t be on trying to solve the problem, but rather on providing engineers with a fast track on how to solve it. When people talk about data-rich vs data lean environments, they all want high-value, rich data, but we always end up coming back to sufficiency.

Assess data sufficiency by asking yourself the following questions:

  • Is the data used frequently? Do people get value from it now?  
  • Does the data effectively change with the process?  
  • Is it collected frequently enough to get many points over time?  
  • Does it contain a lot of noise (poor calibration, sensors poorly maintained, high variability in collection, etc.…)?  
  • Can data be supplemented by synthetic data (soft sensors, model data, etc…)?  
  • Can data be correlated to known operating conditions for model training?

Once operators and engineers start getting value from AI, that's when you start seeing data governance programs coming into place, as they start caring about data differently.


AI is not a standalone solution. Data needs to be pulled from different sources including process instrumentation, process historian, lab data, and maintenance logs. Since there are multiple sources to start enriching the data pool, you shouldn’t just limit yourself to the data you have or worry about the inability to draw linear correlations. That’s where AI shines – it's good at analyzing seemingly disparate data systems and pulling in data that is difficult to monitor. Moreover, by integrating experience into the data, AI can resolve some of the challenges posed by the loss of institutional knowledge as well as guide operators in the right direction, even when the data is constrained.

To learn more about using lean data with AI, tune in to How to Succeed with AI using Lean Data, an on-demand webinar featuring our very own chief commercial officer, Kevin Smith and ARC Advisory Group's senior analyst Peter Reynolds.

No items found.
Related Articles