In the last article I mentioned the digital landfill, which is the process data historian where we save the overwhelming volume of data generated. Why are we saving this data? What do we plan to do with it?
An August 2022 article in the Wall Street Journal “International Paper’s CFO on Leveraging Disruptive Finance” interviewed Tim Nicholls, CFO. In this article, Nicholls stated:
"Technology is helping us improve our business model, our processes, our customer interactions, and much more. Access to large amounts of data has increased our capabilities for decision-making at every level of the organization. At IP, we have built cross-functional teams focused on using advanced analytics to identify patterns and streamline our operations. Our teams take the opportunities presented to us through advanced analytics and find ways to create unique value, with the ultimate goal of bringing scalable solutions to the enterprise."
They are not alone in recognizing the value of data. Throughout industry, investments to turn data into gold proliferate.
The term for this application is Data Analytics.
As with many terms in industry today, there are differing interpretations. Many vendors sell solutions labeled as data analytics, yet they can have very different features and functions.
To flesh this out further, some data analytics solutions on the market fill a particular niche or deliver a subset of the full scope of data analytics. A full data analytics solution includes the following:
CONNECTIVITY
Data originates in many sources. Most commonly in industry, data is stored as time series in a historian. A single historian may consolidate data from several control systems or digital devices. In our industry, we also have QCS systems which may be an independent data source. We also have lab data that could end up in SQL tables. We may need to load data from Excel. We could also have multiples of the above in a facility, and certainly across an enterprise.
A data analytics solution must make connection to these data sources possible. Ideally, we want the data in realtime so that we can see changes as they occur.
This is the meaning of the term Big Data. Big Data refers to a variety of data sources, such as corporate research connecting to data historians at each mill. This is enabled by the internet connectivity of the 4th industrial era. A data analytics solution needs connectivity to big data.
VISUALIZATION
It is hard to analyze data that you can’t see. The volume of data we generate in industry cannot be comprehended by looking at it point by point. The desktop tools that we commonly use are not meant to handle the volume of data we generate. Excel has a maximum of 1,048,576 rows by 16,384 columns. If you stored 6 months of data for a single sensor in a row at 1 second frequency, that would require 15,552,000 rows. Visualizing all of that data requires a full toolbox of techniques. We are all accustomed to seeing time series trends, but that alone is not sufficient for discerning what all of the data together is telling you. It requires tools that can organize and relate data in a presentation that reveals answers.
PRE-PROCESSING
Not all data is good data. To be useful, there needs to be tools to identify and remove bad or irrelevant data. Doing this point by point is practically impossible. Data analytics solutions need a rich set of filtering and condition detection tools, and these tools ideally should allow applying data cleansing in a comprehensive way regardless of the timeframe of the data.
ANALYSIS
Once we have pre-processed the data to yield a valid dataset, analysis tools can find correlations that yield answers. Prediction models can be built to show what will happen under different conditions. Statistical techniques show whether there is a valid cause and effect relationship. There are a multitude of techniques for analyzing data. A data analytics solution should offer a big toolbox for analysis.
COLLABORATION
The results of data analytics aren’t useful unless they can be shared with the right people. Often, it takes a team to conduct the pre-processing and analysis. Having a collaborative environment for doing the work and sharing the results ensures everyone has the current version, instead of hunting through emails and hard drives for a file. Having enterprise wide visibility in a common environment means realtime information with minimal additional effort. Internet connectivity in the 4th industrial era facilitates this collaboration.
In the 4th industrial era, data analytics goes beyond what can be seen in a single control loop. It is a platform for integrating Big Data, meaning a variety of sources, with a rich toolset and collaboration, so that the digital landfill can yield nuggets of gold.
|