Revisiting the Black Box
One of the questions that LNS Research often gets is how to explain Artificial Intelligence (AI) and Machine Learning (ML) not only to plant personnel, but also to operating management, in such a way so that they can understand them sufficiently to place a reasonable degree of trust in them. In other words, enough to allow their adoption for predictive maintenance and other data science-based applications.
First, let’s look at some definitions around AI, ML, statistics and black box. While one can Google these and get a range of definitions, I decided to check into a number of major universities known for their strength in mathematics, data science and technology: Caltech, Stanford, MIT, and the Alan Turing Institute at Cambridge (Sir Isaac Newton’s alma mater, and Newton along with his contemporary, Gottfried Leibniz, stand among the giants of mathematics, as was Turing in the 20th century). Note that statistics was a term coined in Germany back in 1749, about 60 years after Newton’s landmark publication, “Mathematical Principles of Natural Philosophy.” Here we will not try to tackle all of AI, of which ML is a part. Interestingly enough, there are variations and nuances in definitions among the academic experts that leave many areas open to lively debate. Nevertheless, here goes:
Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data. Traditional statistics includes data collection, frequency and inferential. Bayesian statistics differs from frequency statistics, which views probability as the limit of the relative frequency of an event after many trials, in that it’s based on where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event.
The Alan Turing Institute at Cambridge says, “There is no accepted definition of artificial intelligence or ‘AI’ but the term is often used to describe when a machine or system performs tasks that would ordinarily require human (or other biological) brainpower to accomplish, such as making sense of spoken language, learning behaviors or solving problems. There are a wide range of such systems, but broadly speaking they consist of computers running algorithms, often drawing on data.”
The scientific study and application of algorithms that computer systems use to perform a specific task without using explicit instructions, that is without coding, relying on patterns and inference instead.
A device, system or object which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings.
What About the Difference Between Statistics, AI and Machine Learning?
Here, I found the best explanation through my industry membership in the Society of Petroleum Engineers via an article written in November 2019 by Dr. Shahab Mohaghegh, Professor of Petroleum and Natural Gas Engineering at West Virginia University. Dr. Mohaghegh explains as follows, with my comments in italics in parentheses:
“AI and ML do not start with any predetermined models or equations. They do not start with any assumptions regarding the type of behavior that variables may have in order to correlate them to the target output (which is what statistics does). The characteristic of AI and ML is to discover patterns from the existing data. The strength of the open AI and ML algorithms has to do with their amazing capabilities to discover highly complex patterns within the large amounts of variables (something that we engineers, even with statistics, cannot easily do, if at all). The final outcome of the models that are developed by AI and ML algorithms usually cannot be summarized by one or by a few equations (i.e. our engineering desire to express physics relationships by a set of fundamental equations). That is why many engineers like to use the term ‘Black Box.’ From a mathematical point of view, the models that are developed using AI and ML include a series of matrices that generate the models’ outcomes. Therefore, nothing is opaque (black) about these models (boxes).”
Source: Dr. Mohaghegh’s article cites from a lecture given by Dr. Yaser Said Abu-Mostafa*,Professor of Electrical Engineering and Computer Science at the California Institute of Technology
So, is the Black Box really black, meaning entirely opaque? Perhaps not. Maybe it’s a better analogy to call the black box a ‘crystal ball’ whose exterior is transparent but as one looks deeper into the center, becomes dark and opaque.
But then this begs two more questions: First, does this mean AI and ML are superior to statistical approaches and fundamental physics equations? Not at all. In fact, the literature is full of cases where all three techniques are used together for problem solving. Thus, while AI and ML may get all the attention, practical solutions rely on the full toolset of applied mathematics and various types of models.
Second, does the data always explain everything? Of course not. In fact, depending on the data science approach, errors can arise from data handling, sampling and bias which can lead to poor models. This why we cannot just leave the data to data scientists, but instead should use collaborative teams of engineers (as well as other plant personnel), data scientists, and IT to tackle complex problems.
Where does this leave us? How should we regard at AI and ML? Since ML is all about identifying complex patterns in data, ML should be considered as another tool to aid engineers in identifying operational anomalies and predicting future behavior which more traditional analysis approaches struggle with or cannot do. Thus, while AI and ML can be game-changers in enabling a step-change in performance, they are not something completely unknowable or not understandable, and certainly not to be feared.