deeper into the math.
But now apply the idea of conditional probability, and that is how real-time language translation works in computers. Which word or phrase in English, for example, is most likely to match a word or phrase in German, based on the vast data set of translated texts, and given what you have just written hitherto? Such a translation system is nowhere near perfect, because it has no idea what the words mean, but it is getting better all the time.
Parameters are just numbers that you can manipulate in prediction rules to obtain the best result. An example is 220 in your maximum heart rate (MHR) equation: MHR = 220 – Age. That number can move a bit based on the limitations of your data set. The more parameters you add, the better your model’s accuracy becomes. MHR = 208 – 0.7 x Age is actually a better prediction rule. AI can handle many, many parameters, and they are essential for image recognition and identification. A certain model developed to distinguish between digitized images of two dog breeds has nearly 390,000 parameters.[6] No, you don’t have to program or enter them from scratch. AI can set and adjust them automatically, based on the data sets used for training the model.
Bayes’s rule (also called Bayes’s law or theorem, or “Bayesian inference”) only means that you have to update your current state of knowledge as more data becomes available. Bayes’s rule is similar to conditional probability, and the basic idea is easy: prior probability + new facts = revised probability. The formula takes the prior probability that something is the case (a percentage), multiplies it by the accuracy of the new data (another percentage), which is then divided by the new data’s results (a final percentage). (The actual formula is in this footnote.[7]) Thomas Bayes, by the way, is another oldie, a mathematically gifted English Presbyterian minister who lived and worked in the 1700s.
A good applied example of Bayes’s rule is in self-driving cars. It is crucial for autonomous vehicles, drones, robots, and other devices to “know” where they are at any given moment, taking into account all the incoming sensor and location data. The vehicle’s AI system must constantly calculate the need to adjust its position relative to other vehicles and potential obstacles (such as pedestrians), and even perhaps take evasive action.
A vector is just a set of numbers. Think of it as a horizontal row in a table with many columns. All the numbers in the vector are associated with the first in the first cell of the row. AI uses vectors to process language. AI “understands” human language by associating one word with others in numbers, ranging from 0 to 1. Think of it as the percentage chance that one word will appear near another. For example, “happy,” generally speaking, has a low association (close to 0) with “new,” but “happy” followed by “new” has an association much closer to 1 when typed at the end of December and the top of January, followed by “year.” Happy New Year. The inputs of words and their “semantic closeness” expressed in numbers, compiled as “word co-location statistics,” powers AI speech recognition: Alexa, Siri, Cortana, Google voice, and all chatbots.[8]
Variability is crucial for AI to identify statistical anomalies in data sets. Banks and credit card companies use it for fraud detection, and supply chain managers, forensic accountants, equipment maintenance managers, and sports teams apply it to data to predict when things may start to go wrong before a full breakdown occurs. The square-root rule specifies the average variability relative to sample size. You take the variability of a single measurement and divide it by the square root of the sample size. Formula One uses this to check streams of data from its cars’ engines, tires, brakes, etc., to look for signs of impending failure. “Smart cities” use it to monitor and target inspections for many kinds of problems, from gas leaks to illegal subdivision of apartments.[9]
The constant threat of fraud in millions of electronic payment transactions demands the best tools for cost-effective, automated oversight. AI applications are cutting-edge, built on old math. The square-root rule was discovered by Abraham de Moivre in Switzerland in 1718.
See how there’s little that’s new in the math that underlies AI?! This chapter does not get into calculus, derivatives, and linear algebra, but they are even older, stemming from the 1600s. Without them, the algorithms you will meet next would be unthinkable.
The Software of AI
Now for the software—the coded algorithms that perform the math.
You have probably heard the terms machine learning, predictive analytics, deep learning, and neural networks, which refer to groups of algorithms in code. We’re going to go out on a limb here: In artificial intelligence, all four are pretty much the same thing.[10] Okay, “deep learning” algorithms are usually associated with image classification and voice transcription, but they use “neural networks” just like the others can. All four use the prediction rules or models discussed above. All four involve mathematical, statistical algorithms working on data. There’s no need to parse technical jargon flaunted by marketers.
And, just to reiterate, machines cannot “learn” the way a human does; hardware and software are nothing like neurons; and “deep” can mean anything. Computers do not have self-awareness, independent consciousness, feelings, or even thoughts. AI software is a set of data-analysis tools. All code is prone to bugs, and all computer systems crash from time to time. AI is down-to-earth.
One final observation before we get to term definitions: Software is a lot like the law. They both find commonality in reason. Our point is that business software and the law are not natural enemies. Software and math classify and perform procedures—the former, with numbers, the latter, with words.
Indulge a shallow dive into distant history: The same person, Gottfried Wilhelm Leibniz (1646–1716), developed calculus at about the same time Isaac Newton devised mechanical calculators and created the binary number system that computers use today, whereby all numbers are expressed in 1s and 0s. Leibniz also developed a rational, legal “machine,” a code, for classifying disputes (input data) and generating rulings (classification outputs). For him, math and the law complemented each other.
Okay. We’re ready for the software.
Data captured in software must be accurate, reliable, and correctly classified for any of the above procedures to produce useful results. Computer data can come from many sources: keyboard entries, audio recordings, visual images, sonar readings, GPS, document files, spreadsheets, etc. The devices gathering sensory inputs must themselves be of high quality for the sake of accuracy. All the data is reduced to series of numbers and are normally stored in tables, with many rows and columns. If the data is flawed in any way, the procedures will be risky at best. If the procedures are inaccurate, then your business may make poor decisions and follow wrong actions.
As we said before, “Garbage in, garbage out” is as true today as it ever was. An AI firm proclaiming that its products or services can take any kind and quality of data and turn it into perfect predictions and decisions is just alchemy from the Middle Ages. Many people tried in vain for many centuries to turn common metals such as lead into gold. Even a genius like Isaac Newton poured many hours into that total waste of time and energy.
“Garbage in, garbage out” is true of the law as well. If the evidence is faulty or fake, the results of the court proceedings are going to be skewed, distorted, or just dead wrong. The goal is to calculate right determinations and benefit stakeholders. Data science and the law are compatible, indispensable tools in the struggle to make the right move and do the right thing.
Structured data refers to data stored in tables of rows and columns and formatted in a database for queries and analysis. Queries retrieve data, update it, insert more, delete some, and so on. SQL (structured query language), a computer programming language from the 1980s, is still widely used, especially by Microsoft and Amazon Web Services (AWS), and there are dozens of others. Structured data has to be clean and complete. It has to be accurate.
Let’s use the 80/20 rule here: 80 percent of the time spent on your AI projects will be spent in data preparation. (More on that in chapter 5.)
AI guru Andrew Ng says