till a few years ago it was felt that humans understood what their algorithms were doing and how they were doing it. Like Lovelace, they believed you couldn’t really get more out than you put in. But then a new sort of algorithm began to emerge, an algorithm that could adapt and change as it interacted with its data. After a while its programmer may not understand quite why it is making the choices it is. These programs were starting to produce surprises, and for once you could get more out than you put in. They were beginning to be more creative. These were the algorithms DeepMind exploited in its crushing of humanity in the game of Go. They ushered in the new age of machine learning.
Machines take me by surprise with great frequency.
Alan Turing
I first met Demis Hassabis a few years before his great Go triumph at a meeting about the future of innovation. New companies were on the lookout for investment from venture capitalists and investors. Some were going to transform the future, but most would flash and burn. The art was for VCs and angel investors to spot the winners. I must admit when I heard Hassabis speak about code that could learn, adapt and improve I dismissed him out of hand. I couldn’t see how, if you were programming a computer to play a game, the program could get any further than the person who was writing the code. How could you get more out than you were putting in? I wasn’t the only one. Hassabis admits that getting investors to give money to AI a decade ago was extremely difficult.
How I wish now that I’d backed that horse as it came trotting by! The transformative impact of the ideas Hassabis was proposing can be judged by the title of a recent session on AI: ‘Is machine learning the new 42?’ (The allusion to Douglas Adams’s answer to the question of life, the universe and everything from his book The Hitchhiker’s Guide to the Galaxy would have been familiar to the geeky attendees, many of whom were brought up on a diet of sci-fi.) So what has happened to spark this new AI revolution?
The simple answer is data. It is an extraordinary fact that 90 per cent of the world’s data has been created in the last five years. 1 exabyte (10) of data is created on the internet every day, roughly the equivalent of the amount of data that can be stored on 250 million DVDs. Humankind now produces in two days the same amount of data it took us from the dawn of civilisation until 2003 to generate.
This flood of data is the main catalyst for the new age of machine learning. Before now there just wasn’t enough of an environment for an algorithm to roam around in and learn. It was like having a child and denying it sensory input. We know that children who have been trapped indoors fail to develop language and other basic skills. Their brains may have been primed to learn but didn’t encounter enough stimulus or experience to develop properly.
The importance of data to this new revolution has led many to speak of data as the new oil. If you have access to data you are straddling the twenty-first-century’s oilfields. This is why the likes of Facebook, Twitter, Google and Amazon are sitting pretty – we are giving them our reserves for free. Well, not exactly for free as we are exchanging our data for the services they provide. When I drive in my car using Waze, I have chosen to exchange data about my location in return for the most efficient route to my destination. The trouble is, many people are not aware of these transactions and give up valuable data for little in return.
At the heart of machine learning is the idea that an algorithm can be created that will find new questions to ask if it gets something wrong. It learns from its mistake. This tweaks the algorithm’s equations such that next time it will act differently and won’t make the same mistake. This is why access to data is so important: the more examples these smart algorithms can train on the more experienced they will become, and the more each tweak will refine them. Programmers are essentially creating a meta-algorithm which creates new algorithms based on the data it encounters.
People in the field of AI have been shocked at the effectiveness of this new approach. Partly this is because the underlying technology is not that new. These algorithms are created by building up layers of questions that can help reach a conclusion. These layers are sometimes called neural networks because they mimic the way the human brain works. If you think about the structure of the brain, neurons are connected to other neurons by synapses. A collection of neurons might fire due to an input of data from our senses. (The smell of freshly baked bread.) Secondary neurons will then fire, provided certain thresholds are passed. (The decision to eat the bread.) A secondary neuron might fire if ten connected neurons are firing due to the input data, for instance, but not if fewer are firing. The trigger might depend also on the strength of the incoming signal from the other neurons.
Already in the 1950s computer scientists created an artificial version of this process, which they called the perceptron. The idea is that a neuron is like a logic gate that receives input and then, depending on a calculation, decides either to fire or not.
Let’s imagine that the perceptron receives three input numbers. It weights the importance of each of these. In the diagram here, perhaps x1 is three times as important as x2 and x3. It would calculate 3x1 + x2 + x3 and then, depending on whether this fell above or below a certain threshold, it would fire an output or not. Machine learning hinges on reweighting the input if it gets the answer wrong. For example, perhaps x3 is more important in making a decision than x2, so you might change the equation to 3x1 + x2 + 2x3. Or perhaps we simply need to tweak the activation level so the threshold can be dialled up or down in order to fire the perceptron. We can also create a perceptron such that the degree to which it fires is proportional to by how much the function has passed the threshold. The output can be a measure of its confidence in the assessment of the data.
Let’s cook up a perceptron to decide whether you are going to go out tonight. It will depend on three things: (1) is there anything good on TV; (2) are your friends going out; (3) what night of the week is it? Give each of these variables a score between 0 and 10, to indicate your level of preference. For example, Monday will get a 1 score while Friday will get a 10. Depending on your personal proclivities, some of these variables might count more than others. Perhaps you are a bit of a couch potato, so anything vaguely decent on TV will cause you to stay in. This would mean that the x1 variable scores high. The art of this equation is tuning the weightings and the threshold value to mimic the way you behave.
Just as the brain consists of a whole chain of neurons, perceptrons can be layered, so that the triggering of nodes gradually causes a cascade through the network. This is what we call a neural network. In fact, there is a slightly subtler version of the perceptron called the sigmoid neuron that smoothes out the behaviour of these neurons so that they aren’t just simple on/off switches.
Given that computer scientists had already understood how to create artificial neurons, why did it take so long to make these things work so effectively? This brings us back to data. The perceptrons need data from which to learn and evolve; together these are the two ingredients you need to create an effective algorithm. We could try to program our perceptron to decide when we should go out by assigning weights and thresholds, but it is only by training it on our actual behaviour that it will have any chance of getting it right. Each failure to predict our behaviour allows it to learn and reweight itself.
To see or not to see
One of the