“AI.”
The word is terribly misused, but the revolution is real.
The technology space is undergoing a major sea change, a fourth wave of rapid advancement in capabilities on par with personal computers, the arrival of the internet itself or the proliferation of smartphones. The media generally frames this as the “AI Revolution.” Applying the word “AI” to what is happening is a misnomer in my opinion, this sea change has little to do with a true Artificial General Intelligence, at least the way most people would envision such. But the revolution is real, and will have a profound impact on society – with potential for real benefits in mitigating the ongoing climate change disaster – that is comparable to anything we have seen in our lifetime.
“AI” is widely communicated to be a computer that can think like a human. Despite media hype, computers are not evolving to think like humans, nor, in my opinion, is that outcome likely to occur anytime soon. However, they are getting pretty good at faking it for some narrow use cases and becoming not only very intelligent, but also creative at solving certain very specific problem domains.
Well aware of this distinction, most practitioners of these techniques prefer to use the term “Machine Learning“ or ML rather than AI or “Artificial Intelligence.” Semantics notwithstanding, there is still a very profound advancement underway in tech, and understanding this advancement and the capabilities they will provide over the coming years is extremely important for all engineers and for society in general.
But what is this ML thing exactly? Wikipedia defines ML as an ‘umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines “discover“ their “own“ algorithms.’
Most computer programs function via a programmer explicitly telling a computer what to do, via very specific steps expressed in a programming language. Programs are a set of rules. If this thing, then do that thing.
For example, imagine you are at a pub, trying to throw a dart at a dart board. First you throw too high, then too low. After every throw we learn something and improve the end result. Humans learn from experience.
Now imagine you are programming a robot to throw darts. You could tell the computer to measure the distance, air resistance, how drunk you are, etc. and apply a formula to calculate the force and angle required. Do this correctly and you would hit the target every time.
Now if you add a fan (wind force) to your setup, this program will continuously miss the target over and over, for all of time, and won’t learn anything from each failed attempt. To get the outcome right, you need a human to reprogram the computer, taking the wind factor into the formula. In order to increase the accuracy of the program you need to keep adding complexity, continually accommodating external factors.
This effort is very different from how a human “learns” and is clearly suboptimal for some problems. Machine Learning tries to teach computers to replicate the way humans learn. It tries to learn from experience.
Now imagine, instead of such a controlled experiment you are throwing a pitch as part of playing a baseball game. You now have a million factors in play. And also a million examples of attempted throws (all the baseball games ever recorded to video). Your example data includes whether the pitcher was successful or not, along with all the associated environmental data at the time of each throw, and every other piece of data you could possibly gather, the type of baseball, the weather, wind, humidity, lighting, the time of day, what music is playing in the background, etc.
A machine learning algorithm can ingest all that data, chug for a while, and spit out a computer generated program that would be able to program the robot to throw the ball accurately under any circumstance encountered in the past. It would also build up an opinion of how predictive each piece of data is. ”Angle is highly predictive, what music is playing in the background generally isn’t.”
We call all that data “features” and the process of learning from the data to create a program “training.”
We call the computer generated program a model.
It’s actually quite a bit more complicated than this of course, but as a starting place, this is a fine analogy.
We call that process, the “use my model to make a prediction against new data I have not seen before” inference.
We might even find that this model, with some tweaking (called fine tuning a model) might generalize to other kinds of activities, like soccer or volleyball or shooting a cannon. We might need to retrain the model, adding in new features and training against new success criteria, but without having to start over from scratch. This technique is called Transfer Learning.
This approach is not always better than classical programming approaches. It takes more effort. It takes more data, and more computation. But it’s often better at dealing with high levels of complexity.
The techniques are in the process of driving a major revolution, one that will touch most industries, and have a major effect on society as a whole.