A Brief History of ML
The term Machine Learning was first used by Arthur Samuel, one of the pioneers of Artificial Intelligence at IBM, in 1959. The name came from researchers who observed computers recognizing patterns and developed the theory that computers could learn without being programmed to perform specific tasks. They began exploring artificial intelligence to see how capable computers were of learning from data. ML began to pick up speed in 90’s, when it separated from Artificial Intelligence and become its own unique field rooted in statistical modeling and probability theory.
What is ML?
Machine Learning is all about developing computer programs that can receive various types of data as input (images, text, signals, numeric tables, etc), and then recognize patterns in the data and make insights and predictions based on those patterns. One of the most important features of ML is “self learning”. That said, it is important to note that while the computers can do a lot independently, there’s still a long way to go before they can take over the world. ML algorithms usually require the user to supply some function to be optimized by the algorithm (minimize the loss or maximize the gain), and this function is a key part of the success or failure of such algorithms. If the function is not compatible with the data that is fed to the algorithm, it will likely do a poor job. If the function is appropriate but the data is messy or biased or incomplete, we can also expect the algorithm to perform badly. Making sure inputs are reasonable and representative of the world, deciding on the appropriate optimization function, choosing the best algorithm to perform a specific task, and choosing good hyperparameters to the algorithms are all still human tasks.
What does ML do?
One of the very cool properties of ML algorithms is that as models are exposed to new data, they are able to adapt automatically. That means that the more data the computer is exposed to, the more it learns and the better it does recognizing patterns and making decisions. So, What is the goal of ML? To eliminate, or at least mitigate, the corruption or disturbance of data analysis that comes from human participation. ML lets us create models that are always self-improving, handling quantities of data far larger than humans ever could and producing ever more efficient, accurate, and rapid results.
How do we talk about ML?
Today, there are numerous ML algorithms which are often categorized by their type of output (classification vs. regression), their data (supervised vs. unsupervised,) or the domain of the problem (e.g. Image segmentation, NLP, Recommendation systems). A specific algorithm can belong to more than one category, for example decision trees belong to the supervised learning family and solve both regression and classification problems.
What’s the difference between Supervised and Unsupervised algorithms?
Although it might sound like these two concepts refer to the relationship between the machine and the human, they actually have more to do with what the data looks like and how it will be used.
Supervised ML maps an input to an output based on examples of input-output pairs. Those examples, where we know the connection between inputs and outputs, are called “labels”. The algorithm uses the labels (or a subset of the labels) to learn the patterns and relationships in “the training phase” (that’s why the labeled examples are often called the “training set” or “training examples”). Next, the computer uses those patterns it learned to create predictions of what will be the output for new data, even though it wasn’t part of the training phase.
For example, supervised ML algorithms can be used detect spam emails. First, the user divides their emails into “spam” and “normal”, creating labels for the computer to train with (the content of the email is the input, and the “spam” or “normal” tag is the output or the “label”). The “teaching” process starts with the analysis of that data. What makes an email spam? Maybe it was sent from certain IP address, or used certain words, or lots of exclamation marks. From these features, the algorithm learns the patterns. Once all the patterns have been developed, the learning phase is over. Now, every new email the user gets (input) will get tagged as “spam” or “normal” (output). Supervised learning can be easily evaluated using labeled data that wasn’t part training set. We can just feed the algorithm labeled data that wasn’t used for training and compare the algorithm’s prediction with the true label (output) of each example. Then we can see how many mistakes the algorithm made, how big those mistakes were, how often it made them and so on. Common ways of judging the success of the model are Accuracy, AUC, ROC, precision, recall and F-score.
Unsupervised ML uses unlabeled data, meaning data where we don’t have a specific output variable or target we want to predict based on the rest of the data. The goal of unsupervised learning is to create a function that describes the structure of this uncategorized data. So, instead of having a human teach the computer how to organize the data, the computer itself identifies commonalities and differences in the data and decides how to organize it. One of the biggest issues with unsupervised learning is how to measure the algorithm’s success. Without any labels, we don’t have an “ultimate truth”, so we can’t directly measure how many mistakes we had (or how big they were/how often they occurred). There are, however, a few measures that are often used for clustering, which is one of the most common unsupervised learning tasks, and those are usually based on how consistent and the results are, how homogenous each cluster is, and how different the clusters are from each other. For example, clustering algorithms are often evaluated using Calinski-Harabasz index, C-index, The Silhouette index and Dunn Index.
Machine Learning has performed incredibly well in some fields, but also has not reached the expected objectives in many others. One of the main issues that ML faces lies in the data it is fed. The quality of results ML can provide is directly linked with the quality and quantity of data provided. Lack of suitable data, access to data and data bias are all issues that can and have lead ML programs to fail. Another common cause for failure is the incorporation of a new kind of data into a model that has been trained solely on another kind of data. For example, let’s say that we would like to predict the sale of BMW bikes and we attempt to take motorbike customer data and integrate it into a model that predicts the purchase of BMW cars in Great Britain. The model would not be able to predict the behavior of customers that are not represented in the training data.