This kind of broad brush statements about Machine Learning algorithms are made often and there are lot of online content alluding to this simplistic view of Machine Learning. It’s tempting to gravitate towards simplistic views and use recipe like approach while solving Machine Learning problems.
Unfortunately, the reality of Machine Learning is complex. The choice of learning algorithms and the algorithm tuning process can be confounding. Choosing the right learning algorithm for a given problem is not a trivial task.
In this post we will delve into the computational learning theory and use it to debunk the myth of the so called Top Ten Machine Learning algorithms. Hopefully, after reading this post, you will be more cautious about blindly using one of the algorithms from the so called top 10 list. In this post, I will keep the discussion at an intuitive level. More rigorous technical details can be found in the papers cited.
The problem domain is an important factor to consider when choosing a Machine Learning algorithm. When embarking on the solution for a new problem, if you simply go through the top 10 list to find the algorithm with highest accuracy, there is no guarantee you have found the best solution, depending on the problem domain.
You may have to venture outside of the top 10 algorithms in search of the best solution for the problem in hand. In this post, we will explore why that is the case using concepts like Average Generalization Accuracy(AGA) and Expected Generalization Accuracy(EGA).
This post draws heavily on the original paper on this topic and the follow up excellent paper . My treatment of the subject will be more at a pragmatic and intuitive level. Please refer to the papers cited for in depth technical details.
Average Generalization Accuracy
According the law of conservation of Machine Learning algorithms, Total generalization accuracy over all learning situations is 0.5. This is not any better than randomly guessing for a prediction problem, which is not encouraging.
This paints a very hopeless and pessimistic picture and learning appears to be impossible in an uniformly random universe. However the characteristics of real world problems have some bias and regularity which enables us to have learners with greater than 0.5 accuracy, as we will be able to show with Expected Generalization Accuracy.
AGA is defined as below. The summation is over all possible target concepts and unseen examples belonging to a set of unseen data. The sum is divided over the number of possible target concepts for k unique examples, which is 2k
A target concept or a target function, in Machine Learning parlance, is an ideal classifier that will classify all data points accurately.
It can be proved that AGA is always 0.5, which is the average accuracy, which one can obtain simply by guessing . It’s based on the key observation that for a given data element, half of the 2k target concepts will classify it to have one value and the other half will classify it to have the other value. The proof can be found in any of the 2 papers cited above.
Intuitively, if a learning algorithm works well for some problem domains, it will perform equally poorly in some other problem domain, making it a zero sum game i.e the average accuracy will be close to 0.5.
By exchanging the positions of the two sums, we find that the AGA is the sum over all sample points of the average accuracy of a sample point over all target concepts, weighted by the probability of drawing the particular sample.
Expected Generalization Accuracy
One key assumption of AGA is that the target concepts are uniformly distributed. In real life problems, this assumption does not hold and target concepts tend to have skewed distribution. This fact paves the way for opportunities to find learning algorithms with above average accuracy.
Consider a simple loan payment classification problem involving 2 feature attributes as follows
The boolean class attribute is defined by whether the borrower will repay the loan. For this problem domain, we have the knowledge that older people with higher income are more likely to repay the loan. it’s simply the simply innate nature of the problem.
The positive examples will be in the top right area of the 2 dimensional feature space. The more likely target concepts or functions will separate the data accordingly. Hence the target concept distribution for this problem is not uniform.
In EGA, the target concept distributions is taken into account as follows, giving rise to the potential of finding learner with above average accuracy.
Depending on the target concept distribution and learner, there is potential for expected generalization accuracy to be greater than 0.5 for some learner. So learning is not so hopeless after all. Learning is possible because the data for real life problems have bias and regularity. If the learner is in tune with the bias, it will learn well.
We are interested in finding out how well a learner performs not for all possible target concepts, but for all plausible target concepts.
Although, EGA shows us it’s possible to find learners with above average accuracy for real world problems, it’s value can be 0.5 when certain symmetry conditions are met. More details on this can be found in the second paper cited.
It’s been speculated that such symmetries are rare in our universe, making learning possible for most real world problems.
Are There Top Ten Machine Learning Algorithms
We are back to the original question we had asked ourselves. All we can infer from EGA is that for some problem domain there may exist one or more learners that will give us better than average generalization accuracy.
In quest for the top ten learning algorithms across all domains, we can try the following mental exercise. We can find the top 10 learning algorithms, using EGA for all possible known problem domains. It’s very unlikely that the top 10 list will be same across all problem domains.
It will be more credible if we qualify the statement about the top learning algorithms as follows: The top 10 machine learning algorithms for such and such problem domains. Any such statement without the context of the problem domain is flawed.
Although the biases and regularities in the data enables us to find learners with above average accuracy, the nature of those biases and regularities is nuanced and depends on the problem domain. Hence it is difficult to find a set of learning algorithms that will perform equally well among all problem domains.
With help from computation learning theory, we have shown that there may be top 10 machine learning algorithms, but only within the context of a subset of problem domains. So while solving a machine learning problem, you may have to look outside the top 10 list, to find the best learning algorithm for the problem in hand.