Who invented the machine learning term – quora gas under 2 dollars


The right question should be “Who are the father s of machine learning”. Because there isn’t a single person who invented it. Machine learning is a multi-disciplinary field, including probability, statistic, calculus and some computer science (mostly the AI part of CS, and coding too if you are into that). In each of these, you can name at least 2 or 3 people with major contributions, measured by what they found that are still used in machine learning. To answer the question with only one name is nonsensical.

ML is deeply rooted in probability and statistic. This is the reason why in most prediction models, we usually represent our output in term of probability, for example [math]P(y=positive | x)[/math]. A common loss function, the softmax log loss, is also from statistic. When we initialize our weights in neural network, we sample from a Gaussian distribution.

Honestly, probability and statistic are everywhere in ML whether you like it or not. So to fully answer the question, I think we must first know who have the most contribution to probability and statistic, as least to what are still used frequently in ML.

Some would answer Pascal or/and Fermat as they laid the foundation for probability in the 1600s. But at this point, they only “invented” probability mainly for game of chances (i.e. gambling). Comparing these to our current state of probability and statistic is like comparing 1 + 1 = 2 to calculus.

Personally, I think it’s a tie between Gauss, Laplace and Fisher. Gauss was famous for the Gaussian distribution, he also published the least square method (along with 3 other mathematicians). Gauss has strong contribution to the start of estimation theory.

The second man, Laplace, was the one who found the generalized version of Bayes’ theorem, and popularized it. Notice that there are two main schools of thoughts in Statistic. Prior to the rise of Bayes’ theorem, most (if not all) statisticians are frequentists. The main difference between the school of frequentist and the school of Bayesian is that, frequentists only produce predictions from the data he has, while a Bayesian allows some “prior” in his prediction. Basically, a frequentist will not predict the chance a meteor X would hit Earth because it has never happened before, but a Bayesian will, with some predefined prior and error.

This Bayesian school of thought created a lot of controversies (and it still does to this day). However, most of what we use today in machine learning is heavily derived from Bayes’ theorem or can be viewed from a Bayesian’s perspective. There is an entire field called probability graphical model that centers around Bayes’ theorem. When you use L2 regularization in training neural network, you implicitly assume that you have used a Gaussian distribution as a prior. (To know more on this, refer to the book “Pattern Recognition and Machine Learning” by Christopher Bishop).

Finally, Fisher wrote two books setting the standard of how statistics are to be used in other disciplines (economy, business, health, medicine, biology, etc.). He popularized statistic in the modern time (1900s). Ever heard of Fisher’s linear discriminant method? Yeah it’s him.

For the optimization part in calculus, we should thank Newton, Leibniz and Euler for their finding. But honestly, optimization theory has been developed a lot since then and naming 2 or 3 people is an understatement. Here is a nice read: History of Optimization.

In computer science, Alan Turing conceptualized the Turing Test, a test that measures if a bot has achieved intelligence (by fooling human). But he didn’t contribute much to machine learning (I don’t think the concept even existed at that point). I would say Geoffrey Hinton is a great contributor. We all know how the man popularized backpropagation, leading to the current fame of deep learning.

Many people were left out of this answer, I must admit. There is a tremendous amount of contributors to ML, either directly or indirectly. Honestly, I don’t even know why I have written all of this because it’s not even close to enough. To answer the original question, we would need an entire book on the history of ML.