A broad overview of Machine Learning Algorithms and its classifications

In its simplest form, Machine Learning (ML) could be described as when a machine learns something. Another general definition of ML is “the art and science of programming computers so they can learn from data”. My favourite definition is the one below because it tells you instantly what a Machine Learning program does.

According to Tom Mitchell (1997) “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

We use Machine Learning every day in most of the apps that we use, from recommendation systems to our email. A general example of a Machine Learning program that we are all familiar with is our email spam filter. The email examples that the spam filter system uses to learn are the training set, and each of the email used are known as a training instance.

The task here is to check new emails if they are spam or not, and the experience E is the training set. The data scientist will define what the performance P should be. P could be defined as the number or ratio of rightly classified spam emails. This is known as accuracy in a classification model such as this example.

You only need to build your ML algorithm once, and it will automatically adapt to changes. Hence, automating the task which saves you time spent on writing new computer scripts or programs to perform the task each time the rules or data changes. The advantage of using ML algorithm in the spam filter example is that the data scientist can inspect the results to see how many words were flagged by the algorithm as spam, and this might uncover underlying patterns and trends in the data that were not obvious previously.

The task of using ML algorithms to analyse a vast amount of order to discover patterns that are not immediately apparent is called data mining.
Therefore, Machine Learning is useful in solving the following problems:

Problems that require a long list of handwritten rules can be solved using one ML algorithm.
Complex problems that cannot be solved with the traditional approach.
Constantly changing environment with new data streams. ML models can adapt instantly to new data.
Uncovering insights into complex problems and patterns and trends in big data.

So, how many types of Machine Learning algorithms do we have? Well, the discipline is constantly evolving with technological advancements which are used to develop new algorithms. However, there are broad categories of the main types of ML used in data science projects.
The main classifications of ML models are:

Supervised versus unsupervised learning
Online versus batch learning
Instance-based versus model-based learning

Machine Learning programs can be divided into different types according to the amount of supervision that is required. They could be supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning.

Supervised learning involves including the desired results called labels in the algorithm training set/data. Most supervised learning programs are classification tasks such as the spam filter aforementioned, as it involves categorising emails as spam or non-spam.

A regression task is supervised learning as it is largely about predicting a target numeric value such as the price of a house, given a set of features known as predictors, such as the sqft of the house, distance to good school, proximity to transport links, etc.
Logistic regression can be used for a classification task such as calculating the probability of an email being spam.

In data science, the main types of supervised learning algorithms used are:

k-Nearest Neighbors
Linear Regression
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees and Random Forests
Neural networks

In unsupervised learning, as expected the training set is unlabeled and the system tries to learn by itself. Examples include:

Clustering; k-Means, Hierarchical Cluster Analysis (HCA), Expectation Maximization.
Visualisation and dimensionality reduction; Principal Component Analysis (PCA), Kernal PCA, Locally-Linear Embedding (LLE), t-distributed stochastic Neighbour Embedding (t-SNE).
Association rule learning; Apriori, Eclat.

Clustering is grouping similar classes together. For example, I can analyse my blog visitors by running a clustering algorithm to group similar visitors together. You can use a hierarchical clustering algorithm to subdivide each group into subgroups for more insights.

Chart showing clusters uncovered in a dataset

Other unsupervised algorithms include anomaly detection and association rule learning. Anomaly detection use cases include credit card fraud prevention through detection of an unusual transaction, identifying manufacturing defects and automatically removing outliers from a dataset before feeding it into another algorithm.
You show the system normal transactions that are expected called instances, and when new data comes in, the algorithm analyses it to know whether it is normal or an anomaly.

Association technique is used to detect correlations between consumers. You can use this algorithm to categorise customers by spending pattern or shopping habits. For instance, you can use the algorithm to analyse customers shopping baskets and discover that people who buy nappies also buy beers. This can inform the way the items are arranged in the store to attract customers and improve sales.

Dimensionality algorithm can be used to clean or simplify a dataset to improve the program performance. This involves combining similar variables or features into one. For instance, you can use the dimensionality algorithm to create a new feature like the wear and tear of a car from the age and mileage data of the vehicle.

Semisupervised learning is labelling only a few samples of the training data and allowing the algorithm to do the rest. They are usually a combination of supervised and unsupervised algorithms. An example is the deep belief networks (DBNs) where restricted Boltzmann machines (RDMs) unsupervised algorithms are stacked on top of each other.

The technique involves training the algorithm in an unsupervised manner and then fine-tuning it using a supervised learning method. An example is how Google algorithm recognises family members instantly from pictures uploaded once you name the members once on the platform.

Reinforcement Learning: Here, the agent which is the learning program observes the environment, select and performs actions which gets a return. Return could be positive if the system chooses the right action or penalty if negative action was taken. This allows the system to develop the best strategy for dealing with such situations called policy. It is used in designing robots in order for them to perform at the required level.

Batch and Online Learning: Batch learning involves training all your data and evaluating the performance of the algorithm in an offline mode before launching it. Online learning is training the program by increasing the amount of data fed into the system gradually, such a system is useful in applications using fast-changing data such as stock prices.

However, you might need to build an anomaly detection algorithm to avoid bad data in an online environment for end users, and closely monitor the system so that learning can be switched off once an anomaly is detected in the program.

Instance-Based Versus Model-Based Learning: This classifies Machine learning models based on how they generalise. In instance-based learning, the program learns the instance by heart and generalises new cases using a given similarity measure.

Model-based learning uses examples of situation expected to create a model, which in turn is then used to make predictions. Regression analysis is an example of model-based learning which can be used to uncover trends and patterns in a dataset.

Let us evaluate a univariate model exploring the relationship between income and the amount of energy consumed by consumers in the UK. We will use a linear regression algorithm in python to analyse the data.
energy_consumption = A1 + A2 * Income

The linear regression model above can be used to confirm whether the hypothesis that energy consumption increases as consumers income increases is true or not, while also predicting energy consumed by consumers that we only have their income data. The main idea is to specify parameters that make the linear model best fits the data.

You can use a cost function or fitness function to evaluate the performance of your model. The objective is to minimise the cost function which is the distance between the linear model predictions and those of the training sample. All you need to do is to imput the training examples into the algorithm and it finds the parameters that make the linear model a best fit for the data.

A linear regression code example in python:
–– Import the libraries
Import matplotlib
Import matplotlib.pyplot as plt
Import numpy as np
Import pandas as pd
Import sklearn
–– Load the data
energy_consumption = pd.read_csv(“filename”, units=’,’)
income = pd.read_csv(“filename”), thousands=’,’, delimiter=’\t’, encoding=’latini’,na_values=”n/a”)
–– Prepare the data
energy_profile = prepare_energy_profile(energy_consumption, income)
X = np.c_[energy_profile[“income”]]
y = np.c_[energy_profile[“energy_consumption”]]
–– Visualise the data
energy_profile.plot(kind=’scatter’, x=”income”, y=’energy_consumptio’)
plt.show()
–– Select a linear model
Model = sklearn.linear_model.LinearRegression()
–– Train the model
model.fit(X, y)
–– Make a prediction for a new customer that has an income of £45000
X_new = [[45000]
print(model.predict(X_new))

To improve the performance of the model, you can input more variables and run a multivariate model. For example, in the case of the example stated, you can add variables such as household number, number of gadgets, age, level of education, the age of the house, etc.

All in all, the steps involved in a regression analysis are:

State the hypothesis from the business problem.
Study the data available.
Select a model.
Train the model with the training set to enable the model to find model parameters that minimize the cost function.
Use the model to make inferences. This is the predictive power of ML algorithms.

I guess this leads us to the next logical question about challenges or issues that can reduce the predictive power of a Machine Learning algorithm. There are basically two fundamental things that can go wrong, that is, building a bad algorithm and or bad data. Lets break this down. The main bottlenecks are:

Insufficient data: You need a large number of data to have an accurate prediction from the model.
Lack of representative training set: It is important to input training set that is identical to the expected inference to improve the predictive power of the model.
Lack of quality data: It is important to clean your training set to get rid of errors, outliers, handle missing values and noise.
Features quality: You need to ensure that only relevant features are included in the model. You can use dimensionality algorithm to improve existing features or a process known as feature engineering. Feature engineering comprises of three things, that is, feature selection; feature extraction and new features creation through the collection of new data.
Training data overfit: Use regularisation to reduce the problem of overfitting the training set. Overfitting is when the model shows inaccurate p-values, co-efficients and R-squared due to the training set exploring the random error instead of uncovering only the relationship between the analysed features (variables).Therefore, you need to tune the hyperparameter in order to avoid data overfitting.
Training data underfit: This is the opposite of the situation described above. Underfitting can be avoided by using a model with more parameters (multivariate instead of a univariate model), feature engineering and reducing the regularisation hyper-parameter.

Finally, how do you test or validate a Machine Learning model before it is deployed? The recommended approach is to divide your data into two sets: the training set for training the model and the test set for testing the predictive power of the model. The testing set gives a error rate results when the inferences are made on the new instance, and this error measure is called the generalisation error.

Looking at the generalisation error, you can know how well your model performs on new instances. The testing sets also give the testing error to know the performance of the model. Cross-validation is used to deal with the issue of choosing the wrong hyperparameters by splitting the training set into complementary subsets which are trained against different combinations of the subsets and validated using the remaining parts.

ML algorithm validation result example: If the training error is high which means that many mistakes are made on the training set, but the generalisation error is low. This would suggest that the model is underfitting and needs finetuning. Opposite results suggest overfitting.

Typically, most data scientists use 80% of the data for training and 20% for testing.

Clearly, Machine Learning is an exciting part of data science which has many capabilities and use cases that can be used to solve business problems and improve the performance of existing applications.

CrestML

Search This Blog

A broad overview of Machine Learning Algorithms and its classifications

Comments

Post a Comment

Popular posts from this blog

Are you confused about what to study at university? Economics is the answer

AI and automation are changing the Construction Industry

Essential characteristics of Good Jobs