Skip to main content

How to create End-to-End Machine Learning models pipeline in Microsoft Azure


As a data scientist, your work life will focus on value from data. Working on data and deriving valuable
information from this asset will always be your top priority. You only do not want to explore the data, but
you also want the uncovered hidden patterns and trends in the data to be used for decision making.
With the new problem being more data and less information, it is critical to have a strategy in place
regarding how to make the information extracted from data available to end users. That is, you want
the information derived from the data to be actioned and used by relevant stakeholders.

Otherwise, the effort used in exploring the data, building models and making predictions may not
realise the intended business benefits. This sounds counter-intuitive but it is the reality of many data
scientists with just PowerPoint presentations the only thing from their work or a report of their findings.
You can change this with more accessible and easy to deploy technologies at a reduced cost and time.
After spending a lot of time preparing your data, training the models and making predictions, you need
to put all these into a pipeline so that the model can be used in applications and other downstream
processes. Also, you might want to display the results in a dashboard for end users.
Automating the E2E process also enables the pipeline to automatically ingest new data as they come in
and include them as part of the dataset to pick up new trends and patterns in the data for more accurate
predictions.

There are a few options in the market. You could use AWS, Google Cloud or Microsoft Azure. In this
blog, I will be writing about Microsoft Azure. There are numerous advantages of using cloud computing,
these include flexibility, automated software updates, limited capital outlay, version control, collaboration
and remote working.

Before we look at model operationalisation, I will like to state the main steps involved in a typical data
science project after business understanding, ingesting relevant datasets and data exploratory analysis
stage. These are Data Preparation, Data Transformation, Model Training and Model Testing.


Note: This is just an example and will be different for every project, some project will require more or less
data aggregation of the dataset used.


Each of the data science processes will be part of the steps highlighted below using products available in
Azure subscription:

Step One:

Set up an Azure subscription to have access to all the resources required. There are a few options here
which include setting up a Data Science Virtual Machine or creating a Databricks workspace, if you have
a big dataset. Use the notebook to write all your code for the data science work, such as Azuredatabricks,
it runs on Spark to use parallelisation and distribution dataframe structure to reduce compute time and
significantly reduces the execution time. This is very important especially for big datasets and IoT data.
An alternative to this is to use built-in Azure Machine Learning models that comes with your Azure
subscription. However, this is not bespoke and might have less flexibility.

Step Two:

Productionise your machine learning models using Azure Data Factory. This will take care of data
ingestion, preparation and transformation using scripts written in either Python or Scala in Azure
Databricks.

Step Three:

Application Insights is used to monitor the performance of your pipeline through configure telemetry as it
reports all exemptions and events happening across the pipeline. Continuous Integration/Continuous
Delivery (CI/CD) can be activated on Azure DevOps- Using CI/CD capability in DevOps for automated
integration of new features in your code from git. New features will be added automatically to your models
using this CI/CD framework.
To summarise, following these steps will enable you to operationalise your models quickly, efficiently
and scale up the solution easily. Scaling up will be easy due to the ability to configure your cluster
to scale up or down automatically depending on the required computing requirement.
Technology is always changing and data science is an iterative process, so an agile mindset and
methodology are important for continuous improvement of your applications or solution. Lastly, you
can generate REST API in Azure API Management to share your predictions with other users.

Comments

  1. Mohegan Sun Pocono - Mapyro
    The 경주 출장마사지 Mohegan Sun Pocono in Wilkes-Barre, Pennsylvania 남원 출장마사지 is an 8,000 foot (15,000 m2) 여주 출장안마 casino and hotel, 목포 출장샵 offering 1,800 slots, 70 table games and a variety of live entertainment Location: 1280 Highway 315Phone: 1-800-589-7711 경상북도 출장샵

    ReplyDelete

Post a Comment

Popular posts from this blog

Are you confused about what to study at university? Economics is the answer

I wanted to write about something personal this month. If you are thinking about your career and you are not sure about what to study at the university. I have been there before and will tell you five reasons an Economics degree might be the best decision for you. I was admitted to study Medicine at the age of 16 and was a Medic for four years. Apart from the course being a tough one requiring several hours of study. I did not enjoy what I was learning and thought I would instead do a social science course even if it meant learning new subjects and doing the extra study to catch up. I knew being a medical doctor might not be the way for me, but I was not sure of what a better alternative would be. I started talking to friends in different departments across social science, and one of them who was about to graduate recommended Economics, which according to him was the best degree as it helps your analytical and problem-solving skills. You can work in any industry which i...

AI and automation are changing the Construction Industry

We’ve all heard about the fourth industrial revolution (industry 4.0) and the technological disruption it will bring to the world of work. To show its popularity, a 7-Day search on Google Trends reveals that the query ‘industrial revolution’ had peak popularity of 100 on September 27 2018, at 3:00 AM in the UK. 100 is the highest rating for search queries popularity on Google Search. The 'popularity' shows two things; there is an increased awareness about the topic and businesses are evaluating its impact on their operations and strategies. So, what does the fourth industrial revolution represent? The simplest definition of Industry 4.0 is, it is a move towards digitalisation. In other words, it is a period when machines will self-optimise and self-configure to produce quality goods and services using artificial intelligence (AI). Development in big data analytics will enable industries to be more agile through real-time data processing that generates insights faster for...

Essential characteristics of Good Jobs

With record employment level in most G7 countries and the increase in economic growth, little attention is given to analysing the quality of work created. Creating more jobs is paramount to make a nation prosperous and for the redistribution of wealth. Some of us were taught the quantitative measure of labour demand and supply in economics. The chart shows increasing economic growth and employment rate in the UK. UK GDP and Employment Rate, 2010-2018 Qualitative measures regarding the quality of work is also important for a happier workforce and society. Work should be evaluated both quantitatively and qualitatively for measurable improvements and a balanced society. Personally, I do know that if I am happy at work, I feel and perform better both at work and home. Same goes for most of my family and friends. Providing good work for all citizens should be a national strategic objective in every modern society. There is a consensus among existing studies that with the right leg...