azure

About Client

The client builds artificial intelligence (AI) solutions for applications in energy, oil and gas, manufacturing, finance, aerospace, defence, and security sectors. Their primary requirement was to build efficient data pipelines that could seamlessly integrate various data sources and adapt to the ever-changing nature of business requirements.

Business Need

Challenges

Raw data is often fragmented, dirty, and lacks context. This makes it difficult to generate insights from the data.

Technical Details

We used the Data Orchestration Ecosystem (DOE) framework to solve this problem. The DOE is a framework that sits on top of Prefect and allows you to ingest, transform, and enrich data in a low/no-code way. It is designed to be flexible and scalable, so you can easily customize it to meet your specific needs.

CDP Formatted Case Study

Architecture

Azure Triggers are used to automate processes, execute tasks on a schedule, or respond to events in real-time. So whenever a new file is dropped or added into Azure blob storage, the trigger promptly fires an event, prompting the execution of Azure Function Apps.

Azure Function Apps are a serverless compute service provided by Microsoft Azure. They allow you to execute code in response to events without the need to manage the underlying infrastructure. Function Apps are ideal for building event-driven applications and microservices that scale automatically based on demand. This function app call the prefect API with necessary information about where to find the configuration, the input data etc.

Prefect is a workflow automation tool that enables data engineers and data scientists to define, schedule, and execute complex data workflows in the cloud. It lets you coordinate your workflows - running them on a schedule with automatic retries, caching, reusable configuration, a collaborative UI, and more.

Prefect is implemented in Python and allows users to define workflows using Python code, making it easier for developers. Prefect offers tools for managing flows, including versioning, serialization, and scheduling. This allows for reproducibility and consistency in workflow execution. Prefect provides built-in monitoring and logging capabilities, allowing users to track the progress and performance of workflows.

It also supports distributed execution, enabling workflows to be executed on multiple machines or in a cloud-based environment.

This is where Kubernetes comes into the picture. Kubernetes is used in conjunction with Prefect to enhance the scalability and flexibility of running data workflows in a containerized environment. Kubernetes is a powerful container orchestration platform that automates the deployment, scaling, and management of containerized applications.

When Prefect starts its execution, it loads the configuration and input dataset from the location provided by the trigger function. Then Prefect initiates data orchestration, we begin by cleaning the data to ensure compatibility with the ML models. Once the data is cleaned, it is then fed into pre-trained ML models that are hosted as a separate web service. These ML models have been prepared using the DOE framework as well. Upon successful execution of the ML models, the resulting data is stored in a graph database (Neo4j, in this case) and a Solr database (depending on the use case), following a predefined structure.

The data extracted from various sources is currently stored in graph and Solr databases, making it challenging for humans to comprehend directly. To address this issue, we have developed a web application that transforms this data into a human-readable format. The application fetches the data from the databases where it was stored and presents it in an intuitive graphical view. Through the interactive user interface, users can also take advantage of various functionalities, such as filtering and searching, to easily explore and analyze the information.

Business Impact

Technology

azure
azure functions
logo-python
solr
neo4j
mongodb
react
kubernetes

Related Contents

5 minutes read

Auth0 IAM implementation for Zero
trust Networking platform

A leading provider of media optimization solutions recognized the need for a robust Identity and Access Management (IAM) solution to strengthen their security framework and streamline user access to their platform. 

Know more
5 minutes read

Okta to Azure AD Migration

The client, a multinational organization, had been using Okta as their primary identity and access management (IAM) solution for several years. However, due to a strategic organizational decision, they decided to migrate their IAM infrastructure from Okta to Azure Active Directory (Azure AD).

Know more
5 minutes read

Okta IAM Implementation
for a Media Optimization Platform

A leading provider of media optimization solutions recognized the need for a robust Identity and Access Management (IAM) solution to strengthen their security framework and streamline user access to their platform. 

Know more