Optimizing Data Platform Foundations For AI

Overview of the Project 

Just because an organization has a “Modern Data Stack” doesn’t mean all their problems are solved.  

Additionally, organizations can’t accelerate the adoption of GenAI unless their data is clean, centralized, governed, and, ultimately, trustworthy. Sadly, the reality for most organizations is not what the marketers present. Data quality is poor, reporting occurs in silos, data pipelines fail, data readiness can’t meet business SLAs, data teams lack governance standards…The list goes on and on.   

This was the case for one of our clients. As a leader in the Network Security industry, they needed to evolve their data platform foundations in order to unlock the variety of AI/ML workloads.  

Business Problem 

In the Network Security industry, leveraging data is a non-negotiable and is crucial in real-time threat detection and prevention by analyzing patterns and anomalies in network traffic. Moreover, as the world continues to digitize and adoption of generative AI continues, the Network Security industry demands organizations stay at the cutting edge of technology. 

Implementing a unified data and reporting platform is an ongoing journey. To unlock the full value of AI/ML, the foundations of an organization’s data platform need to match the speed, scale, agility, and complexities of the business units they support. For our client, they have ambitious AI initiatives but were restricted by their foundations.  

Data quality was poor, hundreds of ETL failures per month, high volume of production bugs, and lack of governance standards meant the data team was unable to operate at the speed of their business. Resulting in slow delivery of data downstream, delayed reporting, and high costs to resolve.  

Solution Approach 

As experts in guiding clients through the insights journey, the CloudEQS team followed a phase approach to evolve our client’s data platform: 

  1. Platform Health Check & Stabilization: Analyze current data pipelines, stabilize critical jobs, size infrastructure accordingly.  
  1. DevOps & Engineering Standards: Enable GIT integration for code versioning & optimal collaboration. Implement CI/CD automation to reliably deploy code. 
  1. Role Based Access Control (RBAC): Create roles, reduce redundant roles, and enhance current roles to secure projects with confidential data  
  1. Governance: Apply masking policies for confidential / sensitive data and create project-level governance & reporting (job monitoring, token creation, Slack integration for alerts). 
  1. Transformation & Data Prep Migrations: Migrate Matillion transformation jobs to DBT as well as Tableau Prep logic to DBT & Snowflake.  
  1. Unified Reporting & Metrics: Leverage Matillion’s orchestration capabilities and integrated DBT & Tableau reports to refresh. Centralize KPIs leveraging DBT’s Semantic Layer.  

Results 

With the expertise of the CloudEQS team, we were able to achieve the following results for our client in 7-months with zero downtime: 

  • Data Quality: Built 650 DBT models and 1,300 data quality checks. This resulted in reduction in data quality issues, improvement in time to resolve data quality errors, and 0 production bugs.  
  • Developer Productivity: 90% improvement in developer productivity 
  • Near real-time SLAs: With the new architecture, the data platform team can meet their 15-min SLAs for real-time analytics for their consumers. 
  • Self Service Analytics: With DBT semantic layer, the client has centralized KPIs and a single source of truth. As a result, they’re able to extend the architecture to business units so they can self-serve the insights they need.  

What’s Next? 

With this new architecture, the client is looking to achieve the following: 

  • Natural Language Processing (NLP) for Self Service Analytics: Leverage Natural Language Processing (NLP) against their DBT Semantic Layer to further accelerate their self-service analytic culture.  
  • Data Mesh Architecture: Advancing their DBT adoption, the client is going to implement a Data Mesh Architecture in DBT to extend the platform to other divisions.   
  • Unstructured Data Sources: With Snowflake & Matillion’s new capabilities to support unstructured datasets, the client is looking to ingest video, text and transcript datasets for analysis. 
  • AI Chat Bots: The client is implementing chat bots to further improve revenue growth and customer retention.  

Conclusion 

The case study highlights a project aimed at optimizing the data platform for a client in the Network Security industry to support AI and machine learning initiatives. Despite having a Modern Data Stack, the client faced challenges such as poor data quality, high ETL failures, and a lack of governance, which hindered their ability to deliver timely insights. The CloudEQS team stabilized the client’s data pipelines, established Engineering & DevOps standards, and enhanced governance measures, resulting in significant improvements.  

img

A curious data professional passionate about supporting clients on their data journey.

Comments are closed