Overview
One of our recent client projects struggled with observing their data pipelines across their two vendors – Matillion & DBT. This article addresses how the CloudEQS team built a Streamlit on top of Snowflake so data engineers, support engineers, and leadership have a centralized place to observe the health of their ETL pipelines.
Business Problem
Data Engineering teams often leverage numerous data pipeline vendors to get their data analytics & AI-ready. Because of the disparate nature of the solutions, meta-data silos still exist. Without a centralized observability layer, the time spent troubleshooting ETL pipelines increases. This results in less time building data pipelines and solving business problems.
Why It’s Important
Data pipeline observability addresses these challenges by providing organizations with the tools and insights needed to understand their data flow fully. Here’s why it’s essential:
- Proactive Monitoring: Observability enables teams to detect issues before they escalate into larger problems.
- Enhanced Data Quality: With observability, teams can implement data validation checks and establish benchmarks for quality. This leads to cleaner data, which in turn supports more reliable analytics.
- Collaboration Across Teams: Improved visibility fosters collaboration between data engineers, analysts, and business stakeholders.
- Compliance and Governance: Observability tools can automate data lineage tracking, ensuring organizations can demonstrate compliance with regulations and maintain trust with stakeholders.
Solution Approach
Leveraging meta-data driven designs, the CloudEQS team created a solution approach to solving this problem. Here are the steps we followed:
- Metadata Tables: Leveraging the Matillion & DBT APIs, we built metadata tables that consolidates ETL job names, schedules, job status, run times, etc.
- Snowflake Roles: Before creating a Streamlit app within a Snowflake account, we needed to create roles to have access to the database & schema where the metadata tables and master staging tables are created.
- Streamlit Codebase: Leveraging the simplicity of Snowflake’s integration with Streamlit, the team was able to turn the underlying python code into a web application.
- Consolidated Insights: The application dashboard surfaces insights across Maitllion & DBT.
Expected Business Outcomes
The outcomes our client has achieved with this new observability layer include:
- Leadership-level observability on ETL pipeline health
- Search ETL jobs within given timeframe
- Streamlined communication to data consumers
- Faster resource allocation to resolve ETL failures
- Enhance knowledge base of ETL jobs & failures
What’s Next?
For now, this application is a Connected Application Powered by Snowflake. Meaning, a Snowflake user can access the application code base from CloudEQS’s domain and integrate it in their Snowflake instance.
The evolution of this application is to make it a Native Application on the Snowflake Marketplace. This way any Snowflake, Matillion & DBT customer can self-serve this application.
Conclusion
The CloudEQS team addressed a client’s challenge in observing data pipelines across Matillion and DBT by building a centralized Streamlit application on Snowflake. This solution consolidates metadata from various ETL jobs, enhancing visibility and facilitating proactive monitoring, improved data quality, and streamlined collaboration among teams. The application allows leadership to monitor ETL pipeline health, search job statuses, and enhance communication for quicker resolution of failures
Comments are closed