Overview
As Data Engineering experts, we’re constantly engaging with our clients to tackle the most complex data problems in their business. One way we do this is by bridging skillsets, technologies, and change management across teams.
A trend we’re seeing within our clients is integrating Matillion & DBT Labs. Thus, the purpose of this article is to give an unbiased perspective of these two companies, their products and why we’re excited about the future.
Business Problem
Due to market dynamics and the explosion of venture capital around the Snowflake boom, clients are overwhelmed with the variety of data solutions at their disposal. The demand for data is only increasing, the amount of time a data engineer has is limited, and there aren’t enough resources to meet the business SLAs.
To put this into context, here’s the latest Snowflake Ecosystem diagram
How long did it take for you to spot Matillion & DBT? As you can see, they’re both in the Data Integration category but the barriers between these two companies is getting smaller. Let’s learn why.
Background
A brief history lesson on these two companies, their products and key differentiators.
Matillion – History
Both companies are classified in the Data Integration according to Snowflake. Matillion was founded in 2016 to solve the problem of making data professionals more productive by aligning the ELT architecture with cloud computing.
Matillion is classified as an ELT/ETL provider where they handle the full E – L – T process. From an extract and load perspective, Matillion provides 150+ pre-built connectors for common data sources. Additionally, they allow customers to upload their own JDBC drivers and have a framework to create an API-based Custom Connectors.
Once data is loaded into Snowflake, Matillion users can apply their transformation logic, visually. Under the hood, Matillion is compiling the native SnowSQL that gets push-down onto Snowflake at run-time. Additionally, they integrate with any Snowflake function as well as AWS & Azure.
Lastly, Matillion also has an orchestration layer to tie together an end-to-end pipelines. Often times, we see customers orchestrate child jobs, refresh your Business Intelligence reports, integrate with your enterprise scheduler, or orchestrate DBT models.
Product Offerings
Matillion has 3 core products: Matillion ETL (METL), Matillion Data Loader (MDL) and the Data Productivity Cloud (DPC).
METL is a virtual machine deployed within the customer’s cloud environment. In 2021, they released their SaaS offering, Data Productivity Cloud, to integrate all of their products in a singular platform. You can read more about their journey here.
DPC is where their integration with DBT gets exciting (see ‘The Synergy’ section below)
Data Culture
Matillion leans towards the GUI / low-code data culture due to their intuitive interface and no coding requirements.
Core Differentiators
-
- Ability to customize data pipelines (e.g. no restrictions due to pre-defined schemas)
-
- Cloud-native push-down architecture
-
- Intuitive user interface to build end-to-end data pipelines & apply business logic, visually
-
- Pre-built connectors & custom connector framework
-
- Repeatable code leveraging their Shared Pipelines framework
DBT Labs – History
Where Matillion focuses on end-to-end pipelines, DBT Labs focuses on the “T” part of the ELT process. Similarly, DBT Labs was founded in 2016 when Fishtown Analytics created DBT (Data Build Tool) with the goal of applying best-in-class software engineering practices to data. What their CEO, Tristian Handy, calls the Analytic Development Lifecycle.
Product Offerings
DBT is deployed on top of your data platform and has 2 flavors: DBT Core & DBT Cloud. Both products write optimized code that gets push-down onto Snowflake.
DBT Core is their open-source product with a command-line experience where developers write SQL to develop, test, and run DBT models locally.
Whereas DBT Cloud is their SaaS offering. Built on top of Core, users develop via a web UI where they can schedule models (e.g. no need for Airflow), observe changes over time, build data catalogs, and lock down metrics – what they call DBT Semantic Layer.
Data Culture
DBT spans both high-code & low-code data cultures. Core is typically found within high-code Data Engineering cultures where developers are comfortable with CLIs & Airflow. Whereas DBT Cloud tends to be in more analyst-centric data cultures where users aren’t as versed in complex orchestration layers and prefer to write SQL.
Core Differentiators
-
- Data quality checks
-
- DBT Semantic Layer
-
- Focus on Analytics Engineering (e.g. bridge the gap between Data Engineer & Analyst)
-
- Modularly build analytics code
-
- Version control & CI/CD via GIT
Why The Synergy
So why are organizations integrating both of these technologies? The investments in their cloud products and for solving key gaps for one another:
-
- DBT needs data in Snowflake in order to transform. Matillion solves that through their robust & highly customizable ingestion capabilities.
-
- Matillion needs to expand to high-code users. DBT solves that with their comand-line / code-forward experience.
-
- DBT Core needs an orchestration layer. Matillion solves that through their orchestration capabilties.
-
- Matillion needs better data quality checks, testing, and version control. DBT solves that through their native DQ check framework.
Both companies are accelerating innovation in their cloud products (just look at Matillion’s RAG capabilities & DBT’s Semantic Layer). So, when integrated, these two technologies allow the entire data organization – Data Architects, Analysts, Data Scientists, Data Engineers, & Analytics Engineers – to get data business & AI-ready, faster.
Conclusion
The article discusses the integration of Matillion and DBTLabs as a solution to complex data challenges faced by businesses, especially in the context of increasing data demands and limited resources. It highlights Matillion’s strengths in data extraction and loading with its user-friendly, low-code interface, while dbt focuses on the transformation aspect with robust testing and analytics engineering capabilities. Together, these platforms provide a comprehensive approach to data management, enabling organizations to streamline workflows and enhance data quality across teams.
Comments are closed