Overview
In this blog, we will discuss how to convert SQL code into a Matillion job. This can be achieved using multiple components provided by Matillion.
Before we dive in, let’s define our terms. Matillion has 2 types of jobs: Orchestration & Transformation. The distinction between these types of jobs is important.
- Orchestration Jobs – Simply put, Matillion Orchestration jobs cover the E & L of the ELT equation. It’s in orchestration jobs that Matillion users are able to ingest data from pre-built connectors as well as orchestration-level functionality. Including uploading API profiles, ability to call child orchestration and transformation jobs, reverse ETL (e.g. output components), and flow control alerting operations.
- Transformation Jobs – Matillion transformation jobs cover the T of the ELT process. Meaning, Matillion always loads the raw data first and pushes down all transformation logic to the underlying cloud data platform.
Let us look at some of the transformation job components which are used mainly to convert SQL code into a Matillion job.
Read Components
These are the components that define the source of data that is to be transformed. They vary from the Table Input and Multi Table Input, which define tables from your data instance, the Stream Input for a data stream, or Fixed Flow and Generate Sequence for user-defined inputs.
Join Components
Matillion provides a range of join components that allow users to join data from multiple sources and perform various types of joins, lookups, and merges. The most commonly used component of this group will be the Join component, which is equivalent to the Join expression in SQL statements. The Join component allows users to perform an inner join, left join, right join, or full outer join on two data sources based on a common key. Users can select the join type, specify the join key, and select the columns to include in the output. These components are easy to use and can be configured using a drag-and-drop interface.
Transform Components
Matillion is an ELT (Extract, Transform, Load) tool that helps users to extract data from various sources, transform it into a desired format, and then load it into a target destination. The transformation components in Matillion are pre-built blocks of code that perform specific data transformations, such as filtering, sorting, aggregating, joining, and merging data. These components are used to perform data transformations on the data. Matillion provides a wide range of transformation components.
Some of the key transformation components available in Matillion are:
- Filter: Allows users to select a subset of data based on a set of criteria.
- Sort: Orders data by one or more columns in ascending or descending order.
- Aggregate: Performs summary calculations on data, such as calculating the sum, count, or average.
- Join: Combines data from multiple sources into a single table based on a common key.
Write Components
These components define how the data that has been transformed will be written. Data can be written to a new table or view, appended to an existing table, or updated using Change Data Capture.
Test Components
The Assert View component lets users verify that certain conditions are true of a view, or otherwise stop the query. Whenever a view is accessible on a Transformation Job, users can attach an Assert View component and assert against the metadata, values, and row count using comparison operators such as Equal to, less than or equal to, Greater than or equal to, and Range.
The Assert View component in Matillion is used to validate the data in a view by comparing it to an expected result set. This component is typically used in testing or quality assurance scenarios to ensure that the view produces the expected output.
We will be using some of the above components in below examples and converting SQL into Matillion Jobs.
Example 1
SQL Statement: SELECT SUM (amount) from payment where customer_id <200;
In order to transform this statement into a Matillion ETL transformation job, we just have to put together a few components to reach the final outcome, which will look like this.
Step 1: Table Input
This component is used to input the data from source. Here is a breakdown of the properties of the component:
- Name: Name of the component.
- Database: Source Database of the table. This can either be a hardcoded database name, a variable, or “Environment Default,” which is an environment variable that can be configured beforehand.
- Schema: Source schema of the table. This can either be a hardcoded schema name, a variable, or “[Environment Default],” which is an environment variable that can be configured beforehand.
- Target Table: The source table.
- Column Names: The columns being read into this component.
- Offset: A specific component for the Snowflake Data Cloud’s Time Travel feature. This “offsets” the source table contents by a number of specified seconds, which allows a user to see a table as it was X seconds ago. We don’t have that for this example.
Step 2: Filter
The filter component takes input table data and filters out rows according to criteria set in the Filter Conditions property. Here’s a breakdown of the properties:
- Name: The name of the component.
- Filter Conditions: Within this menu are several selections.
- Input Column: The column you want to filter on.
- Qualifier: Whether you want the filter to look for records that qualify or do not qualify. Imagine this as the “=” or “!=” portion of a statement.
- Comparator: The filter condition you want to implement. Some of these are meant for numeric or date values. Others are meant for text-type filtering.
- Value: The value being compared.
Step 3: Calculator
The final data can be retrieved using calculator component. To use this, we need to select include input columns as “no” else it will fail as there would be a greater number of rows based on input.
Example 2
SELECT Actor.actor_id, Actor.first_name, Actor.last_name, Actor_info.film_info, Actor.last_update from Actor INNER JOIN Actor_info where “Actor”.”actor_id”=”Actor_info”.”actor_id”;
Here, output_actor and output_actor_info is input tables.
Here are the properties of Join:
Summary
In this blog, we explored how to convert simple SQL queries into Matillion jobs using various components like Table Input, Filter, Join, and Calculator. Matillion’s drag-and-drop interface makes it easy to build complex ETL jobs without writing extensive code. By understanding the available components and their properties, you can seamlessly transform your SQL logic into scalable, cloud-based ETL pipelines.
Comments are closed