Microbatch Schedule in Matillion Data Productivity Cloud

Overview

The purpose of the article is to showcase a new microbatch design in DPC that removes the dependency on AWS SQS & Lambda functions. Resulting in simplifying pipeline setup, less maintenance of AWS services, reduce cloud spend, and decreased overhead.

The prerequisites for this design include:

Required permissions in Matillion DPC

Flex Connector to call the DPC API

API credentials or connection details for the data source

Working knowledge of Matillion DPC components (optional)

Previous Approach

The current approach is to use SQS message and lambda function to call the component itself which continues to run in a loop (see below). Allowing users to meet SLAs by creating a microbatch schedule to continuous call a Matillion job by subscribing to an SQS que.

What we’ve found for our clients is quite complex. Requiring both additional AWS infrastructure, permissions, task management, and knowledge of AWS configurations (just look at how long the documentation page is for this set up).

New Approach: Using DPC Flex connector

One of the key differentiators with Matillion’s Data Productivity Cloud is their concept of Flex Connectors. A flex connector allows users to create an API profile hosted within their Matillion Hub account and call that profile from within an orchestration pipeline.

For creating a microbatch schedule job directly within DPC, the user will need to create a Flex Connector profile for the DPC API endpoints.

As an overview, the job uses a project variable ‘microbatch flag’ to print either a 1 or 0. The flow control component checks the value of the variable and branches the pipeline based on the criteria set in the variable: 1 = keep looping, 0 = stop. If the flag is set to 1, the pipeline continues to insert data into a table. Then, the Flex Connector calls the pipeline itself and invokes the microbatch loop until the variable is set to 0 (either manually or programmatically)

Step-by-step process

Here are the steps for creating this microbatch pipeline.

Step 1: Set up Flex Connector for DPC

See documentation here. Including client ID, secret, and Oath connections.

Step 2: Create microbatch folder and root pipeline

Step 3: Set Project Variable

Step 4: Insert Row into Table

Using SQL component, insert a row into a table in Snowflake.

Step 3: Call the DPC API and call ‘microbatch job’

Leverage the Execute Published Pipeline endpoint.

Configure the pipeline. Including URI parameters & post body.

Step 4: Validate pipeline run by viewing Pipeline Activity in Matillion Hub

As you can see, the loop is running and executing every ~2 seconds (p.s. for security reasons we removed project and environment).

Future considerations

Set the variables programmatically leveraging Scalar Variables

Leverage Query Results to Scalar to call additional pipelines with this same design

Conclusion

This article explains a simpler way to set up microbatch schedules in Matillion’s Data Productivity Cloud (DPC) by ditching the need for AWS SQS and Lambda. With the help of Matillion’s Flex Connector, users can now trigger microbatch jobs directly within DPC, making the process faster and easier. This approach cuts down on AWS dependencies, lowers costs, and simplifies the setup, saving both time and effort.

Get In Touch

Microbatch Schedule in Matillion Data Productivity Cloud

Overview

Previous Approach

New Approach: Using DPC Flex connector

Step-by-step process

Step 1: Set up Flex Connector for DPC

Step 2: Create microbatch folder and root pipeline

Step 3: Set Project Variable

Step 4: Insert Row into Table

Step 3: Call the DPC API and call ‘microbatch job’

Step 4: Validate pipeline run by viewing Pipeline Activity in Matillion Hub

Future considerations

Conclusion

Manpreet Panesar

Comments are closed

Services

Industries

Partners

Quick Links