Matillion ETL 1.70.0 Release

The purpose of this article is to provide an overview of the Matillion ETL (METL) 1.70.0 release. Through this article we will understand what new features are added, who should upgrade, how to shift to this new version and if it will it be a good decision to upgrade.

Features added

Here’s a list of all of the new features that are included in the 1.70 release:  

  • MariaDB component: Added to the list of available database types when using the Database Query component. 
  • Assign Tag component: (Snowflake): Assign existing tags to Snowflake objects, enabling you to better track object usage within your Snowflake account. 
  • Anaplan Bulk component: Leverage the Anaplan API to retrieve bulk data to load into a table via an Anaplan “export” or custom “view”. 
  • SAP ODP Extract component: Enables Matillion ETL users to connect directly through SAP ODP to access available data sources in SAPI and ABAP CDS view. This component includes many connection options that you may need to set (see below)
  • The Run dbt Command: Run dbt commands as part of an orchestration job.
  • Sync File Source: Fetches the latest commit for a specified file source
  • Manage External File Sources: Connect to (clone) and sync (pull) with a source code repository hosting service (Bitbucket, GitHub, GitLab, AWS CodeCommit)

Let’s dive deeper into some of the features.

Assign Tag

Tags empower your data stewards and administrators to monitor sensitive data with regard to compliance, discovery, security, and resource usage, all via a centralized or decentralized data governance management practice. 

Tip: You must create tags within your Snowflake database.  

Tags can be applied within a schema on the following objects: 

  • External tables 
  • Materialized views 
  • Pipes 
  • Procedures 
  • Stages 
  • Streams
  • Tables 
  • Tasks
  • Views

Below are the properties of assigning tags: 

  • Name = string: A human-readable name for the component. 
  • Database = drop-down: The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Database, Schema, and Share DDL to learn more. 
  • Schema = drop-down: The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more. 
  • Object Type = drop-down: The type of object to assign a tag to. 
  • Object Name = drop-down: The name of the object you wish to assign a tag to. 
  • Object Details = column editor 

Tag Name: Select a tag from the drop-down menu. Read CREATE TAG if you have not created any tags yet. 

Tag Value: A string value. Must be unique for your Snowflake schema. Accepts variables. 

SAP ODP

Matillion’s support for SAP ODP is their 4th way to connect to SAP. Historically, they’ve been able to connect to SAP Netweaver and Hana (via the JDBC driver) and Odata. Their ODP connector enables METL users to efficiency and easily in removing complexity from extracting from SAP.

Using the SAP ODP Extract component requires three additional files —two from SAP, and one from Matillion. These files must be added to the location /usr/share/java/sap/, which you can access via SSH. Here is the step-by-step guide of adding SAP libraries:

SAP files
  1. Log in to SAP and access the JCo download software. If necessary, select the Tools and Services page to display the download page. 
  1. Download the following files and ensure they are named as below: 
  1. libsapjco3.so — This is a “JCo” connection library. 
  1. sapjco3.jar — This is a Java wrapper. 
  1. Download the most recent version of the SAP JCo 3.x for Linux. 
Matillion ETL file 
  1. Download this matillion-jco-wrapper.jar file. 
  2. An old version of this file can be downloaded: matillion-jco-wrapper_S4H.jar. 

Once you have downloaded the three files, add them to /usr/share/java/sap/. The final step is to restart the Tomcat server.

The Run dbt Command

This new component allows users to orchestrate a dbt command from within a Matillion orchestration job. The dbt Run command runs dbt models stored in the Git repository you have cloned and synced. Bridging the gap between low-code and high-code developers. If you’re interested in learning more about the Matillion and dbt integration, I recommend you read our article: ENERTER ARTICEL NAME.

The dbt Run command runs dbt models stored in the Git repository you have cloned and synced. Below are the properties listed for the run dbt command component: 

  • Name = string: A human-readable name for the component. 
  • External File Source = drop-down: Drop-down menu for an external file source. Users can set up external file source connections in the Manage External File Sources modal. 
  • Command = string: A dbt command to execute. Currently, a single Run dbt Command component can run a single dbt command. To learn more about dbt commands, read dbt Command Reference. The Run command will run all models stored in your repository unless you specify a particular model. You can specify a model to run by running dbt run — {name_of_dbt_model}. 
  • Config = string: Add a configuration parameter of the profiles.yml file. Specify parameters inside single quote marks. Accepts grid variables. Users can also toggle Text Mode. All values are a Boolean, except PARTIAL_PARSE, which is an integer. 
  • Map Environment Variables = column editor: Specify any dbt environment variables and their values. 
  • Folder Path = filepath: An exact folder path within the file source structure. Accepts job variables. 

Sync File Source

The Sync File Source component fetches the latest commit for a specified file source. To connect to your Git file source, use Manage Passwords to store your source code repository hosting service (Bitbucket, GitHub, GitLab, AWS CodeCommit, etc.) credentials. 

Here is an overview of properties:

Property Setting Description 
Name String A human-readable name for the component. 
External File Source Drop-down A file source to sync. Use Manage External File Sources from the Project menu to add a new external file source Check Use Variable if you wish to set a variable as the file source. Currently, Git is the only supported file source host. 

The Manage External File Sources

Manage External File Sources lets you connect to (clone) and sync (pull) with a source code repository hosting service (Bitbucket, GitHub, GitLab, AWS CodeCommit). A connected file source can then be selected in the Sync File Source component. 

Here are the steps of cloning a file source: 

  • Click Project → Manage External File Sources. 
  • In the Manage External File Sources overlay, click +. 
  • Complete the fields in the Connect External File Source from Git Repository overlay. The table below describes each field. 
  • Click OK to finish. Manage External File Sources will now list the newly cloned file source. From here, you can click the Sync and Delete buttons if you wish to manage your file source. 
Property Description 
Source Name A name for the external file source. This can be the name of your remote repository, if you wish. 
Remote URI Provide the connection URL of your remote repository (HTTPS). 
Username The connection username. Matillion ETL can autopopulate this field when reading the Remote URI
Password Select a password entry that is obfuscating your connection password. Click Manage to add your connection password to Manage Passwords. Certain source code repository hosting services require the use of an in-service app to be created—with a password generated and assigned by the service—instead of using a login password to connect to the repository. See the below section Hosting Service Passwords for more information. 
Branch Specify an existing branch, for example main or master

Issues Resolved by 1.70 Release

Below are the issues that were taken care in 1.70 release for Matillion which also term as a reason to upgrade our Matillion version from previous to 1.70.

  • Fixed an issue where the Google BigQuery component would not generate correct SQL to run when configured in Basic Mode and where a table had a column of type struct. 
  • Fixed an issue where the Zendesk Support Query component was not retrieving data from column custom_field_options_ in the ticket_fields table. 
  • Fixed an issue where API Query Profile parameters were being duplicated during paging, causing the API Query component to fail. 
  • Fixed an issue where an incorrect number of concurrent users were shown when applying licences on multiple environments. 

Requirements to upgrade to 1.70

In order to upgrade, you need to follow 3 simple steps. There are best-practices for any METL upgrade. Including:

  • Backup
  • Update
  • Roll-back

Let’s dive deeper into these steps.

Back-up  

Here are the steps to taking a backup of your METL instance:

  • In the EC2 Management Console locate the Instance running Matillion ETL and select it. 
  • Find Root devices and then select the EBS ID of the volume. 
  • Right click on the volume and select Create Snapshot. 
  • Enter a Name and Description for your snapshot. 
Update Matillion

Users have 2 options to update to new version. They are listed below: 

  • Updating from the Admin Menu: Select the Matillion ETL Updates option from the Admin menu to open a new dialog box. You may check for software updates using the Check for Updates button. Available updates are listed in the console and the Update button will become available if these can be installed. Selecting ‘Update’ will download any updated packages and apply them – once applied the server will be restarted, which will disconnect any users and abort any running tasks. 
  • Updating manually via SSH: The AMI comes preconfigured with a software repository where updates are published. Therefore, a standard ‘sudo yum update‘ is all that is required to update the software on the AMI. 
    • In the EC2 Management Console locate the Instance running Redshift and select it. 
    • Right click and choose connect. Follow the on screen instructions.
    • Once connected issue the following – sudo yum update matillion-* –. Follow the on screen instructions. 
    • Restart Apache Tomcat using the following – sudo service tomcat8 restart. 
    • Log out of your session with – exit. 
Roll-back

If you notice any problems after updating, please shut down the instance, create a new volume from the backup taken prior to update, and make this the instance’s root volume. This will effectively roll-back the update. 

Who all can update to 1.70 version

The new version of Matillion ETL is specifically designed for companies who require the latest features in data integration and transformation. This version provides significant improvements in terms of functionality, user experience, and performance. The Assign Tag component allows users to assign metadata to their resources, making it easier to track and manage data sets. The Anaplan Bulk component enables users to import and export large amounts of data from Anaplan, a cloud-based business planning and performance management platform. The SAP ODP Extract (Preview) component allows users to extract data from SAP’s Operational Data Provisioning (ODP) framework. 

Moreover, the MariaDB connection across all the components provides users with a seamless experience when working with data in different components. The Run dbt Command feature allows users to run dbt commands within Matillion, eliminating the need to switch between different applications. The Sync File Source feature enables users to synchronize data between different sources, while the Manage External File Sources feature allows users to manage and access external files easily. 

Overall, companies that require enhanced data integration and transformation capabilities can benefit from upgrading to the new version of Matillion ETL. The new features and improvements in this version provide users with a more streamlined and efficient experience when working with data. 

Consequences of upgrading to new version

Upgrading to the new version of Matillion can have both positive and negative consequences, depending on the specific circumstances of the upgrade. Some of the potential consequences of upgrading to Matillion 1.70.

Positive Consequences: 

  • Access to new features: Upgrading to the new version of Matillion ETL can provide access to the latest features and enhancements, which can help improve data integration and transformation processes. 
  • Improved performance: The new version may offer improvements in performance, such as faster data processing or more efficient use of system resources.  

Negative Consequences: 

  • Compatibility issues: Upgrading to a new version of any software can sometimes result in compatibility issues with existing applications or systems. It’s possible that upgrading to Matillion 1.70 may cause problems with existing data sources, integrations, or workflows. 
  • Downtime and disruption: Upgrading to a new version of Matillion ETL can require system downtime, which can impact business operations and disrupt data processing and integrations. 

It’s important to carefully evaluate the potential consequences of upgrading to Matillion 1.70 and ensure that the benefits outweigh any potential risks or negative impacts. It’s also recommended to perform thorough testing and backup procedures before upgrading to minimize any potential risks or issues. 

Who should stay on old version

The decision to upgrade to the new version of Matillion depends on specific organizational requirements. Companies may want to stay on the older version if it meets all their business needs, they have a highly customized deployment, or they are not experiencing performance issues. Organizations should evaluate the benefits and potential drawbacks of upgrading to the new version and determine whether it aligns with their business goals. 

Summary 

Matillion’s release 1.70 introduces several new features, including the MariaDB component for database queries, the Assign Tag component for Snowflake, and new data extraction capabilities with the Anaplan Bulk and SAP ODP Extract components. Additionally, it enhances integration with dbt and external file sources, improving data management and workflow efficiency for users of Snowflake, Amazon Redshift, and Delta Lake. The update also addresses bugs and offers better performance, but requires careful planning due to potential compatibility issues and system downtime during the upgrade process.

Comments are closed