As cloud data warehouses are gaining popularity, new tools are emerging to interact with them. One of such tool is a Data Built Tool (DBT). DBT
is a data transformation tool that focuses on the Transformation part in ELT (Extract, Load, Transform) processes. It supports data warehouses such as Snowflake, Redshift, BigQuery, Databricks, etc. In this article, we will check how to export your Snowflake table to S3 bucket using dbt.
What is DBT and How it Works?
Before jumping into the approach on exporting Snowflake table to AWS S3 bucket from DBT
, let us check what is dbt and how it works?
DBT (Data Build Tool) focuses mainly on the data transformation part in ELT (Extract, Load, Transform) processes. The tool basically acts as an orchestration layer on top data warehouse. It performs various transformations using select statements, which in turn converts to a table and views, thus making the transformation process simple and effective. Anybody who can write SELECT statements can build a model, tests and schedule those jobs.
DBT works on the concept of ELT. It performs all the calculation at the database level, thus making the entire transformation process secure, reliable and easy to maintain.
How to Export Snowflake Table to AWS S3 bucket using DBT?
Snowflake works on transformations using SELECT
statements. There is no option to use SELECT statement to create a flat file. Many databases such as Redshift support an external table to export a table into CSV or flat file format. But, Snowflake supports table unload only using COPY INTO command.
The test part about DBT is that it provides an option to write macros. You can use those macros to perform various transformations such as complex calculations, executing multiple SQL statements, executing custom SQL statements, etc. You can also write custom macro using pre_hook
or post_hook
for merge / update / delete and other SQL statements.
Following steps helps you to export Snowflake table to AWS S3 bucket using DBT.
Let us check the above steps in detail with an example.
- Create a Snowflake External Stage
- Write Macro to Execute COPY INTO Command
- Export Snowflake Table using DBT Macro
Create a Snowflake External Stage
This one of the import steps to create an external stage that will point to the Amazon S3 bucket where you will unload the Snowflake table in CSV format. Snowflake external stages are database and schema specific, so choose appropriate database and schema where you want to create external stage.
You can use following SQL to create an external stage in a snowflake.
CREATE OR REPLACE STAGE S3_STAGE
url='s3:[YOUR_S3_PATH]'
credentials=(
aws_key_id='###'
aws_secret_key='###')
file_format = (type = csv);
Don’t forget to replace YOUR_S3_PATH with AWS S3 bucket URI. Once created, it will be available in the mentioned database and schema. You can verify using SHOW STAGE
command.
Macro to Execute COPY INTO Command
Macros in Jinja are pieces of code that is similar to “functions” in other programming languages such as Python, and are extremely useful to write reusable piece of code. Macros are defined in .sql
files, usually in your macros
folder under your projects.
You can use the following macro to execute Snowflake COPY INTO command from dbt.
{% macro export_to_s3(t) %}
{{ log("Unloading data", True) }}
{% call statement('unload_test', fetch_result=true, auto_begin=true) %}
copy into @S3_STAGE/{{tb}}
from {{tb}}
overwrite = true
{% endcall %}
{{ log("Unloaded data", True) }}
{% endmacro %}
Borrowed this macro from this github issue.
Export Snowflake Table to S3 Bucket using DBT Macro
Finally, you need to use above created macro in your model to perform Snowflake table export.
In the following example, we will use custom macro to execute the above macro and we will also keep dummy SQL statement otherwise, we will end with an error “Tried to run an empty query on model“.
{{
config(
materialized='run_custom_sql'
)
}}
select * from {{ref('TEST')}};
{{ export_to_s3('TEST') }}
Above SQL file will create a dummy view and executes the COPY INTO command to export Snowflake table in CSV format.
Related Article,
Hope this helps 🙂