Snowflake offers a number of ways to ingest the data starting from batch processing to stream processing. SnowPipe feature offers micro-batch processing capability and is one of the very powerful means of ingesting your data without having to deal with provisioning of compute resources. Once configured and used for the right use cases, it offers scalable, cost-optimized and serverless means of loading your files from Azure's storage into snowflake.
Following scenarios best describe the cases when SnowPipe data ingestion is suitable
This architecture shows components and data flow related to SnowPipe on azure platform.
This is the input file we are going to ingest into Snowflake using SnowPipe. Typically these are stored in a blob storage and files may have a specific pattern of file names.
Azure platform (dotted lines) defines the Azure infrastructure boundary.
We created this resource group to keep track of resource listing and their usage for accounting and tracking purpose.
All our Azure’s storage’s resources are connected to this storage account
Blob storages are part of Containers and are created within storage accounts. This blob receives/stores individual CSV files by clients
"Blob created" events are subscribed so that Event grid triggers an action when a file arrives in blob storage and notifies Snowflake stage using the storage queue.
Snowflake's stage holds gets the details of the files that are ready to be ingested into snowflake,
Defines an integration between Azure storage and Snowflake.
A new user is created and granted permission on storage. Azure integration generates a consent URL that should be navigated and the authenticated to permit Snowflake to use Azure's storage resources.
This command creates SnowPipe or an external storage pipe.
Since compute clusters for SnowPipe are provisioned by Snowflake as needed, its billing appears as a separate line item and is listed as "SNOWPIPE". You can also query the billing details. Following query lets you list last 14 days of SnowPipe billing.
Absolutely! Once we build this data pipeline, data ingestion is fully scalable and serverless to receive any number of data files to ingest into snowflake.
Most of the detailed steps mentioned here are reusable and doing the same second time around is usually effortless. It looks like a lot of small steps but we could also write powershell, python and snowsql scripts to automate the whole SnowPipe creation process that could have all parameters fed in by a config file.
GitHub link has all files related to this demo.
SnowPipe is one of the great ways to ingest files and minimizes the cost when used with appropriate measures and planning.
Tags : #Snowflake #SnowflakeDB #SnowPipe