Writing Stored Procedures in Snowpark (Python)

To Nha Notes | Feb. 21, 2023, 11:57 a.m.

This series shows you the various ways you can use Python within Snowflake.

Snowpark for Python is the name for the new Python functionality integration that Snowflake has recently developed. At the Snowflake Summit in June 2022, Snowpark for Python was officially released into Public Preview, which means anybody is able to get started using Python in their own Snowflake environment.

I would understand if you wanted to skip this post and cut directly to the Snowflake’s monolithic documentation page; however, I would strongly recommend reading through this post first as it will give you an introduction to both the possibilities and the limitations of this new functionality. We will also walk through the steps you can take to create your own Python stored procedure from within Snowflake itself. There’s no need to set up anything on your local machine for this; all you need is a Snowflake account and the relevant privileges, most notably the CREATE PROCEDURE privilege at the schema level.

If you are interested in finding out about Python UDFs instead of stored procedures, be sure to check out my matching  post, “A Definitive Guide to Creating Python UDFs Directly with the Snowflake User Interface.”

Why Use a Python Stored Procedure Instead of SQL Scripting?

Most Snowflake users are far more comfortable with SQL than they are with Python or other advanced programming languages, simply because SQL is the core language required to leverage the Snowflake platform. With that in mind, it is easy to consider using SQL-based stored procedures instead of other languages; such as Python. However, there are several strong reasons to leverage a non-SQL language instead, especially when compared to Python:

  • Anything you can achieve with a SQL stored procedure can also be achieved with a Python stored procedure.
  • A SQL stored procedure would have a harder time defining and executing functions within itself, especially when this is paired with looping. SQL is fantastic, but I would not argue for it to be a first class programming language like JavaScript and Python.
  • Python is a high-level language capable of far more than standard SQL, including the ability to import and leverage functionality from a wide number of modules.

To summarise, Python is simply more versatile than SQL and unlocks a wider range of functionality. Here are a few examples of powerful functionality that is possible with Python and not with SQL:

  • Construct and execute SQL queries with greater ease and flexibility than with SQL scripting
  • Access supporting files in cloud storage and read their contents to contribute to the overall Python script
  • Perform powerful data transformations built out of multiple components using Pandas
  • Leverage the map() function or list comprehension to apply a function to each value of a list; a simple way to support looping and iteration
  • Apply machine learning models to generate new forecasts using libraries such as PyTorch or scikit-learn
  • Generate authentication key pairs and apply them to users

How to execute multiple SQL statements in Snowflake stored procedure in python language?

- Load SQLs which has multiple statements separated by ; from staged S3 files via a Snowflake UDF function as blow.

create or replace function my_udf()
  returns string
  language python
  runtime_version=3.8
  imports=('@my_stage/file.txt')
  handler='compute'
as
$$
import sys
import os

with open(os.path.join(sys._xoptions["snowflake_import_directory"], 'file.txt'), "r") as f:
    s = f.read()

def compute():
    return s
$$;

- Create a common stored procedure in python language to split loaded SQL statements and iterative each SQL to execute it.

CREATE OR REPLACE PROCEDURE <PROCEDURE_NAME> (input_param STRING)

  RETURNS STRING

  LANGUAGE PYTHON

  RUNTIME_VERSION = '3.8'

  PACKAGES = ('snowflake-snowpark-python')

  HANDLER = 'run'

  EXECUTE AS CALLER

AS

$$


def execute_sql(session, sql):

    if sql:

        session.sql(sql.strip()).collect()


def run(session, output_bucket):

    sqls_str = """

        <STATEMENT_1>;

       <STAGEMENT_2>;

    """

    sqls = sqls_str.split(';')

    for sql in sqls:

        execute_sql(session, sql.strip())

    return "SUCCESS"

$$;

 

  

References

https://interworks.com/blog/2022/08/16/a-definitive-guide-to-python-stored-procedures-in-the-snowflake-ui/

https://pkg.go.dev/github.com/snowflakedb/gosnowflake

https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-creating