Skip to main content

Data Utilities

See on Snowflake

Overview

Data Utilities is a toolbox of functions for helping you call various services such ChatGPT, Claude, and FRED in Snowflake SQL. These utilities also include tools for parsing and standardizing data such as URLs and phone numbers. Whether you're working with external AI models or enhancing data quality, Snowflake Data Utilities provides flexible functions to support a range of analytical tasks.

AI UDFs

FunctionDescription
OpenAI FunctionsFunctions to interact with OpenAI models (e.g., ChatGPT) in Snowflake SQL for generating insights from text.
Anthropic FunctionsFunctions to integrate with Anthropic models (e.g., Claude) in Snowflake SQL.
Token Counter FunctionsUtility functions to count tokens for OpenAI and Anthropic models, useful for estimating cost and performance.
FRED FunctionsUtility FRED functions allow you to download and process economic data series directly from the Federal Reserve Economic Data (FRED) API within Snowflake
Note

The user is required to have an API key for the OpenAI, Anthropic and FRED functions

Utility UDFs

Unlike the AI functions, the phone number and URL parsing functions do not call external AI models.

FunctionDescription
Phone Number Parsing FunctionsFunctions to standardize and parse phone numbers for better data consistency.
URL Parsing FunctionsFunctions to clean and parse URLs, making web data processing easier within Snowflake SQL.

Internally, Snowflake uses these user defined functions to call out to AI tools and more accurately and consistently parse and match data across tables.


OpenAI Functions

The OpenAI functions allow you to call OpenAI models (e.g. ChatGPT) from Snowflake.

FunctionDescription
EVALUATE_OPENAI_PROMPT(model, system_prompt, user_prompt)

Returns string response from OpenAI.

Parameters:

  • model: OpenAI model used (full list of models can be found here).
  • system_prompt: contents of the system message.
  • user_prompt: contents of the user message.

Example

SELECT CybersynFunctionsAppAI.cybersyn.evaluate_openai_prompt(
'gpt-3.5-turbo',
'You are financial market expert',
'What is the name of the company with symbol SNOW. Just the name in json format.'
);

Returns:

{   "company_name": "Snowflake Inc." }

Set Up Instructions

  1. Create network rule to allow for external access:
CREATE OR REPLACE NETWORK RULE DB.SCHEMA.OPENAI_NETWORK_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('api.openai.com');
  1. Create secret with OpenAI API key:
CREATE OR REPLACE SECRET DB.SCHEMA.OPENAI_API_KEY
TYPE = GENERIC_STRING
SECRET_STRING = 'sk-';
  1. Create external access integration:
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION OPENAI_EXTERNAL_ACCESS_INTEGRATION
ALLOWED_NETWORK_RULES = (DB.SCHEMA.OPENAI_NETWORK_RULE)
ALLOWED_AUTHENTICATION_SECRETS = (DB.SCHEMA.OPENAI_API_KEY)
ENABLED = true;
  1. Grant access for the application to use the api key and external integration (replace CybersynFunctionsAppAI with the installed app name):
GRANT USAGE ON DATABASE DB TO APPLICATION CybersynFunctionsAppAI;
GRANT USAGE ON SCHEMA DB.SCHEMA TO APPLICATION CybersynFunctionsAppAI;
GRANT USAGE ON INTEGRATION OPENAI_EXTERNAL_ACCESS_INTEGRATION TO APPLICATION CybersynFunctionsAppAI;
GRANT READ ON SECRET DB.SCHEMA.OPENAI_API_KEY TO APPLICATION CybersynFunctionsAppAI;
  1. Call stored procedure to enable native app to have external access to OpenAI API:
CALL CybersynFunctionsAppAI.cybersyn.init_openai(
api_key => 'DB.SCHEMA.OPENAI_API_KEY',
external_integration => 'OPENAI_EXTERNAL_ACCESS_INTEGRATION');

Anthropic Functions

The Anthropic functions allow you to call Anthropic AI models (e.g. Claude) from Snowflake.

FunctionDescription
EVALUATE_ANTHROPIC_PROMPT(model, max_tokens, system_prompt, user_prompt)

Returns string response from Anthropic.

Parameters:

  • max_tokens: number representing maximum tokens to return.
  • system_prompt: contents of the system message.
  • user_prompt: contents of the user message.
  • model: Anthropic model used (full list of models can be found here).

Example

SELECT CybersynFunctionsAppAI.cybersyn.evaluate_anthropic_prompt(
'claude-3-sonnet-20240229',
100,
'You are financial market expert',
'What is the name of the company with symbol SNOW. Just the name in json format.'
);

Returns:

{"name": "Snowflake Inc."}

Set Up Instructions

  1. Create network rule to allow for external access:
CREATE OR REPLACE NETWORK RULE DB.SCHEMA.ANTHROPIC_NETWORK_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('api.anthropic.com');
  1. Create secret with Anthropic API key:
CREATE OR REPLACE SECRET DB.SCHEMA.ANTHROPIC_API_KEY
TYPE = GENERIC_STRING
SECRET_STRING = 'sk-';
  1. Create external access integration
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION ANTHROPIC_EXTERNAL_ACCESS_INTEGRATION
ALLOWED_NETWORK_RULES = (DB.SCHEMA.ANTHROPIC_NETWORK_RULE)
ALLOWED_AUTHENTICATION_SECRETS = (DB.SCHEMA.ANTHROPIC_API_KEY)
ENABLED = true;
  1. Grant access for the application to use the api key and external integration (replace CybersynFunctionsAppAI with the installed app name):
GRANT USAGE ON DATABASE DB TO APPLICATION CybersynFunctionsAppAI;
GRANT USAGE ON SCHEMA DB.SCHEMA TO APPLICATION CybersynFunctionsAppAI;
GRANT USAGE ON INTEGRATION ANTHROPIC_EXTERNAL_ACCESS_INTEGRATION TO APPLICATION CybersynFunctionsAppAI;
GRANT READ ON SECRET DB.SCHEMA.ANTHROPIC_API_KEY TO APPLICATION CybersynFunctionsAppAI;
  1. Call stored procedure to enable native app to have external access to Anthropic API
CALL CybersynFunctionsAppAI.cybersyn.init_anthropic(
api_key => 'DB.SCHEMA.ANTHROPIC_API_KEY',
external_integration => 'ANTHROPIC_EXTERNAL_ACCESS_INTEGRATION');

Token Counter Functions

Functions that call external API to find number of tokens in the string. This can be used to estimate cost of the prompt.

FunctionDescription
TOKEN_COUNT(prompt, model)

Returns number of tokens in the prompt

prompt: contents of the message

model: is OpenAI model to use, list of models can be found here

Example

SELECT CybersynFunctionsAppAI.cybersyn.token_count(
'You are financial market expert. What is the name of the company with symbol SNOW. Just the name in json format.',
'gpt-4'
);

Returns:

25

Set Up Instructions

  1. Create network rule to allow for external access:
CREATE OR REPLACE NETWORK RULE DB.SCHEMA.TOKEN_NETWORK_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('openaipublic.blob.core.windows.net');
  1. Create external access integration
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION TOKEN_EXTERNAL_ACCESS_INTEGRATION
ALLOWED_NETWORK_RULES = (DB.SCHEMA.OPENAI_NETWORK_RULE)
ENABLED = true;

3. Grant access for the application to use external integration(replace CybersynFunctionsAppAI with the installed app name):

GRANT USAGE ON DATABASE DB TO APPLICATION CybersynFunctionsAppAI;
GRANT USAGE ON SCHEMA DB.SCHEMA TO APPLICATION CybersynFunctionsAppAI;
GRANT USAGE ON INTEGRATION TOKEN_EXTERNAL_ACCESS_INTEGRATION TO APPLICATION CybersynFunctionsAppAI;

4. Call stored procedure to enable native app to have external access to OpenAI API:

CALL CybersynFunctionsAppAI.cybersyn.init_token_count(
external_integration => 'TOKEN_EXTERNAL_ACCESS_INTEGRATION');

Phone Number Parsing Functions

Clean and process phone numbers directly in Snowflake SQL. The user-defined functions (UDFs) allow users to easily extract phone number from a string directly within Snowflake queries, which can be particularly useful for data cleaning and analysis tasks related to contact information.

The UDFs are all callable in Snowflake SQL and take two parameters:

  • String with a phone number
  • Country code (optional parameter, default value is set to 'US')
FunctionDescription
GETPHONENUMBER(phone, country)Returns phone number in a national format if it is possible to extract, otherwise result is null

Examples

SELECT CybersynFunctionsAppAI.CYBERSYN.GETPHONENUMBER('My phone number is 619-582-9502'); -- returns (619) 582-9502

SELECT CybersynFunctionsAppAI.CYBERSYN.GETPHONENUMBER('My phone number is 619-582-950'); -- returns null

SELECT CybersynFunctionsAppAI.CYBERSYN.GETPHONENUMBER('020-8366-9179', 'GB'); -- returns 020 8366 9179

URL Parsing Functions

Clean and process URLs directly in Snowflake SQL. The user-defined functions (UDFs) allow users to easily extract website parts from URLs within Snowflake queries, which is particularly useful for data cleaning and analysis tasks related to clickstream and web traffic data.

Each UDF takes a single parameter, URL, which should be a string.

FunctionDescription
GETDOMAIN(``url``)Returns the domain with the suffix or top level domain without any URL parameters or subdomains.
GETDOMAINONLY(``url``)Returns only the core domain text without any subdomain, parameters, suffix or top level domain.
GETPUBLICSUFFIXDOMAINONLY(``url``)Returns only the public suffix domain. Public suffixes include suffixes under which users or companies can register domains.
GETSUBDOMAIN(``url``)Returns the domain in addition to the subdomain. The subdomain “www” is excluded from results.
GETSUBDOMAINONLY(``url``)Returns only the subdomain portion of the domain.

Internally, Cybersyn uses these functions to extract portions of URLs to more accurately and consistently parse and match domains across datasets. Use these functions to clean your own data or to easily join your internal data to Cybersyn’s web data.

Examples

Parse the domain details for the following url: https://chat.openai.com/?model=gpt-4-browsing

SELECT CybersynFunctionsAppAI.CYBERSYN.GETDOMAIN('https://chat.openai.com/?model=gpt-4-browsing');  -- returns openai.com

SELECT CybersynFunctionsAppAICYBERSYN.GETDOMAINONLY('https://chat.openai.com/?model=gpt-4-browsing'); -- returns openai

SELECT CybersynFunctionsAppAI.CYBERSYN.GETPUBLICSUFFIXDOMAINONLY('https://chat.openai.com/?model=gpt-4-browsing'); -- returns com

SELECT CybersynFunctionsAppAI.CYBERSYN.GETSUBDOMAIN('https://chat.openai.com/?model=gpt-4-browsing'); -- returns chat.openai.com

SELECT CybersynFunctionsAppAI.CYBERSYN.GETSUBDOMAINONLY('https://chat.openai.com/?model=gpt-4-browsing'); -- returns chat

FRED API Functions

The FRED functions allow you to download and process economic data series directly from the Federal Reserve Economic Data (FRED) API within Snowflake. The FRED data and FRED API are an external offering made available by FRED, subject to FRED's terms of use. Your download and use, as applicable, of FRED data and the FRED API, including in connection with this product, are subject to FRED's terms.

FunctionDescription
FRED_GET_SERIES(series_id)

UDTF that returns FRED series data as a table.

Parameters:

  • series_id: The ID of the FRED series (e.g., 'GDP', 'CPIAUCSL').
FRED_GET_SERIES_METADATA(series_id)

UDTF that returns metadata for a FRED series as a table.

Parameters:

  • series_id: The ID of the FRED series (e.g., 'GDP', 'CPIAUCSL').
FRED_GENERATE_FOUNDATION_TABLES(input_series_data_table, input_series_metadata_table, output_timeseries_table, output_attributes_table)

Stored Procedure to combine raw series data and metadata, process it, and output time series and attribute tables.

Parameters:

  • input_series_data_table: Fully qualified name of the table containing the raw series data (output from FRED_GET_SERIES).
  • input_series_metadata_table: Fully qualified name of the table containing the series metadata (output from FRED_GET_SERIES_METADATA).
  • output_timeseries_table: Name of the target table for the processed time series data.
  • output_attributes_table: Name of the target table for the processed attribute data.
FRED_DOWNLOAD_AND_GENERATE_FOUNDATION_TABLES_FROM_SERIES(series_ids_string, output_timeseries_table, output_attributes_table)

Stored Procedure that combines all steps: downloading data and metadata for multiple series, then generating foundation tables in one operation.

Parameters:

  • series_ids_string: Comma-separated list of FRED series IDs to process (e.g., 'GDP,UNRATE,CPIAUCSL').
  • output_timeseries_table: Name of the target table for the processed time series data.
  • output_attributes_table: Name of the target table for the processed attribute data.

Set up instructions

  1. Create network rule to allow for external access to the FRED API:

    CREATE OR REPLACE NETWORK RULE DB.SCHEMA.FRED_NETWORK_RULE -- Replace DB.SCHEMA with your database and schema names.
    MODE = EGRESS
    TYPE = HOST_PORT
    VALUE_LIST = ('api.stlouisfed.org');
  2. Create a secret to securely store your FRED API key. Replace 'secret' with your actual FRED API key. You can obtain a key from the FRED website.

    CREATE OR REPLACE SECRET DB.SCHEMA.FRED_API_KEY -- Replace DB.SCHEMA with your database and schema names
    TYPE = GENERIC_STRING
    SECRET_STRING = 'secret'; -- Replace with your actual FRED API Key
  3. Create an external access integration linking the network rule and the secret:

    CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION FRED_EXTERNAL_ACCESS_INTEGRATION
    ALLOWED_NETWORK_RULES = (DB.SCHEMA.FRED_NETWORK_RULE) -- Replace DB.SCHEMA with your database and schema names.
    ALLOWED_AUTHENTICATION_SECRETS = (DB.SCHEMA.FRED_API_KEY) -- Replace DB.SCHEMA with your database and schema names.
    ENABLED = true;
  4. Grant access for the application to use the secret and external integration. Replace SnowflakeDataUtilitiesApp with the actual name of your installed application if different. Replace DB and DB.SCHEMA with your database and schema names.

    GRANT USAGE ON DATABASE DB TO APPLICATION SnowflakeDataUtilitiesApp;
    GRANT USAGE ON SCHEMA DB.SCHEMA TO APPLICATION SnowflakeDataUtilitiesApp;
    GRANT USAGE ON INTEGRATION FRED_EXTERNAL_ACCESS_INTEGRATION TO APPLICATION SnowflakeDataUtilitiesApp;
    GRANT READ ON SECRET DB.SCHEMA.FRED_API_KEY TO APPLICATION SnowflakeDataUtilitiesApp;
  5. Call the initialization stored procedure to enable the native app to use the specified secret and integration for FRED API access. Replace DB.SCHEMA.FRED_API_KEY with the fully qualified name of your secret object.

    CALL SnowflakeDataUtilitiesApp.utils.init_fred(
    api_key => 'DB.SCHEMA.FRED_API_KEY',
    external_integration => 'FRED_EXTERNAL_ACCESS_INTEGRATION');

Example using single step to get all data

Replace DB.SCHEMA with your specific database and schema. Replace the example API key with your actual FRED API key.

  1. Ensure you've followed all of the instructions in the Set up Instructions.

  2. One-step Download and Process FRED Data:

    -- Download and process multiple series in a single step
    CALL SnowflakeDataUtilitiesApp.utils.FRED_DOWNLOAD_AND_GENERATE_FOUNDATION_TABLES_FROM_SERIES(
    series_ids_string => 'GDP,CPIAUCSL,UNRATE', -- Comma-separated list of series IDs to download
    output_timeseries_table => 'FRED_TS', -- Table will be created in SnowflakeDataUtilitiesApp.utils.FRED_TS
    output_attributes_table => 'FRED_ATT' -- Table will be created in SnowflakeDataUtilitiesApp.utils.FRED_ATT
    );
  3. Query the Generated Tables:

    -- Query the time series data
    SELECT * FROM SnowflakeDataUtilitiesApp.utils.FRED_TS LIMIT 10;
    -- Query the attributes data
    SELECT * FROM SnowflakeDataUtilitiesApp.utils.FRED_ATT;

This single-step approach eliminates the need to manually create intermediate tables and load data into them.

Example using separate download and process functions

We also provide you a way to separate download and process steps in case you want more granular control over the process. Replace DB.SCHEMA with your specific database and schema. Replace the example API key with your actual FRED API key.

  1. Ensure you've followed all of the instructions in the Set up Instructions.

  2. Fetch FRED Series Data:

    CREATE OR REPLACE TABLE DB.SCHEMA.fred_series_table AS (
    SELECT * FROM TABLE(SnowflakeDataUtilitiesApp.utils.fred_get_series('GDP'))
    );
    -- add another series
    INSERT INTO DB.SCHEMA.fred_series_table (
    SELECT * FROM TABLE(SnowflakeDataUtilitiesApp.utils.fred_get_series('CPIAUCSL'))
    );
  3. Fetch FRED Series Metadata:

    CREATE OR REPLACE TABLE DB.SCHEMA.fred_series_metadata_table AS (
    select * from TABLE(SnowflakeDataUtilitiesApp.utils.fred_get_series_metadata('GDP'))
    );
    -- add another series
    INSERT INTO DB.SCHEMA.fred_series_metadata_table (
    SELECT * FROM TABLE(SnowflakeDataUtilitiesApp.utils.fred_get_series_metadata('CPIAUCSL'))
    );
  4. Grant Select on Tables to the App: (Required for the procedure to read the tables)

    GRANT SELECT ON TABLE DB.SCHEMA.fred_series_table TO APPLICATION SnowflakeDataUtilitiesApp;
    GRANT SELECT ON TABLE DB.SCHEMA.fred_series_metadata_table TO APPLICATION SnowflakeDataUtilitiesApp;
  5. Combine and Process Data:

    CALL SnowflakeDataUtilitiesApp.utils.FRED_GENERATE_FOUNDATION_TABLES(
    input_series_data_table => 'DB.SCHEMA.fred_series_table',
    input_series_metadata_table => 'DB.SCHEMA.fred_series_metadata_table',
    output_timeseries_table => 'FRED_TS', -- Replace FRED_TS with your preferred name. Table will be created/re-created in SnowflakeDataUtilitiesApp.utils.FRED_TS
    output_attributes_table => 'FRED_ATT' -- Replace FRED_ATT with your preferred name. Table will be created/re-created in SnowflakeDataUtilitiesApp.utils.FRED_ATT
    );

    This will create two new tables, FRED_TS and FRED_ATT, containing the processed data with 2 FRED series we added above.

Notes

  • Data Restatements: FRED data can be restated periodically. It is recommended to perform full refreshes of your required series regularly to ensure you have the most up-to-date information.

  • Processing Multiple Series: To process multiple FRED series together:

    1. Insert the data for all desired series into a single series data table (e.g., DB.SCHEMA.fred_series_table).
    2. Insert the metadata for all corresponding series into a single metadata table (e.g., DB.SCHEMA.fred_series_metadata_table) using get_fred_series_metadata.
    3. Run the FRED_GENERATE_FOUNDATION_TABLES procedure once using these consolidated tables.

    Example of adding another series ('CPIAUCSL') to existing tables:

    -- Add CPI data
    INSERT INTO DB.SCHEMA.fred_series_table
    SELECT * FROM TABLE(SnowflakeDataUtilitiesApp.utils.get_fred_series('CPIAUCSL'));
    -- Add CPI metadata
    INSERT INTO DB.SCHEMA.fred_series_metadata_table
    SELECT * FROM TABLE(SnowflakeDataUtilitiesApp.utils.get_fred_series_metadata('CPIAUCSL'));

    After inserting data and metadata for all required series, call FRED_GENERATE_FOUNDATION_TABLES on the populated tables.

  • Rate Limits: The FRED API enforces rate limits. The separation of data download (get_fred_series, get_fred_series_metadata) and processing (FRED_GENERATE_FOUNDATION_TABLES) allows you to manage API calls more effectively. You can implement your own download logic (e.g., adding delays between calls) to respect these limits. If you create a custom downloader, you can still leverage the FRED_GENERATE_FOUNDATION_TABLES procedure, provided your data and metadata tables match the expected structure.

Disclaimer

Your use of OpenAI is subject in all respects to the terms and conditions of OpenAI set forth here. Your use of Anthropic is subject in all respects to the terms and conditions of Anthropic set forth here and here. The URL function is built off of a library licensed from here. The FRED data and FRED API are an external offering made available by FRED, subject to FRED's terms of use. Your download and use, as applicable, of FRED data and the FRED API, including in connection with this product, are subject to FRED's terms.

Snowflake is not responsible for any use you may make of these services or any result derived therefrom. Contact snowflake-public-data@snowflake.com for questions.