Skip to content

Snowflake-Labs/nf-snowflake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-snowflake plugin

Overview

nf-snowflake is a Nextflow plugin that enables Nextflow pipelines to run inside Snowpark Container Service.

Each Nextflow task is translated to a Snowflake Job Service and executed as an SPCS job. The Nextflow main/driver program can run in two modes:

  1. Locally - Running on your local machine or CI/CD environment, connecting to Snowflake via JDBC
  2. Inside SPCS - Running as a separate SPCS job within Snowpark Container Services

These two execution modes correspond to the two authentication methods supported by the plugin. When the main/driver program runs inside an SPCS job, Snowflake automatically injects the required environment variables (such as SNOWFLAKE_ACCOUNT, SNOWFLAKE_HOST, etc.) and the session token file (/snowflake/session/token). The plugin automatically discovers and uses these credentials for authentication.

Intermediate results between different Nextflow processes are shared via Snowflake stages, which must be configured as the working directory.

Prerequisites

Before using this plugin, you should have:

  • Nextflow (version 23.04.0 or later)
  • Snowflake account with access to:
    • Snowpark Container Services (Compute Pools/Image Registries)
    • Internal stages
  • Familiarity with:
    • Nextflow pipelines and configuration
    • Docker/container images
    • Snowflake authentication methods

Authentication

The plugin supports two authentication methods, corresponding to the two execution modes for the main/driver program:

1. Session Token Authentication (Main/Driver Running Inside SPCS)

When the Nextflow main/driver program runs inside an SPCS job, Snowflake automatically injects the session token file at /snowflake/session/token and the following environment variables:

  • SNOWFLAKE_ACCOUNT
  • SNOWFLAKE_HOST
  • SNOWFLAKE_DATABASE
  • SNOWFLAKE_SCHEMA
  • SNOWFLAKE_WAREHOUSE (optional)

The plugin automatically discovers and uses these credentials for authentication. No additional configuration is required.

2. Connections.toml Authentication (Main/Driver Running Locally)

When the Nextflow main/driver program runs locally (on your machine or in CI/CD), the plugin uses the Snowflake connections.toml configuration file for authentication.

File Locations (searched in order):

  1. ~/.snowflake/connections.toml (if directory exists)
  2. Location specified in SNOWFLAKE_HOME environment variable
  3. OS-specific defaults:
    • Linux: ~/.config/snowflake/connections.toml
    • macOS: ~/Library/Application Support/snowflake/connections.toml
    • Windows: %USERPROFILE%\AppData\Local\snowflake\connections.toml

Example connections.toml:

[default]
account = "myaccount"
user = "myuser"
password = "mypassword"
database = "mydb"
schema = "myschema"
warehouse = "mywh"

[production]
account = "prodaccount"
authenticator = "externalbrowser"
database = "proddb"
schema = "public"

Specify a connection in nextflow.config:

snowflake {
    connectionName = 'production'
    computePool = 'MY_COMPUTE_POOL'
}

If no connectionName is specified, the plugin will use:

  1. Connection name from SNOWFLAKE_DEFAULT_CONNECTION_NAME environment variable
  2. The default connection from connections.toml

Configuration Reference

All plugin configurations are defined under the snowflake scope in your nextflow.config:

computePool

The name of the Snowflake compute pool to use for executing jobs.

snowflake {
    computePool = 'MY_COMPUTE_POOL'
}

registryMappings

Docker registry mappings for container images. Snowflake does not support pulling images directly from arbitrary external registries. Instead, you must first replicate container images from external registries (such as Docker Hub, GitHub Container Registry, etc.) to Snowflake image repositories.

The registryMappings configuration allows you to automatically replace external registry hostnames with Snowflake image repository names in your pipeline's container specifications.

Format: Comma-separated list of mappings in the form external_registry:snowflake_repository

snowflake {
    registryMappings = 'docker.io:my_registry,ghcr.io:github_registry'
}

How it works:

  1. First, replicate images to your Snowflake image repository:

    docker pull docker.io/alpine:latest
    docker tag docker.io/alpine:latest <snowflake_repo_url>/alpine:latest
    docker push <snowflake_repo_url>/alpine:latest
  2. Then, when your process uses container 'docker.io/alpine:latest', the plugin automatically replaces docker.io with your Snowflake image repository URL, resulting in the correct Snowflake image reference.

connectionName

The name of the connection to use from the connections.toml file. When specified, the JDBC driver will use the connection configuration defined under this name.

snowflake {
    connectionName = 'production'
}

Note: This is only used when the session token file is not available (i.e., when running outside Snowpark Container Services).

Quick Start

This guide assumes you are familiar with both Nextflow and Snowpark Container Services.

1. Create a Compute Pool

CREATE COMPUTE POOL my_compute_pool
MIN_NODES = 2
MAX_NODES = 5
INSTANCE_FAMILY = CPU_X64_M
AUTO_SUSPEND_SECS = 3600;

2. Create a Snowflake Internal Stage for Working Directory

CREATE OR REPLACE STAGE nxf_workdir
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

3. Set Up Image Repository

CREATE IMAGE REPOSITORY IF NOT EXISTS my_images;

4. Build and Upload Container Images

Build the container image for each Nextflow process, upload the image to Snowflake Image Registry, and update each process's container field.

Example process definition:

process INDEX {
    tag "$transcriptome.simpleName"
    container '/mydb/myschema/my_images/salmon:1.10.0'

    input:
    path transcriptome

    output:
    path 'index'

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}

5. Configure Nextflow

Add a Snowflake profile to your nextflow.config file and enable the nf-snowflake plugin:

plugins {
    id 'nf-snowflake@1.0.0'
}

profiles {
    snowflake {
        process.executor = 'snowflake'

        snowflake {
            computePool = 'my_compute_pool'
            registryMappings = 'docker.io:my_images'
        }
    }
}

6. Run Your Pipeline

Execute the Nextflow pipeline with the Snowflake profile:

nextflow run . -profile snowflake -work-dir snowflake://stage/nxf_workdir/

Snowflake Filesystem and Working Directory

Snowflake Stage URI

The plugin uses a custom URI scheme to access Snowflake internal stages:

snowflake://stage/<stage_name>/<path>

Components:

  • snowflake:// - URI scheme identifier
  • stage/ - Literal prefix indicating a Snowflake stage
  • <stage_name> - The name of your Snowflake internal stage
  • <path> - Optional path within the stage

Examples:

// Access root of a stage
workDir = 'snowflake://stage/my_stage/'

// Access a subdirectory within a stage
workDir = 'snowflake://stage/my_stage/workflows/pipeline1/'

Working Directory Requirement

IMPORTANT: The Nextflow working directory (workDir) must be a Snowflake stage using the snowflake:// URI scheme. This is a strict requirement for the plugin to function correctly.

The working directory is used to:

  • Store intermediate task results
  • Share data between pipeline processes
  • Store task execution metadata and logs

Correct configuration:

profiles {
    snowflake {
        process.executor = 'snowflake'
        workDir = 'snowflake://stage/nxf_workdir/'  // ✓ Valid

        snowflake {
            computePool = 'my_compute_pool'
        }
    }
}

Or specify on the command line:

nextflow run . -profile snowflake -work-dir snowflake://stage/nxf_workdir/

Invalid configurations:

workDir = 's3://my-bucket/work/'              // ✗ Invalid - not a Snowflake stage
workDir = '/local/path/work/'                 // ✗ Invalid - local filesystem
workDir = 'snowflake://my_stage/work/'        // ✗ Invalid - missing 'stage/' prefix

Stage Setup

Before running your pipeline, ensure your stage is properly configured:

-- Create an internal stage with encryption
CREATE OR REPLACE STAGE my_workdir
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Verify stage exists
SHOW STAGES LIKE 'my_workdir';

-- Optional: Test stage access
LIST @my_workdir;

Additional Resources

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 3

  •  
  •  
  •  

Languages