nf-snowflake plugin

Overview

nf-snowflake is a Nextflow plugin that enables Nextflow pipelines to run inside Snowpark Container Service.

Each Nextflow task is translated to a Snowflake Job Service and executed as an SPCS job. The Nextflow main/driver program can run in two modes:

Locally - Running on your local machine or CI/CD environment, connecting to Snowflake via JDBC
Inside SPCS - Running as a separate SPCS job within Snowpark Container Services

These two execution modes correspond to the two authentication methods supported by the plugin. When the main/driver program runs inside an SPCS job, Snowflake automatically injects the required environment variables (such as SNOWFLAKE_ACCOUNT, SNOWFLAKE_HOST, etc.) and the session token file (/snowflake/session/token). The plugin automatically discovers and uses these credentials for authentication.

Intermediate results between different Nextflow processes are shared via Snowflake stages, which must be configured as the working directory.

Prerequisites

Before using this plugin, you should have:

Nextflow (version 23.04.0 or later)
Snowflake account with access to:
- Snowpark Container Services (Compute Pools/Image Registries)
- Internal stages
Familiarity with:
- Nextflow pipelines and configuration
- Docker/container images
- Snowflake authentication methods

Authentication

The plugin supports two authentication methods, corresponding to the two execution modes for the main/driver program:

1. Session Token Authentication (Main/Driver Running Inside SPCS)

When the Nextflow main/driver program runs inside an SPCS job, Snowflake automatically injects the session token file at /snowflake/session/token and the following environment variables:

SNOWFLAKE_ACCOUNT
SNOWFLAKE_HOST
SNOWFLAKE_DATABASE
SNOWFLAKE_SCHEMA
SNOWFLAKE_WAREHOUSE (optional)

The plugin automatically discovers and uses these credentials for authentication. No additional configuration is required.

2. Connections.toml Authentication (Main/Driver Running Locally)

When the Nextflow main/driver program runs locally (on your machine or in CI/CD), the plugin uses the Snowflake connections.toml configuration file for authentication.

File Locations (searched in order):

~/.snowflake/connections.toml (if directory exists)
Location specified in SNOWFLAKE_HOME environment variable
OS-specific defaults:
- Linux: ~/.config/snowflake/connections.toml
- macOS: ~/Library/Application Support/snowflake/connections.toml
- Windows: %USERPROFILE%\AppData\Local\snowflake\connections.toml

Example connections.toml:

[default]
account = "myaccount"
user = "myuser"
password = "mypassword"
database = "mydb"
schema = "myschema"
warehouse = "mywh"

[production]
account = "prodaccount"
authenticator = "externalbrowser"
database = "proddb"
schema = "public"

Specify a connection in nextflow.config:

snowflake {
    connectionName = 'production'
    computePool = 'MY_COMPUTE_POOL'
}

If no connectionName is specified, the plugin will use:

Connection name from SNOWFLAKE_DEFAULT_CONNECTION_NAME environment variable
The default connection from connections.toml

Configuration Reference

All plugin configurations are defined under the snowflake scope in your nextflow.config:

computePool

The name of the Snowflake compute pool to use for executing jobs.

snowflake {
    computePool = 'MY_COMPUTE_POOL'
}

registryMappings

Docker registry mappings for container images. Snowflake does not support pulling images directly from arbitrary external registries. Instead, you must first replicate container images from external registries (such as Docker Hub, GitHub Container Registry, etc.) to Snowflake image repositories.

The registryMappings configuration allows you to automatically replace external registry hostnames with Snowflake image repository names in your pipeline's container specifications.

Format: Comma-separated list of mappings in the form external_registry:snowflake_repository

snowflake {
    registryMappings = 'docker.io:my_registry,ghcr.io:github_registry'
}

How it works:

First, replicate images to your Snowflake image repository:

docker pull docker.io/alpine:latest
docker tag docker.io/alpine:latest <snowflake_repo_url>/alpine:latest
docker push <snowflake_repo_url>/alpine:latest

Then, when your process uses container 'docker.io/alpine:latest', the plugin automatically replaces docker.io with your Snowflake image repository URL, resulting in the correct Snowflake image reference.

connectionName

The name of the connection to use from the connections.toml file. When specified, the JDBC driver will use the connection configuration defined under this name.

snowflake {
    connectionName = 'production'
}

Note: This is only used when the session token file is not available (i.e., when running outside Snowpark Container Services).

Quick Start

This guide assumes you are familiar with both Nextflow and Snowpark Container Services.

1. Create a Compute Pool

CREATE COMPUTE POOL my_compute_pool
MIN_NODES = 2
MAX_NODES = 5
INSTANCE_FAMILY = CPU_X64_M
AUTO_SUSPEND_SECS = 3600;

2. Create a Snowflake Internal Stage for Working Directory

CREATE OR REPLACE STAGE nxf_workdir
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

3. Set Up Image Repository

CREATE IMAGE REPOSITORY IF NOT EXISTS my_images;

4. Build and Upload Container Images

Build the container image for each Nextflow process, upload the image to Snowflake Image Registry, and update each process's container field.

Example process definition:

process INDEX {
    tag "$transcriptome.simpleName"
    container '/mydb/myschema/my_images/salmon:1.10.0'

    input:
    path transcriptome

    output:
    path 'index'

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}

5. Configure Nextflow

Add a Snowflake profile to your nextflow.config file and enable the nf-snowflake plugin:

plugins {
    id 'nf-snowflake@1.0.0'
}

profiles {
    snowflake {
        process.executor = 'snowflake'

        snowflake {
            computePool = 'my_compute_pool'
            registryMappings = 'docker.io:my_images'
        }
    }
}

6. Run Your Pipeline

Execute the Nextflow pipeline with the Snowflake profile:

nextflow run . -profile snowflake -work-dir snowflake://stage/nxf_workdir/

Snowflake Filesystem and Working Directory

Snowflake Stage URI

The plugin uses a custom URI scheme to access Snowflake internal stages:

snowflake://stage/<stage_name>/<path>

Components:

snowflake:// - URI scheme identifier
stage/ - Literal prefix indicating a Snowflake stage
<stage_name> - The name of your Snowflake internal stage
<path> - Optional path within the stage

Examples:

// Access root of a stage
workDir = 'snowflake://stage/my_stage/'

// Access a subdirectory within a stage
workDir = 'snowflake://stage/my_stage/workflows/pipeline1/'

Working Directory Requirement

IMPORTANT: The Nextflow working directory (workDir) must be a Snowflake stage using the snowflake:// URI scheme. This is a strict requirement for the plugin to function correctly.

The working directory is used to:

Store intermediate task results
Share data between pipeline processes
Store task execution metadata and logs

Correct configuration:

profiles {
    snowflake {
        process.executor = 'snowflake'
        workDir = 'snowflake://stage/nxf_workdir/'  // ✓ Valid

        snowflake {
            computePool = 'my_compute_pool'
        }
    }
}

Or specify on the command line:

nextflow run . -profile snowflake -work-dir snowflake://stage/nxf_workdir/

Invalid configurations:

workDir = 's3://my-bucket/work/'              // ✗ Invalid - not a Snowflake stage
workDir = '/local/path/work/'                 // ✗ Invalid - local filesystem
workDir = 'snowflake://my_stage/work/'        // ✗ Invalid - missing 'stage/' prefix

Stage Setup

Before running your pipeline, ensure your stage is properly configured:

-- Create an internal stage with encryption
CREATE OR REPLACE STAGE my_workdir
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Verify stage exists
SHOW STAGES LIKE 'my_workdir';

-- Optional: Test stage access
LIST @my_workdir;

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
docker		docker
gradle/wrapper		gradle/wrapper
src		src
validation		validation
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-snowflake plugin

Overview

Prerequisites

Authentication

1. Session Token Authentication (Main/Driver Running Inside SPCS)

2. Connections.toml Authentication (Main/Driver Running Locally)

Configuration Reference

computePool

registryMappings

connectionName

Quick Start

1. Create a Compute Pool

2. Create a Snowflake Internal Stage for Working Directory

3. Set Up Image Repository

4. Build and Upload Container Images

5. Configure Nextflow

6. Run Your Pipeline

Snowflake Filesystem and Working Directory

Snowflake Stage URI

Working Directory Requirement

Stage Setup

Additional Resources

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Snowflake-Labs/nf-snowflake

Folders and files

Latest commit

History

Repository files navigation

nf-snowflake plugin

Overview

Prerequisites

Authentication

1. Session Token Authentication (Main/Driver Running Inside SPCS)

2. Connections.toml Authentication (Main/Driver Running Locally)

Configuration Reference

computePool

registryMappings

connectionName

Quick Start

1. Create a Compute Pool

2. Create a Snowflake Internal Stage for Working Directory

3. Set Up Image Repository

4. Build and Upload Container Images

5. Configure Nextflow

6. Run Your Pipeline

Snowflake Filesystem and Working Directory

Snowflake Stage URI

Working Directory Requirement

Stage Setup

Additional Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages