SPARQL Builder

A type-safe, fluent interface for building safe SPARQL queries in Python.

Overview

SPARQL Builder is a Python library that provides a fluent, chainable API for constructing SPARQL queries programmatically. It emphasizes type safety and input validation to help you build correct queries while preventing common vulnerabilities like SPARQL injection attacks.

Whether you're querying knowledge graphs, RDF databases, or semantic web data, SPARQL Builder makes it easy to construct complex queries with confidence.

Note on parameter handling: SPARQL Builder supports two parameter modes.

rdflib (default): keeps variables in query text and passes values via rdflib initBindings.

inline: render bindings as SPARQL VALUES clauses in the final query string.

Key Features

Comprehensive: Supports all major SPARQL querying features (SELECT, OPTIONAL, UNION, FILTER, aggregations, etc.)
Type-Safe: Full type hints support with proper validation of all inputs
Fluent API: Chainable methods for intuitive query construction
Security Oriented: Built-in validation and sanitization to prevent SPARQL injection attacks
Minimal Dependencies: Only requires rdflib
Well-Documented: Complete API documentation and examples

Installation

Install via pip:

pip install sparql-builder

Or with optional development dependencies:

pip install sparql-builder[dev]

Quick Start

Here's a simple example to get you started:

from rdflib import Graph, Literal
from sparqlbuilder import select

g = Graph()

# Build a query
query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "a", "foaf:Person")
    .where("?person", "foaf:name", "?name")
    .limit(10)
)

# Build for rdflib (default mode)
query_text, init_bindings = query.bind("name", Literal("Alice")).build(param_mode="rdflib")
results = g.query(query_text, initBindings=init_bindings)

# Inline mode (for string-only endpoint clients)
query_text, _ = query.bind("name", "Alice").build(param_mode="inline")
# send query_text to your SPARQL endpoint client

Note: Inline mode returns (query_text, {})

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
  ?person a foaf:Person .
  ?person foaf:name ?name .
}
LIMIT 10

Tutorial

Simple SELECT Query

Query for all subjects and their labels:

from sparqlbuilder import select

query = (
    select("?subject", "?label")
    .where("?subject", "rdfs:label", "?label")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Parameter Binding Modes

Default rdflib mode (recommended for rdflib.Graph.query):

from rdflib import Graph, Literal
from sparqlbuilder import select

graph = Graph()

query = (
  select("?min", "?max")
  .where("?s", "rdfs:label", "?label")
  .where("?s", "mydb:min", "?min")
  .where("?s", "mydb:max", "?max")
  .bind("label", Literal("Alice"))
)

query_text, init_bindings = query.build(param_mode="rdflib")
rows = graph.query(query_text, initBindings=init_bindings)

Explicit inline mode (useful for remote endpoints expecting only a query string):

from sparqlbuilder import select

query_text, _ = (
  select("?min", "?max")
  .where("?s", "rdfs:label", "?label")
  .where("?s", "mydb:min", "?min")
  .where("?s", "mydb:max", "?max")
  .bind("label", "Alice")
  .build(param_mode="inline")
)

Output:

SELECT ?subject ?label
WHERE {
  ?subject rdfs:label ?label .
}

SELECT with Prefixes

Define namespace prefixes for cleaner queries:

query = (
    select("?person", "?name", "?email")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .prefix("schema", "https://schema.org/")
    .where("?person", "a", "foaf:Person")
    .where("?person", "foaf:name", "?name")
    .where("?person", "schema:email", "?email")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <https://schema.org/>
SELECT ?person ?name ?email
WHERE {
  ?person a foaf:Person .
  ?person foaf:name ?name .
  ?person schema:email ?email .
}

SELECT DISTINCT

Remove duplicate results:

query = (
    select_distinct("?type")
    .where("?subject", "a", "?type")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT DISTINCT ?type
WHERE {
  ?subject a ?type .
}

Using LIMIT and OFFSET

Paginate results:

query = (
    select("?item", "?label")
    .where("?item", "rdfs:label", "?label")
    .order_by("?label")
    .limit(20)
    .offset(40)  # Skip first 40 results (page 3)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT ?item ?label
WHERE {
  ?item rdfs:label ?label .
}
ORDER BY ?label
LIMIT 20
OFFSET 40

Filtering Results

Use filters to constrain results:

query = (
    select("?person", "?name", "?age")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .where("?person", "foaf:age", "?age")
    .filter("?age >= 18")
    .filter("?age < 65")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?age
WHERE {
  ?person foaf:name ?name .
  ?person foaf:age ?age .
  FILTER (?age >= 18)
  FILTER (?age < 65)
}

Safe Filtering with filter_equals

For exact value matching (safe from injection):

query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .filter_equals("?name", "Alice")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
  ?person foaf:name ?name .
  FILTER (?name = "Alice")
}

Pattern Matching with filter_regex

Search with regular expressions:

query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .filter_regex("?name", "^A", "i")  # Names starting with 'A' (case-insensitive)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
  ?person foaf:name ?name .
  FILTER (REGEX(?name, "^A", "i"))
}

OPTIONAL Patterns

Include optional data when available:

query = (
    select("?person", "?name", "?email")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .optional(lambda q: q.where("?person", "foaf:mbox", "?email"))
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?email
WHERE {
  ?person foaf:name ?name .
  OPTIONAL {
    ?person foaf:mbox ?email .
  }
}

VALUES Clause

Restrict query results to specific values:

query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .values("?person", [
        "http://example.org/alice",
        "http://example.org/bob"
    ])
    .where("?person", "foaf:name", "?name")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person ?name
WHERE {
  VALUES ?person { <http://example.org/alice> <http://example.org/bob> }
  ?person foaf:name ?name .
}

Bind multiple variables with tuples of values. Use None for SPARQL's UNDEF:

query = (
    select("?person", "?name", "?email")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .values(["?name", "?email"], [
        ("Alice", "alice@example.org"),
        ("Bob", None),  # Bob has no email
    ])
    .where("?person", "foaf:name", "?name")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person ?name ?email
WHERE {
  VALUES (?name ?email) { ("Alice" "alice@example.org") ("Bob" UNDEF) }
  ?person foaf:name ?name .
}

UNION Patterns

Match alternative patterns:

query = (
    select("?person", "?contact")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .union(
        lambda q: q.where("?person", "foaf:mbox", "?contact"),
        lambda q: q.where("?person", "foaf:phone", "?contact")
    )
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?contact
WHERE {
  ?person foaf:name ?name .
  {
    ?person foaf:mbox ?contact .
  }
  UNION
  {
    ?person foaf:phone ?contact .
  }
}

Aggregation with COUNT

Count results grouped by a variable:

query = (
    select("?type")
    .select_count("?item", "?count")
    .where("?item", "a", "?type")
    .group_by("?type")
    .order_by("?count", descending=True)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT ?type (COUNT(?item) AS ?count)
WHERE {
  ?item a ?type .
}
GROUP BY ?type
ORDER BY DESC(?count)

Multiple Aggregations

Combine different aggregation functions:

query = (
    select("?category")
    .select_count("?item", "?total")
    .select_avg("?price", "?avgPrice")
    .select_max("?price", "?maxPrice")
    .prefix("ex", "http://example.org/")
    .where("?item", "ex:category", "?category")
    .where("?item", "ex:price", "?price")
    .group_by("?category")
    .having("COUNT(?item) > 5")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX ex: <http://example.org/>
SELECT ?category (COUNT(?item) AS ?total) (AVG(?price) AS ?avgPrice) (MAX(?price) AS ?maxPrice)
WHERE {
  ?item ex:category ?category .
  ?item ex:price ?price .
}
GROUP BY ?category
HAVING (COUNT(?item) > 5)

Property Paths

Use property paths for complex graph navigation:

query = (
    select("?person", "?connection")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .property_path("?person", "foaf:knows+", "?connection")  # One or more knows relationships
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?connection
WHERE {
  ?person foaf:knows+ ?connection .
}

Working with Named Graphs

Query specific named graphs:

query = (
    select("?subject", "?predicate", "?object")
    .from_named_graph("http://example.org/graph1")
    .graph(
        "http://example.org/graph1",
        lambda q: q.where("?subject", "?predicate", "?object")
    )
    .limit(100)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT ?subject ?predicate ?object
FROM NAMED <http://example.org/graph1>
WHERE {
  GRAPH <http://example.org/graph1> {
    ?subject ?predicate ?object .
  }
}
LIMIT 100

Complex Query Example

A real-world example combining multiple features:

query = (
    select("?researcher", "?name", "?institution", "?paperCount")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .prefix("ex", "http://example.org/")
    .where("?researcher", "a", "ex:Researcher")
    .where("?researcher", "foaf:name", "?name")
    .where("?researcher", "ex:affiliation", "?institution")
    .optional(lambda q: q
        .where("?paper", "ex:author", "?researcher")
    )
    .filter_regex("?name", "^[A-M]", "i")  # Names A-M
    .select_count("?paper", "?paperCount")
    .group_by("?researcher", "?name", "?institution")
    .having("COUNT(?paper) > 2")
    .order_by("?paperCount", descending=True)
    .limit(10)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?researcher ?name ?institution (COUNT(?paper) AS ?paperCount)
WHERE {
  ?researcher a ex:Researcher .
  ?researcher foaf:name ?name .
  ?researcher ex:affiliation ?institution .
  OPTIONAL {
    ?paper ex:author ?researcher .
  }
  FILTER (REGEX(?name, "^[A-M]", "i"))
}
GROUP BY ?researcher ?name ?institution
HAVING (COUNT(?paper) > 2)
ORDER BY DESC(?paperCount)
LIMIT 10

Security

SPARQL Builder supports two parameter modes:

rdflib (default): keeps variables in query text and passes values through rdflib initBindings.
inline: renders values directly into SPARQL as VALUES clauses.

Use rdflib mode when executing with rdflib.Graph.query(...) to rely on rdflib term handling. Use inline mode for endpoints that only accept a raw SPARQL string (for example, some HTTP endpoint clients).

In inline mode, the provided query-construction methods (unless otherwise specified) validate and sanitize inputs to prevent SPARQL injection attacks in the generated query text. In rdflib mode, variable values are passed separately via initBindings and are handled by rdflib.

from rdflib import Graph, Literal
from sparqlbuilder import select

graph = Graph()

query = (
    select("?s", "?o")
    .where("?s", "rdfs:label", "?o")
    .bind("o", Literal("Alice"))
)

query_text, init_bindings = query.build(param_mode="rdflib")
rows = graph.query(query_text, initBindings=init_bindings)

from sparqlbuilder import select

query_text, _ = (
    select("?s", "?o")
    .where("?s", "rdfs:label", "?o")
    .bind("o", user_input)
    .build(param_mode="inline")
)

In cases where you want to sanitize raw input manually or when raw input is necessary, you can use the provided formatting functions format_subject, format_predicate and format_object:

from sparqlbuilder import format_subject

# User input (potentially malicious)
user_input = "http://example.org/resource"

# Safely format for use in query
safe_uri = format_subject(user_input)

What's Protected

Variable names: Validated against injection attempts
URIs: Checked for dangerous characters and proper format
Literals: Properly escaped and quoted
Filter expressions: Safe methods provided for common patterns
Regex patterns: Automatically escaped in filter_regex()

Important Caveat

The raw filter() and having() methods accept trusted SPARQL expressions directly. Do not pass unsanitized user input into these methods.

API Reference

For complete API documentation, see API.md.

Query Construction:

select() - Create a new SELECT query
select_distinct() - Create a new SELECT DISTINCT query
select_reduced() - Create a new SELECT REDUCED query
SPARQLQuery - Main query builder class

Value Formatting (for external validation):

format_subject() - Format and validate a subject term
format_predicate() - Format and validate a predicate term
format_object() - Format and validate an object term

Missing features and TODO

Features planned for future releases:

Add more filter expression builders for common patterns (e.g., numeric comparisons, date comparisons)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

See DEVELOPMENT.md for development setup and guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project Links

Acknowledgements

We gratefully acknowledge the following for supporting the development of this package:

PINK Project (2024-2027) funded by the European Union's Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 101137809.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
sparqlbuilder		sparqlbuilder
tests		tests
.gitignore		.gitignore
API.md		API.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

License

PINK-project/SPARQL-builder

Folders and files

Latest commit

History

Repository files navigation

SPARQL Builder

Table of Contents

Overview

Key Features

Installation

Quick Start

Tutorial

Simple SELECT Query

Parameter Binding Modes

SELECT with Prefixes

SELECT DISTINCT

Using LIMIT and OFFSET

Filtering Results

Safe Filtering with filter_equals

Pattern Matching with filter_regex

OPTIONAL Patterns

VALUES Clause

UNION Patterns

Aggregation with COUNT

Multiple Aggregations

Property Paths

Working with Named Graphs

Complex Query Example

Security

What's Protected

Important Caveat

API Reference

Missing features and TODO

Contributing

License

Project Links

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages