Skip to content

Type-safe, fluent SPARQL builder for Python

License

Notifications You must be signed in to change notification settings

PINK-project/SPARQL-builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPARQL Builder

A type-safe, fluent interface for building safe SPARQL queries in Python.

Table of Contents

Overview

SPARQL Builder is a Python library that provides a fluent, chainable API for constructing SPARQL queries programmatically. It emphasizes type safety and input validation to help you build correct queries while preventing common vulnerabilities like SPARQL injection attacks.

Whether you're querying knowledge graphs, RDF databases, or semantic web data, SPARQL Builder makes it easy to construct complex queries with confidence.

Note on parameter handling: SPARQL Builder supports two parameter modes.

  • rdflib (default): keeps variables in query text and passes values via rdflib initBindings.
  • inline: render bindings as SPARQL VALUES clauses in the final query string.

Key Features

  • Comprehensive: Supports all major SPARQL querying features (SELECT, OPTIONAL, UNION, FILTER, aggregations, etc.)
  • Type-Safe: Full type hints support with proper validation of all inputs
  • Fluent API: Chainable methods for intuitive query construction
  • Security Oriented: Built-in validation and sanitization to prevent SPARQL injection attacks
  • Minimal Dependencies: Only requires rdflib
  • Well-Documented: Complete API documentation and examples

Installation

Install via pip:

pip install sparql-builder

Or with optional development dependencies:

pip install sparql-builder[dev]

Quick Start

Here's a simple example to get you started:

from rdflib import Graph, Literal
from sparqlbuilder import select

g = Graph()

# Build a query
query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "a", "foaf:Person")
    .where("?person", "foaf:name", "?name")
    .limit(10)
)

# Build for rdflib (default mode)
query_text, init_bindings = query.bind("name", Literal("Alice")).build(param_mode="rdflib")
results = g.query(query_text, initBindings=init_bindings)

# Inline mode (for string-only endpoint clients)
query_text, _ = query.bind("name", "Alice").build(param_mode="inline")
# send query_text to your SPARQL endpoint client

Note: Inline mode returns (query_text, {})

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
  ?person a foaf:Person .
  ?person foaf:name ?name .
}
LIMIT 10

Tutorial

Simple SELECT Query

Query for all subjects and their labels:

from sparqlbuilder import select

query = (
    select("?subject", "?label")
    .where("?subject", "rdfs:label", "?label")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Parameter Binding Modes

Default rdflib mode (recommended for rdflib.Graph.query):

from rdflib import Graph, Literal
from sparqlbuilder import select

graph = Graph()

query = (
  select("?min", "?max")
  .where("?s", "rdfs:label", "?label")
  .where("?s", "mydb:min", "?min")
  .where("?s", "mydb:max", "?max")
  .bind("label", Literal("Alice"))
)

query_text, init_bindings = query.build(param_mode="rdflib")
rows = graph.query(query_text, initBindings=init_bindings)

Explicit inline mode (useful for remote endpoints expecting only a query string):

from sparqlbuilder import select

query_text, _ = (
  select("?min", "?max")
  .where("?s", "rdfs:label", "?label")
  .where("?s", "mydb:min", "?min")
  .where("?s", "mydb:max", "?max")
  .bind("label", "Alice")
  .build(param_mode="inline")
)

Output:

SELECT ?subject ?label
WHERE {
  ?subject rdfs:label ?label .
}

SELECT with Prefixes

Define namespace prefixes for cleaner queries:

query = (
    select("?person", "?name", "?email")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .prefix("schema", "https://schema.org/")
    .where("?person", "a", "foaf:Person")
    .where("?person", "foaf:name", "?name")
    .where("?person", "schema:email", "?email")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <https://schema.org/>
SELECT ?person ?name ?email
WHERE {
  ?person a foaf:Person .
  ?person foaf:name ?name .
  ?person schema:email ?email .
}

SELECT DISTINCT

Remove duplicate results:

query = (
    select_distinct("?type")
    .where("?subject", "a", "?type")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT DISTINCT ?type
WHERE {
  ?subject a ?type .
}

Using LIMIT and OFFSET

Paginate results:

query = (
    select("?item", "?label")
    .where("?item", "rdfs:label", "?label")
    .order_by("?label")
    .limit(20)
    .offset(40)  # Skip first 40 results (page 3)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT ?item ?label
WHERE {
  ?item rdfs:label ?label .
}
ORDER BY ?label
LIMIT 20
OFFSET 40

Filtering Results

Use filters to constrain results:

query = (
    select("?person", "?name", "?age")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .where("?person", "foaf:age", "?age")
    .filter("?age >= 18")
    .filter("?age < 65")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?age
WHERE {
  ?person foaf:name ?name .
  ?person foaf:age ?age .
  FILTER (?age >= 18)
  FILTER (?age < 65)
}

Safe Filtering with filter_equals

For exact value matching (safe from injection):

query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .filter_equals("?name", "Alice")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
  ?person foaf:name ?name .
  FILTER (?name = "Alice")
}

Pattern Matching with filter_regex

Search with regular expressions:

query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .filter_regex("?name", "^A", "i")  # Names starting with 'A' (case-insensitive)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
  ?person foaf:name ?name .
  FILTER (REGEX(?name, "^A", "i"))
}

OPTIONAL Patterns

Include optional data when available:

query = (
    select("?person", "?name", "?email")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .optional(lambda q: q.where("?person", "foaf:mbox", "?email"))
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?email
WHERE {
  ?person foaf:name ?name .
  OPTIONAL {
    ?person foaf:mbox ?email .
  }
}

VALUES Clause

Restrict query results to specific values:

query = (
    select("?person", "?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .values("?person", [
        "http://example.org/alice",
        "http://example.org/bob"
    ])
    .where("?person", "foaf:name", "?name")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person ?name
WHERE {
  VALUES ?person { <http://example.org/alice> <http://example.org/bob> }
  ?person foaf:name ?name .
}

Bind multiple variables with tuples of values. Use None for SPARQL's UNDEF:

query = (
    select("?person", "?name", "?email")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .values(["?name", "?email"], [
        ("Alice", "alice@example.org"),
        ("Bob", None),  # Bob has no email
    ])
    .where("?person", "foaf:name", "?name")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person ?name ?email
WHERE {
  VALUES (?name ?email) { ("Alice" "alice@example.org") ("Bob" UNDEF) }
  ?person foaf:name ?name .
}

UNION Patterns

Match alternative patterns:

query = (
    select("?person", "?contact")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "foaf:name", "?name")
    .union(
        lambda q: q.where("?person", "foaf:mbox", "?contact"),
        lambda q: q.where("?person", "foaf:phone", "?contact")
    )
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?contact
WHERE {
  ?person foaf:name ?name .
  {
    ?person foaf:mbox ?contact .
  }
  UNION
  {
    ?person foaf:phone ?contact .
  }
}

Aggregation with COUNT

Count results grouped by a variable:

query = (
    select("?type")
    .select_count("?item", "?count")
    .where("?item", "a", "?type")
    .group_by("?type")
    .order_by("?count", descending=True)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT ?type (COUNT(?item) AS ?count)
WHERE {
  ?item a ?type .
}
GROUP BY ?type
ORDER BY DESC(?count)

Multiple Aggregations

Combine different aggregation functions:

query = (
    select("?category")
    .select_count("?item", "?total")
    .select_avg("?price", "?avgPrice")
    .select_max("?price", "?maxPrice")
    .prefix("ex", "http://example.org/")
    .where("?item", "ex:category", "?category")
    .where("?item", "ex:price", "?price")
    .group_by("?category")
    .having("COUNT(?item) > 5")
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX ex: <http://example.org/>
SELECT ?category (COUNT(?item) AS ?total) (AVG(?price) AS ?avgPrice) (MAX(?price) AS ?maxPrice)
WHERE {
  ?item ex:category ?category .
  ?item ex:price ?price .
}
GROUP BY ?category
HAVING (COUNT(?item) > 5)

Property Paths

Use property paths for complex graph navigation:

query = (
    select("?person", "?connection")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .property_path("?person", "foaf:knows+", "?connection")  # One or more knows relationships
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?connection
WHERE {
  ?person foaf:knows+ ?connection .
}

Working with Named Graphs

Query specific named graphs:

query = (
    select("?subject", "?predicate", "?object")
    .from_named_graph("http://example.org/graph1")
    .graph(
        "http://example.org/graph1",
        lambda q: q.where("?subject", "?predicate", "?object")
    )
    .limit(100)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

SELECT ?subject ?predicate ?object
FROM NAMED <http://example.org/graph1>
WHERE {
  GRAPH <http://example.org/graph1> {
    ?subject ?predicate ?object .
  }
}
LIMIT 100

Complex Query Example

A real-world example combining multiple features:

query = (
    select("?researcher", "?name", "?institution", "?paperCount")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .prefix("ex", "http://example.org/")
    .where("?researcher", "a", "ex:Researcher")
    .where("?researcher", "foaf:name", "?name")
    .where("?researcher", "ex:affiliation", "?institution")
    .optional(lambda q: q
        .where("?paper", "ex:author", "?researcher")
    )
    .filter_regex("?name", "^[A-M]", "i")  # Names A-M
    .select_count("?paper", "?paperCount")
    .group_by("?researcher", "?name", "?institution")
    .having("COUNT(?paper) > 2")
    .order_by("?paperCount", descending=True)
    .limit(10)
)

query_text, _ = query.build(param_mode="inline")
print(query_text)

Output:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?researcher ?name ?institution (COUNT(?paper) AS ?paperCount)
WHERE {
  ?researcher a ex:Researcher .
  ?researcher foaf:name ?name .
  ?researcher ex:affiliation ?institution .
  OPTIONAL {
    ?paper ex:author ?researcher .
  }
  FILTER (REGEX(?name, "^[A-M]", "i"))
}
GROUP BY ?researcher ?name ?institution
HAVING (COUNT(?paper) > 2)
ORDER BY DESC(?paperCount)
LIMIT 10

Security

SPARQL Builder supports two parameter modes:

  • rdflib (default): keeps variables in query text and passes values through rdflib initBindings.
  • inline: renders values directly into SPARQL as VALUES clauses.

Use rdflib mode when executing with rdflib.Graph.query(...) to rely on rdflib term handling. Use inline mode for endpoints that only accept a raw SPARQL string (for example, some HTTP endpoint clients).

In inline mode, the provided query-construction methods (unless otherwise specified) validate and sanitize inputs to prevent SPARQL injection attacks in the generated query text. In rdflib mode, variable values are passed separately via initBindings and are handled by rdflib.

from rdflib import Graph, Literal
from sparqlbuilder import select

graph = Graph()

query = (
    select("?s", "?o")
    .where("?s", "rdfs:label", "?o")
    .bind("o", Literal("Alice"))
)

query_text, init_bindings = query.build(param_mode="rdflib")
rows = graph.query(query_text, initBindings=init_bindings)
from sparqlbuilder import select

query_text, _ = (
    select("?s", "?o")
    .where("?s", "rdfs:label", "?o")
    .bind("o", user_input)
    .build(param_mode="inline")
)

In cases where you want to sanitize raw input manually or when raw input is necessary, you can use the provided formatting functions format_subject, format_predicate and format_object:

from sparqlbuilder import format_subject

# User input (potentially malicious)
user_input = "http://example.org/resource"

# Safely format for use in query
safe_uri = format_subject(user_input)

What's Protected

  • Variable names: Validated against injection attempts
  • URIs: Checked for dangerous characters and proper format
  • Literals: Properly escaped and quoted
  • Filter expressions: Safe methods provided for common patterns
  • Regex patterns: Automatically escaped in filter_regex()

Important Caveat

The raw filter() and having() methods accept trusted SPARQL expressions directly. Do not pass unsanitized user input into these methods.

API Reference

For complete API documentation, see API.md.

Query Construction:

  • select() - Create a new SELECT query
  • select_distinct() - Create a new SELECT DISTINCT query
  • select_reduced() - Create a new SELECT REDUCED query
  • SPARQLQuery - Main query builder class

Value Formatting (for external validation):

  • format_subject() - Format and validate a subject term
  • format_predicate() - Format and validate a predicate term
  • format_object() - Format and validate an object term

Missing features and TODO

Features planned for future releases:

  • Add more filter expression builders for common patterns (e.g., numeric comparisons, date comparisons)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

See DEVELOPMENT.md for development setup and guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project Links

Acknowledgements

We gratefully acknowledge the following for supporting the development of this package:

  • PINK Project (2024-2027) funded by the European Union's Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 101137809.

About

Type-safe, fluent SPARQL builder for Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages