A type-safe, fluent interface for building safe SPARQL queries in Python.
SPARQL Builder is a Python library that provides a fluent, chainable API for constructing SPARQL queries programmatically. It emphasizes type safety and input validation to help you build correct queries while preventing common vulnerabilities like SPARQL injection attacks.
Whether you're querying knowledge graphs, RDF databases, or semantic web data, SPARQL Builder makes it easy to construct complex queries with confidence.
Note on parameter handling: SPARQL Builder supports two parameter modes.
rdflib(default): keeps variables in query text and passes values via rdflibinitBindings.inline: render bindings as SPARQLVALUESclauses in the final query string.
- Comprehensive: Supports all major SPARQL querying features (SELECT, OPTIONAL, UNION, FILTER, aggregations, etc.)
- Type-Safe: Full type hints support with proper validation of all inputs
- Fluent API: Chainable methods for intuitive query construction
- Security Oriented: Built-in validation and sanitization to prevent SPARQL injection attacks
- Minimal Dependencies: Only requires
rdflib - Well-Documented: Complete API documentation and examples
Install via pip:
pip install sparql-builderOr with optional development dependencies:
pip install sparql-builder[dev]Here's a simple example to get you started:
from rdflib import Graph, Literal
from sparqlbuilder import select
g = Graph()
# Build a query
query = (
select("?person", "?name")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "a", "foaf:Person")
.where("?person", "foaf:name", "?name")
.limit(10)
)
# Build for rdflib (default mode)
query_text, init_bindings = query.bind("name", Literal("Alice")).build(param_mode="rdflib")
results = g.query(query_text, initBindings=init_bindings)
# Inline mode (for string-only endpoint clients)
query_text, _ = query.bind("name", "Alice").build(param_mode="inline")
# send query_text to your SPARQL endpoint clientNote: Inline mode returns
(query_text, {})
Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
?person a foaf:Person .
?person foaf:name ?name .
}
LIMIT 10Query for all subjects and their labels:
from sparqlbuilder import select
query = (
select("?subject", "?label")
.where("?subject", "rdfs:label", "?label")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Default rdflib mode (recommended for rdflib.Graph.query):
from rdflib import Graph, Literal
from sparqlbuilder import select
graph = Graph()
query = (
select("?min", "?max")
.where("?s", "rdfs:label", "?label")
.where("?s", "mydb:min", "?min")
.where("?s", "mydb:max", "?max")
.bind("label", Literal("Alice"))
)
query_text, init_bindings = query.build(param_mode="rdflib")
rows = graph.query(query_text, initBindings=init_bindings)Explicit inline mode (useful for remote endpoints expecting only a query string):
from sparqlbuilder import select
query_text, _ = (
select("?min", "?max")
.where("?s", "rdfs:label", "?label")
.where("?s", "mydb:min", "?min")
.where("?s", "mydb:max", "?max")
.bind("label", "Alice")
.build(param_mode="inline")
)Output:
SELECT ?subject ?label
WHERE {
?subject rdfs:label ?label .
}Define namespace prefixes for cleaner queries:
query = (
select("?person", "?name", "?email")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.prefix("schema", "https://schema.org/")
.where("?person", "a", "foaf:Person")
.where("?person", "foaf:name", "?name")
.where("?person", "schema:email", "?email")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <https://schema.org/>
SELECT ?person ?name ?email
WHERE {
?person a foaf:Person .
?person foaf:name ?name .
?person schema:email ?email .
}Remove duplicate results:
query = (
select_distinct("?type")
.where("?subject", "a", "?type")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
SELECT DISTINCT ?type
WHERE {
?subject a ?type .
}Paginate results:
query = (
select("?item", "?label")
.where("?item", "rdfs:label", "?label")
.order_by("?label")
.limit(20)
.offset(40) # Skip first 40 results (page 3)
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
SELECT ?item ?label
WHERE {
?item rdfs:label ?label .
}
ORDER BY ?label
LIMIT 20
OFFSET 40Use filters to constrain results:
query = (
select("?person", "?name", "?age")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "foaf:name", "?name")
.where("?person", "foaf:age", "?age")
.filter("?age >= 18")
.filter("?age < 65")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?age
WHERE {
?person foaf:name ?name .
?person foaf:age ?age .
FILTER (?age >= 18)
FILTER (?age < 65)
}For exact value matching (safe from injection):
query = (
select("?person", "?name")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "foaf:name", "?name")
.filter_equals("?name", "Alice")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
?person foaf:name ?name .
FILTER (?name = "Alice")
}Search with regular expressions:
query = (
select("?person", "?name")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "foaf:name", "?name")
.filter_regex("?name", "^A", "i") # Names starting with 'A' (case-insensitive)
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
?person foaf:name ?name .
FILTER (REGEX(?name, "^A", "i"))
}Include optional data when available:
query = (
select("?person", "?name", "?email")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "foaf:name", "?name")
.optional(lambda q: q.where("?person", "foaf:mbox", "?email"))
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?email
WHERE {
?person foaf:name ?name .
OPTIONAL {
?person foaf:mbox ?email .
}
}Restrict query results to specific values:
query = (
select("?person", "?name")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.values("?person", [
"http://example.org/alice",
"http://example.org/bob"
])
.where("?person", "foaf:name", "?name")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
VALUES ?person { <http://example.org/alice> <http://example.org/bob> }
?person foaf:name ?name .
}Bind multiple variables with tuples of values. Use None for SPARQL's UNDEF:
query = (
select("?person", "?name", "?email")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.values(["?name", "?email"], [
("Alice", "alice@example.org"),
("Bob", None), # Bob has no email
])
.where("?person", "foaf:name", "?name")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?email
WHERE {
VALUES (?name ?email) { ("Alice" "alice@example.org") ("Bob" UNDEF) }
?person foaf:name ?name .
}Match alternative patterns:
query = (
select("?person", "?contact")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.where("?person", "foaf:name", "?name")
.union(
lambda q: q.where("?person", "foaf:mbox", "?contact"),
lambda q: q.where("?person", "foaf:phone", "?contact")
)
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?contact
WHERE {
?person foaf:name ?name .
{
?person foaf:mbox ?contact .
}
UNION
{
?person foaf:phone ?contact .
}
}Count results grouped by a variable:
query = (
select("?type")
.select_count("?item", "?count")
.where("?item", "a", "?type")
.group_by("?type")
.order_by("?count", descending=True)
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
SELECT ?type (COUNT(?item) AS ?count)
WHERE {
?item a ?type .
}
GROUP BY ?type
ORDER BY DESC(?count)Combine different aggregation functions:
query = (
select("?category")
.select_count("?item", "?total")
.select_avg("?price", "?avgPrice")
.select_max("?price", "?maxPrice")
.prefix("ex", "http://example.org/")
.where("?item", "ex:category", "?category")
.where("?item", "ex:price", "?price")
.group_by("?category")
.having("COUNT(?item) > 5")
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX ex: <http://example.org/>
SELECT ?category (COUNT(?item) AS ?total) (AVG(?price) AS ?avgPrice) (MAX(?price) AS ?maxPrice)
WHERE {
?item ex:category ?category .
?item ex:price ?price .
}
GROUP BY ?category
HAVING (COUNT(?item) > 5)Use property paths for complex graph navigation:
query = (
select("?person", "?connection")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.property_path("?person", "foaf:knows+", "?connection") # One or more knows relationships
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?connection
WHERE {
?person foaf:knows+ ?connection .
}Query specific named graphs:
query = (
select("?subject", "?predicate", "?object")
.from_named_graph("http://example.org/graph1")
.graph(
"http://example.org/graph1",
lambda q: q.where("?subject", "?predicate", "?object")
)
.limit(100)
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
SELECT ?subject ?predicate ?object
FROM NAMED <http://example.org/graph1>
WHERE {
GRAPH <http://example.org/graph1> {
?subject ?predicate ?object .
}
}
LIMIT 100A real-world example combining multiple features:
query = (
select("?researcher", "?name", "?institution", "?paperCount")
.prefix("foaf", "http://xmlns.com/foaf/0.1/")
.prefix("ex", "http://example.org/")
.where("?researcher", "a", "ex:Researcher")
.where("?researcher", "foaf:name", "?name")
.where("?researcher", "ex:affiliation", "?institution")
.optional(lambda q: q
.where("?paper", "ex:author", "?researcher")
)
.filter_regex("?name", "^[A-M]", "i") # Names A-M
.select_count("?paper", "?paperCount")
.group_by("?researcher", "?name", "?institution")
.having("COUNT(?paper) > 2")
.order_by("?paperCount", descending=True)
.limit(10)
)
query_text, _ = query.build(param_mode="inline")
print(query_text)Output:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?researcher ?name ?institution (COUNT(?paper) AS ?paperCount)
WHERE {
?researcher a ex:Researcher .
?researcher foaf:name ?name .
?researcher ex:affiliation ?institution .
OPTIONAL {
?paper ex:author ?researcher .
}
FILTER (REGEX(?name, "^[A-M]", "i"))
}
GROUP BY ?researcher ?name ?institution
HAVING (COUNT(?paper) > 2)
ORDER BY DESC(?paperCount)
LIMIT 10SPARQL Builder supports two parameter modes:
rdflib(default): keeps variables in query text and passes values through rdflibinitBindings.inline: renders values directly into SPARQL asVALUESclauses.
Use rdflib mode when executing with rdflib.Graph.query(...) to rely on rdflib term handling.
Use inline mode for endpoints that only accept a raw SPARQL string (for example, some HTTP endpoint clients).
In inline mode, the provided query-construction methods (unless otherwise specified) validate and sanitize inputs to prevent SPARQL injection attacks in the generated query text. In rdflib mode, variable values are passed separately via initBindings and are handled by rdflib.
from rdflib import Graph, Literal
from sparqlbuilder import select
graph = Graph()
query = (
select("?s", "?o")
.where("?s", "rdfs:label", "?o")
.bind("o", Literal("Alice"))
)
query_text, init_bindings = query.build(param_mode="rdflib")
rows = graph.query(query_text, initBindings=init_bindings)from sparqlbuilder import select
query_text, _ = (
select("?s", "?o")
.where("?s", "rdfs:label", "?o")
.bind("o", user_input)
.build(param_mode="inline")
)In cases where you want to sanitize raw input manually or when raw input is necessary, you can use the provided formatting functions format_subject, format_predicate and format_object:
from sparqlbuilder import format_subject
# User input (potentially malicious)
user_input = "http://example.org/resource"
# Safely format for use in query
safe_uri = format_subject(user_input)- Variable names: Validated against injection attempts
- URIs: Checked for dangerous characters and proper format
- Literals: Properly escaped and quoted
- Filter expressions: Safe methods provided for common patterns
- Regex patterns: Automatically escaped in
filter_regex()
The raw filter() and having() methods accept trusted SPARQL expressions directly.
Do not pass unsanitized user input into these methods.
For complete API documentation, see API.md.
Query Construction:
select()- Create a new SELECT queryselect_distinct()- Create a new SELECT DISTINCT queryselect_reduced()- Create a new SELECT REDUCED querySPARQLQuery- Main query builder class
Value Formatting (for external validation):
format_subject()- Format and validate a subject termformat_predicate()- Format and validate a predicate termformat_object()- Format and validate an object term
Features planned for future releases:
- Add more filter expression builders for common patterns (e.g., numeric comparisons, date comparisons)
Contributions are welcome! Please feel free to submit a Pull Request.
See DEVELOPMENT.md for development setup and guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
We gratefully acknowledge the following for supporting the development of this package:
- PINK Project (2024-2027) funded by the European Union's Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 101137809.