π Production-Ready Quality: 9.2/10 βββββ
π Standards Compliant: YARRRML / RML / R2RML / W3C
π PyPI Package: semantic-rdf-mapper
π³ Docker Hub: rxcthefirst/rdfmap
Transform any tabular or structured data (CSV, Excel, JSON, XML) into high-quality RDF triples aligned with OWL ontologies using AI-powered semantic matching with 95% accuracy.
- What's New
- Key Features
- Features
- Standards & Compliance
- Dependencies & Licenses
- Commercial Support
- Quick Start
- Complete Workflow Guide
- CLI Command Reference
- Configuration Guide
- Output Formats
- Real-World Examples
- Performance & Scaling
- Troubleshooting
- Contributing
- β Generate RML in RDF/XML - Full support for XML-based RML mappings
- β Read RDF/XML Mappings - Parse and convert using RDF/XML RML files
- β Auto-Format Detection - File extension determines output format (.rdf, .ttl, .nt, .jsonld)
- β 100% Interoperable - Works with XML-based RDF toolchains
- β YARRRML Format Support - Read and write YARRRML (YAML-based RML) natively
- β RML/R2RML Generation - Export to W3C standard formats (Turtle, RDF/XML, N-Triples)
- π Multi-Format Support - Seamlessly works with YARRRML, RML, or internal format
- π€ Ecosystem Interoperability - Compatible with RMLMapper, RocketRML, Morph-KGC, SDM-RDFizer
- π― AI Metadata Preserved - x-alignment extensions for matcher confidence and evidence
- π Standards Compliant - Follows W3C RML and YARRRML specifications
- β YARRRML Format Support - Read and write YARRRML (YAML-based RML) natively
- π Auto-Format Detection - Seamlessly works with YARRRML or internal format
- π€ Ecosystem Interoperability - Compatible with RMLMapper, RocketRML, Morph-KGC, SDM-RDFizer
- π― AI Metadata Preserved - x-alignment extensions for matcher confidence and evidence
- π Standards Compliant - Follows W3C RML and YARRRML specifications
Major Intelligence Upgrade: 7.2 β 9.2 (+28%)
- π§ AI-Powered Semantic Matching - BERT embeddings catch 25% more matches
- π― 95% Automatic Success Rate - Up from 65%
- π Data Type Validation - OWL integration prevents type mismatches
- π Continuous Learning - System improves with every use via mapping history
- π Automatic FK Detection - Foreign key relationships mapped automatically
- π Enhanced Logging - Complete visibility into matching decisions
- π Confidence Calibration - Learns which matchers are most accurate
- β‘ 11 Intelligent Matchers - Working together in a plugin architecture
Result: 50% faster mappings, 71% fewer manual corrections, production-ready quality!
See FINAL_ACHIEVEMENT_REPORT.md for complete details.
AI-Powered Intelligence (Not available in other RML tools):
- π§ 95% Automatic Mapping Accuracy - BERT semantic embeddings
- π― 11 Intelligent Matchers - Plugin architecture for extensibility
- π Confidence Scoring - Know which mappings are reliable
- π Automatic Relationship Detection - Foreign keys discovered automatically
- π Continuous Learning - Improves with usage via mapping history
Standards Compliance:
- β W3C RML 1.0 specification
- β YARRRML 1.3.0 format
- β R2RML compatibility
- β SHACL validation
- β OWL 2 best practices
Interoperability:
- π€ Compatible with RMLMapper-Java, Morph-KGC, RocketRML, SDM-RDFizer
- π€ Export standard RML (Turtle, RDF/XML, N-Triples, JSON-LD)
- π₯ Import from YARRRML or RML
- π Round-trip conversion verified
- Local Files:
- CSV/TSV (configurable delimiters, CSVW support planned)
- Excel (.xlsx) - multi-sheet with automatic relationship detection
- JSON (JSONPath with
@for current object) - XML (XPath with namespace support)
- LibreOffice (.ods) - via pandas
- Data Characteristics:
- Streaming mode for large files (constant memory)
- Multi-sheet workbooks with FK detection
- Nested structures (JSON/XML)
- UTF-8 and other encodings
- Input: RML (Turtle, RDF/XML, N-Triples, JSON-LD), YARRRML, Internal YAML/JSON
- Output: RML (Turtle, RDF/XML, N-Triples, JSON-LD, N3), YARRRML, Internal YAML/JSON
- Auto-detection: File extension determines serialization format
- Turtle (.ttl) - default, human-readable
- N-Triples (.nt) - streaming, large datasets
- JSON-LD (.jsonld) - JSON-based tools
- RDF/XML (.rdf, .xml) - XML-based tools
- N3 (.n3) - Notation3
- Polars Engine: 10-100x faster than pandas
- Streaming Mode: Constant memory for any dataset size
- IRI Templates: Deterministic, idempotent generation
- Datatype Transforms: Built-in converters (decimal, date, boolean, etc.)
- Language Tags: Multi-language literal support
- Blank Nodes: Automatic for nested objects
- Join Conditions: Parent-child relationships via FK detection
- BERT Embeddings: Semantic similarity matching
- 11 Matcher Types:
- Exact label matching
- SKOS preferred/alt labels
- Fuzzy string matching
- Semantic embeddings (BERT)
- Datatype inference
- Pattern matching
- Graph structure reasoning
- Domain/range validation
- Historical learning
- Synonym expansion
- Abbreviation detection
- Confidence Scoring: 0-100% for each mapping decision
- Alignment Reports: JSON/HTML with evidence and suggestions
- Interactive Review: Approve/reject generated mappings
- SHACL Validation: Against custom shapes
- Ontology Validation: Class/property existence checks
- Datatype Validation: XSD type enforcement
- IRI Validation: Proper URI formatting
- Cardinality Checks: Min/max constraints
- Interactive Wizard: Guided setup for beginners
- Batch Processing:
--limitfor testing - Dry Run Mode: Validate without output
- Verbose Logging: Complete processing visibility
- Error Reports: Row-level error tracking
- Relational Databases:
- PostgreSQL
- MySQL/MariaDB
- SQLite
- Oracle
- SQL Server
- NoSQL Databases:
- MongoDB
- Cassandra
- Redis
- APIs:
- REST APIs with authentication
- GraphQL endpoints
- SPARQL endpoints (as source)
- Cloud Storage:
- S3
- Azure Blob
- Google Cloud Storage
- SPARQL Endpoints: Direct triple insertion
- Graph Databases: Neo4j, Neptune, ArangoDB
- Triple Stores: Virtuoso, Stardog, GraphDB
- Cloud RDF Services: AWS Neptune, Azure Cosmos DB
- Conditional Mappings: Rules engine for complex logic
- Functions: Custom transformation functions (FnO)
- Logical Views: SQL-like transformations before mapping
- Incremental Updates: Delta processing for changed data
- Federated Queries: Multi-source integration
- RDF Support*: RDF-star for provenance
- GPT-4 Integration: Natural language mapping descriptions
- Active Learning: User feedback improves suggestions
- Ontology Alignment: Auto-detect similar classes across ontologies
- Data Quality: Automated anomaly detection
RDFMap implements the following W3C specifications:
-
β RML (R2RML Extended): https://rml.io/specs/rml/
- Logical sources with CSV, JSON, XML
- Subject maps with IRI templates
- Predicate-object maps with datatypes
- Parent triples maps for joins
- Reference formulations (ql:CSV, ql:JSONPath, ql:XPath)
-
β YARRRML: https://rml.io/yarrrml/
- Human-friendly YAML-based RML
- Read and write support
- Full spec compliance
-
β R2RML Compatibility: https://www.w3.org/TR/r2rml/
- Core R2RML vocabulary
- TriplesMap structure
- SubjectMap, PredicateObjectMap patterns
-
β SHACL Validation: https://www.w3.org/TR/shacl/
- Shape-based validation
- Constraint checking
- Validation reports
-
β OWL 2: https://www.w3.org/TR/owl2-overview/
- NamedIndividual declarations
- Class and property hierarchies
- Domain/range constraints
RDFMap adds non-breaking extensions for AI features:
- x-alignment: Matcher confidence and evidence metadata
- x-matcher: Which matcher produced the mapping
- x-confidence: Numerical confidence score (0-100)
- x-evidence: Human-readable explanation
- x-suggestions: Alternative mappings considered
These extensions are ignored by standard RML processors, ensuring compatibility.
RDFMap-generated RML works with:
| Tool | Status | Notes |
|---|---|---|
| RMLMapper-Java | β Tested | Industry standard |
| Morph-KGC | β Compatible | R2RML + RML |
| RocketRML | β Compatible | Node.js implementation |
| SDM-RDFizer | β Compatible | Spark-based |
| CARML | β Compatible | Streaming RML |
Validation: We maintain cross-tool test suites in examples/ directory.
RDFMap includes test cases for:
- RML core features (
tests/test_rml_*.py) - YARRRML support (
tests/test_yarrrml_*.py) - Multi-format output (
tests/test_serialization.py) - Edge cases and compliance (
tests/test_compliance.py)
Run tests: pytest tests/
| Package | Version | License | Purpose |
|---|---|---|---|
| polars | β₯0.19.0 | MIT | High-performance data processing |
| rdflib | β₯7.0.0 | BSD-3-Clause | RDF graph construction |
| pydantic | β₯2.0.0 | MIT | Configuration validation |
| typer | β₯0.9.0 | MIT | CLI framework |
| sentence-transformers | β₯2.2.0 | Apache 2.0 | BERT embeddings |
| pyshacl | β₯0.23.0 | Apache 2.0 | SHACL validation |
| Package | Version | License | Purpose |
|---|---|---|---|
| openpyxl | β₯3.1.0 | MIT | Excel file support |
| lxml | β₯4.9.0 | BSD | XML parsing (faster) |
| pandas | β₯2.0.0 | BSD | Fallback data processing |
| Package | Version | License | Purpose |
|---|---|---|---|
| pytest | β₯7.4.0 | MIT | Testing framework |
| pytest-cov | β₯4.1.0 | MIT | Code coverage |
| mypy | β₯1.5.0 | MIT | Type checking |
| ruff | β₯0.1.0 | MIT | Linting |
Full dependency list: See pyproject.toml
License Compatibility: All dependencies are permissively licensed (MIT, BSD, Apache 2.0).
Do you need enterprise-grade support for RDFMap?
Available Services:
- π Training & Workshops - On-site or remote team training
- π§ Custom Development - Feature development for your use case
- π Integration Consulting - Integrate RDFMap into your pipeline
- β‘ Performance Optimization - Tune for your specific data/ontology
- π‘οΈ Support Contracts - SLA-backed support (24/7 available)
- π’ Enterprise Licensing - Commercial licensing options
Industries We Serve:
- Financial Services (mortgage, trading, compliance)
- Healthcare & Life Sciences (FHIR, clinical trials)
- E-Commerce (product catalogs, supply chain)
- Research & Academia (publications, grants)
- Government (open data, linked data)
Contact: For enterprise inquiries, please open a GitHub Discussion with tag [enterprise]
Get the complete RDFMap stack running in one command:
docker run -d -p 8080:8080 --name rdfmap rxcthefirst/rdfmap:latest
# Access the UI at http://localhost:8080That's it! Everything included:
- π¨ Web UI (React + Vite)
- βοΈ REST API (FastAPI)
- π Background workers (Celery)
- πΎ Built-in storage (SQLite)
With persistent data:
docker run -d -p 8080:8080 \
-v rdfmap-data:/app/data \
--name rdfmap \
rxcthefirst/rdfmap:latestAdvanced: Microservices deployment for production scaling - see Docker Guide
- Python 3.11+ (recommended: Python 3.13)
pip install rdfmap# Clone the repository
git clone https://github.com/rdfmap/rdfmap.git
cd rdfmap
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"Get the complete RDFMap stack running in one command:
docker run -d -p 8080:8080 --name rdfmap rxcthefirst/rdfmap:latest
# Access the UI at http://localhost:8080That's it! Everything included:
- π¨ Web UI (React + Vite)
- βοΈ REST API (FastAPI)
- π Background workers (Celery)
- πΎ Built-in storage (SQLite)
With persistent data:
docker run -d -p 8080:8080 \
-v rdfmap-data:/app/data \
--name rdfmap \
rxcthefirst/rdfmap:latestAdvanced: Microservices deployment for production scaling - see Docker Guide
- Python 3.11+ (recommended: Python 3.13)
- pip or conda
# Install the package
pip install semantic-rdf-mapper
# Verify installation
rdfmap --help# Clone the repository
git clone https://github.com/yourusername/rdfmap.git
cd rdfmap
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"RDFMap supports four main workflows, from beginner-friendly wizards to advanced enterprise scenarios.
Best for: First-time users, prototyping, learning the tool
The interactive wizard guides you through every step with smart defaults and data analysis.
# Start the interactive wizard
rdfmap initWhat the wizard does:
- Data Source Selection: Choose your CSV, Excel, JSON, or XML file
- Ontology Selection: Select your OWL ontology file
- Target Class: Auto-detect or manually select the main class
- Configuration Options: Set base IRI, output preferences, validation
- Smart Defaults: Analyzes your data and suggests optimal settings
- Auto-Generation: Runs AI-powered mapping generation at the end
Example Session:
π― RDFMap Configuration Wizard
ββββββββββββββββββββββββββββββββββββ
π Step 1: Data Source
? Enter path to data file: examples/mortgage/data/loans.csv
β Found CSV with 5 rows, 10 columns
ποΈ Step 2: Ontology
? Enter path to ontology file: examples/mortgage/ontology/mortgage_ontology.ttl
β Loaded ontology with 8 classes, 24 properties
π― Step 3: Target Class
? Select target class:
1. MortgageLoan (8 matches)
2. Borrower (2 matches)
3. Property (2 matches)
? Your choice [1]: 1
βοΈ Step 4: Configuration
? Base IRI [http://example.org/]: http://example.com/mortgage/
? Output format [rml]: rml
? Generate alignment report? [y/N]: y
π Generating mapping...
β Generated RML mapping: mortgage_mapping.rml.ttl
β Alignment report: mortgage_mapping_alignment.json
π Next steps:
1. Review the mapping: cat mortgage_mapping.rml.ttl
2. Convert data: rdfmap convert --mapping mortgage_mapping.rml.ttl --output data.ttl
Best for: Automated pipelines, batch processing, repeatable workflows
Generate mappings programmatically with full control over options.
# Generate RML mapping in Turtle format
rdfmap generate \
--ontology examples/mortgage/ontology/mortgage_ontology.ttl \
--data examples/mortgage/data/loans.csv \
--format rml \
--output mortgage_mapping.rml.ttl \
--base-iri "http://example.com/mortgage/" \
--alignment-reportOutput:
Analyzing ontology...
Found 8 classes
Found 24 properties
Analyzing data source...
Columns: 10
Identifier columns: ['LoanID']
Generating mappings with AI matchers...
β Matched 10/10 columns (100%)
β Detected 2 object properties
β Generated IRI template: http://example.com/mortgage/loan/{LoanID}
β RML mapping (Turtle) written to mortgage_mapping.rml.ttl
β Alignment report (JSON): mortgage_mapping_alignment.json
# Generate RML in RDF/XML format (for XML-based tools)
rdfmap generate \
--ontology examples/mortgage/ontology/mortgage_ontology.ttl \
--data examples/mortgage/data/loans.csv \
--format rml \
--output mortgage_mapping.rml.rdf \
--base-iri "http://example.com/mortgage/"Format auto-detected from .rdf extension! Produces XML-serialized RML.
# Generate human-friendly YARRRML
rdfmap generate \
--ontology examples/mortgage/ontology/mortgage_ontology.ttl \
--data examples/mortgage/data/loans.csv \
--format yarrrml \
--output mortgage_mapping.yaml \
--base-iri "http://example.com/mortgage/"Generated YARRRML (mortgage_mapping.yaml):
prefixes:
ex: https://example.com/mortgage#
xsd: http://www.w3.org/2001/XMLSchema#
sources:
loans:
access: examples/mortgage/data/loans.csv
referenceFormulation: csv
mappings:
loans:
sources:
- loans
s: http://example.com/mortgage/loan/$(LoanID)
po:
- [a, ex:MortgageLoan]
- [ex:loanNumber, $(LoanID), xsd:string]
- [ex:principalAmount, $(Principal), xsd:decimal]
- [ex:interestRate, $(InterestRate), xsd:decimal]
- [ex:originationDate, $(OriginationDate), xsd:date]
- p: ex:hasBorrower
o:
mapping: borrower
condition:
function: equal
parameters:
- [str1, $(BorrowerID)]
- [str2, $(BorrowerID)]Once you have a mapping, convert your data:
# Convert using RML (Turtle)
rdfmap convert \
--mapping mortgage_mapping.rml.ttl \
--format ttl \
--output mortgage_data.ttl
# Convert using RML (RDF/XML)
rdfmap convert \
--mapping mortgage_mapping.rml.rdf \
--format ttl \
--output mortgage_data.ttl
# Convert using YARRRML
rdfmap convert \
--mapping mortgage_mapping.yaml \
--format jsonld \
--output mortgage_data.jsonldOutput (mortgage_data.ttl):
@prefix ex: <https://example.com/mortgage#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
<http://example.com/mortgage/loan/L-1001> a owl:NamedIndividual, ex:MortgageLoan ;
ex:loanNumber "L-1001"^^xsd:string ;
ex:principalAmount 250000^^xsd:decimal ;
ex:interestRate 0.0525^^xsd:decimal ;
ex:originationDate "2023-06-15"^^xsd:date ;
ex:loanTerm 360^^xsd:integer ;
ex:loanStatus "Active"^^xsd:string ;
ex:hasBorrower <http://example.com/mortgage/borrower/B-9001> ;
ex:collateralProperty <http://example.com/mortgage/property/P-7001> .
<http://example.com/mortgage/borrower/B-9001> a owl:NamedIndividual, ex:Borrower ;
ex:borrowerName "Alex Morgan"^^xsd:string .
<http://example.com/mortgage/property/P-7001> a owl:NamedIndividual, ex:Property ;
ex:propertyAddress "12 Oak St"^^xsd:string .Best for: Custom requirements, complex transformations, fine-tuned control
Create mappings manually for complete control over every aspect.
File: custom_mapping.rml.ttl
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix ex: <https://example.com/mortgage#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<#LoansMapping> a rr:TriplesMap ;
rml:logicalSource [
rml:source "examples/mortgage/data/loans.csv" ;
rml:referenceFormulation ql:CSV
] ;
rr:subjectMap [
rr:template "http://example.com/mortgage/loan/{LoanID}" ;
rr:class ex:MortgageLoan
] ;
rr:predicateObjectMap [
rr:predicate ex:loanNumber ;
rr:objectMap [
rml:reference "LoanID" ;
rr:datatype xsd:string
]
] ;
rr:predicateObjectMap [
rr:predicate ex:principalAmount ;
rr:objectMap [
rml:reference "Principal" ;
rr:datatype xsd:decimal
]
] .rdfmap convert \
--mapping custom_mapping.rml.ttl \
--output custom_output.ttl \
--verboseBest for: Complex data models, normalized databases, multi-table exports
Handle Excel workbooks or multiple CSV files with relationships.
# Auto-detect relationships between sheets
rdfmap generate \
--ontology examples/comprehensive_test/ontology.ttl \
--data examples/comprehensive_test/multi_sheet.xlsx \
--format rml \
--output multi_sheet_mapping.rml.ttl \
--alignment-reportWhat happens:
- Analyzes all sheets in workbook
- Detects foreign key relationships (e.g.,
LoanIDreferences) - Creates separate TriplesMaps for each entity
- Generates
rr:parentTriplesMapfor relationships - Produces unified RML mapping
Convert multi-sheet data:
rdfmap convert \
--mapping multi_sheet_mapping.rml.ttl \
--output integrated_data.ttl# Namespace declarations
namespaces:
ex: https://example.com/mortgage#
xsd: http://www.w3.org/2001/XMLSchema#
# Default settings
defaults:
base_iri: https://data.example.com/
language: en # Optional default language tag
# Sheet/file mappings
sheets:
- name: loans
source: loans.csv # Relative to mapping file or absolute
# Main resource for each row
row_resource:
class: ex:MortgageLoan
iri_template: "{base_iri}loan/{LoanID}"
# Column mappings
columns:
LoanID:
as: ex:loanNumber
datatype: xsd:string
required: true
Principal:
as: ex:principalAmount
datatype: xsd:decimal
transform: to_decimal # Built-in transform
default: 0 # Optional default value
Notes:
as: rdfs:comment
datatype: xsd:string
language: en # Language tag for literal
# Linked objects (object properties)
objects:
borrower:
predicate: ex:hasBorrower
class: ex:Borrower
iri_template: "{base_iri}borrower/{BorrowerID}"
properties:
- column: BorrowerName
as: ex:borrowerName
datatype: xsd:string
# Validation configuration
validation:
shacl:
enabled: true
shapes_file: shapes/mortgage_shapes.ttl
# Processing options
options:
delimiter: ","
header: true
on_error: "report" # "report" or "fail-fast"
skip_empty_values: trueto_decimal: Convert to decimal numberto_integer: Convert to integerto_date: Parse date (ISO format)to_datetime: Parse datetime with timezone supportto_boolean: Convert to booleanuppercase: Convert string to uppercaselowercase: Convert string to lowercasestrip: Trim whitespace
Use Python-style string formatting with column names:
{base_iri}loan/{LoanID}βhttps://data.example.com/loan/L-1001{base_iri}{EntityType}/{ID}β Combine multiple columns
Start the interactive configuration wizard for guided setup.
rdfmap init [OPTIONS]Options:
--help- Show help message
Use when: You're new to RDFMap or want guided configuration
Example:
rdfmap init
# Follow the prompts for step-by-step setupAuto-generate mapping configuration from ontology and data using AI.
rdfmap generate --ontology FILE --data FILE --output FILE [OPTIONS]Required Options:
--ontology, -ont FILE- Path to ontology file (TTL, RDF/XML, etc.)--data, -d FILE- Path to data file (CSV, XLSX, JSON, XML)--output, -o FILE- Output path for generated mapping
Optional Flags:
--base-iri, -b TEXT- Base IRI for resources (default:http://example.org/)--class, -c TEXT- Target ontology class (auto-detects if omitted)--format, -f TEXT- Output format:rml,yarrrml,yaml,json(default:rml)--analyze-only- Show analysis without generating mapping--export-schema- Export JSON Schema for validation--import TEXT- Additional ontology files (can be specified multiple times)--alignment-report- Generate quality metrics report--verbose, -v- Enable detailed logging
Format Options:
rml- W3C RML standard (file extension determines serialization).ttlβ Turtle (default, human-readable).rdfor.xmlβ RDF/XML (XML-based tools).ntβ N-Triples (streaming-friendly).jsonldβ JSON-LD (JSON-based tools)
yarrrml- YARRRML format (human-friendly YAML)yaml- Internal format (backwards compatible)json- Internal format (machine-readable)
Examples:
# Generate RML in Turtle format (recommended)
rdfmap generate \
--ontology ontology.ttl \
--data data.csv \
-f rml \
-o mapping.rml.ttl
# Generate RML in RDF/XML format
rdfmap generate \
--ontology ontology.ttl \
--data data.csv \
-f rml \
-o mapping.rml.rdf
# Generate YARRRML with alignment report
rdfmap generate \
--ontology ontology.ttl \
--data data.xlsx \
-f yarrrml \
-o mapping.yaml \
--alignment-report
# With custom base IRI and target class
rdfmap generate \
--ontology ontology.ttl \
--data data.csv \
-f rml \
-o mapping.rml.ttl \
--base-iri "https://data.mycompany.com/" \
--class "MyMainClass"
# With ontology imports
rdfmap generate \
--ontology main_ontology.ttl \
--import imported_ontology1.ttl \
--import imported_ontology2.ttl \
--data data.csv \
-f rml \
-o mapping.rml.ttl
# Analyze only (no generation)
rdfmap generate \
--ontology ontology.ttl \
--data data.csv \
-o mapping.rml.ttl \
--analyze-onlyTransform data to RDF triples using a mapping configuration.
rdfmap convert --mapping FILE [OPTIONS]Required Options:
--mapping, -m FILE- Path to mapping file (RML, YARRRML, YAML, or JSON)
Optional Flags:
--ontology FILE- Path to ontology for validation--format, -f TEXT- Output format:ttl,xml,jsonld,nt(default:ttl)--output, -o FILE- Output file path (stdout if omitted)--validate- Run SHACL validation after conversion--report FILE- Write validation report to JSON file--limit N- Process only first N rows (for testing)--dry-run- Parse and validate without writing output--verbose, -v- Enable detailed logging--aggregate-duplicates/--no-aggregate-duplicates- Control IRI aggregation--log FILE- Write log to file
Examples:
# Basic conversion to Turtle
rdfmap convert --mapping mapping.rml.ttl --output data.ttl
# Convert to JSON-LD
rdfmap convert \
--mapping mapping.rml.ttl \
--format jsonld \
--output data.jsonld
# With SHACL validation
rdfmap convert \
--mapping mapping.rml.ttl \
--ontology ontology.ttl \
--validate \
--report validation_report.json \
--output data.ttl
# Test with limited rows (dry run)
rdfmap convert \
--mapping mapping.rml.ttl \
--limit 10 \
--dry-run \
--verbose
# Streaming output (N-Triples for large datasets)
rdfmap convert \
--mapping mapping.rml.ttl \
--format nt \
--output data.nt \
--no-aggregate-duplicates
# With logging
rdfmap convert \
--mapping mapping.rml.ttl \
--output data.ttl \
--log conversion.log \
--verboseInteractively review and approve/reject generated mappings.
rdfmap review --mapping FILE [OPTIONS]Options:
--mapping FILE- Path to mapping file to review--alignment-report FILE- Path to alignment report with confidence scores
Use when: You want to manually verify AI-generated mappings
Example:
rdfmap review \
--mapping generated_mapping.rml.ttl \
--alignment-report generated_mapping_alignment.jsonValidate RDF file against SHACL shapes.
rdfmap validate --rdf FILE --shapes FILE [OPTIONS]Options:
--rdf FILE- Path to RDF file to validate--shapes FILE- Path to SHACL shapes file--report FILE- Output validation report to JSON--inference TEXT- Inference mode:none,rdfs,owlrl(default:none)
Example:
rdfmap validate \
--rdf data.ttl \
--shapes shapes/mortgage_shapes.ttl \
--report validation_report.json \
--inference rdfsDisplay information about a mapping configuration.
rdfmap info --mapping FILEExample:
rdfmap info --mapping mapping.rml.ttlOutput:
Mapping Information
ββββββββββββββββββββββββββββββββββββ
Format: RML (Turtle)
Source: examples/mortgage/data/loans.csv
Base IRI: http://example.com/mortgage/
Classes: 3
- MortgageLoan
- Borrower
- Property
Properties: 10
Relationships: 2
Add SKOS labels to ontology based on alignment report suggestions.
rdfmap enrich --ontology FILE --report FILE --output FILEExample:
rdfmap enrich \
--ontology ontology.ttl \
--report alignment_report.json \
--output enriched_ontology.ttlAnalyze alignment reports to track trends and improvements.
rdfmap stats --reports DIR [OPTIONS]Example:
rdfmap stats --reports reports/ --output stats.jsonCheck SKOS label coverage in an ontology.
rdfmap validate-ontology --ontology FILEExample:
rdfmap validate-ontology --ontology ontology.ttlList available mapping templates for common domains.
rdfmap templatesOutput:
Available Mapping Templates
ββββββββββββββββββββββββββββββββββββ
1. mortgage - Mortgage loan data
2. healthcare - Patient and medical records
3. ecommerce - Product catalogs
4. finance - Financial transactions
Scenario: Convert mortgage loan data from legacy CSV to RDF for integration with a knowledge graph.
Data Structure (loans.csv):
LoanID,BorrowerID,BorrowerName,PropertyID,PropertyAddress,Principal,InterestRate,OriginationDate,LoanTerm,Status
L-1001,B-9001,Alex Morgan,P-7001,12 Oak St,250000,0.0525,2023-06-15,360,Active
L-1002,B-9002,Jamie Lee,P-7002,88 Pine Ave,475000,0.0475,2022-11-30,360,Active
Step 1: Generate mapping
rdfmap generate \
--ontology mortgage_ontology.ttl \
--data loans.csv \
-f rml \
-o mortgage_mapping.rml.ttl \
--base-iri "https://bank.example.com/data/" \
--alignment-reportStep 2: Convert data
rdfmap convert \
--mapping mortgage_mapping.rml.ttl \
--format ttl \
--output loans.ttl \
--validateResult: 3 entity types (Loan, Borrower, Property) with relationships, all properly typed and linked.
Scenario: Transform Excel patient records to FHIR-compatible RDF.
Data Structure (patients.xlsx):
- Sheet 1: Patients (ID, Name, DOB, Gender)
- Sheet 2: Visits (VisitID, PatientID, Date, Diagnosis)
- Sheet 3: Medications (MedicationID, VisitID, Drug, Dosage)
Generate with multi-sheet support:
rdfmap generate \
--ontology fhir_ontology.ttl \
--data patients.xlsx \
-f rml \
-o healthcare_mapping.rml.ttl \
--base-iri "https://hospital.example.org/data/" \
--import fhir_extensions.ttlConvert:
rdfmap convert \
--mapping healthcare_mapping.rml.ttl \
--ontology fhir_ontology.ttl \
--format jsonld \
--output patients.jsonld \
--validate \
--report validation_report.jsonFeatures demonstrated:
- Multi-sheet Excel processing
- Foreign key detection (PatientID, VisitID)
- Ontology imports
- JSON-LD output for FHIR compatibility
- SHACL validation
Scenario: Convert product data from JSON API to RDF for semantic search.
Data Structure (products.json):
{
"products": [
{
"sku": "PROD-001",
"name": "Wireless Mouse",
"category": "Electronics",
"price": 29.99,
"inStock": true,
"tags": ["wireless", "ergonomic", "bluetooth"]
}
]
}Generate with JSON support:
rdfmap generate \
--ontology ecommerce_ontology.ttl \
--data products.json \
-f yarrrml \
-o products_mapping.yaml \
--base-iri "https://shop.example.com/products/"Convert with streaming (large catalog):
rdfmap convert \
--mapping products_mapping.yaml \
--format nt \
--output products.nt \
--no-aggregate-duplicates # For better performanceFeatures demonstrated:
- JSON input with nested structures
- Array expansion (tags)
- YARRRML for human-friendly editing
- N-Triples streaming for large datasets
Scenario: Convert bibliographic data from CSV to schema.org-compatible RDF.
Workflow:
# 1. Analyze data first
rdfmap generate \
--ontology schema_org.ttl \
--data publications.csv \
-o mapping.rml.ttl \
--analyze-only
# 2. Generate with custom class
rdfmap generate \
--ontology schema_org.ttl \
--data publications.csv \
-f rml \
-o publications_mapping.rml.rdf \
--class "ScholarlyArticle" \
--base-iri "https://research.example.edu/pub/"
# 3. Convert with custom output format
rdfmap convert \
--mapping publications_mapping.rml.rdf \
--format xml \
--output publications.rdf \
--limit 100 # Test first
# 4. Full conversion after validation
rdfmap convert \
--mapping publications_mapping.rml.rdf \
--format ttl \
--output publications.ttl \
--log conversion.log \
--verboseFeatures demonstrated:
- Analyze-only mode for inspection
- RDF/XML mapping format
- Iterative workflow (test β full conversion)
- Detailed logging
Scenario: Stream large IoT sensor readings to RDF without loading all data into memory.
Data: 10 million rows of sensor readings
Generate mapping:
rdfmap generate \
--ontology iot_ontology.ttl \
--data sensor_readings.csv \
-f rml \
-o sensors_mapping.rml.ttl \
--base-iri "https://iot.example.com/sensors/" \
--class "SensorReading"Streaming conversion:
rdfmap convert \
--mapping sensors_mapping.rml.ttl \
--format nt \
--output sensors.nt \
--no-aggregate-duplicates \
--log streaming.logWhy this works:
- N-Triples format writes one triple at a time
--no-aggregate-duplicatesdisables in-memory grouping- Constant memory usage regardless of file size
- Can process TB-scale data
Scenario: Generate RML with RDFMap, execute with RMLMapper for validation.
Step 1: Generate RML in Turtle
rdfmap generate \
--ontology ontology.ttl \
--data data.csv \
-f rml \
-o mapping.rml.ttlStep 2: Convert with RDFMap
rdfmap convert \
--mapping mapping.rml.ttl \
--output output_rdfmap.ttlStep 3: Validate with RMLMapper (Java)
java -jar rmlmapper.jar \
-m mapping.rml.ttl \
-o output_rmlmapper.ttlStep 4: Compare outputs
# Both should produce identical RDF graphs
diff output_rdfmap.ttl output_rmlmapper.ttlInteroperability Features:
- Standards-compliant RML output
- Compatible with RMLMapper, Morph-KGC, RocketRML
- Consistent results across tools
- Vendor-neutral mapping format
| Dataset Size | Memory Usage | Processing Time | Recommended Format |
|---|---|---|---|
| < 10K rows | ~50 MB | < 1 second | TTL (aggregated) |
| 10K - 100K rows | ~200 MB | 1-10 seconds | TTL or JSON-LD |
| 100K - 1M rows | ~500 MB | 10-60 seconds | N-Triples (streaming) |
| 1M - 10M rows | ~1 GB | 1-10 minutes | N-Triples (streaming) |
| 10M+ rows | ~2 GB (constant) | 10+ minutes | N-Triples (no aggregation) |
# Disable aggregation for constant memory usage
rdfmap convert \
--mapping mapping.rml.ttl \
--format nt \
--output large_data.nt \
--no-aggregate-duplicates# Process first 1000 rows to validate mapping
rdfmap convert \
--mapping mapping.rml.ttl \
--limit 1000 \
--output test.ttl \
--verbose
# Then run full conversion
rdfmap convert \
--mapping mapping.rml.ttl \
--output full.nt \
--format nt# Validate without writing output
rdfmap convert \
--mapping mapping.rml.ttl \
--dry-run \
--limit 100# Split large file, process in parallel
split -l 1000000 large_data.csv chunk_
# Process chunks in parallel (bash)
for chunk in chunk_*; do
rdfmap convert \
--mapping mapping.rml.ttl \
--output ${chunk}.nt &
done
wait
# Concatenate results
cat chunk_*.nt > combined.nt# Fast: N-Triples (no parsing needed)
rdfmap convert --mapping map.rml.ttl --format nt -o data.nt
# Medium: Turtle (readable, compressed)
rdfmap convert --mapping map.rml.ttl --format ttl -o data.ttl
# Slow: RDF/XML (verbose, complex)
rdfmap convert --mapping map.rml.ttl --format xml -o data.rdfTested on M1 MacBook Pro (16GB RAM):
- CSV Processing: ~50,000 rows/second
- Excel Processing: ~30,000 rows/second
- JSON Processing: ~40,000 rows/second
- Triple Generation: ~100,000 triples/second
- SHACL Validation: ~10,000 triples/second
2M Row Dataset:
- Generation: 30 seconds
- Conversion (aggregated): 2.5 minutes
- Conversion (streaming): 1.5 minutes
- Memory: 1.8 GB (aggregated), 500 MB (streaming)
Error:
Error: Column 'CustomerID' not found in data source
Solutions:
- Check column name spelling (case-sensitive!)
- Verify CSV has headers
- Check for extra spaces in column names
- Use
--verboseto see all available columns
rdfmap convert --mapping map.rml.ttl --dry-run --verboseError:
Error: Invalid IRI generated: 'http://example.org/item/None'
Causes:
- Missing column in IRI template
- NULL values in identifier column
- Incorrect template syntax
Solutions:
# Bad:
iri_template: "{base_iri}item/{ItemID}" # ItemID is NULL
# Good - add fallback:
iri_template: "{base_iri}item/{ItemID|default_id}"
# Or skip rows with missing IDs:
options:
skip_empty_values: trueError:
Error: Cannot convert 'N/A' to xsd:decimal for column 'Price'
Solutions:
- Add transform to clean data:
columns:
Price:
as: ex:price
datatype: xsd:decimal
transform: to_decimal # Handles NULL, empty, "N/A"
default: 0 # Fallback value- Use string type for problematic columns:
columns:
Price:
as: ex:priceString
datatype: xsd:string # Store as-is- Skip validation temporarily:
rdfmap convert \
--mapping map.rml.ttl \
--output data.ttl # Remove --validate flagError:
MemoryError: Unable to allocate array
Solutions:
- Use streaming mode:
rdfmap convert \
--mapping map.rml.ttl \
--format nt \
--no-aggregate-duplicates \
--output data.nt- Process in chunks:
# Test with limit first
rdfmap convert --mapping map.rml.ttl --limit 100000 -o chunk1.nt
# Split file externally
split -l 100000 large_data.csv chunk_- Increase system memory or use cloud instance
Error:
Validation failed: 156 violations found
Solutions:
- Review validation report:
rdfmap convert \
--mapping map.rml.ttl \
--validate \
--report report.json \
--output data.ttl
# Examine report
cat report.json | jq '.results[] | {node: .focusNode, message: .resultMessage}'- Check common violations:
- Missing required properties
- Wrong datatypes
- Cardinality violations (min/max)
- Invalid IRI formats
- Fix mapping configuration:
# Add missing required properties
columns:
RequiredField:
as: ex:requiredProperty
datatype: xsd:string
required: true # Mark as requiredSymptoms: Conversion takes much longer than expected
Solutions:
- Profile with verbose logging:
rdfmap convert \
--mapping map.rml.ttl \
--verbose \
--log profile.log \
--output data.ttl- Disable aggregation:
rdfmap convert \
--mapping map.rml.ttl \
--no-aggregate-duplicates \
--output data.nt- Use simpler output format:
# Fast: N-Triples
rdfmap convert --mapping map.rml.ttl -f nt -o data.nt
# Slow: RDF/XML
rdfmap convert --mapping map.rml.ttl -f xml -o data.rdf- Check for complex transformations or joins
Symptoms: AI matcher confidence scores < 50%
Solutions:
- Add SKOS labels to ontology:
ex:customerIdentifier a owl:DatatypeProperty ;
rdfs:label "Customer ID"@en ;
skos:prefLabel "Customer Identifier"@en ;
skos:altLabel "CustID"@en, "Customer Number"@en .- Use
--classto specify target:
rdfmap generate \
--ontology ont.ttl \
--data data.csv \
--class "Customer" \ # Specify target class
-o map.rml.ttl- Review and approve mappings:
rdfmap review \
--mapping map.rml.ttl \
--alignment-report map_alignment.json- Enrich ontology with learned mappings:
rdfmap enrich \
--ontology ont.ttl \
--report map_alignment.json \
--output enriched_ont.ttlEnable maximum verbosity for troubleshooting:
rdfmap convert \
--mapping map.rml.ttl \
--verbose \
--log debug.log \
--dry-run \
--limit 10Check log for:
- Column mappings applied
- Datatype conversions
- IRI generation
- Triple counts per row
- Error messages with row numbers
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run tests (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Clone repository
git clone https://github.com/yourusername/rdfmap.git
cd rdfmap
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=rdfmap --cov-report=html
# Type checking
mypy src/rdfmap
# Linting
ruff check src/MIT License - See LICENSE file for details
- Polars - High-performance data processing
- rdflib - RDF processing in Python
- pydantic - Data validation
- pyshacl - SHACL validation
- typer - Beautiful CLI framework
- Documentation: Full docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- PyPI: semantic-rdf-mapper
- Docker Hub: rxcthefirst/rdfmap
Built with β€οΈ by the RDFMap community