OntoSemantics

Literature-Guided Integration of Biomedical Ontologies for Cross-Domain Knowledge Discovery

Overview

OntoSemantics addresses a critical challenge in biomedical AI: isolated ontologies. While biomedical ontologies like MONDO (diseases), CHEBI (chemicals), and Gene Ontology contain rich internal structure, they exist in silos with minimal cross-domain relationships. This creates a fragmented knowledge landscape that limits comprehensive biomedical reasoning. Our solution uses hybrid transformer-ontology architectures to automatically discover and validate cross-ontology relationships from biomedical literature, creating the first large-scale integrated biomedical knowledge graph derived from literature evidence.

Features

🔄 Self-Improving Architecture: Knowledge base learns from every query through ontology validation
🧬 Biomedical Focus: Specialized for medical literature and research applications
📊 Multiple Ontologies: Integrated support for MONDO, Gene Ontology, Human Phenotype Ontology
⚡ Real-time Validation: Live checking against authoritative knowledge sources
📈 Measurable Progress: Track knowledge graph growth and accuracy improvements over time
🔍 Relationship Extraction: Advanced biomedical entity relationship discovery

Quick Start

Prerequisites

Python 3.8+
Docker & Docker Compose
Ollama (for local LLM inference)

Installation

# Clone the repository
git clone https://github.com/mdrago98/ontosemantics.git
cd ontosemantics

# Install the package (editable local checkout)
pip install -e .[full]

# Or install directly from GitHub (ideal for Colab)
pip install "bioengine[full] @ git+https://github.com/mdrago98/bioengine.git"

Set the BIOENGINE_CONFIG_PATH environment variable if you want to load a custom YAML configuration instead of the packaged defaults.

Setup

Start required services:

docker-compose up -d

Download ontologies:

from bioengine.knowledge_engine.ontology_manager import OntologyManager
om = OntologyManager()
await om.download_and_load_ontologies()

or through the bash script:

 sh scripts/download_ontologies.sh -a

Initialize LLM extractor:

from bioengine.nlp_processor.llm_extractor import LLMRelationshipExtractor
extractor = LLMRelationshipExtractor('gemma3:1b')

Usage

Basic Relationship Extraction

# Extract relationships without context
relationships = extractor.extract_relationships(text)

# Extract with entity context
relationships = extractor.extract_relationships(
    text, 
    context={'entities': ['insulin', 'diabetes', 'glucose']}
)

# Extract with full ontological context
semantic_context = om.get_semantic_context(['insulin', 'diabetes'])
relationships = extractor.extract_relationships(
    text,
    context={'semantic_relationships': semantic_context}
)

Evaluation

from utils.eval import RelationshipEvaluator

evaluator = RelationshipEvaluator(matching_strategy="fuzzy")
metrics = evaluator.evaluate(predicted_relationships, ground_truth)
print(f"F1-Score: {metrics.overall_metrics.f1_score}")

Ontology Integration

# Validate and enrich entities with ontological knowledge
matches = om.validate_and_enrich_entity('type 2 diabetes')
for match in matches:
    print(f"Parents: {[p.name for p in match.parents]}")
    print(f"Children: {[c.name for c in match.children]}")

Project Structure

ontosemantics/
├── bioengine/knowledge_engine/  # Core ontology processing
│   ├── ontology_manager.py      # Ontology loading and management
│   └── models/                  # Data models for entities and relationships
├── bioengine/nlp_processor/     # LLM-based extraction
│   └── llm_extractor.py         # Relationship extraction with context
├── bioengine/utils/             # Evaluation and utilities
│   └── eval.py                  # Metrics calculation and evaluation
├── notebooks/                # Jupyter notebooks and experiments
│   └── ontology.ipynb       # Main experiment notebook
├── data/                     # Datasets and ontologies
│   ├── BioRED/              # BioRED challenge dataset
│   └── ontologies/          # Downloaded ontology files
└── docker-compose.yml       # Service orchestration

Experimental Results

Context Type	Precision	Recall	F1-Score
Entity Context	0.083	0.333	0.133
Semantic Context	0.333	0.667	0.444

Supported Ontologies

MONDO: Disease Ontology (56,695+ terms)
Gene Ontology (GO): Biological processes and molecular functions (48,106+ terms)
Human Phenotype Ontology (HP): Phenotypic abnormalities (19,653+ terms)
UBERON: Anatomical structures (planned)
ChEBI: Chemical entities (planned)

Configuration

LLM Models

Currently supports Ollama-compatible models:

gemma3:1b (default, lightweight)
llama3:8b (better performance)
mistral:7b (alternative option)

Ontology Sources

Ontologies are automatically downloaded from:

Development

Running Tests

pytest tests/

Adding New Ontologies

# Add to ontology_manager.py ONTOLOGY_URLS
ONTOLOGY_URLS = {
    'your_ontology': 'http://purl.obolibrary.org/obo/your_ontology.obo'
}

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Future Work

Embedding-Based Context: Pre-computed semantic embeddings for faster context selection
Multi-Modal Integration: Combining structured graphs with LLM embeddings
Hierarchical Embeddings: Preserving parent-child relationships in vector space
Real-time Knowledge Graph Updates: Live integration of validated relationships
Cross-Domain Transfer: Extending beyond biomedical to other knowledge domains

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

BioRED Challenge for the evaluation dataset
Pronto for ontology processing
Ollama for local LLM inference
Open Biomedical Ontologies Foundry for ontology standards

Contact

Author: Matthew Drago
Blog: Here

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.idea		.idea
bioengine		bioengine
conda.recipe		conda.recipe
evaluation		evaluation
meta_classes		meta_classes
notebooks		notebooks
resources		resources
scripts		scripts
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
classes.dot		classes.dot
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
output.pdf		output.pdf
packages.dot		packages.dot
pyproject.toml		pyproject.toml
settings.py		settings.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OntoSemantics

Overview

Features

Quick Start

Prerequisites

Installation

Setup

Usage

Basic Relationship Extraction

Evaluation

Ontology Integration

Project Structure

Experimental Results

Supported Ontologies

Configuration

LLM Models

Ontology Sources

Development

Running Tests

Adding New Ontologies

Contributing

Future Work

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OntoSemantics

Overview

Features

Quick Start

Prerequisites

Installation

Setup

Usage

Basic Relationship Extraction

Evaluation

Ontology Integration

Project Structure

Experimental Results

Supported Ontologies

Configuration

LLM Models

Ontology Sources

Development

Running Tests

Adding New Ontologies

Contributing

Future Work

License

Acknowledgments

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages