Skip to content

feat: add JSON Schema definitions for programmatic access#100

Open
edonadei wants to merge 2 commits intosafe-agentic-framework:mainfrom
edonadei:schema-for-programmatic-consumption
Open

feat: add JSON Schema definitions for programmatic access#100
edonadei wants to merge 2 commits intosafe-agentic-framework:mainfrom
edonadei:schema-for-programmatic-consumption

Conversation

@edonadei
Copy link

@edonadei edonadei commented Nov 4, 2025

Summary

This PR establishes the foundational data structure contract for SAFE-MCP by adding comprehensive JSON Schema (Draft 7) definitions for techniques, mitigations, and tactics. These schemas enable programmatic access, validation, and automated tooling integration.

This is Part 1 of a multi-PR initiative to address issue #48.

Type of Contribution

  • Documentation improvement
  • Infrastructure/Tooling

What's Included

Schema Definitions (~1k lines total)

  1. schemas/technique-schema.json (~500 lines)

    • Comprehensive schema for attack techniques
    • Includes: metadata, attack vectors, impact assessment, detection methods, mitigations, MITRE ATT&CK mappings, version history
    • ID pattern: ^SAFE-T[0-9]{4}(\.[0-9]{3})?$
  2. schemas/mitigation-schema.json (~400 lines)

    • Complete schema for security controls
    • Includes: metadata, implementation details, benefits/limitations, deployment considerations, testing requirements
    • ID pattern: ^SAFE-M-[0-9]+$
  3. schemas/tactic-schema.json (~50 lines)

    • Schema for MITRE ATT&CK-aligned tactics
    • ID pattern: ^ATK-TA[0-9]{4}$

Key Features

  • Required fields enforce core metadata presence
  • Enum values provide controlled vocabularies for consistency
  • Pattern matching validates ID formats
  • Extensible design allows future additions without breaking changes
  • JSON Schema Draft 7 standard compliance

Benefits

This establishes a contract that enables:

  • Automated tooling integration (dashboards, SIEM systems)
  • Programmatic data access via scripts and APIs
  • Data validation and consistency checking
  • Type-safe development for integrations

Multi-PR Roadmap

Checklist

  • DCO sign-off included in commits (git commit -s)
  • Schemas follow JSON Schema Draft 7 specification
  • ID patterns match existing SAFE-MCP naming conventions
  • All required fields identified from markdown templates
  • Extensibility considered for future enhancements

Related Issues

Related: #48 - Create JSON/YAML index of SAFE-MCP techniques

Testing

The schemas can be validated against tools like:

# Using online validators
https://www.jsonschemavalidator.net/

# Or CLI tools (will be automated in PR #2)
pip install jsonschema
python -c "import jsonschema; print('Schemas are valid JSON Schema Draft 7')"

Add comprehensive JSON Schema (Draft 7) definitions to establish the data
structure contract for SAFE-MCP techniques, mitigations, and tactics.

This enables:
- Automated tooling integration
- Programmatic data access
- Validation and consistency checking
- Type-safe development

Schemas added:
- schemas/technique-schema.json (557 lines)
  Covers attack techniques with metadata, impact assessment, detection
  methods, mitigations, and MITRE ATT&CK mappings

- schemas/mitigation-schema.json (399 lines)
  Covers security controls with implementation details, deployment
  considerations, and effectiveness ratings

- schemas/tactic-schema.json (45 lines)
  Covers MITRE ATT&CK-aligned tactics

Key features:
- Required fields enforce core metadata presence
- Enum values provide controlled vocabularies
- Pattern matching validates ID formats (SAFE-T####, SAFE-M-#)
- Extensible design allows future additions

Related: safe-agentic-framework#48

Next PRs will add:
- Parser tooling (markdown → JSON)
- CI automation via GitHub Actions
- TOON format for LLM optimization
- Documentation and integration guides

Signed-off-by: Emrick Donadei <emrick.donadei@gmail.com>
@edonadei
Copy link
Author

edonadei commented Nov 4, 2025

@fkautz @bochristopher PTAL, as discussed during the last safe-mcp community meeting, I decided to give it a try. The big idea would be:

Markdown as source of truth --> JSON schema --> "compiled" in TOON format for efficient LLM retrieval

I've gone that direction because it seems like contributors enjoy the ability to work with Markdown and being opened to contributing in this format. It's a suggestion though, and I'm open to other ideas, we could actually do the opposite, where the contributors would contribute in JSON (or YAML).

JSON as source of Truth --> JSON schema --> Markdown compiled
YAML as source of Truth --> JSON schema --> Markdown compiled

@edonadei edonadei marked this pull request as ready for review November 4, 2025 04:13
Signed-off-by: Emrick Donadei <emrick.donadei@gmail.com>
@edonadei edonadei force-pushed the schema-for-programmatic-consumption branch from 5e11908 to d1ed1c9 Compare November 4, 2025 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant