feat: Add Markdown parser and CI validation workflow#101
Draft
edonadei wants to merge 2 commits intosafe-agentic-framework:mainfrom
Draft
feat: Add Markdown parser and CI validation workflow#101edonadei wants to merge 2 commits intosafe-agentic-framework:mainfrom
edonadei wants to merge 2 commits intosafe-agentic-framework:mainfrom
Conversation
Add comprehensive JSON Schema (Draft 7) definitions to establish the data structure contract for SAFE-MCP techniques, mitigations, and tactics. This enables: - Automated tooling integration - Programmatic data access - Validation and consistency checking - Type-safe development Schemas added: - schemas/technique-schema.json (557 lines) Covers attack techniques with metadata, impact assessment, detection methods, mitigations, and MITRE ATT&CK mappings - schemas/mitigation-schema.json (399 lines) Covers security controls with implementation details, deployment considerations, and effectiveness ratings - schemas/tactic-schema.json (45 lines) Covers MITRE ATT&CK-aligned tactics Key features: - Required fields enforce core metadata presence - Enum values provide controlled vocabularies - Pattern matching validates ID formats (SAFE-T####, SAFE-M-#) - Extensible design allows future additions Related: safe-agentic-framework#48 Next PRs will add: - Parser tooling (markdown → JSON) - CI automation via GitHub Actions - TOON format for LLM optimization - Documentation and integration guides Signed-off-by: Emrick Donadei <emrick.donadei@gmail.com>
Add comprehensive tooling to parse markdown files into structured JSON format and automate generation via GitHub Actions. Parser Features: - scripts/parse_markdown.py (872 lines) Parses all techniques and mitigations from markdown to JSON Extracts complete data including metadata, descriptions, attack vectors, impact assessments, detection methods, mitigations, and version history - scripts/validate_schema.py (107 lines) Validates generated JSON against schema definitions Provides detailed error reporting for data quality issues CI/CD Automation: - .github/workflows/generate-schema.yml Triggers on markdown changes, script updates, or manual dispatch Generates JSON index from markdown sources Validates against schemas (non-blocking) Uploads artifacts for PRs Publishes to GitHub Releases on main branch push Posts statistics as PR comments Key Features: - Markdown remains source of truth - Generated JSON excluded from git (recreated by CI) - Data distributed via GitHub Releases (tag: latest-data) - Comprehensive parsing with error handling - Support for Sigma detection rules, test files, and references Generated data structure: - 14 tactics parsed from README - 31 techniques with full metadata - 35 mitigations with implementation details - ~1.3 MB JSON output Updated .gitignore to exclude generated files while preserving directory structure with data/.gitkeep Related: safe-agentic-framework#48 Next PR will add: - TOON format generator for LLM optimization Signed-off-by: Emrick Donadei <emrick.donadei@gmail.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR operationalizes the JSON schemas established in PR #<NUMBER_OF_FIRST_PR> by introducing a Python-based Markdown parser and an automated GitHub Actions CI workflow.
The core script (
scripts/schema_parser.py) reads the YAML frontmatter from all technique and mitigation Markdown files, validates their structure against the corresponding JSON schemas, and (on success) generates completetechniques.jsonandmitigations.jsonindex files.The CI workflow (
.github/workflows/main.yml) runs this parser on every pull request tomain, ensuring that all contributed content is valid before it can be merged.This is Part 2 of a multi-PR initiative to address issue #48.
Type of Contribution
What's Included
scripts/schema_parser.py.mdfiles in thetechniques/andmitigations/directories.output/techniques.jsonandoutput/mitigations.json..github/workflows/main.ymlpushandpull_requestevents to themainbranch.requirements.txt.schema_parser.pyscript to validate all content.requirements.txtjsonschema(for validation) andPyYAML(for parsing frontmatter).Key Features
Benefits
This PR provides the critical feedback loop for the data contract. It:
Multi-PR Roadmap
Checklist
git commit -s)main.requirements.txtis complete.Related Issues
Related: #48 - Create JSON/YAML index of SAFE-MCP techniques