Skip to content

Add SAFE-T1603 System-Prompt Disclosure#170

Open
VikranthKumar wants to merge 1 commit intosafe-agentic-framework:mainfrom
VikranthKumar:technique-safe-t1603
Open

Add SAFE-T1603 System-Prompt Disclosure#170
VikranthKumar wants to merge 1 commit intosafe-agentic-framework:mainfrom
VikranthKumar:technique-safe-t1603

Conversation

@VikranthKumar
Copy link
Contributor

Summary

This PR adds SAFE-T1603: System-Prompt Disclosure to the SAFE-MCP framework, documenting how attackers can coerce an MCP-enabled assistant to leak hidden system/developer instructions and/or tool registry/schema details (e.g., inputSchema). This disclosure enables more reliable follow-on tool abuse and guardrail bypass.

Type of Contribution

  • New Technique

Checklist

Related Issues

New Technique

Signed-off-by: Vikranth Kumar Shivaa <srisnevisa@gmail.com>
)


class SystemPromptDisclosureDetector:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested these on a real MCP connected to an LLM?
Since the MCP passes its information and details to the LLM to decide which tool to call, I think you need to update to more robust detection techniques.

Relying on LLM calls and regex matching is not a reliable or perfect approach for this.

Kindly update test_detection_rule.py and test-logs.json.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rockerritesh can you can contribute on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants