Pluggable agent SDK to talk to Llama Stack using different agentic frameworks. Use the default backend (Llama Stack Responses API) or a lang-graph backend (stub for future implementation).
From source with uv (recommended):
cd portazgo
uv sync --extra devWith optional LangGraph extra (when that backend is implemented):
uv sync --extra dev --extra langgraphWith pip (from source):
pip install -e .For PyPI (once published):
pip install portazgofrom portazgo import Agent
# Default: Llama Stack Responses API (same as ragas_pipeline / ragas_dataset_generator)
agent = Agent(type="default")
ragas_dataset = agent.generate_ragas_dataset(
base_dataset=base_dataset,
client=llama_stack_client,
model_id="my-model",
vector_store_id=vs_id,
mcp_tools=mcp_tools,
instructions="Optional system prompt",
)Same parameter shape as generate_ragas_dataset, but for one input. The name follows LangChain/LangGraph (agent.invoke(input)):
from portazgo import Agent
agent = Agent(type="default")
result = agent.invoke(
"What is the capital of France?",
client=llama_stack_client,
model_id="my-model",
vector_store_id=vs_id,
mcp_tools=[], # or list of MCP tool configs
instructions="You are a helpful assistant.",
)
# result["answer"] -> str
# result["contexts"] -> list[str] (retrieved chunks + non–file_search tool responses)
# result["tool_calls"] -> list[dict]Pass messages so the model sees previous turns. Each message is {"role": "user"|"assistant"|"system", "content": str}:
history = [
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Nice to meet you, Alice!"},
]
result = agent.invoke(
"What's my name?",
client=client,
model_id=model_id,
vector_store_id=vs_id,
mcp_tools=[],
messages=history,
)
# result["answer"] can refer to the conversation (e.g. "Your name is Alice.")For real-time display (e.g. Streamlit), use invoke_stream. It yields events: content_delta (chunk of text) then done (final answer + contexts + tool_calls). If the backend does not support token-level streaming, the full answer is sent as one delta then done.
for event in agent.invoke_stream(
"Explain RAG in one sentence.",
client=client,
model_id=model_id,
vector_store_id=vs_id,
mcp_tools=[],
messages=st.session_state.messages, # optional history
):
if event["type"] == "content_delta":
print(event["delta"], end="", flush=True)
elif event["type"] == "done":
answer, contexts, tool_calls = event["answer"], event["contexts"], event["tool_calls"]import streamlit as st
from portazgo import Agent
# Init session state
if "messages" not in st.session_state:
st.session_state.messages = []
agent = Agent(type="default")
# client, model_id, vector_store_id from your config (e.g. sidebar)
# Display history
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.markdown(msg["content"])
if prompt := st.chat_input("Your message"):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
placeholder = st.empty()
full = ""
for event in agent.invoke_stream(
prompt,
client=client,
model_id=model_id,
vector_store_id=vector_store_id,
mcp_tools=[],
messages=st.session_state.messages[:-1], # history (exclude current)
):
if event["type"] == "content_delta":
full += event["delta"]
placeholder.markdown(full + "▌")
placeholder.markdown(full)
st.session_state.messages.append({"role": "assistant", "content": full})# LangGraph backend (not yet implemented; will raise NotImplementedError)
agent = Agent(type="lang-graph")
# agent.invoke(...) # NotImplementedErrorThe library also exposes helpers used by the default backend, useful for custom pipelines:
from portazgo import strip_think_blocks, serialize_for_json, extract_tool_callsstrip_think_blocks(text)– remove<think>...</think>blocks from model output.serialize_for_json(val)– convert objects to JSON-serializable form.extract_tool_calls(response)– extract tool calls from a Llama Stack response.
Option 1: Unit tests (no Llama Stack server)
Runs invoke against a mock client so you can confirm the API shape:
cd portazgo
uv run pytest tests/test_agent.py -v -k invokeOption 2: Real invoke against Llama Stack
Use the example script (requires a running Llama Stack and a vector store):
cd portazgo
export LLAMA_STACK_HOST=localhost
export LLAMA_STACK_PORT=8080
# optional: AGENT_VECTOR_STORE_NAME=rag-store, AGENT_MODEL_ID="your/model"
uv run python examples/simple_invoke.py "What is 2+2?"You can pass any question as arguments; default is "What is 2+2?".
Option 3: OpenShift (oc)
If Llama Stack is exposed on OpenShift, use the helper script to get APPS_DOMAIN and run the example:
cd portazgo
./scripts/run_invoke_oc.sh "What is 2+2?"The script sources .env (for PROJECT, etc.), runs oc get ingresses.config.openshift.io cluster for the apps domain, sets LLAMA_STACK_HOST to llama-stack-demo-route-${PROJECT}.${APPS_DOMAIN}, then runs the example with any arguments you pass.
Uses uv for the venv and running tools. From the portazgo directory:
- Create venv and install deps:
make install-dev(oruv sync --extra dev) - Lock dependencies:
make lock(oruv lock) - Lint:
make lint(ruff viauv run) - Format:
make format - Tests:
make test(oruv run pytest tests) - Coverage:
make coverage - Build:
make build(oruv run python -m build)
Apache-2.0. See LICENSE.