Skip to content

Conversation

@davidwtf
Copy link
Contributor

@davidwtf davidwtf commented Jan 15, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive guide for creating AI agents with LlamaStack, covering setup, prerequisites, quickstart, and resources.
    • Added an example stack configuration demonstrating providers, storage backends, and model registration for LlamaStack deployments.
    • Added an interactive Quickstart notebook with end-to-end examples: tool definition, agent creation, session handling, streaming responses, and FastAPI deployment.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds new Llama Stack documentation, a deployment configuration, and a Jupyter quickstart notebook that demonstrate creating an AI agent, defining tools, connecting clients, running an agent, and exposing a FastAPI chat endpoint for queries.

Changes

Cohort / File(s) Summary
Documentation
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
New guide with Overview, Prereqs, Quickstart, and Additional Resources describing server setup, notebook usage, tool creation, client connection, model/agent configuration, session management, streaming, and result handling.
Stack configuration
docs/public/llama-stack/llama_stack_config.yaml
New YAML stack config (version: 2) declaring APIs (inference, agents, safety, tool_runtime, vector_io, files), providers (remote OpenAI/DeepSeek, inline backends), persistence backends, and a model entry (deepseek/deepseek-chat). Review env var fallbacks and persistence paths.
Notebook / Examples
docs/public/llama-stack/llama_stack_quickstart.ipynb
New comprehensive notebook showing server startup, dependency install, @client_tool weather tool (get_weather), Agent creation, session streaming via AgentEventLogger, and a FastAPI ChatRequest + chat endpoint exposing agent responses. Check executable cells, example creds/URLs, and background FastAPI usage.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant FastAPI
  participant Agent
  participant Tool
  participant Model

  User->>FastAPI: POST /chat {message}
  FastAPI->>Agent: enqueue message / start session
  Agent->>Model: generate response (streaming)
  Agent->>Tool: call get_weather(...) (if needed)
  Tool-->>Agent: tool result
  Model-->>Agent: streaming tokens
  Agent-->>FastAPI: stream aggregated response
  FastAPI-->>User: stream final/partial responses
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • zhaomingkun1030
  • sinbadonline

Poem

🐰 I hopped through docs and YAML lines,

Tools and agents, neat designs,
A notebook sparkles, FastAPI sings,
DeepSeek winds up curious things,
I nibble bugs and bless these springs 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding a Llama Stack quickstart guide and notebook demo. It is concise, specific, and clearly reflects the primary additions in the PR (documentation file, configuration file, and Jupyter notebook).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


🧹 Recent nitpick comments
docs/public/llama-stack/llama_stack_quickstart.ipynb (1)

326-344: Synchronous blocking calls inside async endpoint.

The agent.create_turn() and the synchronous iteration over logger.log(response_stream) will block the async event loop. For this demo, it's acceptable since the notebook context is single-threaded, but consider adding a note for production users.

💡 Suggestion for production documentation

Add a comment noting this limitation:

 `@api_app.post`("/chat")
 async def chat(request: ChatRequest):
-    """Chat endpoint that uses the Llama Stack Agent"""
+    """Chat endpoint that uses the Llama Stack Agent.
+    
+    Note: This demo uses synchronous agent calls. For production with high
+    concurrency, consider using run_in_executor or async-compatible APIs.
+    """

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f51adf9 and 7f42896.

📒 Files selected for processing (1)
  • docs/public/llama-stack/llama_stack_quickstart.ipynb
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-30T08:43:22.275Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 95
File: docs/public/langchain/langchain_quickstart.ipynb:163-171
Timestamp: 2025-12-30T08:43:22.275Z
Learning: When reviewing LangChain notebooks, validate the agent creation pattern: ensure code uses create_agent(model=..., tools=[...], system_prompt="...") with model being a chat model instance (e.g., ChatOpenAI) and system_prompt provided as a string describing system instructions. This reflects a standard agent construction and can be applied across LangChain notebook files.

Applied to files:

  • docs/public/llama-stack/llama_stack_quickstart.ipynb
🪛 Ruff (0.14.11)
docs/public/llama-stack/llama_stack_quickstart.ipynb

42-42: Do not catch blind exception: Exception

(BLE001)


43-43: Use explicit conversion flag

Replace with conversion flag

(RUF010)


59-59: Abstract raise to an inner function

(TRY301)


59-59: Create your own exception

(TRY002)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


73-73: Abstract raise to an inner function

(TRY301)


73-73: Create your own exception

(TRY002)


73-73: Avoid specifying long messages outside the exception class

(TRY003)


81-81: Use raise without specifying exception name

Remove exception name

(TRY201)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (5)
docs/public/llama-stack/llama_stack_quickstart.ipynb (5)

85-106: LGTM!

The sys.path manipulation for custom package location and imports are appropriate for a notebook demonstration environment.


164-210: LGTM! Exception re-raise properly guards agent creation.

The raise e at line 197 ensures the notebook execution stops if model listing fails, preventing the NameError concern from the past review. The agent creation pattern with model, instructions, and tools is correct.


226-294: LGTM!

The agent execution pattern with session creation, turn creation, and response streaming via AgentEventLogger is well-structured for demonstration purposes.


365-385: LGTM!

The daemon thread approach is appropriate for notebook demonstrations, and the comments clearly explain that process managers should be used for production deployments.


463-480: LGTM! Kernel name is now appropriate.

The kernel name has been updated from langchain-demo to llama-stack-demo, correctly reflecting the notebook content.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/public/llama-stack/llama_stack_config.yaml`:
- Around line 1-60: The metadata_store block omits an explicit db_path; add a
db_path entry to metadata_store mirroring the pattern used for vector_io and
files so it reads metadata_store: type: sqlite and db_path:
${env.SQLITE_STORE_DIR:~/.llama/distributions/llama-stack-demo}/registry.db
(update the metadata_store section in the YAML to include this db_path key).

In `@docs/public/llama-stack/llama_stack_quickstart.ipynb`:
- Around line 462-467: Update the notebook metadata kernelspec so the kernel
name and display_name reflect the Llama Stack quickstart (e.g., change
kernelspec.name from "langchain-demo" and kernelspec.display_name from "Python
(langchain-demo)" to a clearer identifier like "llama-stack" and "Python (Llama
Stack)" respectively) by editing the kernelspec block in the notebook metadata.
- Around line 122-148: The docstring for get_weather promises wind speed but the
returned dict only contains city, temperature, and humidity; update the function
to include wind speed by extracting it from the parsed API response (e.g.,
current['windspeedKmph'] or current['windspeedMiles'] depending on desired
units) and add a 'wind_speed' key to the returned dictionary, or alternatively
remove the "wind speed" mention from the docstring to make it match the existing
return value.
- Around line 194-208: Agent creation uses model_id which may be undefined if
the model listing try block failed; move the Agent(...) creation (the Agent
instantiation that references model_id, client, get_weather and instructions)
inside the try block that sets model_id or add an early exit/conditional guard
after the except (e.g., return or raise) so Agent(...) is only called when
model_id is successfully set; ensure you reference the same Agent(...) call and
the model_id assignment to relocate or gate the creation.
🧹 Nitpick comments (2)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)

41-44: Consider varying the link descriptions.

All four resource links begin with "Llama Stack", which creates repetition. You could vary the wording:

💡 Suggested rewording
-- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.
-- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management.
-- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers.
-- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios.
+- [Official Documentation](https://llamastack.github.io/docs) - Covers all usage-related topics, API providers, and core concepts.
+- [Core Concepts Guide](https://llamastack.github.io/docs/concepts) - Deep dive into architecture, API stability, and resource management.
+- [GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, and distribution configurations.
+- [Example Applications](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating various use cases.
docs/public/llama-stack/llama_stack_quickstart.ipynb (1)

325-343: Consider session management for the chat endpoint.

The /chat endpoint creates a new session for every request (line 328). For a demo this works, but in production:

  1. Sessions accumulate without cleanup
  2. Conversation context is lost between requests

For a production-ready version, consider reusing sessions or implementing session cleanup:

# Option 1: Single shared session (simple approach)
_session_id = None

`@api_app.post`("/chat")
async def chat(request: ChatRequest):
    global _session_id
    if _session_id is None:
        _session_id = agent.create_session('fastapi-weather-session')
    # ... rest of the code using _session_id
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98fc418 and f51adf9.

📒 Files selected for processing (3)
  • docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
  • docs/public/llama-stack/llama_stack_config.yaml
  • docs/public/llama-stack/llama_stack_quickstart.ipynb
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T11:25:34.596Z
Learnt from: jing2uo
Repo: alauda/knowledge PR: 104
File: docs/en/solutions/How_to_Migrate_VirtualMachine_From_VMware.md:131-172
Timestamp: 2026-01-13T11:25:34.596Z
Learning: In VMware migration documentation (docs/en/solutions), when describing the Forklift Operator workflow for VMware, specify that the VMware provider secret should set insecureSkipVerify=true to accommodate self-signed certificates commonly used in enterprise vCenter/ESXi environments. Include a note on the security trade-offs and ensure readers understand this is for internal, controlled environments.

Applied to files:

  • docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md
📚 Learning: 2025-12-30T08:43:22.275Z
Learnt from: davidwtf
Repo: alauda/knowledge PR: 95
File: docs/public/langchain/langchain_quickstart.ipynb:163-171
Timestamp: 2025-12-30T08:43:22.275Z
Learning: When reviewing LangChain notebooks, validate the agent creation pattern: ensure code uses create_agent(model=..., tools=[...], system_prompt="...") with model being a chat model instance (e.g., ChatOpenAI) and system_prompt provided as a string describing system instructions. This reflects a standard agent construction and can be applied across LangChain notebook files.

Applied to files:

  • docs/public/llama-stack/llama_stack_quickstart.ipynb
🪛 LanguageTool
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md

[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... stability, and resource management. - [Llama Stack GitHub Repository](https://github...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s, and how to add new API providers. - [Llama Stack Example Apps](https://github.com/...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.11)
docs/public/llama-stack/llama_stack_quickstart.ipynb

42-42: Do not catch blind exception: Exception

(BLE001)


43-43: Use explicit conversion flag

Replace with conversion flag

(RUF010)


59-59: Abstract raise to an inner function

(TRY301)


59-59: Create your own exception

(TRY002)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


73-73: Abstract raise to an inner function

(TRY301)


73-73: Create your own exception

(TRY002)


73-73: Avoid specifying long messages outside the exception class

(TRY003)


78-78: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
docs/en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md (1)

1-44: Documentation is clear and comprehensive.

The guide covers all essential aspects: overview, prerequisites, quickstart instructions, and additional resources. The structure is logical and user-friendly.

docs/public/llama-stack/llama_stack_quickstart.ipynb (2)

85-106: LGTM! Import setup is appropriate for notebook environments.

The custom path manipulation to include ~/packages aligns with the --target ~/packages flag used in the pip install cell, which is a valid pattern for restricted notebook environments.


358-384: LGTM! Server startup approach is appropriate for notebook demo.

The daemon thread approach with clear comments about production alternatives is well-documented. Using daemon=True ensures cleanup when the kernel restarts.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 194 to 208
"except Exception as e:\n",
" print(f'Failed to get model list: {e}')\n",
" print('Make sure the server is running')\n",
"\n",
"\n",
"# Create Agent\n",
"print('Creating Agent...')\n",
"agent = Agent(\n",
" client,\n",
" model=model_id,\n",
" instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n",
" tools=[get_weather],\n",
")\n",
"\n",
"print('Agent created successfully')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Agent creation may fail if model listing failed.

The agent creation at lines 199-208 uses model_id which is only defined inside the try block (line 191). If the model listing fails, model_id will be undefined and agent creation will raise a NameError.

🔧 Suggested fix: Move agent creation inside the try block or add early exit
 except Exception as e:
     print(f'Failed to get model list: {e}')
     print('Make sure the server is running')
+    raise  # Re-raise to prevent subsequent cells from failing


 # Create Agent

Or wrap agent creation in a conditional:

+if 'model_id' in dir():
     # Create Agent
     print('Creating Agent...')
     agent = Agent(
         client,
         model=model_id,
         instructions='You are a helpful weather assistant...',
         tools=[get_weather],
     )
     print('Agent created successfully')
+else:
+    print('Skipping agent creation - no model available')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"except Exception as e:\n",
" print(f'Failed to get model list: {e}')\n",
" print('Make sure the server is running')\n",
"\n",
"\n",
"# Create Agent\n",
"print('Creating Agent...')\n",
"agent = Agent(\n",
" client,\n",
" model=model_id,\n",
" instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',\n",
" tools=[get_weather],\n",
")\n",
"\n",
"print('Agent created successfully')"
except Exception as e:
print(f'Failed to get model list: {e}')
print('Make sure the server is running')
raise # Re-raise to prevent subsequent cells from failing
# Create Agent
print('Creating Agent...')
agent = Agent(
client,
model=model_id,
instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',
tools=[get_weather],
)
print('Agent created successfully')
🤖 Prompt for AI Agents
In `@docs/public/llama-stack/llama_stack_quickstart.ipynb` around lines 194 - 208,
Agent creation uses model_id which may be undefined if the model listing try
block failed; move the Agent(...) creation (the Agent instantiation that
references model_id, client, get_weather and instructions) inside the try block
that sets model_id or add an early exit/conditional guard after the except
(e.g., return or raise) so Agent(...) is only called when model_id is
successfully set; ensure you reference the same Agent(...) call and the model_id
assignment to relocate or gate the creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants