-
Notifications
You must be signed in to change notification settings - Fork 45
Add local Q&A CLI MVP for markdown notes #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a Smart Notes local Q&A (RAG MVP) application. It adds configuration to ignore a notes directory, provides documentation describing the MVP features and workflow, and implements a Python CLI tool that loads markdown notes, tokenizes them into sentences, and searches for sentences matching user queries by filtering out common question words. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as qa_cli.py
participant FS as File System
participant Logic as Search Logic
User->>CLI: Run application
CLI->>FS: Load notes from notes/
FS-->>CLI: Return markdown files & content
CLI->>Logic: split_sentences() on each note
Logic-->>CLI: Sentences per note
User->>CLI: Enter query
alt Query is "exit"
CLI->>User: Terminate
else Query is valid
CLI->>Logic: search_notes(query, notes)
Logic->>Logic: Filter question words
Logic->>Logic: Find matching sentences
Logic-->>CLI: Matching results
CLI->>User: Display results
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@smart-notes/rag_mvp/qa_cli.py`:
- Line 10: The NOTES_DIR constant is currently a relative path (NOTES_DIR) which
breaks when the script is run from a different CWD; change NOTES_DIR to be
computed relative to the script file by using the script's directory (via
__file__ and os.path.abspath/os.path.dirname) and joining the repository's notes
directory (e.g., two levels up then "notes") with os.path.join so the path
resolves regardless of working directory, or alternatively add a clear comment
documenting the required working directory if you intentionally keep a relative
path.
- Around line 34-53: The search_notes function currently does substring matching
using "word in sentence_lower" which yields false positives (e.g., "ai" matching
"said"); update the matching to use whole-word checks instead: for each sentence
from split_sentences(note["content"]) normalize/tokenize it into words (or use a
regex with word boundaries) and test membership against query_words (and respect
QUESTION_WORDS filtering already applied). Modify the inner loop where
sentence_lower is used and replace the substring check with either a compiled
word-boundary regex or a set-based word membership test so results.append still
uses note["filename"] and sentence.strip().
In `@smart-notes/rag_mvp/README.md`:
- Around line 28-42: Close the opening ```bash fence immediately after the run
command (python smart-notes/rag_mvp/qa_cli.py) and move the interactive example
into its own fenced block (e.g., ```text) so prompts and outputs are separated
from the shell instruction; in that example block ensure every user prompt is
prefixed with ">>" and the outputs are plain text lines (add missing ">>"
prefixes to the lines currently at the end of the file and format outputs like
"[1] From test.md: ..." on separate lines) to match the suggested "Example
session" structure.
🧹 Nitpick comments (3)
smart-notes/rag_mvp/qa_cli.py (3)
4-8: Consider naming thisSTOP_WORDSfor clarity.The set contains not just question words but also articles, prepositions, and auxiliary verbs.
STOP_WORDSis the standard NLP term for this kind of filter list and would better describe its purpose.
56-77: UnhandledEOFError/KeyboardInterruptoninput().If stdin is piped or the user presses Ctrl+C / Ctrl+D, the CLI will crash with a traceback. Wrap the loop with basic exception handling:
Proposed fix
- while True: - query = input(">> ").strip() + try: + while True: + try: + query = input(">> ").strip() + except EOFError: + print("\nGoodbye 👋") + break if query.lower() == "exit": print("Goodbye 👋") break matches = search_notes(query, notes) if not matches: print("No relevant notes found.\n") else: print("\n--- Answers ---\n") for i, m in enumerate(matches, 1): print(f"[{i}] From {m['filename']}:") print(m["sentence"]) print() + except KeyboardInterrupt: + print("\nGoodbye 👋")
30-31: Sentence splitting doesn't handle markdown structure.
re.split(r'(?<=[.!?])\s+', text)only splits on sentence-ending punctuation followed by whitespace. Markdown headings (# Heading), bullet points (- item), and blank-line-separated paragraphs will be lumped into single "sentences", producing noisy results. Consider also splitting on newlines:def split_sentences(text): lines = text.splitlines() sentences = [] for line in lines: line = line.strip() if line: sentences.extend(re.split(r'(?<=[.!?])\s+', line)) return sentences
| "of", "to", "in", "on", "for" | ||
| } | ||
|
|
||
| NOTES_DIR = "notes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTES_DIR is relative to CWD, not to the script location.
If a user runs the script from any directory other than the repository root, the notes/ path won't resolve correctly. Consider deriving the path relative to the script file:
NOTES_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "notes")Or at minimum, document the expected working directory clearly.
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` at line 10, The NOTES_DIR constant is
currently a relative path (NOTES_DIR) which breaks when the script is run from a
different CWD; change NOTES_DIR to be computed relative to the script file by
using the script's directory (via __file__ and os.path.abspath/os.path.dirname)
and joining the repository's notes directory (e.g., two levels up then "notes")
with os.path.join so the path resolves regardless of working directory, or
alternatively add a clear comment documenting the required working directory if
you intentionally keep a relative path.
| def search_notes(query, notes): | ||
| results = [] | ||
|
|
||
| query_words = [ | ||
| word.lower() | ||
| for word in query.split() | ||
| if word.lower() not in QUESTION_WORDS | ||
| ] | ||
|
|
||
| for note in notes: | ||
| sentences = split_sentences(note["content"]) | ||
| for sentence in sentences: | ||
| sentence_lower = sentence.lower() | ||
| if any(word in sentence_lower for word in query_words): | ||
| results.append({ | ||
| "filename": note["filename"], | ||
| "sentence": sentence.strip() | ||
| }) | ||
|
|
||
| return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substring matching causes false positives — use word-boundary matching.
word in sentence_lower (line 47) performs a substring check, not a whole-word check. For example, the query "what is AI" filters to query_words = ["ai"], which then matches sentences containing "said", "explain", "brain", "aim", etc.
Use a regex word-boundary check or split the sentence into words and check set membership.
🐛 Proposed fix using word boundaries
def search_notes(query, notes):
results = []
query_words = [
word.lower()
for word in query.split()
if word.lower() not in QUESTION_WORDS
]
+ if not query_words:
+ return results
+
for note in notes:
sentences = split_sentences(note["content"])
for sentence in sentences:
sentence_lower = sentence.lower()
- if any(word in sentence_lower for word in query_words):
+ if any(re.search(r'\b' + re.escape(word) + r'\b', sentence_lower) for word in query_words):
results.append({
"filename": note["filename"],
"sentence": sentence.strip()
})
return results🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` around lines 34 - 53, The search_notes
function currently does substring matching using "word in sentence_lower" which
yields false positives (e.g., "ai" matching "said"); update the matching to use
whole-word checks instead: for each sentence from
split_sentences(note["content"]) normalize/tokenize it into words (or use a
regex with word boundaries) and test membership against query_words (and respect
QUESTION_WORDS filtering already applied). Modify the inner loop where
sentence_lower is used and replace the substring check with either a compiled
word-boundary regex or a set-based word membership test so results.append still
uses note["filename"] and sentence.strip().
| ```bash | ||
| python smart-notes/rag_mvp/qa_cli.py | ||
|
|
||
|
|
||
|
|
||
| >> what is AI | ||
|
|
||
| [1] From test.md: | ||
| Artificial Intelligence (AI) is the simulation of human intelligence in machines. | ||
|
|
||
|
|
||
| >> what is machine learning | ||
| how is machine learning used | ||
| difference between AI and ML | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "How to run" code block is malformed and the example is confusing.
The ```bash block opened at line 28 is never closed—the remaining lines (example prompts, outputs, and follow-up queries) all run together inside it. Lines 39–41 also lack the >> prompt prefix, making it unclear whether they are user input or program output.
Consider closing the bash block after the run command and using a separate block for the example session:
📝 Suggested fix
## How to run
```bash
python smart-notes/rag_mvp/qa_cli.py
+```
+### Example session
-
->> what is AI
-
-[1] From test.md:
-Artificial Intelligence (AI) is the simulation of human intelligence in machines.
-
-
->> what is machine learning
-how is machine learning used
-difference between AI and ML
+```text
+>> what is AI
+[1] From test.md:
+Artificial Intelligence (AI) is the simulation of human intelligence in machines.
+
+>> what is machine learning
+[1] From test.md:
+Machine learning is a subset of AI.
+```🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/README.md` around lines 28 - 42, Close the opening
```bash fence immediately after the run command (python
smart-notes/rag_mvp/qa_cli.py) and move the interactive example into its own
fenced block (e.g., ```text) so prompts and outputs are separated from the shell
instruction; in that example block ensure every user prompt is prefixed with
">>" and the outputs are plain text lines (add missing ">>" prefixes to the
lines currently at the end of the file and format outputs like "[1] From
test.md: ..." on separate lines) to match the suggested "Example session"
structure.
Summary
Adds a minimal local Q&A CLI MVP that allows users to query markdown notes.
What’s included
notes/directory.gitignoreto exclude local test notesMotivation
This MVP demonstrates the foundation for a future RAG-based system using embeddings and vector search.
Limitations / Future work
Summary by CodeRabbit
New Features
Documentation
Chores