Skip to content

Conversation

@tubajavedd
Copy link

@tubajavedd tubajavedd commented Feb 8, 2026

Summary

Adds a minimal local Q&A CLI MVP that allows users to query markdown notes.

What’s included

  • CLI tool to load markdown notes from a local notes/ directory
  • Basic keyword-based search for relevant content
  • README documenting usage and limitations
  • .gitignore to exclude local test notes

Motivation

This MVP demonstrates the foundation for a future RAG-based system using embeddings and vector search.

Limitations / Future work

  • Current search is exact-match only
  • Can be extended with stopword removal and embeddings

Summary by CodeRabbit

  • New Features

    • Added a local Q&A CLI tool that loads markdown notes from disk and enables searching with natural language queries, returning relevant content for each match
  • Documentation

    • Added detailed README with Smart Notes application overview, feature descriptions, current implementation limitations, and interactive CLI usage examples
  • Chores

    • Updated .gitignore configuration

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a Smart Notes local Q&A (RAG MVP) application. It adds configuration to ignore a notes directory, provides documentation describing the MVP features and workflow, and implements a Python CLI tool that loads markdown notes, tokenizes them into sentences, and searches for sentences matching user queries by filtering out common question words.

Changes

Cohort / File(s) Summary
Configuration
.gitignore
Adds notes/ directory to Git ignore list.
Documentation
smart-notes/rag_mvp/README.md
New README documenting the RAG MVP application, including features (local markdown loading, natural-language Q&A, sentence-level answers), limitations, workflow details, and CLI usage instructions with example interactions.
Implementation
smart-notes/rag_mvp/qa_cli.py
New Python CLI tool providing Q&A functionality over local markdown notes. Exports constants QUESTION_WORDS and NOTES_DIR, and functions load_notes(), split_sentences(), and search_notes(). Loads markdown files from notes directory, splits content into sentences, filters query words to exclude question words, and returns matching sentences with source filenames. Includes interactive loop with exit condition.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as qa_cli.py
    participant FS as File System
    participant Logic as Search Logic
    
    User->>CLI: Run application
    CLI->>FS: Load notes from notes/
    FS-->>CLI: Return markdown files & content
    CLI->>Logic: split_sentences() on each note
    Logic-->>CLI: Sentences per note
    User->>CLI: Enter query
    alt Query is "exit"
        CLI->>User: Terminate
    else Query is valid
        CLI->>Logic: search_notes(query, notes)
        Logic->>Logic: Filter question words
        Logic->>Logic: Find matching sentences
        Logic-->>CLI: Matching results
        CLI->>User: Display results
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A bundle of notes, now searchable and bright,
Questions are answered with sentence-light!
No fancy embeddings, just words plain and clear,
The RAG MVP's magic is finally here!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add local Q&A CLI MVP for markdown notes' directly and clearly describes the main change: introducing a new Q&A command-line interface tool for querying markdown notes locally.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@smart-notes/rag_mvp/qa_cli.py`:
- Line 10: The NOTES_DIR constant is currently a relative path (NOTES_DIR) which
breaks when the script is run from a different CWD; change NOTES_DIR to be
computed relative to the script file by using the script's directory (via
__file__ and os.path.abspath/os.path.dirname) and joining the repository's notes
directory (e.g., two levels up then "notes") with os.path.join so the path
resolves regardless of working directory, or alternatively add a clear comment
documenting the required working directory if you intentionally keep a relative
path.
- Around line 34-53: The search_notes function currently does substring matching
using "word in sentence_lower" which yields false positives (e.g., "ai" matching
"said"); update the matching to use whole-word checks instead: for each sentence
from split_sentences(note["content"]) normalize/tokenize it into words (or use a
regex with word boundaries) and test membership against query_words (and respect
QUESTION_WORDS filtering already applied). Modify the inner loop where
sentence_lower is used and replace the substring check with either a compiled
word-boundary regex or a set-based word membership test so results.append still
uses note["filename"] and sentence.strip().

In `@smart-notes/rag_mvp/README.md`:
- Around line 28-42: Close the opening ```bash fence immediately after the run
command (python smart-notes/rag_mvp/qa_cli.py) and move the interactive example
into its own fenced block (e.g., ```text) so prompts and outputs are separated
from the shell instruction; in that example block ensure every user prompt is
prefixed with ">>" and the outputs are plain text lines (add missing ">>"
prefixes to the lines currently at the end of the file and format outputs like
"[1] From test.md: ..." on separate lines) to match the suggested "Example
session" structure.
🧹 Nitpick comments (3)
smart-notes/rag_mvp/qa_cli.py (3)

4-8: Consider naming this STOP_WORDS for clarity.

The set contains not just question words but also articles, prepositions, and auxiliary verbs. STOP_WORDS is the standard NLP term for this kind of filter list and would better describe its purpose.


56-77: Unhandled EOFError / KeyboardInterrupt on input().

If stdin is piped or the user presses Ctrl+C / Ctrl+D, the CLI will crash with a traceback. Wrap the loop with basic exception handling:

Proposed fix
-    while True:
-        query = input(">> ").strip()
+    try:
+      while True:
+        try:
+            query = input(">> ").strip()
+        except EOFError:
+            print("\nGoodbye 👋")
+            break
 
         if query.lower() == "exit":
             print("Goodbye 👋")
             break
 
         matches = search_notes(query, notes)
 
         if not matches:
             print("No relevant notes found.\n")
         else:
             print("\n--- Answers ---\n")
             for i, m in enumerate(matches, 1):
                 print(f"[{i}] From {m['filename']}:")
                 print(m["sentence"])
                 print()
+    except KeyboardInterrupt:
+        print("\nGoodbye 👋")

30-31: Sentence splitting doesn't handle markdown structure.

re.split(r'(?<=[.!?])\s+', text) only splits on sentence-ending punctuation followed by whitespace. Markdown headings (# Heading), bullet points (- item), and blank-line-separated paragraphs will be lumped into single "sentences", producing noisy results. Consider also splitting on newlines:

def split_sentences(text):
    lines = text.splitlines()
    sentences = []
    for line in lines:
        line = line.strip()
        if line:
            sentences.extend(re.split(r'(?<=[.!?])\s+', line))
    return sentences

"of", "to", "in", "on", "for"
}

NOTES_DIR = "notes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

NOTES_DIR is relative to CWD, not to the script location.

If a user runs the script from any directory other than the repository root, the notes/ path won't resolve correctly. Consider deriving the path relative to the script file:

NOTES_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "notes")

Or at minimum, document the expected working directory clearly.

🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` at line 10, The NOTES_DIR constant is
currently a relative path (NOTES_DIR) which breaks when the script is run from a
different CWD; change NOTES_DIR to be computed relative to the script file by
using the script's directory (via __file__ and os.path.abspath/os.path.dirname)
and joining the repository's notes directory (e.g., two levels up then "notes")
with os.path.join so the path resolves regardless of working directory, or
alternatively add a clear comment documenting the required working directory if
you intentionally keep a relative path.

Comment on lines +34 to +53
def search_notes(query, notes):
results = []

query_words = [
word.lower()
for word in query.split()
if word.lower() not in QUESTION_WORDS
]

for note in notes:
sentences = split_sentences(note["content"])
for sentence in sentences:
sentence_lower = sentence.lower()
if any(word in sentence_lower for word in query_words):
results.append({
"filename": note["filename"],
"sentence": sentence.strip()
})

return results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Substring matching causes false positives — use word-boundary matching.

word in sentence_lower (line 47) performs a substring check, not a whole-word check. For example, the query "what is AI" filters to query_words = ["ai"], which then matches sentences containing "said", "explain", "brain", "aim", etc.

Use a regex word-boundary check or split the sentence into words and check set membership.

🐛 Proposed fix using word boundaries
 def search_notes(query, notes):
     results = []
 
     query_words = [
         word.lower()
         for word in query.split()
         if word.lower() not in QUESTION_WORDS
     ]
 
+    if not query_words:
+        return results
+
     for note in notes:
         sentences = split_sentences(note["content"])
         for sentence in sentences:
             sentence_lower = sentence.lower()
-            if any(word in sentence_lower for word in query_words):
+            if any(re.search(r'\b' + re.escape(word) + r'\b', sentence_lower) for word in query_words):
                 results.append({
                     "filename": note["filename"],
                     "sentence": sentence.strip()
                 })
 
     return results
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/qa_cli.py` around lines 34 - 53, The search_notes
function currently does substring matching using "word in sentence_lower" which
yields false positives (e.g., "ai" matching "said"); update the matching to use
whole-word checks instead: for each sentence from
split_sentences(note["content"]) normalize/tokenize it into words (or use a
regex with word boundaries) and test membership against query_words (and respect
QUESTION_WORDS filtering already applied). Modify the inner loop where
sentence_lower is used and replace the substring check with either a compiled
word-boundary regex or a set-based word membership test so results.append still
uses note["filename"] and sentence.strip().

Comment on lines +28 to +42
```bash
python smart-notes/rag_mvp/qa_cli.py



>> what is AI

[1] From test.md:
Artificial Intelligence (AI) is the simulation of human intelligence in machines.


>> what is machine learning
how is machine learning used
difference between AI and ML

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The "How to run" code block is malformed and the example is confusing.

The ```bash block opened at line 28 is never closed—the remaining lines (example prompts, outputs, and follow-up queries) all run together inside it. Lines 39–41 also lack the >> prompt prefix, making it unclear whether they are user input or program output.

Consider closing the bash block after the run command and using a separate block for the example session:

📝 Suggested fix
 ## How to run
 
 ```bash
 python smart-notes/rag_mvp/qa_cli.py
+```
 
+### Example session
 
-
->> what is AI
-
-[1] From test.md:
-Artificial Intelligence (AI) is the simulation of human intelligence in machines.
-
-
->>  what is machine learning
-how is machine learning used
-difference between AI and ML
+```text
+>> what is AI
+[1] From test.md:
+Artificial Intelligence (AI) is the simulation of human intelligence in machines.
+
+>> what is machine learning
+[1] From test.md:
+Machine learning is a subset of AI.
+```
🤖 Prompt for AI Agents
In `@smart-notes/rag_mvp/README.md` around lines 28 - 42, Close the opening
```bash fence immediately after the run command (python
smart-notes/rag_mvp/qa_cli.py) and move the interactive example into its own
fenced block (e.g., ```text) so prompts and outputs are separated from the shell
instruction; in that example block ensure every user prompt is prefixed with
">>" and the outputs are plain text lines (add missing ">>" prefixes to the
lines currently at the end of the file and format outputs like "[1] From
test.md: ..." on separate lines) to match the suggested "Example session"
structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant