Add local Q&A CLI MVP for markdown notes #36

tubajavedd · 2026-02-08T18:51:36Z

Summary

Adds a minimal local Q&A CLI MVP that allows users to query markdown notes.

What’s included

CLI tool to load markdown notes from a local notes/ directory
Basic keyword-based search for relevant content
README documenting usage and limitations
.gitignore to exclude local test notes

Motivation

This MVP demonstrates the foundation for a future RAG-based system using embeddings and vector search.

Limitations / Future work

Current search is exact-match only
Can be extended with stopword removal and embeddings

Summary by CodeRabbit

New Features
- Added a local Q&A CLI tool that loads markdown notes from disk and enables searching with natural language queries, returning relevant content for each match
Documentation
- Added detailed README with Smart Notes application overview, feature descriptions, current implementation limitations, and interactive CLI usage examples
Chores
- Updated .gitignore configuration

coderabbitai · 2026-02-08T18:51:53Z

📝 Walkthrough

Walkthrough

This pull request introduces a Smart Notes local Q&A (RAG MVP) application. It adds configuration to ignore a notes directory, provides documentation describing the MVP features and workflow, and implements a Python CLI tool that loads markdown notes, tokenizes them into sentences, and searches for sentences matching user queries by filtering out common question words.

Changes

Cohort / File(s)	Summary
Configuration `.gitignore`	Adds `notes/` directory to Git ignore list.
Documentation `smart-notes/rag_mvp/README.md`	New README documenting the RAG MVP application, including features (local markdown loading, natural-language Q&A, sentence-level answers), limitations, workflow details, and CLI usage instructions with example interactions.
Implementation `smart-notes/rag_mvp/qa_cli.py`	New Python CLI tool providing Q&A functionality over local markdown notes. Exports constants `QUESTION_WORDS` and `NOTES_DIR`, and functions `load_notes()`, `split_sentences()`, and `search_notes()`. Loads markdown files from notes directory, splits content into sentences, filters query words to exclude question words, and returns matching sentences with source filenames. Includes interactive loop with exit condition.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as qa_cli.py
    participant FS as File System
    participant Logic as Search Logic
    
    User->>CLI: Run application
    CLI->>FS: Load notes from notes/
    FS-->>CLI: Return markdown files & content
    CLI->>Logic: split_sentences() on each note
    Logic-->>CLI: Sentences per note
    User->>CLI: Enter query
    alt Query is "exit"
        CLI->>User: Terminate
    else Query is valid
        CLI->>Logic: search_notes(query, notes)
        Logic->>Logic: Filter question words
        Logic->>Logic: Find matching sentences
        Logic-->>CLI: Matching results
        CLI->>User: Display results
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A bundle of notes, now searchable and bright,
Questions are answered with sentence-light!
No fancy embeddings, just words plain and clear,
The RAG MVP's magic is finally here! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add local Q&A CLI MVP for markdown notes' directly and clearly describes the main change: introducing a new Q&A command-line interface tool for querying markdown notes locally.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@smart-notes/rag_mvp/qa_cli.py`:
- Line 10: The NOTES_DIR constant is currently a relative path (NOTES_DIR) which
breaks when the script is run from a different CWD; change NOTES_DIR to be
computed relative to the script file by using the script's directory (via
__file__ and os.path.abspath/os.path.dirname) and joining the repository's notes
directory (e.g., two levels up then "notes") with os.path.join so the path
resolves regardless of working directory, or alternatively add a clear comment
documenting the required working directory if you intentionally keep a relative
path.
- Around line 34-53: The search_notes function currently does substring matching
using "word in sentence_lower" which yields false positives (e.g., "ai" matching
"said"); update the matching to use whole-word checks instead: for each sentence
from split_sentences(note["content"]) normalize/tokenize it into words (or use a
regex with word boundaries) and test membership against query_words (and respect
QUESTION_WORDS filtering already applied). Modify the inner loop where
sentence_lower is used and replace the substring check with either a compiled
word-boundary regex or a set-based word membership test so results.append still
uses note["filename"] and sentence.strip().

In `@smart-notes/rag_mvp/README.md`:
- Around line 28-42: Close the opening ```bash fence immediately after the run
command (python smart-notes/rag_mvp/qa_cli.py) and move the interactive example
into its own fenced block (e.g., ```text) so prompts and outputs are separated
from the shell instruction; in that example block ensure every user prompt is
prefixed with ">>" and the outputs are plain text lines (add missing ">>"
prefixes to the lines currently at the end of the file and format outputs like
"[1] From test.md: ..." on separate lines) to match the suggested "Example
session" structure.

🧹 Nitpick comments (3)

smart-notes/rag_mvp/qa_cli.py (3)
4-8: Consider naming this STOP_WORDS for clarity.

The set contains not just question words but also articles, prepositions, and auxiliary verbs. STOP_WORDS is the standard NLP term for this kind of filter list and would better describe its purpose.

56-77: Unhandled EOFError / KeyboardInterrupt on input().

If stdin is piped or the user presses Ctrl+C / Ctrl+D, the CLI will crash with a traceback. Wrap the loop with basic exception handling:
Proposed fix
-    while True:
-        query = input(">> ").strip()
+    try:
+      while True:
+        try:
+            query = input(">> ").strip()
+        except EOFError:
+            print("\nGoodbye 👋")
+            break
 
         if query.lower() == "exit":
             print("Goodbye 👋")
             break
 
         matches = search_notes(query, notes)
 
         if not matches:
             print("No relevant notes found.\n")
         else:
             print("\n--- Answers ---\n")
             for i, m in enumerate(matches, 1):
                 print(f"[{i}] From {m['filename']}:")
                 print(m["sentence"])
                 print()
+    except KeyboardInterrupt:
+        print("\nGoodbye 👋")
30-31: Sentence splitting doesn't handle markdown structure.

re.split(r'(?<=[.!?])\s+', text) only splits on sentence-ending punctuation followed by whitespace. Markdown headings (# Heading), bullet points (- item), and blank-line-separated paragraphs will be lumped into single "sentences", producing noisy results. Consider also splitting on newlines:
def split_sentences(text):
    lines = text.splitlines()
    sentences = []
    for line in lines:
        line = line.strip()
        if line:
            sentences.extend(re.split(r'(?<=[.!?])\s+', line))
    return sentences

coderabbitai · 2026-02-08T18:54:02Z

smart-notes/rag_mvp/qa_cli.py

+    "of", "to", "in", "on", "for"
+}
+
+NOTES_DIR = "notes"


⚠️ Potential issue | 🟡 Minor

NOTES_DIR is relative to CWD, not to the script location.

If a user runs the script from any directory other than the repository root, the notes/ path won't resolve correctly. Consider deriving the path relative to the script file:

NOTES_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "notes")

Or at minimum, document the expected working directory clearly.

🤖 Prompt for AI Agents

In `@smart-notes/rag_mvp/qa_cli.py` at line 10, The NOTES_DIR constant is currently a relative path (NOTES_DIR) which breaks when the script is run from a different CWD; change NOTES_DIR to be computed relative to the script file by using the script's directory (via __file__ and os.path.abspath/os.path.dirname) and joining the repository's notes directory (e.g., two levels up then "notes") with os.path.join so the path resolves regardless of working directory, or alternatively add a clear comment documenting the required working directory if you intentionally keep a relative path.

coderabbitai · 2026-02-08T18:54:02Z

smart-notes/rag_mvp/qa_cli.py

+def search_notes(query, notes):
+    results = []
+
+    query_words = [
+        word.lower()
+        for word in query.split()
+        if word.lower() not in QUESTION_WORDS
+    ]
+
+    for note in notes:
+        sentences = split_sentences(note["content"])
+        for sentence in sentences:
+            sentence_lower = sentence.lower()
+            if any(word in sentence_lower for word in query_words):
+                results.append({
+                    "filename": note["filename"],
+                    "sentence": sentence.strip()
+                })
+
+    return results


⚠️ Potential issue | 🔴 Critical

Substring matching causes false positives — use word-boundary matching.

word in sentence_lower (line 47) performs a substring check, not a whole-word check. For example, the query "what is AI" filters to query_words = ["ai"], which then matches sentences containing "said", "explain", "brain", "aim", etc.

Use a regex word-boundary check or split the sentence into words and check set membership.

🐛 Proposed fix using word boundaries

def search_notes(query, notes): results = [] query_words = [ word.lower() for word in query.split() if word.lower() not in QUESTION_WORDS ] + if not query_words: + return results + for note in notes: sentences = split_sentences(note["content"]) for sentence in sentences: sentence_lower = sentence.lower() - if any(word in sentence_lower for word in query_words): + if any(re.search(r'\b' + re.escape(word) + r'\b', sentence_lower) for word in query_words): results.append({ "filename": note["filename"], "sentence": sentence.strip() }) return results

🤖 Prompt for AI Agents

In `@smart-notes/rag_mvp/qa_cli.py` around lines 34 - 53, The search_notes function currently does substring matching using "word in sentence_lower" which yields false positives (e.g., "ai" matching "said"); update the matching to use whole-word checks instead: for each sentence from split_sentences(note["content"]) normalize/tokenize it into words (or use a regex with word boundaries) and test membership against query_words (and respect QUESTION_WORDS filtering already applied). Modify the inner loop where sentence_lower is used and replace the substring check with either a compiled word-boundary regex or a set-based word membership test so results.append still uses note["filename"] and sentence.strip().

coderabbitai · 2026-02-08T18:54:02Z

smart-notes/rag_mvp/README.md

+```bash
+python smart-notes/rag_mvp/qa_cli.py
+
+
+
+>> what is AI
+
+[1] From test.md:
+Artificial Intelligence (AI) is the simulation of human intelligence in machines.
+
+
+>>  what is machine learning
+how is machine learning used
+difference between AI and ML
+


⚠️ Potential issue | 🟡 Minor

The "How to run" code block is malformed and the example is confusing.

The ```bash block opened at line 28 is never closed—the remaining lines (example prompts, outputs, and follow-up queries) all run together inside it. Lines 39–41 also lack the >> prompt prefix, making it unclear whether they are user input or program output.

Consider closing the bash block after the run command and using a separate block for the example session:

📝 Suggested fix

## How to run ```bash python smart-notes/rag_mvp/qa_cli.py +``` +### Example session - ->> what is AI - -[1] From test.md: -Artificial Intelligence (AI) is the simulation of human intelligence in machines. - - ->> what is machine learning -how is machine learning used -difference between AI and ML +```text +>> what is AI +[1] From test.md: +Artificial Intelligence (AI) is the simulation of human intelligence in machines. + +>> what is machine learning +[1] From test.md: +Machine learning is a subset of AI. +```

🤖 Prompt for AI Agents

In `@smart-notes/rag_mvp/README.md` around lines 28 - 42, Close the opening ```bash fence immediately after the run command (python smart-notes/rag_mvp/qa_cli.py) and move the interactive example into its own fenced block (e.g., ```text) so prompts and outputs are separated from the shell instruction; in that example block ensure every user prompt is prefixed with ">>" and the outputs are plain text lines (add missing ">>" prefixes to the lines currently at the end of the file and format outputs like "[1] From test.md: ..." on separate lines) to match the suggested "Example session" structure.

Add local Q&A CLI MVP for markdown notes

564741c

coderabbitai bot reviewed Feb 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local Q&A CLI MVP for markdown notes #36

Add local Q&A CLI MVP for markdown notes #36

tubajavedd commented Feb 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 8, 2026

Uh oh!

coderabbitai bot Feb 8, 2026

Uh oh!

coderabbitai bot Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add local Q&A CLI MVP for markdown notes #36

Are you sure you want to change the base?

Add local Q&A CLI MVP for markdown notes #36

Conversation

tubajavedd commented Feb 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What’s included

Motivation

Limitations / Future work

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tubajavedd commented Feb 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 8, 2026 •

edited

Loading