Add markdown-to-blocks parser for proper Logseq block hierarchy by RobertoGongora · Pull Request #12 · ergut/mcp-logseq

RobertoGongora · 2026-01-07T10:18:49Z

Add markdown-to-blocks parser for proper Logseq block hierarchy

Fixes #7

This PR introduces intelligent markdown parsing that converts markdown content into Logseq's native block tree structure, fixing the issue where create_page and update_page would dump all content as a single block.

Key improvements:

✅ Markdown → proper Logseq block hierarchy (headings, lists, nesting)
✅ YAML frontmatter → page properties (no more MethodNotExist error)
✅ Code blocks preserved as single blocks (not split line-by-line)
✅ TODO/DONE checkboxes without duplicate markers
✅ Multi-line blockquotes join correctly
✅ All nested blocks returned by get_page_content

📖 Quick Examples

Example 1: Simple Page Creation - Proper Hierarchy

Before (v1.0.1) ❌

# All content dumped as a single flat block
create_page("Project Notes", """
# Project Notes
- Task 1
  - Subtask A
- Task 2
""")

# Result in Logseq:
# Single block containing:
# "# Project Notes\n- Task 1\n  - Subtask A\n- Task 2"

After (v1.1.0) ✅

# Properly parsed into hierarchical blocks
create_page("Project Notes", """
# Project Notes
- Task 1
  - Subtask A
- Task 2
""")

# Result in Logseq:
# ├─ # Project Notes
# │  ├─ Task 1
# │  │  └─ Subtask A
# │  └─ Task 2

Try it yourself:

from mcp_logseq.logseq import LogSeq
from mcp_logseq.parser import parse_content

content = """# Project Notes
- Task 1
  - Subtask A
- Task 2"""

parsed = parse_content(content)
print(f"Blocks created: {len(parsed.blocks)}")
print(f"First block: {parsed.blocks[0].content}")
print(f"First block children: {len(parsed.blocks[0].children)}")
# Output:
# Blocks created: 1
# First block: # Project Notes
# First block children: 2

Example 2: Properties + TODO Lists - No More Errors

Before (v1.0.1) ❌

create_page("Tasks", """
---
priority: high
due-date: 2026-01-15
---

# Tasks
- [ ] TODO: Design feature
- [x] DONE: Initial setup
""")

# Result in Logseq:
# - Properties: error: MethodNotExist: get_page_properties
# - TODO markers duplicated: "TODO TODO: Design feature"
# - Date serialization error: "Object of type date is not JSON serializable"

After (v1.1.0) ✅

create_page("Tasks", """
---
priority: high
due-date: 2026-01-15
---

# Tasks
- [ ] TODO: Design feature
- [x] DONE: Initial setup
""")

# Result in Logseq:
# - Properties: {priority: "high", due-date: "2026-01-15"} ✅
# - Clean TODO markers: "TODO Design feature" ✅
# - Date serialized correctly to ISO string ✅

Try it yourself:

from mcp_logseq.parser import parse_content

content = """---
priority: high
due-date: 2026-01-15
---

# Tasks
- [ ] TODO: Design feature
- [x] DONE: Initial setup"""

parsed = parse_content(content)
print(f"Properties: {parsed.properties}")
print(f"First task: {parsed.blocks[0].children[0].content}")
print(f"Second task: {parsed.blocks[0].children[1].content}")
# Output:
# Properties: {'priority': 'high', 'due-date': '2026-01-15'}
# First task: TODO Design feature
# Second task: DONE Initial setup

Summary

Changes: 11 files (+2543, -682)
Tests: 103 passing with comprehensive coverage
Type checking: 0 errors

See commit message for full details on new features, API extensions, and behavioral changes.

Recommended version: v1.1.0 (feature release)

This PR introduces intelligent markdown parsing that converts markdown content into Logseq's native block tree structure, fixing the issue where create_page and update_page would dump all content as a single block. ## New Features ### Markdown Parser (src/mcp_logseq/parser.py) - Parse YAML frontmatter into page properties - Convert headings (H1-H6) into hierarchical block sections - Handle nested bullet lists (-, *, +) with proper indentation - Support numbered lists with nesting - Convert checkboxes to Logseq TODO/DONE markers - Preserve fenced code blocks as single blocks - Join contiguous blockquote lines into single blocks - Serialize date/datetime values to ISO strings for JSON compatibility ### Enhanced Tool Handlers - CreatePageToolHandler: Now parses markdown and creates proper block hierarchy - UpdatePageToolHandler: Added 'mode' parameter (append/replace) - Both support YAML frontmatter for page properties ### API Extensions (src/mcp_logseq/logseq.py) - insert_batch_block(): Use Logseq's insertBatchBlock for efficient bulk inserts - create_page_with_blocks(): Create pages with proper block hierarchy - update_page_with_blocks(): Update with append/replace modes - clear_page_content(): Remove all blocks from a page - Helper methods for block manipulation ## Tests - 91 tests passing with comprehensive coverage - 36 new parser tests for all markdown features - Edge cases: deep nesting, special characters, empty content ## Behavioral Changes - update_page now defaults to 'append' mode (adds content after existing blocks) - Use mode='replace' to clear and replace all content ## Dependencies - Added pyyaml>=6.0 for YAML frontmatter parsing ## Recommended Version Consider bumping to v1.1.0 for this feature release.

Adds support for Logseq's flexible marker system where any capitalized word (3+ characters) at the start of a line can be used as a task marker with nested children. This enables users to use custom workflow markers like TODO, DONE, DOING, WAITING, LATER, NOW, IN-PROGRESS, etc., matching Logseq's native system. ## Pattern - Matches: [A-Z][A-Z0-9_-]{2,} followed by space and text - Minimum 3 total characters (e.g., NOW, TODO, DONE) - Supports hyphens and underscores (IN-PROGRESS, PRIORITY_HIGH) - Supports numbers (STEP1, PRIORITY2) - Excludes: 2-char codes (CA, NY, US), lowercase, mixed-case ## Examples DONE Task completed - Detail 1 - Detail 2 NOW Urgent work - Immediate action needed IN-PROGRESS Current project - Phase 1 complete - Phase 2 in progress ## Changes - Add CAPITALIZED_MARKER_PATTERN regex to parser - Update _parse_list_item_content() to handle capitalized markers - Update main parsing loop to detect capitalized markers - Update list parsing logic to recognize markers at same level ## Tests - 12 new tests covering markers, nesting, edge cases - All 103 tests passing (+12 from previous 91) - 0 type errors - Verified with real journal entry format This allows journal entries like 'DONE Task' with nested children to work correctly, matching common Logseq workflow patterns.

RobertoGongora · 2026-01-07T12:29:23Z

Fixed an issue with capitalized markers not being properly parsed. I'm using this and improving as I find things but it's already in a pretty usable state.

Edit: Additional improvements made on the get_page_contents tool so its friendlier with LLMs and cross-tool e.g. for full page replacements.

Fixes issue where get_page_content with format='text' only displayed top-level blocks and ignored nested children, causing incomplete page content display. ## Changes ### Core Fix - Add _format_block_tree() recursive helper function to traverse and format block hierarchies with proper indentation - Use 2-space indentation per nesting level (Logseq standard) - Preserve TODO/DONE markers in content - Display block-level properties inline (tags as #tag, others as key::value) ### New Feature - Add max_depth parameter to control nesting display depth - Default: -1 (unlimited, show all levels) - max_depth=0: only top-level blocks - max_depth=1: parent + immediate children - Enables performance optimization for deeply nested pages ### API Changes - GetPageContentToolHandler.get_tool_description(): Add max_depth input parameter - GetPageContentToolHandler.run_tool(): Extract and use max_depth parameter - New static method: _format_block_tree(block, indent_level, max_depth) ## Tests - Add 5 comprehensive test cases for nested block formatting - Test 2-level nesting with proper indentation - Test 3+ level deep nesting - Test max_depth limiting behavior - Test markers and properties display - Test multiple sibling blocks at same level - All 108 tests passing (+5 from previous 103) ## Example Output Before (broken): ``` # 2026-01-07 Content: - DONE Parent task ``` After (fixed): ``` # 2026-01-07 Content: - DONE Parent task #opensource #contribution priority::high - DONE Child task - Grandchild detail ``` This ensures users can see complete hierarchical content when reading pages via the MCP tool, matching their Logseq graph structure.

Remove confusing decorative headers from get_page_content tool output that were causing LLMs to include them as part of content during update operations. Changes: - Replace title header and property labels with YAML frontmatter format - Remove "# Title" header (page name already passed to tool) - Remove "Content:" label before blocks - Convert page properties to YAML frontmatter (---...---) - Change empty page message from text to "-" (valid Logseq syntax) - Update tests to check for YAML frontmatter instead of old headers Benefits: - LLM-friendly output without ambiguous decorative headers - Standard YAML frontmatter format (widely recognized) - Direct usage for update_page operations without confusion - Clean, parseable output that matches Logseq expectations Breaking change: Text format output structure changed (acceptable for minor bump)

Properties are now properly stored on the first block of a page using upsertBlockProperty, which is how Logseq internally handles page properties. Key changes: - Set properties AFTER block insertion in create_page_with_blocks - Set properties AFTER block operations in update_page_with_blocks - Add property merge logic: append mode merges, replace mode replaces - Add property value normalization for tags/aliases dicts - Fix '[object Object]' issue when tags passed as dict The normalization handles special cases where tags/aliases are passed as dicts (e.g., {"hello": true, "test": true}) and converts them to arrays (["hello", "test"]) which Logseq expects. Fixes property persistence in UI and JSON retrieval. Test coverage: - 22 property persistence tests (12 original + 10 normalization) - All 130 tests passing - 0 type errors

Logseq API returns properties in two places: 1. In block.content field (as 'property:: value' lines) 2. In block.properties dict Our code was adding properties a third time from the properties dict, causing triple duplication in the output. Changes: - Remove inline property rendering from _format_block_tree - Properties now shown once, as they appear in block content - Page-level properties still shown in YAML frontmatter - Update test to reflect real Logseq API behavior Example output now: --- heading: 1 qa: true --- - # hello world heading:: 1 (shown once, from content) qa:: true (shown once, from content) Instead of: - # hello world heading:: 1 (from content) qa:: true (from content) heading::1 qa::True (duplicate from our code) All 130 tests passing.

Since page properties are already included in the first block's content (as Logseq stores them), we don't need to duplicate them in YAML frontmatter at the top of the output. Changes: - Remove YAML frontmatter rendering from get_page_content text format - Remove unused yaml import - Update tests to reflect that properties appear once in content - Simplify output format Before: --- heading: 1 qa: true --- - # hello world heading:: 1 qa:: true After: - # hello world heading:: 1 qa:: true Properties shown once, as they naturally appear in Logseq's content. All 130 tests passing.

RobertoGongora added 2 commits January 7, 2026 11:10

RobertoGongora added 5 commits January 7, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add markdown-to-blocks parser for proper Logseq block hierarchy#12

Add markdown-to-blocks parser for proper Logseq block hierarchy#12
RobertoGongora wants to merge 7 commits intoergut:mainfrom
RobertoGongora:feature/block-rendering

RobertoGongora commented Jan 7, 2026 •

edited

Loading

Uh oh!

RobertoGongora commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RobertoGongora commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add markdown-to-blocks parser for proper Logseq block hierarchy

📖 Quick Examples

Before (v1.0.1) ❌

After (v1.1.0) ✅

Try it yourself:

Before (v1.0.1) ❌

After (v1.1.0) ✅

Try it yourself:

Summary

Uh oh!

RobertoGongora commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RobertoGongora commented Jan 7, 2026 •

edited

Loading

RobertoGongora commented Jan 7, 2026 •

edited

Loading