Add markdown-to-blocks parser for proper Logseq block hierarchy#12
Open
RobertoGongora wants to merge 7 commits intoergut:mainfrom
Open
Add markdown-to-blocks parser for proper Logseq block hierarchy#12RobertoGongora wants to merge 7 commits intoergut:mainfrom
RobertoGongora wants to merge 7 commits intoergut:mainfrom
Conversation
This PR introduces intelligent markdown parsing that converts markdown content into Logseq's native block tree structure, fixing the issue where create_page and update_page would dump all content as a single block. ## New Features ### Markdown Parser (src/mcp_logseq/parser.py) - Parse YAML frontmatter into page properties - Convert headings (H1-H6) into hierarchical block sections - Handle nested bullet lists (-, *, +) with proper indentation - Support numbered lists with nesting - Convert checkboxes to Logseq TODO/DONE markers - Preserve fenced code blocks as single blocks - Join contiguous blockquote lines into single blocks - Serialize date/datetime values to ISO strings for JSON compatibility ### Enhanced Tool Handlers - CreatePageToolHandler: Now parses markdown and creates proper block hierarchy - UpdatePageToolHandler: Added 'mode' parameter (append/replace) - Both support YAML frontmatter for page properties ### API Extensions (src/mcp_logseq/logseq.py) - insert_batch_block(): Use Logseq's insertBatchBlock for efficient bulk inserts - create_page_with_blocks(): Create pages with proper block hierarchy - update_page_with_blocks(): Update with append/replace modes - clear_page_content(): Remove all blocks from a page - Helper methods for block manipulation ## Tests - 91 tests passing with comprehensive coverage - 36 new parser tests for all markdown features - Edge cases: deep nesting, special characters, empty content ## Behavioral Changes - update_page now defaults to 'append' mode (adds content after existing blocks) - Use mode='replace' to clear and replace all content ## Dependencies - Added pyyaml>=6.0 for YAML frontmatter parsing ## Recommended Version Consider bumping to v1.1.0 for this feature release.
Adds support for Logseq's flexible marker system where any capitalized
word (3+ characters) at the start of a line can be used as a task marker
with nested children.
This enables users to use custom workflow markers like TODO, DONE, DOING,
WAITING, LATER, NOW, IN-PROGRESS, etc., matching Logseq's native system.
## Pattern
- Matches: [A-Z][A-Z0-9_-]{2,} followed by space and text
- Minimum 3 total characters (e.g., NOW, TODO, DONE)
- Supports hyphens and underscores (IN-PROGRESS, PRIORITY_HIGH)
- Supports numbers (STEP1, PRIORITY2)
- Excludes: 2-char codes (CA, NY, US), lowercase, mixed-case
## Examples
DONE Task completed
- Detail 1
- Detail 2
NOW Urgent work
- Immediate action needed
IN-PROGRESS Current project
- Phase 1 complete
- Phase 2 in progress
## Changes
- Add CAPITALIZED_MARKER_PATTERN regex to parser
- Update _parse_list_item_content() to handle capitalized markers
- Update main parsing loop to detect capitalized markers
- Update list parsing logic to recognize markers at same level
## Tests
- 12 new tests covering markers, nesting, edge cases
- All 103 tests passing (+12 from previous 91)
- 0 type errors
- Verified with real journal entry format
This allows journal entries like 'DONE Task' with nested children to work
correctly, matching common Logseq workflow patterns.
Author
|
Fixed an issue with capitalized markers not being properly parsed. I'm using this and improving as I find things but it's already in a pretty usable state. Edit: Additional improvements made on the |
Fixes issue where get_page_content with format='text' only displayed
top-level blocks and ignored nested children, causing incomplete page
content display.
## Changes
### Core Fix
- Add _format_block_tree() recursive helper function to traverse and
format block hierarchies with proper indentation
- Use 2-space indentation per nesting level (Logseq standard)
- Preserve TODO/DONE markers in content
- Display block-level properties inline (tags as #tag, others as key::value)
### New Feature
- Add max_depth parameter to control nesting display depth
- Default: -1 (unlimited, show all levels)
- max_depth=0: only top-level blocks
- max_depth=1: parent + immediate children
- Enables performance optimization for deeply nested pages
### API Changes
- GetPageContentToolHandler.get_tool_description(): Add max_depth input parameter
- GetPageContentToolHandler.run_tool(): Extract and use max_depth parameter
- New static method: _format_block_tree(block, indent_level, max_depth)
## Tests
- Add 5 comprehensive test cases for nested block formatting
- Test 2-level nesting with proper indentation
- Test 3+ level deep nesting
- Test max_depth limiting behavior
- Test markers and properties display
- Test multiple sibling blocks at same level
- All 108 tests passing (+5 from previous 103)
## Example Output
Before (broken):
```
# 2026-01-07
Content:
- DONE Parent task
```
After (fixed):
```
# 2026-01-07
Content:
- DONE Parent task #opensource #contribution priority::high
- DONE Child task
- Grandchild detail
```
This ensures users can see complete hierarchical content when reading
pages via the MCP tool, matching their Logseq graph structure.
Remove confusing decorative headers from get_page_content tool output that were causing LLMs to include them as part of content during update operations. Changes: - Replace title header and property labels with YAML frontmatter format - Remove "# Title" header (page name already passed to tool) - Remove "Content:" label before blocks - Convert page properties to YAML frontmatter (---...---) - Change empty page message from text to "-" (valid Logseq syntax) - Update tests to check for YAML frontmatter instead of old headers Benefits: - LLM-friendly output without ambiguous decorative headers - Standard YAML frontmatter format (widely recognized) - Direct usage for update_page operations without confusion - Clean, parseable output that matches Logseq expectations Breaking change: Text format output structure changed (acceptable for minor bump)
Properties are now properly stored on the first block of a page using
upsertBlockProperty, which is how Logseq internally handles page properties.
Key changes:
- Set properties AFTER block insertion in create_page_with_blocks
- Set properties AFTER block operations in update_page_with_blocks
- Add property merge logic: append mode merges, replace mode replaces
- Add property value normalization for tags/aliases dicts
- Fix '[object Object]' issue when tags passed as dict
The normalization handles special cases where tags/aliases are passed
as dicts (e.g., {"hello": true, "test": true}) and converts them
to arrays (["hello", "test"]) which Logseq expects.
Fixes property persistence in UI and JSON retrieval.
Test coverage:
- 22 property persistence tests (12 original + 10 normalization)
- All 130 tests passing
- 0 type errors
Logseq API returns properties in two places: 1. In block.content field (as 'property:: value' lines) 2. In block.properties dict Our code was adding properties a third time from the properties dict, causing triple duplication in the output. Changes: - Remove inline property rendering from _format_block_tree - Properties now shown once, as they appear in block content - Page-level properties still shown in YAML frontmatter - Update test to reflect real Logseq API behavior Example output now: --- heading: 1 qa: true --- - # hello world heading:: 1 (shown once, from content) qa:: true (shown once, from content) Instead of: - # hello world heading:: 1 (from content) qa:: true (from content) heading::1 qa::True (duplicate from our code) All 130 tests passing.
Since page properties are already included in the first block's content (as Logseq stores them), we don't need to duplicate them in YAML frontmatter at the top of the output. Changes: - Remove YAML frontmatter rendering from get_page_content text format - Remove unused yaml import - Update tests to reflect that properties appear once in content - Simplify output format Before: --- heading: 1 qa: true --- - # hello world heading:: 1 qa:: true After: - # hello world heading:: 1 qa:: true Properties shown once, as they naturally appear in Logseq's content. All 130 tests passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add markdown-to-blocks parser for proper Logseq block hierarchy
Fixes #7
This PR introduces intelligent markdown parsing that converts markdown content into Logseq's native block tree structure, fixing the issue where
create_pageandupdate_pagewould dump all content as a single block.Key improvements:
MethodNotExisterror)get_page_content📖 Quick Examples
Example 1: Simple Page Creation - Proper Hierarchy
Before (v1.0.1) ❌
After (v1.1.0) ✅
Try it yourself:
Example 2: Properties + TODO Lists - No More Errors
Before (v1.0.1) ❌
After (v1.1.0) ✅
Try it yourself:
Summary
Changes: 11 files (+2543, -682)
Tests: 103 passing with comprehensive coverage
Type checking: 0 errors
See commit message for full details on new features, API extensions, and behavioral changes.
Recommended version:
v1.1.0(feature release)