Skip to content

Add configurable list formatting for CSV/TSV serialization#3134

Open
turbomam wants to merge 24 commits intomainfrom
issue-3041-annotation-based-csv-delimiters
Open

Add configurable list formatting for CSV/TSV serialization#3134
turbomam wants to merge 24 commits intomainfrom
issue-3041-annotation-based-csv-delimiters

Conversation

@turbomam
Copy link
Member

@turbomam turbomam commented Feb 4, 2026

New Summary from 2026-02-11

Adds configurable multivalued field formatting for CSV/TSV serialization, via schema-level annotations and CLI options.

Before: Multivalued fields always serialize with brackets: [value1|value2|value3]
After: With list_syntax: plaintext, fields serialize without brackets: value1|value2|value3

Closes #3041. Addresses the core of #2581 (filed by @matentzn as a blocker for supporting common delimited formats like pipe-separated, semicolon-separated, etc.).

Origin and design

This follows the design @cmungall and I agreed on in our Dec 15 rolling meeting notes:

annotations:
  list_syntax: plaintext   # python (default) | plaintext
  list_delimiter: "; "     # any string; space must be explicit

With mapping to json-flattener: list_syntax: plaintextcsv_list_markers=("", ""), list_delimitercsv_inner_delimiter.

Deviation from spec: schema-level only

The Dec 15 spec discussed slot-level annotations overriding schema-level defaults via SchemaView. The implementation is schema-level only. json-flattener's GlobalConfig defines csv_list_markers and csv_inner_delimiter at the top level with no per-column configuration path, so slot-level overrides would require extending json-flattener itself. The primary use case (MIxS-style "semicolon-delimited, no brackets") is uniform across all multivalued fields in a schema, so this felt like the right scope for now.

No changes to csvutils.py

Per Chris's guidance ("prefer no changes in csvutils.py"), configuration is handled in the loader and dumper rather than in the shared utility layer.

SSSOM alignment

@matentzn suggested checking how SSSOM handles multivalued field packing. With list_syntax: plaintext and list_delimiter: "|", our output matches SSSOM's TSV spec exactly (plain a|b|c, no brackets, strip whitespace). LinkML generalizes what SSSOM hardcodes — appropriate for a general-purpose modeling language where different schemas need different conventions.

The SSSOM ecosystem is actively working on delimiter-in-value escaping (sssom#507, sssom-java#17). This PR doesn't implement escaping either, but the annotation-based configuration provides the right foundation to add it later.

What's not in scope

  • linkml-validate loader: This PR modifies the linkml_runtime loader/dumper (used by linkml-convert), not the separate linkml.validator.loaders.delimited_file_loader. Filed linkml-validate CSV/TSV loader lacks schema-aware parsing (boolean coercion, list splitting) #3147 to track unifying them.
  • Pandera / column-oriented data: @tfliss and @sneakers-the-rat raised broader tabular concerns in Discussion #1996. This is row-oriented only — those feel like follow-up work for the tabular data library discussion.
  • Delimiter-in-value escaping: Neither this PR nor SSSOM 1.0 implement escaping. Instead, the refuse_delimiter_in_data annotation/CLI flag raises a ValueError before serialization if any value contains the delimiter — preventing silent data corruption. Full escaping (e.g., SSSOM 1.1's backslash approach) can be added later.
  • RDF order preservation: @gouttegd clarified that multivalued slot order non-preservation is a LinkML-wide property (not SSSOM-specific), since the RDF translation rules use unstructured triples even when list_elements_ordered: true. Worth noting but orthogonal to this PR.

Configuration reference

Schema annotations

id: https://example.org/myschema
name: myschema
annotations:
  list_syntax: plaintext
  list_delimiter: "|"
  list_strip_whitespace: "true"
  refuse_delimiter_in_data: "true"
Annotation Values Default Description
list_syntax python, plaintext python python wraps lists in brackets [a|b|c], plaintext has no brackets a|b|c
list_delimiter any string | (pipe) Character(s) used to separate list items
list_strip_whitespace true, false true Strip whitespace around delimiters when loading and dumping
refuse_delimiter_in_data true, false false Raise ValueError if any multivalued field value contains the delimiter, preventing silent data corruption

CLI options (override schema annotations)

linkml-convert -s schema.yaml -C Container -S items -t tsv \
  --list-syntax plaintext \
  --list-delimiter "|" \
  --list-strip-whitespace \
  --refuse-delimiter-in-data \
  input.yaml
CLI Option Default Description
--list-syntax None (use schema) python or plaintext
--list-delimiter None (use schema) Delimiter string
--list-strip-whitespace / --no-list-strip-whitespace None (use schema) Strip whitespace from list values
--refuse-delimiter-in-data / --no-refuse-delimiter-in-data None (use schema) Raise error if any value contains the delimiter

Review feedback addressed

From @cmungall's review (Feb 5):

  • ✅ Converted all tests to pure idiomatic pytest (no unittest classes, no hybrid styles)
  • ✅ Removed verbose agent-conversation-style comments
  • ✅ Made helper functions public (dropped underscore prefix)

From Copilot:

  • ✅ Fixed schema-level vs slot-level annotation mismatch in tests
  • ✅ Removed unused variables
  • ✅ Added warning log for invalid list_syntax values

Coverage:

  • ✅ Added CLI integration tests — 25 converter tests + 29 CSV/TSV runtime tests pass

Files changed

  • docs/data/csvs.md — documentation
  • packages/linkml/src/linkml/converter/cli.py — CLI options
  • packages/linkml_runtime/src/linkml_runtime/dumpers/delimited_file_dumper.py — output formatting
  • packages/linkml_runtime/src/linkml_runtime/loaders/delimited_file_loader.py — input parsing
  • tests/linkml/test_utils/test_converter.py — CLI tests
  • tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py — runtime tests
  • tests/linkml_runtime/test_utils/test_csv_utils.py — utility tests

@turbomam turbomam changed the title Add annotation-based CSV delimiter configuration Add annotation-based xSV delimiter configuration Feb 4, 2026
@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.77%. Comparing base (d187949) to head (751a917).

Files with missing lines Patch % Lines
packages/linkml/src/linkml/converter/cli.py 61.11% 2 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3134      +/-   ##
==========================================
+ Coverage   79.92%   83.77%   +3.85%     
==========================================
  Files         144      144              
  Lines       16579    16597      +18     
  Branches     3421     3428       +7     
==========================================
+ Hits        13250    13904     +654     
+ Misses       2606     1918     -688     
- Partials      723      775      +52     
Flag Coverage Δ
linkml 79.92% <61.11%> (+0.01%) ⬆️
runtime 79.92% <61.11%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@turbomam turbomam force-pushed the issue-3041-annotation-based-csv-delimiters branch from e788b02 to 2264cc3 Compare February 4, 2026 17:49
@turbomam turbomam changed the title Add annotation-based xSV delimiter configuration Add configurable list formatting for CSV/TSV serialization Feb 5, 2026
@turbomam turbomam marked this pull request as ready for review February 5, 2026 15:23
Copilot AI review requested due to automatic review settings February 5, 2026 15:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable list formatting for CSV/TSV serialization to address issue #3041, enabling users to control how multivalued fields are serialized (with or without brackets, custom delimiters, and whitespace handling).

Changes:

  • Adds schema-level annotations (list_syntax, list_delimiter, list_strip_whitespace) to control multivalued field formatting in CSV/TSV output
  • Implements CLI options (--list-syntax, --list-delimiter, --list-strip-whitespace) to override schema annotations
  • Extends CSV/TSV loaders and dumpers to handle plaintext-style lists (e.g., a|b|c) in addition to python-style lists (e.g., [a|b|c])

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
docs/data/csvs.md Comprehensive documentation of the new configuration options with examples and usage instructions
packages/linkml/src/linkml/converter/cli.py Adds three new CLI options for list formatting that apply to both input and output CSV/TSV operations
packages/linkml_runtime/src/linkml_runtime/dumpers/delimited_file_dumper.py Implements list formatting configuration for CSV/TSV output, reading from schema annotations or CLI overrides
packages/linkml_runtime/src/linkml_runtime/loaders/delimited_file_loader.py Implements list formatting configuration for CSV/TSV input, including helper functions for annotation reading and whitespace stripping
tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py Comprehensive integration tests covering plaintext mode, custom delimiters, whitespace handling, and edge cases
tests/linkml_runtime/test_utils/test_csv_utils.py Unit tests for annotation reading (contains a test schema that uses slot-level annotations inconsistently with implementation)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@turbomam
Copy link
Member Author

turbomam commented Feb 5, 2026

Re: patch coverage

The 12 uncovered lines are in cli.py where the new CLI options are passed through to the loader/dumper.

Update: Added CLI integration tests in commit a1db0e5. The CLI options (--list-syntax, --list-delimiter, --list-strip-whitespace) are now tested directly via CliRunner.

Test coverage includes:

  • 4 CLI tests in test_converter.py (linkml package)
  • 33 tests in test_csv_tsv_loader_dumper.py and test_csv_utils.py (linkml_runtime package)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



# =============================================================================
# pytest-style unit tests for annotation-based CSV configuration (issue #3041)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file is now a hybrid of 3 styles:

  1. UnitTest
  2. pure pytest
  3. non-idiomatic pytest (using classes)

The guidelines aren't clear what to do when contributing a new test to an existing UnitTest file:

https://linkml.io/linkml/maintainers/contributing.html#unittest-to-pytest-conversion

I favor consistency here, and I'd just convert this all to pure pytest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9b1e739 and 19b9006. Converted the entire file to pure pytest — both the old CsvUtilTestCase and the new annotation tests are now plain functions. Removed verbose comments and section headers.

# -----------------------------------------------------------------------------
# Inline test schemas for annotation testing
#
# We use inline schemas here because:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too much verbiage, sounds like the results of a conversation with an agent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the verbose comment blocks.

# -----------------------------------------------------------------------------
# Note on KeyConfig generation for multivalued primitive slots
#
# The _get_key_config() function in csvutils.py is NOT modified per Chris's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding comments to tests is good if they help future maintenance or explain he purpose or function of the text, but this looks like a conversation that has lost its context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the KeyConfig note — it was stale context.

return SchemaView(SCHEMA_WHITESPACE_PRESERVE)


class TestWhitespaceStripping:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this style of pytest seems non-idiomatic for this repo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Converted all test classes to plain test_* functions, including the pre-existing CsvAndTsvGenTestCase (mechanical conversion, no behavior change).

Copy link
Member

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the tests more consistent with other tests

from linkml_runtime.dumpers import csv_dumper, json_dumper, tsv_dumper, yaml_dumper
from linkml_runtime.loaders import csv_loader, tsv_loader, yaml_loader
from linkml_runtime.loaders.delimited_file_loader import (
_get_list_config_from_annotations,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making more clearly intended as public

(doctests might work better if it's intended as private)

Copy link
Member Author

@turbomam turbomam Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made them public by dropping the underscore prefix: get_list_config_from_annotations, enhance_configmap_for_multivalued_primitives, strip_whitespace_from_lists.

Considered doctests, but the repo has no doctest infrastructure — there's no --doctest-modules in pyproject.toml and no doctest step in CI. The existing >>> examples in ~9 source files are documentation-only and never executed as tests. Making the functions public with dedicated pytest coverage seemed more practical.

@turbomam
Copy link
Member Author

turbomam commented Feb 11, 2026

Consolidated unaddressed feedback — list formatting (PR #3134)

Gathering all outstanding feedback from multiple sources so nothing falls through the cracks.

1. Chris's CHANGES_REQUESTED review (Feb 5) — addressed in code, awaiting re-review

All 5 inline comments have been addressed:

  • ✅ Converted test file from hybrid unittest/pytest to pure pytest (9b1e739, 19b9006)
  • ✅ Removed verbose agent-conversation-style comments
  • ✅ Removed stale KeyConfig context note
  • ✅ Converted test classes to plain test_* functions
  • ✅ Made helper functions public (dropped underscore prefix)

Status: Code changes pushed, Chris has not re-reviewed yet.

2. Chris's design spec from Dec 15 rolling meeting notes

The agreed-upon design from our Dec 15 meeting (Chris & Mark rolling notes):

# Schema annotation spec
attributes:
  name_list:
    multivalued: true
    annotations:
      list_syntax: python  ## allowed: python | plaintext
      list_delimiter: "; "  ## must include space explicitly. No effect if list_syntax == 'python'

Mapping to json-flattener:

  • If list_syntax == "python" → use defaults
  • Else → csv_list_markers = ("", ""), csv_inner_delimiter = $list_delimiter

Cascading: Use SchemaView — default to schema-level annotations, slot-level annotations override.

Additional guidance from Chris:

  • "prefer no changes in packages/linkml_runtime/src/linkml_runtime/utils/csvutils.py"
  • "remember that json-flattener isn't really schema aware"
  • "pass csv list marker (a tuple) and inner delimiter (or a csv style syntax enum) in schema"

Need to verify: Does the current implementation match this spec exactly? Specifically:

  • Slot-level annotation override of schema-level annotations
  • Correct mapping to json-flattener's csv_list_markers and csv_inner_delimiter
  • No changes in csvutils.py

3. Chris's helper function visibility comment (PR inline)

Chris said "consider making more clearly intended as public (doctests might work better if it's intended as private)". I made them public and noted the repo has no doctest infrastructure — filed #3146 to track adding it. Chris hasn't responded to this.

4. Copilot review — invalid list_syntax values

Copilot flagged that invalid list_syntax values (e.g. "foobar") silently default to python style. Fixed in 971b3ec — added a warning log. ✅

5. Discussion #1996 context

My progress update in the "Improved ways of working with tabular data" discussion links this PR. Broader context from the discussion:

  • @tfliss raised interaction with Pandera generator for inlined-as-simple-dict and range classes
  • @sneakers-the-rat raised the two orientations of tabular data (column-oriented vs row-oriented)
  • Chris's original post discusses whether we should have a separate tabular data library with plugin architecture

6. Feb 9 rolling notes

Chris noted "Finish linkml PRs" as a current action item.

7. Related issues

Next steps

  1. Verify implementation matches Chris's Dec 15 spec (especially slot-level override cascade and json-flattener mapping)
  2. Rebase if needed
  3. Request re-review from @cmungall

🤖 Generated with Claude Code

@turbomam turbomam force-pushed the issue-3041-annotation-based-csv-delimiters branch from 9b1e739 to d0aa8d5 Compare February 11, 2026 16:58
@turbomam
Copy link
Member Author

@cmungall Heads up on one deviation from our Dec 15 spec. We discussed slot-level annotations overriding schema-level defaults, but the current implementation only supports schema-level annotations.

Why: json-flattener's GlobalConfig defines csv_list_markers and csv_inner_delimiter at the top level, not on KeyConfig. These get applied uniformly to all columns — there's no per-column configuration path. So slot-level overrides would require extending json-flattener itself.

I think schema-level-only is the right call here. The main use case driving this (MIxS-style "semicolon-delimited, no brackets") is uniform across all multivalued fields in a given schema anyway. If a per-slot use case comes up later, we can add it via json-flattener at that point.

Also rebased onto current main and resolved the conflict with #3118's new converter tests. All 54 tests pass (25 converter + 29 CSV/TSV runtime). Ready for re-review when you get a chance.

@turbomam
Copy link
Member Author

Scope and known limitations

What this PR does and doesn't touch

This PR modifies the linkml_runtime loader/dumper (used by linkml-convert). It does not touch the separate linkml.validator.loaders.delimited_file_loader (used by linkml-validate), which is a simpler 79-line loader built on bare csv.DictReader without json-flattener.

That means after this merges, linkml-convert will correctly split a|b into ['a', 'b'], but linkml-validate on the same CSV will still see it as the raw string a|b. I filed #3147 to track unifying these two loaders — that felt like a separate effort.

Schema-level only annotations

Our Dec 15 spec discussed slot-level annotation overrides via SchemaView. The implementation is schema-level only — see my earlier comment for why (json-flattener's GlobalConfig has no per-column delimiter support).

Known edge cases

  • Delimiter-in-value: If a value contains the delimiter character, round-tripping will break. No escaping mechanism yet. Tracked in a skipped test with a note.
  • Empty multivalued fields: Skipped test due to a json-flattener json_clean issue — empty lists don't roundtrip cleanly.
  • Pandera / column-oriented data: @tfliss and @sneakers-the-rat raised broader tabular concerns in Discussion Improved ways of working with tabular data #1996. This PR is row-oriented only and doesn't address those — they feel like follow-up work for the tabular data library discussion.

list_strip_whitespace accepts only true/false

Tightened in e4d955e to only accept case-insensitive "true" or "false". Previously accepted YAML 1.1 conventions (yes/no, 0/1). Changed to stay consistent with the direction in #3144 per Chris's feedback about not mixing boolean conventions.

@turbomam
Copy link
Member Author

SSSOM alignment — context from today's linkml-dev meeting

Nico suggested looking at how SSSOM handles multivalued field packing in TSV. Turns out the SSSOM ecosystem is actively debating the same problems we're solving here, literally today.

How SSSOM does it

From the SSSOM/TSV spec:

"Multi-valued slots MUST be serialised as a list of values separated by | characters."

  • No brackets — plain value1|value2|value3
  • Pipe is hardcoded in the spec, not configurable
  • sssom-py strips whitespace: [s.strip() for s in v.split("|")]
  • Schema-driven — checks multivalued: true to know which columns to split

Active spec work happening right now

  • mapping-commons/sssom#507 (opened Feb 10) — proposes backslash escaping (\| for literal pipe) within multivalued values, targeting SSSOM 1.1
  • mapping-commons/sssom#429 — the underlying debate about whether to forbid pipe in values, escape it, or percent-encode it
  • gouttegd/sssom-java#17 — concrete Java implementation of the escape mechanism
  • mapping-commons/sssom#504 — Nico is promoting sssom-java as the reference implementation, so its escaping approach will likely set the standard

How PR #3134 compares

Aspect SSSOM LinkML PR #3134
Format No brackets, plain a + pipe + b + pipe + c Configurable: brackets by default, no brackets with list_syntax: plaintext
Delimiter Pipe, fixed by spec Configurable via list_delimiter annotation
Whitespace strip() on each value Configurable via list_strip_whitespace
Escaping None in 1.0; backslash escaping proposed for 1.1 None (same gap)
Schema-driven Checks multivalued: true Same, via SchemaView + enhance_configmap_for_multivalued_primitives()

With list_syntax: plaintext and list_delimiter: "|", our output matches SSSOM's format exactly. LinkML generalizes what SSSOM hardcodes — which makes sense for a general-purpose modeling language where different schemas need different conventions (SSSOM uses pipe, MIxS has historically used semicolons, commas, etc.).

Escaping

Neither SSSOM (1.0) nor this PR handle delimiter-in-value escaping. SSSOM is actively working on backslash escaping for 1.1 (backslash-pipe for literal pipe, double-backslash for literal backslash). If/when that lands and we want to support it in LinkML, it would be a follow-up — the annotation-based configuration in this PR provides the right foundation to build on.

Related

  • Nico filed #2581 (the origin issue for this work), rating configurable delimiters as a "blocker"
  • SSSOM discussion #428 found 4 of 5 published SSSOM datasets on Zenodo don't follow the spec — reinforcing why good serialization tooling matters
  • SSSOM issue #491 notes SSSOM does not guarantee order preservation in multivalued slots (for RDF simplicity) — something to be aware of for LinkML's semantics

turbomam and others added 22 commits February 11, 2026 14:11
Add unit and integration tests for configurable multivalued field
delimiters in CSV/TSV serialization. Tests follow Chris Mungall's
design guidance: logic should be in loader/dumper files, not csvutils.py.

Tests include:
- Annotation reading (list_syntax, list_delimiter) via SchemaView
- Integration tests using personinfo.yaml with dynamic alias injection
- Parametrized tests for different delimiter configurations
- Edge case tests (empty lists, single values, delimiter in values)

All new tests are skipped pending implementation, following TDD approach.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for customizing multivalued field formatting in CSV/TSV
serialization via slot annotations:
- list_syntax: "python" (default, with brackets) or "plaintext" (no brackets)
- list_delimiter: custom delimiter between list items (default "|")

Implementation:
- Add _get_list_config_from_annotations() to read annotations from schema
- Add _enhance_configmap_for_multivalued_primitives() for plaintext mode
- Update loader and dumper to use annotation-derived configuration
- Logic is in loader/dumper files per Chris Mungall's guidance

Tests:
- Enable plaintext roundtrip tests (now passing)
- Enable custom delimiter tests for |, ;, and , (now passing)
- 16 tests passing, 14 skipped (edge cases for future work)

Documentation:
- Add "Customizing multivalued field formatting" section to docs/data/csvs.md
- Document list_syntax and list_delimiter annotations with examples

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
)

json-flattener's GlobalConfig applies the same csv_list_markers and
csv_inner_delimiter to all columns, so slot-level overrides don't make
sense. Simplified implementation to only read schema-level annotations.

Updated docs and tests to reflect this design constraint.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ruff UP006/UP035: Use lowercase tuple instead of typing.Tuple

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use ordinal for temp filename to avoid Windows reserved chars like |

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove TestCsvConfigFromAnnotations and test_list_syntax_to_markers
  (tested helper functions we never implemented - logic is in loader/dumper)
- Remove TestPersoninfoAliasesIntegration (used schema without annotations)
- Remove unused fixtures and inline schemas
- Update comments to reflect schema-level only design

Tests: 17 passed, 3 skipped (pre-existing issues outside PR scope)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test edge cases for _get_list_config_from_annotations and
_enhance_configmap_for_multivalued_primitives:
- None schemaview returns defaults
- Schema without annotations returns defaults
- plaintext_mode=False returns original configmap

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add YAML→TSV conversion example using existing test files
- Add TSV→YAML conversion example showing plaintext parsing
- Use markdown table to show sample TSV data
- Update terminology to "python style (bracketed)"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add list_strip_whitespace annotation (default true) to control
  whether whitespace around delimiters is stripped when loading
- Add CLI options to linkml-convert to override schema annotations:
  --list-syntax, --list-delimiter, --list-strip-whitespace
- Update documentation with new annotation and CLI options
- Add tests for whitespace stripping functionality

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Whitespace stripping now works for both loading and dumping
- On input: "a | b" → ['a', 'b'] (stripped) or ['a ', ' b'] (preserved)
- On output: ['dog   ', 'cat'] → "dog|cat" (stripped) or "dog   |cat" (preserved)
- Add tests for output whitespace stripping
- Update documentation to clarify bidirectional behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move inline schemas to module-level constants (SCHEMA_WITHOUT_ANNOTATIONS,
  SCHEMA_WHITESPACE_STRIP, SCHEMA_WHITESPACE_PRESERVE)
- Add make_delimiter_schema() factory for parametrized delimiter tests
- Move fixtures to module level for reusability
- Convert loop-based annotation value tests to @pytest.mark.parametrize

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wrap pipe characters in double backticks in --list-syntax and
--list-delimiter help strings. This prevents Sphinx's sphinx-click
extension from interpreting them as RST substitution references,
which was causing the docs build to fail with warnings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pipe characters inside markdown table cells were being interpreted as
column delimiters, causing truncated content in the rendered HTML.
Escaped with backslash (\|) to render as literal pipes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update CLI help and docs to say "when loading and dumping" (not just loading)
- Simplify CLI help text to avoid formatting issues with special characters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move annotations from slot-level to schema-level in test schema to match
  actual implementation behavior
- Remove unused variables from skipped test

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests for --list-syntax, --list-delimiter, and --list-strip-whitespace
options in linkml-convert CLI.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Warn users if they provide an invalid list_syntax annotation value
(e.g., typo like "plainetxt" instead of "plaintext").

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Convert test_csv_utils.py from hybrid unittest/pytest to pure pytest
- Convert class-based tests to plain functions in test_csv_tsv_loader_dumper.py
- Remove verbose comment blocks and chatty docstrings
- Rename helper functions to public (drop underscore prefix):
  get_list_config_from_annotations, enhance_configmap_for_multivalued_primitives,
  strip_whitespace_from_lists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mechanical conversion: drop class wrapper, remove self, remove
import unittest. No behavior change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only accept case-insensitive "true" or "false" for the
list_strip_whitespace annotation, with a warning for invalid values.
Aligns with the direction in #3144 to avoid YAML 1.1 boolean
conventions (yes/no, on/off, 0/1) in CSV-related configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@turbomam turbomam force-pushed the issue-3041-annotation-based-csv-delimiters branch from ec4030c to 99696b2 Compare February 11, 2026 19:12
@gouttegd
Copy link
Contributor

Not strictly related to the issue of list delimiters, but since this is noted here:

SSSOM issue #491 notes SSSOM does not guarantee order preservation in multivalued slots (for RDF simplicity) — something to be aware of for LinkML's semantics

This is not a SSSOM-specific limitation. SSSOM does not guarantee the order of the values in a multi-valued slot, because LinkML itself does not guarantee that. The behaviour of SSSOM here was directly taken from the behaviour of the LinkML runtime.

The rules for RDF translations do not cover the case of multi-valued slots, so I don’t know what was the intention here, but in effect the LinkML runtime translates multi-valued slots as simple unstructured list of triples (even if the slot is defined with list_elements_ordered: true). Such a translation cannot ensure that the order of values is preserved. So LinkML may preserve the order of values in all other formats, but as soon as you convert to or from RDF the order of values cannot be expected to be preserved. Ergo, more generally, LinkML does not guarantee that the order of values in multi-valued slots is preserved.

turbomam and others added 2 commits February 11, 2026 15:13
When enabled (via schema annotation or CLI flag), raises ValueError
before serializing if any multivalued field value contains the list
delimiter character. This catches round-trip corruption at write time
rather than silently producing corrupt output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSV/TSV loader does not split brackets-free, multivalued primitive slots (with pipe delimiter)

3 participants