Version: 1.1.0 Last Updated: 2025
- 1. Introduction
- 2. Design Philosophy
- 3. Lexical Structure
- 4. Data Types
- 5. Syntax Rules
- 6. Header Syntax
- 7. Data Line Syntax
- 8. Nested Structures
- 9. Escape Sequences
- 10. Comments
- 11. Complete Examples
- 12. Parsing Rules
- 13. Best Practices
- 14. Changelog
Object Record Table (ORT) is a CSV-like structured data format designed specifically for token optimization in Large Language Model (LLM) contexts. Unlike traditional human-readable formats like JSON and YAML, ORT prioritizes computational efficiency while maintaining readability.
- Token Efficiency: Minimize the number of tokens required to represent structured data
- Structural Clarity: Maintain clear relationships between data elements
- Native Support: Support objects and arrays as first-class data structures
- Simplicity: Keep the syntax simple and predictable
ORT is ideal for:
- Data interchange with LLMs
- Uniform data structures with multiple records
- Configuration files for AI applications
- Structured data logging
ORT is NOT ideal for:
- Heterogeneous data structures
- Direct application I/O operations
- Human-editable configuration files requiring comments throughout
ORT achieves token efficiency through:
- Header-based field definitions: Field names declared once in header
- Positional value mapping: Data lines contain only values
- Minimal delimiters: Using only essential punctuation
- No redundant whitespace: Compact representation
# ORT Format (110 characters, 35 tokens)
users:id,profile(name,age,address(city,country)):
1,(John Doe,30,(New York,USA))
2,(Jane Smith,25,(London,UK))
// JSON Format (398 characters, 118 tokens)
{
"users": [
{
"id": 1,
"profile": {
"name": "John Doe",
"age": 30,
"address": {
"city": "New York",
"country": "USA"
}
}
},
{
"id": 2,
"profile": {
"name": "Jane Smith",
"age": 25,
"address": {
"city": "London",
"country": "UK"
}
}
}
]
}ORT uses UTF-8 encoding and supports the full Unicode character set.
- Unix style:
\n(LF) - Windows style:
\r\n(CRLF) - Both are supported and normalized during parsing
- Spaces and tabs are trimmed from line beginnings and endings
- Whitespace within values is preserved
- Empty lines are ignored
The following characters have special meaning in ORT:
| Character | Purpose | Escape Required |
|---|---|---|
: |
Header delimiter | No (only in headers) |
, |
Value separator | Yes (in values) |
( |
Object/nested field start | Yes (in string values) |
) |
Object/nested field end | Yes (in string values) |
[ |
Array start | Yes (in string values) |
] |
Array end | Yes (in string values) |
\ |
Escape character | Yes (always \\) |
# |
Comment marker | No (only at line start) |
ORT supports six primitive and composite data types:
Represents absence of a value.
Syntax: Empty string or no value between delimiters
Examples:
users:id,name,email:
1,John,
2,Jane,jane@example.com
JSON Equivalent:
{
"users": [
{"id": 1, "name": "John", "email": null},
{"id": 2, "name": "Jane", "email": "jane@example.com"}
]
}Boolean values representing true or false.
Syntax:
true- Boolean truefalse- Boolean false
Case Sensitivity: Lowercase only
Examples:
settings:enabled,verified:
true,false
false,true
JSON Equivalent:
{
"settings": [
{"enabled": true, "verified": false},
{"enabled": false, "verified": true}
]
}Numeric values including integers and floating-point numbers.
Syntax:
- Integer:
42,-17,0 - Float:
3.14,-0.5,999.99 - Scientific notation: NOT supported in current version
Range:
- Integers: 64-bit signed (-2^63 to 2^63-1)
- Floats: 64-bit IEEE 754 double precision
Examples:
products:id,price:
101,999.99
102,29.99
103,79.99
Special Cases:
- Leading zeros are preserved for strings:
007→"007" - Pure numbers are parsed as numbers:
007→7 - To force string interpretation, use escape:
\007→"007"(if not parseable as number)
UTF-8 encoded text values.
Syntax: Raw text without quotes
Characteristics:
- No surrounding quotes required
- Whitespace is trimmed from beginning and end
- Internal whitespace is preserved
- Special characters must be escaped
Examples:
users:id,name:
1,John Doe
2,Jane Smith
Trimming Behavior:
data:value:
hello world
Parsed as: "hello world" (leading/trailing spaces removed)
Ordered collection of values.
Syntax: [value1,value2,value3]
Characteristics:
- Square brackets
[]delimit arrays - Values separated by commas
- Can contain any data type
- Can be nested
- Empty arrays:
[]
Examples:
Simple Array:
colors:
[red,green,blue,yellow]
Nested Array:
matrix:
[[1,2,3],[4,5,6],[7,8,9]]
Mixed Type Array:
data:
[42,hello world,true,(id:100,active:false),[1,2,3]]
Array Field:
users:id,tags:
1,[admin,user]
2,[]
3,[guest]
Unordered collection of key-value pairs.
Syntax:
- Inline:
(key1:value1,key2:value2) - Header-based: Defined in header, values in data lines
Characteristics:
- Parentheses
()delimit inline objects - Key-value pairs separated by colons
: - Pairs separated by commas
- Can be nested
- Empty objects:
()
Examples:
Inline Object:
data:
[(id:1,name:Alice),(id:2,name:Bob)]
Header-based Object (Preferred):
users:id,name:
1,Alice
2,Bob
Nested Object:
users:id,profile(name,age):
1,(Alice,30)
2,(Bob,25)
An ORT document consists of one or more sections. Each section has:
- Header Line: Defines structure and field names
- Data Lines: Contains actual values
Syntax: keyName:field1,field2,...:
Usage: Creating named arrays in the root object
Example:
users:id,name:
1,Alice
2,Bob
Result:
{
"users": [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
]
}Syntax: :field1,field2,...:
Usage: Creating root-level objects or arrays
Single Object:
:id,name,email:
1001,Alice Williams,alice@example.com
Result:
{
"id": 1001,
"name": "Alice Williams",
"email": "alice@example.com"
}Multiple Objects (Array):
:id,name:
1,Alice
2,Bob
Result:
[
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
]General Form: [keyName]:field1,field2,...:
Components:
- Optional Key Name: Identifier for the data section
- Colon: Separates key name from fields
- Field List: Comma-separated field names
- Trailing Colon: Marks end of header
Rules:
- Must be valid identifiers
- Case-sensitive
- Can contain letters, numbers, underscores
- Cannot start with a number (by convention)
- No spaces allowed
Valid Examples:
idfirstNameuser_nameitem2_private
Invalid Examples:
first name(contains space)2ndItem(starts with number - technically allowed but not recommended)
Nested fields represent object structures.
Syntax: fieldName(nestedField1,nestedField2,...)
Example:
users:id,profile(name,age,email):
1,(John Doe,30,john@example.com)
2,(Jane Smith,25,jane@example.com)
Nesting can be arbitrarily deep.
Example:
users:id,profile(name,age,address(city,country)):
1,(John Doe,30,(New York,USA))
2,(Jane Smith,25,(London,UK))
JSON Equivalent:
{
"users": [
{
"id": 1,
"profile": {
"name": "John Doe",
"age": 30,
"address": {
"city": "New York",
"country": "USA"
}
}
},
{
"id": 2,
"profile": {
"name": "Jane Smith",
"age": 25,
"address": {
"city": "London",
"country": "UK"
}
}
}
]
}Data lines contain comma-separated values corresponding to fields in the header.
Rules:
- Values must match header field order
- Number of values must equal number of fields
- Values are separated by commas
- Leading/trailing whitespace is trimmed
Example:
users:id,name,age:
1,Alice,30
2,Bob,25
Values are parsed in the following order:
- Empty string →
null []→ Empty array()→ Empty object[...]→ Array(...)→ Object (if contains:) or Nested object (if in nested field)- Numeric string → Number (if parseable)
true/false→ Boolean- Everything else → String
For fields defined with nested structure in header, values should be wrapped in parentheses.
Example:
users:id,profile(name,age):
1,(Alice,30)
2,(Bob,25)
Invalid:
users:id,profile(name,age):
1,Alice,30 # WRONG: Not wrapped in parentheses
Note (v1.1.0+): The parser now supports dynamic field recognition. If a value doesn't match the expected nested format (e.g., an array [...] instead of an object (...)), the parser will treat it as a regular value instead of throwing an error. This allows for more flexible data structures while maintaining backward compatibility.
Multiple data lines create an array of objects.
Example:
users:id,name:
1,Alice
2,Bob
3,Charlie
Result:
{
"users": [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
{"id": 3, "name": "Charlie"}
]
}Header Definition:
users:id,profile(name,contact(email,phone)):
Data:
1,(John,(john@example.com,555-1234))
Result:
{
"users": [{
"id": 1,
"profile": {
"name": "John",
"contact": {
"email": "john@example.com",
"phone": "555-1234"
}
}
}]
}Example:
users:id,name,tags:
1,Alice,[admin,user]
2,Bob,[guest]
3,Charlie,[]
Inline Objects:
data:
[(id:1,name:Alice),(id:2,name:Bob)]
Result:
{
"data": [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
]
}Combining all nesting types:
records:id,data(values,metadata(tags,settings(options))):
1,([1,2,3],([dev,test],((verbose:true,debug:false))))
Result:
{
"records": [{
"id": 1,
"data": {
"values": [1, 2, 3],
"metadata": {
"tags": ["dev", "test"],
"settings": {
"options": {
"verbose": true,
"debug": false
}
}
}
}
}]
}Escape sequences allow special characters to be included in string values.
| Escape Sequence | Character | Description |
|---|---|---|
\\ |
\ |
Backslash |
\, |
, |
Comma |
\( |
( |
Left parenthesis |
\) |
) |
Right parenthesis |
\[ |
[ |
Left square bracket |
\] |
] |
Right square bracket |
\n |
Line Feed | Newline |
\t |
Tab | Horizontal tab |
\r |
Carriage Return | Carriage return |
messages:id,text:
1,\(Hello\, World!\)
2,Price: $99\,99
3,Use backslash: \\
4,Array syntax: \[1\,2\,3\]
Result:
{
"messages": [
{"id": 1, "text": "(Hello, World!)"},
{"id": 2, "text": "Price: $99,99"},
{"id": 3, "text": "Use backslash: \\"},
{"id": 4, "text": "Array syntax: [1,2,3]"}
]
}texts:id,content:
1,First line\nSecond line\nThird line
2,Name:\tJohn\nAge:\t30
Result:
{
"texts": [
{"id": 1, "content": "First line\nSecond line\nThird line"},
{"id": 2, "content": "Name:\tJohn\nAge:\t30"}
]
}Processing Rules:
- Backslash followed by recognized character → Replace with escaped character
- Backslash followed by unrecognized character → Keep the character, remove backslash
- Backslash at end of string → Keep backslash
Comments start with # at the beginning of a line.
Characteristics:
- Line comments only (no inline comments)
#must be the first non-whitespace character- Everything after
#to end of line is ignored - Comments can appear anywhere in the document
# This is a comment
users:id,name: # This is NOT a comment (not at line start)
1,Alice
# Another comment
2,Bob
# User Database
# Format: ID, Name, Email, Active Status
# Last Updated: 2025-01-15
users:id,name,email,active:
1001,Alice Williams,alice@example.com,true
1002,Bob Johnson,bob@example.com,false
users:age,id,name:
30,1,John Doe
25,2,Jane Smith
35,3,Bob Johnson
colors:
[red,green,blue,yellow]
:id,name,email,active:
1001,Alice Williams,alice@example.com,true
users:id,profile(name,age,address(city,country)):
1,(John Doe,30,(New York,USA))
2,(Jane Smith,25,(London,UK))
matrix:
[[1,2,3],[4,5,6],[7,8,9]]
data:
[42,hello world,true,(id:100,active:false),[1,2,3]]
products:id,name,price:
101,Laptop,999.99
102,Mouse,29.99
103,Keyboard,79.99
categories:id,name:
1,Electronics
2,Accessories
records:id,name,email,tags:
1,John Doe,,[]
2,Jane Smith,jane@example.com,()
3,Bob,bob@example.com,[admin,user]
messages:id,text:
1,\(Hello\, World!\)
2,Price: $99\,99
3,Use backslash: \\
4,Array syntax: \[1\,2\,3\]
texts:id,content:
1,First line\nSecond line\nThird line
2,Name:\tJohn\nAge:\t30
3,Multi\nLine\nText
settings:id,feature,enabled,verified:
1,notifications,true,false
2,dark_mode,false,true
3,auto_save,true,true
Values are parsed using the following algorithm:
function parse_value(string):
trimmed = trim(string)
if trimmed is empty:
return null
if trimmed == "[]":
return empty_array
if trimmed == "()":
return empty_object
if trimmed starts with '[' and ends with ']':
return parse_array(trimmed)
if trimmed starts with '(' and ends with ')':
if contains ':' at depth 0:
return parse_inline_object(trimmed)
else:
return parse_nested_object(trimmed)
unescaped = unescape(trimmed)
if unescaped is valid number:
return number
if unescaped == "true":
return true
if unescaped == "false":
return false
return string
When parsing values, track nesting depth to correctly identify delimiters:
depth = 0
bracket_depth = 0
for each character:
if character == '(':
depth++
else if character == ')':
depth--
else if character == '[':
bracket_depth++
else if character == ']':
bracket_depth--
else if character == ',' and depth == 0 and bracket_depth == 0:
# This comma is a value separator
Rule: Number of values in data line must exactly match number of fields in header.
Example Error:
users:id,name,age:
1,Alice # ERROR: Expected 3 values, got 2
Rule: For nested fields, values should be wrapped in parentheses and contain correct number of nested values.
Example Error:
users:id,profile(name,age):
1,(Alice) # ERROR: Expected 2 nested values, got 1
Dynamic Field Recognition (v1.1.0+):
When a nested field receives a value that doesn't match the expected format, the parser will attempt to parse it as a regular value:
users:id,profile(name,age):
1,[x,y,z] # Parsed as array instead of nested object
This behavior allows the parser to handle non-uniform data structures where the same field may contain different types across records. The generator will detect such non-uniform arrays and output them using inline object format instead of tabular format.
Good Use Cases:
- Uniform data structures (same fields across records)
- Large datasets for LLM consumption
- Token-optimized data transfer
- Structured logging
Poor Use Cases:
- Heterogeneous data (different fields per record)
- Direct application configuration
- Human-primary editing scenarios
- Data with frequent schema changes
Field Names:
- Use camelCase or snake_case consistently
- Keep names concise but descriptive
- Avoid abbreviations unless widely understood
Examples:
# Good
users:id,firstName,lastName,emailAddress:
# Acceptable
users:id,first_name,last_name,email_address:
# Avoid
users:i,fn,ln,ea: # Too cryptic
Prefer Flat Over Nested (when reasonable):
# Better for token efficiency
users:id,name,city,country:
1,John,New York,USA
# More nested, but more tokens
users:id,name,address(city,country):
1,John,(New York,USA)
Use Nesting for Logical Grouping:
# Good use of nesting
users:id,profile(name,age,email),settings(theme,language):
Group Related Sections:
# Good: Related data together
users:id,name:
1,Alice
2,Bob
user_roles:user_id,role:
1,admin
2,user
Use Comments for Clarity:
# User Master Data
users:id,name,email:
1,Alice,alice@example.com
# User Permissions
permissions:user_id,resource,access:
1,/admin,read-write
Always Validate:
- Field count matches value count
- Nested structures are properly formed
- Escape sequences are valid
- Data types are appropriate
Provide Clear Error Messages:
Line 5: Expected 3 values but got 2
1,Alice
For Large Datasets:
- Stream parsing when possible
- Validate headers before processing data
- Use appropriate buffer sizes
- Consider memory constraints for nested structures
Token Optimization:
- Minimize field name lengths (while maintaining clarity)
- Use flat structures when appropriate
- Avoid redundant nesting
| ORT Value | Parsed Type | Notes |
|---|---|---|
| (empty) | null | Empty string |
true |
boolean | Lowercase only |
false |
boolean | Lowercase only |
42 |
number | Integer |
3.14 |
number | Float |
-17 |
number | Negative number |
hello |
string | Raw text |
[] |
array | Empty array |
[1,2,3] |
array | Array of numbers |
() |
object | Empty object |
(a:1,b:2) |
object | Inline object |
(val1,val2) |
object | Nested object (context-dependent) |
ORT documents must be encoded in UTF-8. Parsers should:
- Accept UTF-8 with or without BOM
- Reject invalid UTF-8 sequences
- Preserve Unicode characters in string values
- Handle surrogate pairs correctly
A compliant ORT parser must:
- Support all data types defined in Section 4
- Implement escape sequence processing (Section 9.2)
- Validate field/value count matching (Section 13.3)
- Handle arbitrary nesting depth (Section 8)
- Ignore comments and empty lines (Section 10)
A compliant ORT generator must:
- Escape special characters in string values
- Generate valid headers for object arrays
- Maintain field order consistency
- Output minimal whitespace
- Use UTF-8 encoding
Implementations should provide clear error messages including:
- Line number
- Problematic content
- Description of the error
- Suggested fix (when applicable)
- ORT GitHub Repository
- ORT Playground
- TOON Format - Inspiration for ORT
Release Date: 2025
-
Dynamic Field Recognition in Parser
- Parser now handles cases where a field is defined as nested in the header but receives a different value type
- Arrays can now be parsed even when the header defines a nested object structure
- Fallback parsing for values that don't match the expected nested format
-
Improved Uniform Array Detection in Generator
- Generator now checks both key names AND value types when determining if an array is uniform
- Arrays with same keys but different value types (e.g., object vs array) are now correctly identified as non-uniform
- Non-uniform arrays are generated using inline object format instead of tabular format
- Fixed parsing error "Expected nested object in parentheses" when array values appear in nested field positions
- Fixed parsing error "Expected X values but got Y" for non-uniform object arrays
JSON with non-uniform array:
{
"test": [
{ "input": { "pairs": [["a","b"],["c,d","e:f",true]] } },
{ "input": ["x", "y", "true", true, 10] }
]
}Previous behavior (v1.0.1): Generated invalid ORT that couldn't be parsed back
New behavior (v1.1.0): Generates valid inline object format
test:
[(input:(pairs:[[a,b],[c\,d,e:f,true]])),(input:[x,y,true,true,10])]
Release Date: 2025
- Initial stable release
- Full support for all data types (null, boolean, number, string, array, object)
- Nested field syntax in headers
- Escape sequence support
- Multi-language implementations (Rust, TypeScript, Python)
End of Specification