Skip to content

senojj/hl7

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hl7

A Go library for parsing, transforming, and validating HL7 version 2.x messages in ER7 (pipe-delimited) format.

Zero external dependencies. Requires Go 1.23+.

msg, _ := hl7.ParseMessage(rawBytes)

fmt.Println(msg.Get("MSH-9.1"))  // "ADT"
fmt.Println(msg.Get("PID-5.1"))  // "Smith"

When to use this library

This library is designed for applications that route, filter, validate, or modify HL7v2 messages — integration engines, message brokers, audit loggers, and similar infrastructure. It parses structure (segments, fields, components) and optionally validates messages against user-provided schemas.

Choose this library when:

  • You need to read a few fields from each message and forward the rest untouched. Parsing is lazy: accessing MSH-9 never touches OBX segments.
  • You process high-throughput message streams and need predictable, low-allocation performance.
  • You want a single dependency-free package rather than a framework with code-generated message types.
  • You need to transform messages (change field values, move data between fields, convert delimiters) without building a full serialization layer.
  • You need to validate messages against custom schemas — checking segment structure, field presence, data type formats, and coded table values.

Choose a different library when:

  • You want strongly typed message structures with named fields for specific trigger events.

Parsing

ParseMessage copies the input bytes and splits the message into segments. This is the only step that allocates. All deeper access — fields, components, subcomponents — scans raw bytes on demand with no caching.

raw := []byte("MSH|^~\\&|SEND|FAC|RECV|FAC|202401011200||ADT^A01|MSG001|P|2.5.1\rPID|1||12345^^^MRN||Smith^John||19800101|M")

msg, err := hl7.ParseMessage(raw)
if err != nil {
    log.Fatal(err)
}

Terser-style access

The Get method accepts location strings in the format SEG-Field.Component.SubComponent:

msg.Get("MSH-9")      // "ADT^A01"  (full field, unescaped)
msg.Get("MSH-9.1")    // "ADT"      (first component)
msg.Get("MSH-9.2")    // "A01"      (second component)
msg.Get("PID-5.1")    // "Smith"    (family name)
msg.Get("PID-3.1")    // "12345"    (patient ID)
msg.Get("PID-3.1.1")  // "12345"    (first subcomponent)

Segment occurrence and repetition indices are supported:

msg.Get("OBX(0)-5")   // first OBX, observation value
msg.Get("OBX(1)-5")   // second OBX, observation value
msg.Get("PID-3[0].1") // first repetition of PID-3, component 1
msg.Get("PID-3[1].1") // second repetition of PID-3, component 1

Missing values return an empty string — no error checking needed for chained reads.

Location parsing

Location strings can be parsed into a Location struct and converted back:

loc, err := hl7.ParseLocation("PID-3[1].4.2")
// loc.Segment = "PID", loc.Field = 3, loc.Repetition = 1,
// loc.Component = 4, loc.SubComponent = 2

fmt.Println(loc.String()) // "PID-3[1].4.2"

Structural access

For iteration or when you need more control, walk the type hierarchy directly:

for _, seg := range msg.Segments() {
    fmt.Printf("%-3s  %d fields\n", seg.Type(), seg.FieldCount())
}

pid := msg.Segments()[1]
name := pid.Field(5)                      // Field is 0-indexed (0 = segment type)
fmt.Println(name.Rep(0).Component(1))     // Component is 1-indexed per HL7 convention
fmt.Println(name.RepetitionCount())

Null vs empty

HL7 distinguishes between omitted fields (||) and explicitly null fields (|""|):

f := seg.Field(7)
f.IsEmpty()    // true if field was omitted
f.IsNull()     // true if field is the HL7 null value ""
f.HasValue()   // true if neither empty nor null

Transforming

Transform applies changes to a message and returns a new *Message. The original is never modified.

result, err := msg.Transform(
    hl7.Replace("PID-5.1", "Jones"),
    hl7.Replace("MSH-10", "NEW_CTRL_ID"),
    hl7.Null("PID-7"),              // set to HL7 null ("")
    hl7.Omit("PID-19"),             // remove value entirely
    hl7.Move("PID-3", "PID-4"),     // copy PID-4 to PID-3, clear PID-4
)

Values passed to Replace are plain text — delimiter characters are escaped automatically.

Delimiter conversion

TransformWith re-encodes the entire message to a different delimiter set:

newDelims := hl7.Delimiters{
    Field: '#', Component: '@', Repetition: '!',
    Escape: '$', SubComponent: '%',
}

result, err := msg.TransformWith(newDelims,
    hl7.Replace("MSH-10", "CONVERTED"),
)

Delimiter conversion correctly handles escape sequences: \F\ in the source (which represents the literal source field separator |) resolves to the character | in the output, and is re-escaped only if | happens to be a delimiter in the destination set.

Extending messages

Changes that target fields or segments beyond the current message size automatically extend it:

// PID has 8 fields; this extends it to include field 30.
result, _ := msg.Transform(hl7.Replace("PID-30", "extended"))

// ZZZ segment doesn't exist; it will be created.
result, _ = msg.Transform(hl7.Replace("ZZZ-1", "custom"))

Building messages

MessageBuilder constructs HL7 messages from scratch using terser-style field paths.

b, err := hl7.NewMessageBuilder()
if err != nil {
    log.Fatal(err)
}

b.Set("MSH-9.1", "ADT")
b.Set("MSH-9.2", "A01")
b.Set("MSH-10", "CTRL001")
b.Set("MSH-11", "P")
b.Set("MSH-12", "2.5.1")
b.Set("PID-3.1", "12345")
b.Set("PID-5.1", "Smith")
b.Set("PID-5.2", "John")
b.Set("PV1-2", "I")

msg, err := b.Build()

Set accepts the same location syntax as Get — segments, fields, components, subcomponents, and repetitions. Values are plain text and will be escaped automatically. Segments are created on first use.

Repetitions and subcomponents

b.Set("PID-3[0].1", "ID1")
b.Set("PID-3[1].1", "ID2")
b.Set("PID-3.4.1", "AUTH")
b.Set("PID-3.4.2", "SYSTEM")

Null values

b.SetNull("PID-7") // sets to HL7 null ("")

Custom delimiters

b, err := hl7.NewMessageBuilder(hl7.WithDelimiters(hl7.Delimiters{
    Field: '#', Component: '@', Repetition: '!',
    Escape: '$', SubComponent: '%',
}))

Reusability

The builder remains usable after Build. Subsequent Set calls modify the builder's state, and a new Build produces a new independent *Message:

b.Set("PID-3", "AAA")
msg1, _ := b.Build()

b.Set("PID-3", "BBB")
msg2, _ := b.Build() // msg1 is unchanged

Validation

Validate checks a parsed message against a user-provided Schema. The schema is composed of four optional maps — populate only the categories you need.

schema := &hl7.Schema{
    Messages: map[string]*hl7.MessageDef{
        "ADT_A01": {Elements: []hl7.Element{
            {Segment: "MSH", Min: 1, Max: 1},
            {Segment: "PID", Min: 1, Max: 1},
            {Segment: "PV1", Min: 1, Max: 1},
        }},
    },
    Segments: map[string]*hl7.SegmentDef{
        "PID": {Fields: []hl7.FieldDef{
            {Index: 3, Name: "Patient Identifier List", DataType: "CX", Required: true},
            {Index: 5, Name: "Patient Name", Required: true, MaxLength: 48},
            {Index: 8, Name: "Administrative Sex", Table: "0001"},
        }},
    },
    Tables: map[string]*hl7.TableDef{
        "0001": {Values: map[string]string{"F": "Female", "M": "Male", "U": "Unknown"}},
    },
}

result := msg.Validate(schema)
if !result.Valid {
    for _, issue := range result.Issues {
        fmt.Println(issue) // [error] PID-8: Value "X" at PID-8 is not in table 0001 (INVALID_TABLE_VALUE)
    }
}

What gets validated

Validation runs in three phases. Each phase runs only if the relevant definitions exist in the schema:

Structure (schema.Messages) — Checks segment presence, order, and cardinality against the message definition looked up from MSH-9. Supports nested groups with min/max repetition counts.

Content (schema.Segments, schema.DataTypes, schema.Tables) — For each segment with a definition, checks:

  • Exact value assertions (Value on fields and components)
  • Required fields are present
  • Field length does not exceed MaxLength
  • Non-repeating fields do not repeat
  • Primitive data types match expected format (DT, TM, DTM/TS, NM, SI)
  • Composite data types have required components with valid lengths and formats
  • Coded values exist in the referenced table
  • Custom field checks (FieldDef.Check) and segment checks (SegmentDef.Check)

Value assertions are checked first. When a value doesn't match, remaining checks (format, table, length) are skipped — they would be noise on a wrong value.

Custom checks (schema.Checks) — Runs message-level MessageCheckFunc functions for cross-segment business rules that cannot be expressed declaratively.

Composite data types

Define DataTypeDef entries to validate component structure within composite fields:

schema.DataTypes = map[string]*hl7.DataTypeDef{
    "CX": {Components: []hl7.ComponentDef{
        {Index: 1, Name: "ID Number", Required: true, MaxLength: 15},
        {Index: 4, Name: "Assigning Authority", MaxLength: 20},
        {Index: 5, Name: "Identifier Type Code", Table: "0203"},
    }},
}

Value assertions

Use the Value field on FieldDef or ComponentDef to assert that a field or component has an exact expected value:

schema.Segments = map[string]*hl7.SegmentDef{
    "MSH": {Fields: []hl7.FieldDef{
        {Index: 12, Name: "Version ID", Value: "2.5.1"},
        {Index: 18, Name: "Character Set", Value: "UNICODE UTF-8"},
    }},
}

This is useful for enforcing interface agreements — version IDs, processing modes, encoding declarations, and other fields that must have specific values. Empty fields are not checked against value assertions.

Custom check functions

For validation logic that cannot be expressed declaratively, attach check functions at three levels. All return []Issue and are tagged json:"-" so they don't interfere with schema serialization.

Field-level — runs once per field (after declarative checks), receives the Field value:

{Index: 5, Name: "Observation Value", Check: func(f hl7.Field) []hl7.Issue {
    // Custom range check on a numeric field.
    val, err := strconv.ParseFloat(f.String(), 64)
    if err != nil || val < 0 || val > 300 {
        return []hl7.Issue{{
            Severity: hl7.SeverityError, Location: "OBX-5",
            Code: "VALUE_RANGE", Description: "value out of range [0, 300]",
        }}
    }
    return nil
}}

Segment-level — runs once per segment occurrence (after all field checks), receives *Segment:

schema.Segments["PID"] = &hl7.SegmentDef{
    Check: func(seg *hl7.Segment) []hl7.Issue {
        if seg.Field(8).HasValue() && !seg.Field(7).HasValue() {
            return []hl7.Issue{{
                Severity: hl7.SeverityError, Location: "PID-7",
                Code: "CONDITIONAL_REQUIRED",
                Description: "PID-7 required when PID-8 is present",
            }}
        }
        return nil
    },
}

Message-level — runs once per Validate call (after all segments), receives *Message:

schema.Checks = []hl7.MessageCheckFunc{
    func(msg *hl7.Message) []hl7.Issue {
        if msg.Get("MSH-9.1") == "ORU" && msg.Get("OBX-1") == "" {
            return []hl7.Issue{{
                Severity: hl7.SeverityError, Location: "OBX",
                Code: "BUSINESS_RULE",
                Description: "ORU messages must contain at least one OBX",
            }}
        }
        return nil
    },
}

Custom checks are additive — they run alongside declarative checks, not instead of them. Empty or null fields skip FieldDef.Check entirely.

Incremental adoption

Each map in the schema is independent. A schema with only Messages performs structure validation without checking field contents. A schema with only Segments and Tables validates field values without checking segment order. This lets you adopt validation incrementally.

Loading schemas from files

All schema types have struct tags that support JSON, YAML, and TOML. Use any decoder that unmarshals into Go structs — no special loading function is needed.

// JSON (encoding/json — stdlib, zero dependencies)
f, _ := os.Open("schema.json")
var schema hl7.Schema
json.NewDecoder(f).Decode(&schema)

// YAML (gopkg.in/yaml.v3)
f, _ := os.Open("schema.yaml")
var schema hl7.Schema
yaml.NewDecoder(f).Decode(&schema)

// TOML (github.com/BurntSushi/toml)
f, _ := os.Open("schema.toml")
var schema hl7.Schema
toml.NewDecoder(f).Decode(&schema)

A schema file in JSON:

{
  "messages": {
    "ADT_A01": {
      "elements": [
        {"segment": "MSH", "min": 1, "max": 1},
        {"segment": "PID", "min": 1, "max": 1},
        {"segment": "PV1", "min": 1, "max": 1}
      ]
    }
  },
  "segments": {
    "MSH": {
      "fields": [
        {"index": 12, "name": "Version ID", "value": "2.5.1"}
      ]
    },
    "PID": {
      "fields": [
        {"index": 3, "name": "Patient Identifier List", "type": "CX", "required": true},
        {"index": 5, "name": "Patient Name", "required": true, "max_length": 48},
        {"index": 8, "name": "Administrative Sex", "table": "0001"}
      ]
    }
  },
  "tables": {
    "0001": {
      "values": {"F": "Female", "M": "Male", "U": "Unknown"}
    }
  }
}

The same schema in YAML:

messages:
  ADT_A01:
    elements:
      - segment: MSH
        min: 1
        max: 1
      - segment: PID
        min: 1
        max: 1
      - segment: PV1
        min: 1
        max: 1

segments:
  MSH:
    fields:
      - index: 12
        name: Version ID
        value: "2.5.1"
  PID:
    fields:
      - index: 3
        name: Patient Identifier List
        type: CX
        required: true
      - index: 5
        name: Patient Name
        required: true
        max_length: 48
      - index: 8
        name: Administrative Sex
        table: "0001"

tables:
  "0001":
    values:
      F: Female
      M: Male
      U: Unknown

Schemas can also be marshaled back to any format for sharing or code generation:

data, _ := json.MarshalIndent(schema, "", "  ")
os.WriteFile("schema.json", data, 0644)

Reading streams

Reader reads messages from an io.Reader with support for MLLP framing and raw (MSH-boundary) detection.

MLLP (typical for TCP connections)

reader := hl7.NewReader(conn, hl7.WithMode(hl7.ModeMLLP))

err := reader.EachMessage(func(msg *hl7.Message) error {
    msgType := msg.Get("MSH-9.1")
    fmt.Println("received", msgType)
    return nil
})
if err != nil {
    log.Fatal(err)
}

Auto-detection

ModeAuto (the default) peeks at the first byte to decide between MLLP and raw mode:

reader := hl7.NewReader(conn) // auto-detect
msg, err := reader.ReadMessage()

Raw message bytes

If you need the unparsed bytes (for forwarding, logging, etc.):

raw, err := reader.ReadRawMessage()

Writing streams

Writer writes messages to an io.Writer with optional MLLP framing. It is the counterpart to Reader.

writer := hl7.NewWriter(conn, hl7.WithMLLP())
err := writer.WriteMessage(msg)

Without WithMLLP(), messages are written as raw bytes followed by a segment terminator.

Forwarding without parsing

WriteRawMessage writes raw bytes directly — useful for proxying or logging without the cost of parsing:

raw, _ := reader.ReadRawMessage()
writer.WriteRawMessage(raw)

Read-transform-write

A typical integration engine loop:

reader := hl7.NewReader(inConn, hl7.WithMode(hl7.ModeMLLP))
writer := hl7.NewWriter(outConn, hl7.WithMLLP())

reader.EachMessage(func(msg *hl7.Message) error {
    modified, _ := msg.Transform(hl7.Replace("MSH-5", "DEST"))
    return writer.WriteMessage(modified)
})

Writes are zero-allocation when the Writer is reused. Each write flushes immediately — HL7 messages are request/response, so buffering across messages is not desirable.

ACK generation

Ack generates an ACK response message from a received message. It swaps sender/receiver fields, copies delimiters, and builds MSH + MSA segments:

ack, err := msg.Ack(hl7.AA, "ACK001")

The first argument is an acknowledgment code (AA, AE, AR, CA, CE, CR). The second is the control ID for the ACK's MSH-10.

Error responses

Use WithText to include an error description in MSA-3:

ack, err := msg.Ack(hl7.AE, "ACK002",
    hl7.WithText("PID-3 missing required patient identifier"))

Field mapping

The ACK copies and swaps fields from the original message:

ACK field Source
MSH-3 (Sending App) Original MSH-5 (Receiving App)
MSH-4 (Sending Facility) Original MSH-6 (Receiving Facility)
MSH-5 (Receiving App) Original MSH-3 (Sending App)
MSH-6 (Receiving Facility) Original MSH-4 (Sending Facility)
MSH-7 (Timestamp) Current time (or WithTimestamp(t))
MSH-9 (Message Type) ACK^trigger^ACK
MSH-10 (Control ID) The controlID argument
MSH-11 (Processing ID) Original MSH-11
MSH-12 (Version ID) Original MSH-12
MSA-1 (Ack Code) The code argument
MSA-2 (Control ID) Original MSH-10
MSA-3 (Text) WithText value (if provided)

The result is raw []byte suitable for sending directly or writing via a Writer:

writer := hl7.NewWriter(conn, hl7.WithMLLP())
writer.WriteRawMessage(ack)

Batch and file parsing

HL7 defines batch (BHS/BTS) and file (FHS/FTS) wrapper segments for grouping messages:

batch, err := hl7.ParseBatch(data)
for _, msg := range batch.Messages {
    fmt.Println(msg.Get("MSH-10"))
}

file, err := hl7.ParseFile(data)
for _, batch := range file.Batches {
    for _, msg := range batch.Messages {
        fmt.Println(msg.Get("MSH-10"))
    }
}

Header and trailer segments are optional — messages without BHS/BTS wrappers are placed in an implicit batch.

Escape sequences

Escape processing is deferred until .String() is called. The Unescape and Escape functions are also available directly:

d := hl7.DefaultDelimiters()

// Unescape: resolve HL7 escape sequences to literal text
text := hl7.Unescape([]byte(`Dr\S\ Smith \F\ MD`), d)
// text = "Dr^ Smith | MD"

// Escape: encode delimiter characters for safe embedding in fields
encoded := hl7.Escape([]byte("value|with^delims"), d)
// encoded = `value\F\with\S\delims`

Both functions have a zero-allocation fast path when no escape or delimiter characters are present in the input.

Design tradeoffs

Scan-on-access instead of eager parsing

Every call to Field(n), Component(n), or SubComponent(n) re-scans raw bytes to find the n-th delimiter. Nothing is cached.

  • Benefit: Sub-message types are pure value types (~32 bytes each). ParseMessage allocates only 3 objects: the byte buffer copy, the segment slice, and the *Message struct. There are no per-field or per-component heap allocations.
  • Cost: Repeated access to the same field re-scans each time. For typical HL7 segments (<200 bytes), each scan costs ~10-20ns — negligible compared to the ~100-200ns per heap allocation it avoids.
  • Implication: If you access the same deeply nested value in a tight loop, extract it to a variable first.

Immutable messages

ParseMessage copies the input buffer. All types hold read-only slices into this owned copy. There are no setters — use Transform to produce a modified copy. This makes messages safe for concurrent reads with no synchronization.

Zero values instead of errors for access

Out-of-range field, component, and subcomponent access returns empty zero values rather than errors. This enables chained access patterns like seg.Field(5).Rep(0).Component(2).String() without intermediate nil checks. The cost is that typos in field indices silently return empty strings.

Transform rebuilds message bytes

Transform applies changes by splicing raw bytes in a working buffer, then calls ParseMessage on the result. This means every transform pays the cost of a full re-parse (~800ns for a typical message). This is by design: the output is a fully independent *Message with its own buffer, and correctness is guaranteed by reusing the battle-tested parse path.

Schema validation is opt-in

The parser treats all segments generically — it does not require a schema to parse or access any message. Validation is a separate step via msg.Validate(schema) with a user-provided Schema. The library does not ship with built-in HL7v2 segment or table definitions. This keeps the parser small and avoids coupling to any particular HL7 version.

Performance

Benchmarked on Apple M3 Pro (arm64):

Operation Time Allocs Bytes
ParseMessage (695B ORU_R01) 837 ns 3 1,088
Parse + access all fields 8.1 us 5 1,140
ParseMessage (minimal MSH) 93 ns 3 144
Get accessor (3 lookups) 329 ns 6 96
Transform (3 changes) 1.3 us 6 2,200
Builder (10 Set + Build) 1.0 us 13 784
Validate structure only (ORU_R01) 257 ns 3 72
Validate fields only (ORU_R01) 2.8 us 11 64
Validate full (ORU_R01) 3.1 us 13 104
WriteMessage MLLP 12 ns 0 0
WriteMessage raw 9 ns 0 0
Ack (ADT^A01) 442 ns 3 176

The 3 base allocations are the byte buffer copy, the segment slice, and the *Message struct. The 2 additional allocations in parse+access come from Unescape on fields containing the escape character (MSH-2 always contains \).

Validation operates on raw bytes (avoiding Unescape allocations) and defers location string construction to error paths only. On valid messages — the common case — no location strings are built, keeping allocations minimal.

Writer writes are zero-allocation when the Writer is reused. The bufio.Writer batches framing bytes and payload into a single syscall.