[Question] How to map input bytes to the functions that accessed them in v4.0.0?

I'm analyzing parser behavior and need to determine which functions in an instrumented program accessed specific input bytes. This would help understand:
- Which parsing functions process which byte ranges
- How input bytes flow through the call stack
- Which code paths are triggered by specific input patterns

### Environment

- **PolyTracker version:** 4.0.0 (Docker: `trailofbits/polytracker:latest`)
- **Platform:** macOS ARM64 (using `--platform linux/amd64`)
- **Python version (in container):** 3.10

### What I've Successfully Extracted
I've made significant progress exploring the v4.0.0 API and can extract:

#### 1. Function Names
```python
from polytracker.taint_dag import TDFunctionsSection, TDStringSection

# Extract function ID → name mapping
for func_id, fn_header in enumerate(functions_section):
    func_name = string_section.read_string(fn_header.name_offset)
```

**Results:** Successfully extracted all function names: `main`, `parse_expr`, `program`, `statement`, `expr`, `test`, `sum`, `term`, etc.

#### 2. Function Call Trace
```python
from polytracker.taint_dag import TDEventsSection

# Iterate through execution trace
for event in events_section:
    function_id = event.fnidx      # Function being called
    event_type = event.kind        # ENTRY (0) or EXIT (1)
```

**Results:** Complete function call trace with proper nesting:
```
ENTER program
  ENTER statement
    ENTER expr
      ENTER term
```

#### 3. Input Byte Offsets
```python
# Extract byte offsets from taint forest
for node in trace.taint_forest.nodes():
    if node.source is not None:
        offset = trace.file_offset(node).offset  # Byte offset in input
        affected_cf = node.affected_control_flow # Whether byte influenced branching
```

**Results:** All input bytes tracked with control flow information.

### What's Missing: The Correlation

I **cannot** find a way to correlate these pieces together to answer:
**"Which function accessed byte X?"**

For example, given:
- Input file: `{i=1;}\n`
- Byte 2 (`=` character) at offset 2

I need to determine: *"The `expr` function accessed byte 2"*

### What I've Tried

#### Approach 1: TDEvent attributes
```python
for event in events_section:
    # event has: fnidx, kind
    # event does NOT have: label, taints, bytes_accessed
```

**Result:** Events know which function, but not which bytes it accessed.

#### Approach 2: Taint nodes
```python
for node in forest.nodes():
    # node has: label, source, affected_control_flow
    # node does NOT have: function, event, accessed_by
```

**Result:** Nodes know which byte, but not which function accessed it.

#### Approach 3: Documented methods
```python
# These raise NotImplementedError:
trace.access_sequence()  # NotImplementedError
trace.function_trace()   # NotImplementedError
for event in trace:      # NotImplementedError (via __iter__)
```

**Result:** Documented API methods are not implemented in v4.0.0.

#### Approach 4: Control Flow Log
```python
from polytracker.taint_dag import TDControlFlowLogSection

# CF log has function_id_mapping but unclear how to correlate with taints
```

**Result:** Found `function_id_mapping` attribute but it's a method, and calling it returns empty results.

### Questions

1. **Is there an API I'm missing?**  
   Is there a method/property that links taint labels to the events/functions that accessed them?

2. **Should I use a different trace format?**  
   Issue #6534 mentioned `DBProgramTrace` vs `TDProgramTrace`. Can I generate `.db` files where `access_sequence()` actually works?

3. **Is this data available internally but not exposed?**  
   If the correlation exists internally but isn't exposed via Python API, would you accept a PR to add it?

4. **Alternative approach?**  
   Is there a recommended way to achieve this byte-to-function mapping with the current v4.0.0 API?

### Minimal Reproduction
```bash
# Instrument a C program
docker run --rm --platform linux/amd64 \
    -v $(pwd):/workdir -w /workdir \
    trailofbits/polytracker bash -c \
    "polytracker build clang program.c -o program && \
     polytracker instrument-targets --taint --ftrace program"

# Execute with stdin tracking
docker run --rm --platform linux/amd64 \
    -v $(pwd):/workdir -w /workdir \
    -e POLYDB=polytracker.tdag \
    -e POLYTRACKER_STDIN_SOURCE=1 \
    trailofbits/polytracker \
    bash -c "./program.instrumented < input.txt"

# Analyze the trace
docker run --rm --platform linux/amd64 \
    -v $(pwd):/workdir -w /workdir \
    trailofbits/polytracker python3 -c "
from polytracker import PolyTrackerTrace
from polytracker.taint_dag import TDFunctionsSection, TDEventsSection, TDStringSection

trace = PolyTrackerTrace.load('polytracker.tdag')

# Can extract functions and events separately,
# but cannot correlate which functions accessed which bytes
```

### Use Case

This mapping would enable:
- **Parser debugging:** Identify which function mishandled a specific byte
- **Security analysis:** Find which code paths process attacker-controlled bytes
- **Execution visualization:** Create diagrams showing byte flow through functions
- **Performance analysis:** Identify hot paths for specific input patterns

---

**Related Issues:**
- #6534 - "Emitting and loading a DBProgramTrace instead of a TDProgramTrace" (similar access_sequence question)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to map input bytes to the functions that accessed them in v4.0.0? #6581

Environment

What I've Successfully Extracted

1. Function Names

2. Function Call Trace

3. Input Byte Offsets

What's Missing: The Correlation

What I've Tried

Approach 1: TDEvent attributes

Approach 2: Taint nodes

Approach 3: Documented methods

Approach 4: Control Flow Log

Questions

Minimal Reproduction

Use Case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How to map input bytes to the functions that accessed them in v4.0.0? #6581

Description

Environment

What I've Successfully Extracted

1. Function Names

2. Function Call Trace

3. Input Byte Offsets

What's Missing: The Correlation

What I've Tried

Approach 1: TDEvent attributes

Approach 2: Taint nodes

Approach 3: Documented methods

Approach 4: Control Flow Log

Questions

Minimal Reproduction

Use Case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions