Fix split_predict generating invalid ONNX models with missing elem_type #244

Copilot · 2025-12-31T20:53:42Z

The split_predict pass generates invalid ONNX models when intermediate values lack explicit value_info. Graph inputs end up with UNDEFINED elem_type (value 0), causing validation failures in onnx.checker and ONNX Runtime load errors.

Changes

Core fix in onnxoptimizer/passes/split.h:

Added inferElemType() helper that infers type from producing node's inputs when elem_type is UNDEFINED
Modified split_predict to use type inference instead of blindly copying UNDEFINED elem_type via copyMetadata()
Modified split_init to ensure output values have valid elem_type before registration

Type inference heuristic:

// For operators where output type matches input type (Add, Sub, Mul, etc.)
// Check producing node's inputs for a known elem_type
for (const Value* input : producer->inputs()) {
  if (input->elemType() != TensorProto_DataType_UNDEFINED) {
    return input->elemType();
  }
}

Tests:

Added Python test: test_split_predict_preserves_elem_type()
Added C++ test: SplitPredictPreservesElemType

Both tests verify optimized models pass ONNX validation and have valid elem_type for all inputs.

Limitations

Type inference works for common operators where output type matches input type. Operators with different output types (Shape, Cast) require proper value_info in the original model.

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] The pass "split_predict" generates an invalid optimized ONNX model</issue_title>
<issue_description>split_predict pass generates an invalid ONNX model (missing tensor elem_type) — fails onnx.checker and cannot be loaded by ONNX Runtime

Describe the bug
Hi ONNX Optimizer maintainers, thanks for the project!

When optimizing a valid ONNX model named model.onnx using only the split_predict pass, the resulting model.opt.onnx becomes invalid ONNX. It fails onnx.checker.check_model() with:

onnx.onnx_cpp2py_export.checker.ValidationError: Field 'elem_type' of 'type' is required but missing.

and ONNX Runtime fails to load it:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: ... failed: Invalid tensor data type 0.

This suggests the pass may be producing incomplete/incorrect type information (elem_type) for some tensor(s) in the optimized graph.

Environment
OS: Ubuntu 20.04 LTS
Python: 3.9.6
onnx: 1.19.0
onnxruntime: 1.19.2
onnxoptimizer: 0.3.13
To Reproduce

Download and unzip the attached archive, then cd into the extracted directory

split_predict_repro.tar.gz
tar -xzvf split_predict_repro.tar.gz
cd split_predict_repro
Create a Python environment (Python 3.9.6) and install dependencies:
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
Run the split_predict pass on the case model and save model.opt.onnx:

python optimize_model.py --case ./case_00051_seed20654888

This script loads model.onnx and runs optimizer.optimize(model, ["split_predict"]), then saves model.opt.onnx.

Validate the optimized model with ONNX checker:

python ./check_model.py

check_model.py loads ./case_00051_seed20654888/model.opt.onnx and runs checker.check_model(m).

Try to run the optimized model with ONNX Runtime using stored oracle inputs:

python ./run_optimized_model.py

This script creates an InferenceSession from ./case_00051_seed20654888/model.opt.onnx and feeds inputs from oracle.pkl.

Expected behavior

model.opt.onnx should remain a valid ONNX model after split_predict optimization.

onnx.checker.check_model() should pass.

ONNX Runtime should be able to create an InferenceSession for model.opt.onnx.

Actual behavior

ONNX checker fails:

onnx.onnx_cpp2py_export.checker.ValidationError: Field 'elem_type' of 'type' is required but missing.

ONNX Runtime fails to load:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model ... failed: Invalid tensor data type 0.

Attachments

I’m attaching:

requirements.txt (dependency versions)

optimize_model.py (runs split_predict and saves model.opt.onnx)

check_model.py (runs ONNX checker on model.opt.onnx)

run_optimized_model.py (loads model.opt.onnx via ORT and runs with oracle.pkl)

the minimal failing folder ./case_00051_seed20654888 containing model.onnx, oracle.pkl, and the generated model.opt.onnx.

Thanks for your help!</issue_description>

Comments on the Issue (you are @copilot in this section)
@andife I wonder could you try the newest release v0.3.19, on Mar 3, 2024 or even the current github version?

Fixes [BUG] The pass "split_predict" generates an invalid optimized ONNX model #221

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Added inferElemType() helper function to infer missing elem_type from inputs - Modified split_predict to infer elem_type when creating new graph inputs/outputs - This fixes the bug where split_predict generates invalid ONNX models with missing elem_type Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

- Add null check for producer node - Improve documentation to explain limitations and supported operators - Clarify that this is a heuristic for common cases Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

- Clarify what happens when elem_type inference fails - Use TensorProto.UNDEFINED constant in Python test instead of magic number - Add comment explaining UNDEFINED value in C++ test Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

- Use named constant throughout split.h instead of magic number 0 - Update C++ test to use ONNX_NAMESPACE::TensorProto_DataType_UNDEFINED - Improves code readability and maintainability Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

- Add null pointer check for Value parameter to prevent segfaults - Clarify comment in Python test about TensorProto.UNDEFINED Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

sonarqubecloud · 2025-12-31T21:21:38Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Initial plan

a2d3aba

Copilot AI assigned Copilot and justinchuby Dec 31, 2025

Copilot started work on behalf of justinchuby December 31, 2025 20:54 View session

Copilot AI and others added 7 commits December 31, 2025 21:09

Add test for split_predict elem_type preservation

510ca22

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Add C++ test for split_predict elem_type preservation

7a94e92

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Add null check for Value parameter in inferElemType

0b34ad1

- Add null pointer check for Value parameter to prevent segfaults - Clarify comment in Python test about TensorProto.UNDEFINED Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix invalid optimized ONNX model generation by split_predict pass~~ Fix split_predict generating invalid ONNX models with missing elem_type Dec 31, 2025

Copilot AI requested a review from justinchuby December 31, 2025 21:22

Copilot finished work on behalf of justinchuby December 31, 2025 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix split_predict generating invalid ONNX models with missing elem_type #244

Fix split_predict generating invalid ONNX models with missing elem_type #244

Uh oh!

Copilot AI commented Dec 31, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix split_predict generating invalid ONNX models with missing elem_type #244

Are you sure you want to change the base?

Fix split_predict generating invalid ONNX models with missing elem_type #244

Uh oh!

Conversation

Copilot AI commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Limitations

Comments on the Issue (you are @copilot in this section)

Uh oh!

sonarqubecloud bot commented Dec 31, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 31, 2025 •

edited

Loading