Skip to content

Conversation

Copy link

Copilot AI commented Dec 31, 2025

The split_predict pass generates invalid ONNX models when intermediate values lack explicit value_info. Graph inputs end up with UNDEFINED elem_type (value 0), causing validation failures in onnx.checker and ONNX Runtime load errors.

Changes

Core fix in onnxoptimizer/passes/split.h:

  • Added inferElemType() helper that infers type from producing node's inputs when elem_type is UNDEFINED
  • Modified split_predict to use type inference instead of blindly copying UNDEFINED elem_type via copyMetadata()
  • Modified split_init to ensure output values have valid elem_type before registration

Type inference heuristic:

// For operators where output type matches input type (Add, Sub, Mul, etc.)
// Check producing node's inputs for a known elem_type
for (const Value* input : producer->inputs()) {
  if (input->elemType() != TensorProto_DataType_UNDEFINED) {
    return input->elemType();
  }
}

Tests:

  • Added Python test: test_split_predict_preserves_elem_type()
  • Added C++ test: SplitPredictPreservesElemType

Both tests verify optimized models pass ONNX validation and have valid elem_type for all inputs.

Limitations

Type inference works for common operators where output type matches input type. Operators with different output types (Shape, Cast) require proper value_info in the original model.

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] The pass "split_predict" generates an invalid optimized ONNX model</issue_title>
<issue_description>split_predict pass generates an invalid ONNX model (missing tensor elem_type) — fails onnx.checker and cannot be loaded by ONNX Runtime

Describe the bug
Hi ONNX Optimizer maintainers, thanks for the project!

When optimizing a valid ONNX model named model.onnx using only the split_predict pass, the resulting model.opt.onnx becomes invalid ONNX. It fails onnx.checker.check_model() with:

onnx.onnx_cpp2py_export.checker.ValidationError: Field 'elem_type' of 'type' is required but missing.

and ONNX Runtime fails to load it:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: ... failed: Invalid tensor data type 0.

This suggests the pass may be producing incomplete/incorrect type information (elem_type) for some tensor(s) in the optimized graph.

Environment

OS: Ubuntu 20.04 LTS
Python: 3.9.6
onnx: 1.19.0
onnxruntime: 1.19.2
onnxoptimizer: 0.3.13

To Reproduce

  1. Download and unzip the attached archive, then cd into the extracted directory

split_predict_repro.tar.gz

tar -xzvf split_predict_repro.tar.gz
cd split_predict_repro
  1. Create a Python environment (Python 3.9.6) and install dependencies:
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
  1. Run the split_predict pass on the case model and save model.opt.onnx:

python optimize_model.py --case ./case_00051_seed20654888

This script loads model.onnx and runs optimizer.optimize(model, ["split_predict"]), then saves model.opt.onnx.

  1. Validate the optimized model with ONNX checker:

python ./check_model.py

check_model.py loads ./case_00051_seed20654888/model.opt.onnx and runs checker.check_model(m).

  1. Try to run the optimized model with ONNX Runtime using stored oracle inputs:

python ./run_optimized_model.py

This script creates an InferenceSession from ./case_00051_seed20654888/model.opt.onnx and feeds inputs from oracle.pkl.

Expected behavior

  • model.opt.onnx should remain a valid ONNX model after split_predict optimization.
  • onnx.checker.check_model() should pass.
  • ONNX Runtime should be able to create an InferenceSession for model.opt.onnx.

Actual behavior

ONNX checker fails:

onnx.onnx_cpp2py_export.checker.ValidationError: Field 'elem_type' of 'type' is required but missing.

ONNX Runtime fails to load:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model ... failed: Invalid tensor data type 0.

Attachments

I’m attaching:

  • requirements.txt (dependency versions)
  • optimize_model.py (runs split_predict and saves model.opt.onnx)
  • check_model.py (runs ONNX checker on model.opt.onnx)
  • run_optimized_model.py (loads model.opt.onnx via ORT and runs with oracle.pkl)
  • the minimal failing folder ./case_00051_seed20654888 containing model.onnx, oracle.pkl, and the generated model.opt.onnx.

Thanks for your help!</issue_description>

Comments on the Issue (you are @copilot in this section)

@andife I wonder could you try the newest release v0.3.19, on Mar 3, 2024 or even the current github version?

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 7 commits December 31, 2025 21:09
- Added inferElemType() helper function to infer missing elem_type from inputs
- Modified split_predict to infer elem_type when creating new graph inputs/outputs
- This fixes the bug where split_predict generates invalid ONNX models with missing elem_type

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Add null check for producer node
- Improve documentation to explain limitations and supported operators
- Clarify that this is a heuristic for common cases

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Clarify what happens when elem_type inference fails
- Use TensorProto.UNDEFINED constant in Python test instead of magic number
- Add comment explaining UNDEFINED value in C++ test

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Use named constant throughout split.h instead of magic number 0
- Update C++ test to use ONNX_NAMESPACE::TensorProto_DataType_UNDEFINED
- Improves code readability and maintainability

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
- Add null pointer check for Value parameter to prevent segfaults
- Clarify comment in Python test about TensorProto.UNDEFINED

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
@sonarqubecloud
Copy link

Copilot AI changed the title [WIP] Fix invalid optimized ONNX model generation by split_predict pass Fix split_predict generating invalid ONNX models with missing elem_type Dec 31, 2025
Copilot AI requested a review from justinchuby December 31, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] The pass "split_predict" generates an invalid optimized ONNX model

2 participants