-
Notifications
You must be signed in to change notification settings - Fork 286
Open
Description
Hello βπ»
I have tried multiple ways to convert the LightGBMRanker to ONNX, but in all of them i am finding the issue where the output predictions are not the same.
An example of the code with the hummingbird:
import lightgbm as lgb
import pandas as pd
import numpy as np
from hummingbird.ml import convert, load
# Define the number of samples
num_samples = 100
# Generate random data for numerical features
num_data = {
"col1": np.random.randint(1, 100, num_samples),
"col2": np.random.randint(1, 100, num_samples),
"col2": np.random.uniform(10.0, 100.0, num_samples),
"col3": np.random.uniform(1.0, 10.0, num_samples),
"col4": np.random.uniform(0.0, 0.5, num_samples),
}
# Generate random data for categorical features
cat_data = {
"col5": np.random.choice(["dummy1", "dummy2", "dummy3"], num_samples),
"col6": np.random.choice(["dummy1", "dummy2", "dummy3"], num_samples),
"col7": np.random.choice(["dummy1", "dummy2", "dummy3"], num_samples),
}
target = {"target": [i % 2 for i in range(num_samples)]}
data = num_data | cat_data | target
df = pd.DataFrame(data)
cat_mapping = {
col: {val: idx for idx, val in enumerate(df[col].unique())}
for col in cat_data.keys() # Iterate through the column names in cat_data
}
for cat_col in list(cat_data.keys()):
df[cat_col] = df[cat_col].map(cat_mapping[cat_col])
X = df[list(num_data.keys()) + list(cat_data.keys())]
Y = df[list(target.keys())]
for cat_col in cat_data.keys():
X[cat_col] = X[cat_col].astype('category')
model = lgb.LGBMRanker(
objective="lambdarank",
metric=["ndcg", "map"],
boosting_type="gbdt",
categorical_feature=list(cat_data.keys()),
n_estimators=100,
# Auto-choosing col-wise multi-threading, the overhead of testing was 0.000241 seconds. You can set `force_col_wise=true` to remove the overhead.
force_col_wise=True,
random_state=42
)
# just train model as example
model.fit(
X,
Y,
group=[2]*50,
)
test_input = np.array(X.loc[:10].values, dtype=np.float32)
onnx_model = convert(
model,
"onnx",
test_input=test_input,
)
model.predict(test_input) # array([-1.03617303, 1.31771085, -0.04840754, 1.2865519 , 0.49542698,
1.49092876, -1.36244258, 0.15192526, -1.50055302, -0.34809177,
-0.31897834])
onnx_model.predict(test_input) # array([-0.83041805, 1.2326361 , -0.518405 , 0.79480076, 0.24598463,
0.7074484 , -1.706661 , -0.18985131, -0.98669803, 0.00681482,
0.1253777 ], dtype=float32)Does anyone know why? Is there anything in the conversion that i am missing?
Metadata
Metadata
Assignees
Labels
No labels