Skip to content

Industrial Reliability Study: Predicting APS System Failures in Scania Trucks

License

Notifications You must be signed in to change notification settings

abdibasidadan-byte/Predicting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#The APS (Air Processing System) is critical for the operation of Scania trucks. APS failures can #lead to significant industrial costs and production downtime. Historical sensor data enables the #analysis of system behavior and early detection of anomalies before they escalate into critical #failures. #In this study, a Random Forest model was trained to detect APS system failures, with special #attention given to the highly imbalanced nature of the dataset by applying class weighting to the #rare positive instances. Model performance was evaluated using standard metrics including precision, #recall, F1-score, and confusion matrices, while the associated industrial cost was calculated based #on the impact of false positives and false negatives. To further optimize failure detection and #reduce total costs, XGBoost was optionally employed. Additionally, feature importance analysis was #conducted to identify the most critical sensors influencing APS failure predictions, with the top #features visualized through horizontal bar charts to provide interpretable insights for industrial #decision-making.

Modeling and Prediction

pip install xgboost --break-system-packages

from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import confusion_matrix, classification_report import pandas as pd import matplotlib.pyplot as plt import seaborn as sns

1. Feature / Target Separation

X_train = train_df.drop(columns=['class']) y_train = train_df['class'].map({'neg': 0, 'pos': 1}) # 1 = APS failure

X_test = test_df.drop(columns=['class']) y_test = test_df['class'].map({'neg': 0, 'pos': 1})

2. Missing Value Imputation

Replace missing values using the median (robust to outliers)

X_train = X_train.fillna(X_train.median()) X_test = X_test.fillna(X_train.median()) # use train statistics only

3. Feature Scaling

scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)

4. Random Forest Model with Class Weighting

Strong weight on positive class to reduce costly false negatives

rf_model = RandomForestClassifier( n_estimators=200, class_weight={0: 1, 1: 50}, random_state=42, n_jobs=-1 )

rf_model.fit(X_train_scaled, y_train)

5. Prediction and Evaluation

y_pred = rf_model.predict(X_test_scaled)

conf_matrix = confusion_matrix(y_test, y_pred) print("Confusion Matrix:\n", conf_matrix)

print("\nClassification Report:\n") print(classification_report(y_test, y_pred))

6. Industrial Cost Evaluation

Cost definition (from APS challenge)

COST_FP = 10 # unnecessary workshop inspection COST_FN = 500 # missed APS failure

false_positives = conf_matrix[0, 1] false_negatives = conf_matrix[1, 0]

total_industrial_cost = ( false_positives * COST_FP + false_negatives * COST_FN )

print("Total Industrial Cost:", total_industrial_cost)

7. Confusion Matrix Visualization

plt.figure(figsize=(6, 5)) sns.heatmap( conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Predicted Negative', 'Predicted Positive'], yticklabels=['Actual Negative', 'Actual Positive'] ) plt.title("Confusion Matrix – APS Failure Detection") plt.xlabel("Prediction") plt.ylabel("Ground Truth") plt.show()

8. Feature Importance Analysis

feature_importance = pd.Series( rf_model.feature_importances_, index=X_train.columns )

top_features = feature_importance.sort_values(ascending=False).head(15)

print("\nTop 15 Most Important Features:") print(top_features)

9. Feature Importance Visualization

plt.figure(figsize=(10, 6)) top_features.plot(kind='barh') plt.gca().invert_yaxis() plt.title("Top 15 Most Influential Sensors for APS Failure Prediction") plt.xlabel("Importance Score") plt.show()

#Figure 1 illustrates the confusion matrix of the Random Forest model applied to APS failure #detection under a highly imbalanced class distribution. The model correctly classifies the majority #of non-failure cases, as reflected by the large number of true negatives and the very low number of #false positives. This indicates a strong capability to avoid unnecessary maintenance actions. #However, a non-negligible number of APS failures are misclassified as normal operations (false #negatives), which represent critical cases from an industrial perspective due to their high #associated cost. This result emphasizes the importance of cost-sensitive learning and motivates the #use of alternative models, such as XGBoost, to further reduce false negatives and optimize #industrial cost. In the confusion matrix, the value 15,607 corresponds to true negatives, that is, #trucks that do not have an APS system failure and are correctly identified as such by the model, #demonstrating its ability to avoid unnecessary maintenance interventions. #Figure 2 presents the top 15 most influential sensor variables used by the Random Forest model to #predict APS failures. The results indicate that the prediction is driven by a limited subset of #sensors, with aa_000 being the most dominant feature, followed by ci_000, ck_000, and dn_000. This #suggests that APS failures are strongly associated with specific operational measurements rather #than uniformly across all sensors. The concentration of importance among these variables highlights #their potential relevance for targeted monitoring and preventive maintenance strategies, as focusing #on key sensors could improve fault detection efficiency while reducing system complexity.

#Conclusion #The experimental results confirm that the proposed machine learning approach is effective for #detecting failures of the Air Pressure System (APS) in heavy-duty trucks. The Random Forest model #achieved a high overall accuracy (≈99%) and a strong precision for the negative class, indicating #reliable identification of non-failure cases. However, the recall for APS failures remained moderate #(≈58–61%), highlighting the intrinsic difficulty of detecting rare failure events in highly #imbalanced industrial datasets. #The integration of class weighting significantly reduced the number of false negatives, which are #associated with the highest industrial cost. Using the defined cost function, the Random Forest #model resulted in a total industrial cost of approximately 74,000–78,000 units, demonstrating a #meaningful improvement over non–cost-aware baselines. Furthermore, the XGBoost model substantially #outperformed Random Forest in cost optimization, reducing the total industrial cost to approximately #29,850 units, primarily by further decreasing missed APS failures. #Feature importance analysis revealed that a limited set of sensor variables (e.g., aa_000, ci_000, #ck_000, dn_000) consistently contributed most to the predictive performance. This suggests that APS #degradation can be detected through specific operational patterns captured by onboard sensors. #Overall, these results validate the relevance of cost-sensitive learning for industrial reliability #studies and demonstrate the practical value of data-driven predictive maintenance in intelligent #transportation systems.

#APS Failure at Scania Trucks [Dataset]. (2016). UCI Machine Learning Repository. https://doi.org/#10.24432/C51S51

About

Industrial Reliability Study: Predicting APS System Failures in Scania Trucks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published