Skip to content

This project focuses on feature engineering and EDA of Delhivery’s logistics dataset. It covers missing value treatment, outlier detection, categorical encoding, normalization and creation of new features. The analysis provides insights into delivery patterns, busiest routes and operational efficiency to support data-driven decision-making.

Notifications You must be signed in to change notification settings

Biplabini-1992/Logistics-Feature-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Logistics-Feature-Engineering

  • Delhivery is the largest and fastest-growing fully integrated player in India by revenue in Fiscal 2021. They aim to build the operating system for commerce, through a combination of world-class infrastructure, logistics operations of the highest quality, and cutting-edge engineering and technology capabilities.
  • The Data team builds intelligence and capabilities using this data that helps them to widen the gap between the quality, efficiency, and profitability of their business versus their competitors.

Problem Statement:

  • The company aims to optimize its data engineering pipelines by enhancing its capacity to clean, sanitize, and manipulate incoming data to derive valuable features from raw fields.
  • This initiative seeks to streamline the process of making sense of raw data, empowering the data science team to construct more accurate forecasting models.

Dataset Information:

Source:

Please check the dataset at: "Dataset Link"

Feature Information:

  • data: tells whether the data is testing or training data
  • trip_creation_time: Timestamp of trip creation
  • route_schedule_uuid: Unique Id for a particular route schedule
  • route_type: Transportation type
    • FTL(Full Truck Load): FTL shipments get to the destination sooner, as the truck is making no other pickups or drop-offs along the way
    • Carting: Handling system consisting of small vehicles (carts)
  • trip_uuid: Unique ID given to a particular trip (A trip may include different source and destination centers)
  • source_center: Source ID of trip origin
  • source_name: Source Name of trip origin
  • destination_cente: Destination ID
  • destination_name: Destination Name
  • od_start_time: Trip start time
  • od_end_time: Trip end time
  • start_scan_to_end_scan: Time taken to deliver from source to destination
  • is_cutoff: Unknown field
  • cutoff_factor: Unknown field
  • cutoff_timestamp: Unknown field
  • actual_distance_to_destination: Distance in Kms between source and destination warehouse
  • actual_time: Actual time taken to complete the delivery (Cumulative)
  • osrm_time: An open-source routing engine time calculator which computes the shortest path between points in a given map (Includes usual traffic, distance through major and minor roads) and gives the time (Cumulative)
  • osrm_distance: An open-source routing engine which computes the shortest path between points in a given map (Includes usual traffic, distance through major and minor roads) (Cumulative)
  • factor: Unknown field
  • segment_actual_time: This is a segment time. Time taken by the subset of the package delivery
  • segment_osrm_time: This is the OSRM segment time. Time taken by the subset of the package delivery
  • segment_osrm_distance: This is the OSRM distance. Distance covered by subset of the package delivery
  • segment_factor: Unknown field

Exploratory Data Analysis(EDA):

Shape of Data

  • Observations on the number of rows and columns.
  • Dimensions of the dataset.

Data Types

  • Identifying data types of all attributes.
  • Conversion of categorical attributes to 'category' type if required.
  • Categorical Attributes Conversion

Missing Value Detection

  • Detection and summary of missing values across the dataset.

Statistical Summary

  • Descriptive statistics including mean, median, standard deviation, etc., for all attributes.

Visual Analysis

  • Distribution Plots: Histograms or density plots for all continuous variables.
  • Boxplots: For all categorical variables to understand the spread and detect outliers.

Insights from EDA

  • Key observations and patterns identified during the exploratory analysis.

Comments on Attributes

  • Range and distribution of attributes.
  • Identification and handling of outliers.

Univariate and Bivariate Plots

  • Detailed comments on each univariate plot (individual variable analysis).
  • Detailed comments on each bivariate plot (analysis of relationships between pairs of variables).

Feature Creation

Merging of Rows and Aggregation of Fields

  • Methodology for merging rows and aggregating fields.
  • Justification for the chosen approach.

Comparison & Visualization of Time and Distance Fields

  • Analysis of time and distance fields.
  • Visual comparisons and insights drawn from these comparisons.

Missing Values Treatment & Outlier Treatment

  • Techniques used for handling missing values.
  • Methods for detecting and treating outliers.

Checking Relationship Between Aggregated Fields

  • Analysis of relationships between newly aggregated fields.
  • Statistical and visual methods used to assess these relationships.

Handling Categorical Values

  • Strategies for encoding categorical variables (e.g., one-hot encoding, label encoding).

Column Normalization / Standardization

  • Processes for normalizing or standardizing columns.
  • Rationale behind choosing normalization or standardization.

Business Insights

  • Patterns observed in the data with potential business implications.
  • For example: Analysis of order origins (state, corridor, etc.).
  • Identification of the busiest corridors, average distances, and time taken.

Recommendations

  • Actionable items for the business based on the analysis.

About

This project focuses on feature engineering and EDA of Delhivery’s logistics dataset. It covers missing value treatment, outlier detection, categorical encoding, normalization and creation of new features. The analysis provides insights into delivery patterns, busiest routes and operational efficiency to support data-driven decision-making.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published