Logistics-Feature-Engineering

Delhivery is the largest and fastest-growing fully integrated player in India by revenue in Fiscal 2021. They aim to build the operating system for commerce, through a combination of world-class infrastructure, logistics operations of the highest quality, and cutting-edge engineering and technology capabilities.
The Data team builds intelligence and capabilities using this data that helps them to widen the gap between the quality, efficiency, and profitability of their business versus their competitors.

Problem Statement:

The company aims to optimize its data engineering pipelines by enhancing its capacity to clean, sanitize, and manipulate incoming data to derive valuable features from raw fields.
This initiative seeks to streamline the process of making sense of raw data, empowering the data science team to construct more accurate forecasting models.

Dataset Information:

Source:

Please check the dataset at: "Dataset Link"

Feature Information:

data: tells whether the data is testing or training data
trip_creation_time: Timestamp of trip creation
route_schedule_uuid: Unique Id for a particular route schedule
route_type: Transportation type
- FTL(Full Truck Load): FTL shipments get to the destination sooner, as the truck is making no other pickups or drop-offs along the way
- Carting: Handling system consisting of small vehicles (carts)
trip_uuid: Unique ID given to a particular trip (A trip may include different source and destination centers)
source_center: Source ID of trip origin
source_name: Source Name of trip origin
destination_cente: Destination ID
destination_name: Destination Name
od_start_time: Trip start time
od_end_time: Trip end time
start_scan_to_end_scan: Time taken to deliver from source to destination
is_cutoff: Unknown field
cutoff_factor: Unknown field
cutoff_timestamp: Unknown field
actual_distance_to_destination: Distance in Kms between source and destination warehouse
actual_time: Actual time taken to complete the delivery (Cumulative)
osrm_time: An open-source routing engine time calculator which computes the shortest path between points in a given map (Includes usual traffic, distance through major and minor roads) and gives the time (Cumulative)
osrm_distance: An open-source routing engine which computes the shortest path between points in a given map (Includes usual traffic, distance through major and minor roads) (Cumulative)
factor: Unknown field
segment_actual_time: This is a segment time. Time taken by the subset of the package delivery
segment_osrm_time: This is the OSRM segment time. Time taken by the subset of the package delivery
segment_osrm_distance: This is the OSRM distance. Distance covered by subset of the package delivery
segment_factor: Unknown field

Exploratory Data Analysis(EDA):

Shape of Data

Observations on the number of rows and columns.
Dimensions of the dataset.

Data Types

Identifying data types of all attributes.
Conversion of categorical attributes to 'category' type if required.
Categorical Attributes Conversion

Missing Value Detection

Detection and summary of missing values across the dataset.

Statistical Summary

Descriptive statistics including mean, median, standard deviation, etc., for all attributes.

Visual Analysis

Distribution Plots: Histograms or density plots for all continuous variables.
Boxplots: For all categorical variables to understand the spread and detect outliers.

Insights from EDA

Key observations and patterns identified during the exploratory analysis.

Comments on Attributes

Range and distribution of attributes.
Identification and handling of outliers.

Univariate and Bivariate Plots

Detailed comments on each univariate plot (individual variable analysis).
Detailed comments on each bivariate plot (analysis of relationships between pairs of variables).

Feature Creation

Merging of Rows and Aggregation of Fields

Methodology for merging rows and aggregating fields.
Justification for the chosen approach.

Comparison & Visualization of Time and Distance Fields

Analysis of time and distance fields.
Visual comparisons and insights drawn from these comparisons.

Missing Values Treatment & Outlier Treatment

Techniques used for handling missing values.
Methods for detecting and treating outliers.

Checking Relationship Between Aggregated Fields

Analysis of relationships between newly aggregated fields.
Statistical and visual methods used to assess these relationships.

Handling Categorical Values

Strategies for encoding categorical variables (e.g., one-hot encoding, label encoding).

Column Normalization / Standardization

Processes for normalizing or standardizing columns.
Rationale behind choosing normalization or standardization.

Business Insights

Patterns observed in the data with potential business implications.
For example: Analysis of order origins (state, corridor, etc.).
Identification of the busiest corridors, average distances, and time taken.

Recommendations

Actionable items for the business based on the analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Delhivery - Feature Engineering.ipynb		Delhivery - Feature Engineering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Logistics-Feature-Engineering

Problem Statement:

Dataset Information:

Source:

Feature Information:

Exploratory Data Analysis(EDA):

Shape of Data

Data Types

Missing Value Detection

Statistical Summary

Visual Analysis

Insights from EDA

Comments on Attributes

Univariate and Bivariate Plots

Feature Creation

Merging of Rows and Aggregation of Fields

Comparison & Visualization of Time and Distance Fields

Missing Values Treatment & Outlier Treatment

Checking Relationship Between Aggregated Fields

Handling Categorical Values

Column Normalization / Standardization

Business Insights

Recommendations

About

Uh oh!

Releases

Packages

Languages

Biplabini-1992/Logistics-Feature-Engineering

Folders and files

Latest commit

History

Repository files navigation

Logistics-Feature-Engineering

Problem Statement:

Dataset Information:

Source:

Feature Information:

Exploratory Data Analysis(EDA):

Shape of Data

Data Types

Missing Value Detection

Statistical Summary

Visual Analysis

Insights from EDA

Comments on Attributes

Univariate and Bivariate Plots

Feature Creation

Merging of Rows and Aggregation of Fields

Comparison & Visualization of Time and Distance Fields

Missing Values Treatment & Outlier Treatment

Checking Relationship Between Aggregated Fields

Handling Categorical Values

Column Normalization / Standardization

Business Insights

Recommendations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages