- Delhivery is the largest and fastest-growing fully integrated player in India by revenue in Fiscal 2021. They aim to build the operating system for commerce, through a combination of world-class infrastructure, logistics operations of the highest quality, and cutting-edge engineering and technology capabilities.
- The Data team builds intelligence and capabilities using this data that helps them to widen the gap between the quality, efficiency, and profitability of their business versus their competitors.
- The company aims to optimize its data engineering pipelines by enhancing its capacity to clean, sanitize, and manipulate incoming data to derive valuable features from raw fields.
- This initiative seeks to streamline the process of making sense of raw data, empowering the data science team to construct more accurate forecasting models.
Please check the dataset at: "Dataset Link"
- data: tells whether the data is testing or training data
- trip_creation_time: Timestamp of trip creation
- route_schedule_uuid: Unique Id for a particular route schedule
- route_type: Transportation type
- FTL(Full Truck Load): FTL shipments get to the destination sooner, as the truck is making no other pickups or drop-offs along the way
- Carting: Handling system consisting of small vehicles (carts)
- trip_uuid: Unique ID given to a particular trip (A trip may include different source and destination centers)
- source_center: Source ID of trip origin
- source_name: Source Name of trip origin
- destination_cente: Destination ID
- destination_name: Destination Name
- od_start_time: Trip start time
- od_end_time: Trip end time
- start_scan_to_end_scan: Time taken to deliver from source to destination
- is_cutoff: Unknown field
- cutoff_factor: Unknown field
- cutoff_timestamp: Unknown field
- actual_distance_to_destination: Distance in Kms between source and destination warehouse
- actual_time: Actual time taken to complete the delivery (Cumulative)
- osrm_time: An open-source routing engine time calculator which computes the shortest path between points in a given map (Includes usual traffic, distance through major and minor roads) and gives the time (Cumulative)
- osrm_distance: An open-source routing engine which computes the shortest path between points in a given map (Includes usual traffic, distance through major and minor roads) (Cumulative)
- factor: Unknown field
- segment_actual_time: This is a segment time. Time taken by the subset of the package delivery
- segment_osrm_time: This is the OSRM segment time. Time taken by the subset of the package delivery
- segment_osrm_distance: This is the OSRM distance. Distance covered by subset of the package delivery
- segment_factor: Unknown field
- Observations on the number of rows and columns.
- Dimensions of the dataset.
- Identifying data types of all attributes.
- Conversion of categorical attributes to 'category' type if required.
- Categorical Attributes Conversion
- Detection and summary of missing values across the dataset.
- Descriptive statistics including mean, median, standard deviation, etc., for all attributes.
- Distribution Plots: Histograms or density plots for all continuous variables.
- Boxplots: For all categorical variables to understand the spread and detect outliers.
- Key observations and patterns identified during the exploratory analysis.
- Range and distribution of attributes.
- Identification and handling of outliers.
- Detailed comments on each univariate plot (individual variable analysis).
- Detailed comments on each bivariate plot (analysis of relationships between pairs of variables).
- Methodology for merging rows and aggregating fields.
- Justification for the chosen approach.
- Analysis of time and distance fields.
- Visual comparisons and insights drawn from these comparisons.
- Techniques used for handling missing values.
- Methods for detecting and treating outliers.
- Analysis of relationships between newly aggregated fields.
- Statistical and visual methods used to assess these relationships.
- Strategies for encoding categorical variables (e.g., one-hot encoding, label encoding).
- Processes for normalizing or standardizing columns.
- Rationale behind choosing normalization or standardization.
- Patterns observed in the data with potential business implications.
- For example: Analysis of order origins (state, corridor, etc.).
- Identification of the busiest corridors, average distances, and time taken.
- Actionable items for the business based on the analysis.