Skip to content

Engineering-focused portfolio showcasing relational query optimization, distributed spatial analytics with Apache Spark, spatiotemporal hotspot detection, and embedded NoSQL storage using RocksDB.

Notifications You must be signed in to change notification settings

JananyaPS/Data-Systems-Engineering

Repository files navigation

Data Systems Engineering

A curated portfolio of data systems projects focused on performance-aware design, scalability, and real-world data processing.
These projects explore how modern data platforms are built—from relational query optimization to distributed analytics and storage-engine internals.

The repository demonstrates hands-on experience with database design, query execution, distributed computation, spatial analytics, and NoSQL storage systems, using industry-relevant tools and frameworks.

Projects

Relational Query Engineering

Design and optimization of a relational database using PostgreSQL, emphasizing schema modeling, relationships, constraints, and efficient query execution. The project highlights practical aspects of SQL performance tuning and data organization.

Spark Spatial Analytics

Implementation of distributed spatial queries using Apache Spark and SparkSQL. Includes range queries, distance queries, and spatial joins implemented through custom UDFs, showcasing scalable geospatial data processing.

Spatiotemporal Gi* Hotspots

A distributed spatiotemporal hotspot detection pipeline using Apache Spark. The project applies the Getis-Ord Gi* statistic over space-time grids to identify statistically significant activity clusters at scale.

Embedded NoSQL Storage (RocksDB)

A C++-based embedded key-value storage layer built on RocksDB. Demonstrates core NoSQL storage concepts such as batch ingestion, multi-key retrieval, range scans, and persistent data management using an LSM-tree architecture.

Technology Stack

PostgreSQL • Apache Spark • SparkSQL • Scala • C++ • RocksDB • SQL • Distributed Systems

Datasets are intentionally not included due to size and licensing constraints. Each project README provides instructions on expected input formats and execution.

About

Engineering-focused portfolio showcasing relational query optimization, distributed spatial analytics with Apache Spark, spatiotemporal hotspot detection, and embedded NoSQL storage using RocksDB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published