Skip to content
View PS-O5's full-sized avatar
:atom:
Researching
:atom:
Researching
  • Bayreuth, Germany
  • 02:47 (UTC +01:00)

Block or report PS-O5

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
PS-O5/README.md

Pratik Suryawanshi (PS_O5)

SoC & HPC Engineer | C/C++ & CUDA | Linux Kernel, RTOS, ARM, RISC-V | Embedded AI

Specializing in Heterogeneous Computing, Hardware-Software Co-Design, and Scientific AI.




🚀 Professional Summary

I am an R&D Engineer bridging the gap between Scientific Computing and Embedded Hardware. My work focuses on running complex physical models (PDEs) and AI inference on specialized silicon with strict real-time constraints.

  • Master's Thesis: Real-Time Determinism of Equivariant GNNs on RISC-V (Bare-metal/FreeRTOS).
  • Key Interest: Eliminating the "Real-Time Gap" in AI for Scientific Computing using Vector Intrinsics (RVV/NEON) and Heterogeneous SoCs.

🛠️ Technical Arsenal

Domain Technologies & Tools
HPC & Acceleration CUDA, OpenMP, MPI, SIMD (AVX/RVV), AHMED Library
Embedded Systems FreeRTOS, Embedded Linux (Yocto), Bare-metal C, UART/I2C/SPI Drivers
Hardware Design RISC-V (T-Head TH1520), ARM Cortex-A/M, SystemVerilog, SoC Partitioning
Algorithms Numerical Methods (Elliptic PDEs), FFT/IFFT, Graph Neural Networks (GNN)
DevOps & Tools CMake, Docker, Git, NVIDIA Nsight, Verilator

🔬 Featured Projects & Research

Note: Some repositories are private for IP reasons. Detailed documentation available upon request.

Project Tech Stack Impact / Metric
RISC-V GNN Accelerator(Ongoing) C, RVV Intrinsics, FreeRTOS Aiming for 3.5x speedup on MACE kernels via vectorization; with <1ms deterministic latency.
Automotive Simulation Engine CUDA, C++, NVIDIA Nsight Optimized GPU memory patterns for ZF Group, enabling real-time vehicle dynamics solving.
SoC Partitioning Framework Python, SystemC, Graph Theory Reduced NoC traffic by 40% for Siemens multi-core architectures.
Radar/SAR Imaging Pipeline CUDA, OpenMP, Jetson TX2 Accelerated FFT/IFFT kernels by 600% for high-fidelity radar imaging.

🧘‍♂️ The Human Side

When I am not optimizing kernels or debugging Verilog:

  • Music: Classical Flute (Learning & Practice).
  • Philosophy: Student of Tantra (Science of Inner Transformation) & Dvait-Advait Philosophy.
  • Sport: Swimming & Endurance Training.

"Seeker of knowledge. Builder of ideas. Explorer of the inner and outer worlds."

Pinned Loading

  1. th1520-accelerator-framework th1520-accelerator-framework Public

    A minimal bare-metal development framework for the T-Head C910 (TH1520) RISC-V 64-bit processor running FreeRTOS. Designed as a baseline for developing and testing custom hardware accelerators usin…

    C

  2. SoC-Architecture-and-Performance_Siemens_Digital_Industries_Software SoC-Architecture-and-Performance_Siemens_Digital_Industries_Software Public

    A performance-aware partitioning project for multi-core SoC architectures. A custom partitioning algorithm emphasizing data locality and load balancing, achieving a ~40% reduction in inter-partitio…

    C++

  3. Simulation-Engine-ZF_Group Simulation-Engine-ZF_Group Public

    A high-fidelity simulation engine for automotive validation, utilizing CUDA for massive parallel physics workloads. Optimized memory hierarchy usage (Shared Memory & Register tiling) to minimize gl…

    Makefile

  4. OFPrimer OFPrimer Public

    Forked Repository from OpenFOAM Primer for implementing custom solvers and algorithms.

    4 1

  5. HPC_Applications HPC_Applications Public

    These are few HPC implementation codes/snippets which I am sharing with permission to share them. You can use them if you want. :)

    C++

  6. Embedded-Software Embedded-Software Public

    Embedded Systems' Software/Firmware

    C++