Skip to content

dineshc227/Profile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 

Repository files navigation

header

Site Reliability Engineer β€’ Observability β€’ Production Support β€’ Technical Support

Email Mobile Portfolio

🎯 Career Objective

  • Overall, 3+ years of experience in Site Reliability Engineering & Observability platforms & IT Infrastructure & Applications Production Support and Java Support Engineer
  • Experienced Observability Monitoring Engineer with over 3 years in administrative roles, specializing in providing 24/7 support for global customers in production environments.
  • Proficient in APM monitoring tools such as DataDog, Grafana, Kibana, Dynatrace, Splunk, OMI, Tidal, and Sitescope. Skilled in managing SLOs, SLIs, and SLAs, and well-versed in ITIL frameworks including incident, change, major, and problem management. Proven ability in Datadog administration, dashboard creation, and monitoring services in production environments.
  • API Development: Engineered secure and robust API endpoints for CRUD operations, ensuring data integrity and correct performance.
  • Debugging & Maintenance: Adept at bug fixing and debugging complex applications to maintain system health.
  • Frameworks: I have good knowledge in developing and troubleshooting applications using Spring Boot and Spring MVC.
  • Timely Resolution: Committed to diagnosing and resolving system issues to minimize downtime and impact.

Monitoring & Observability

Proficient in the end-to-end administration of a comprehensive APM and monitoring stack, including:

Datadog Grafana Kibana New Relic

Tools: Datadog | Grafana | Kibana | New Relic |

  • Datadog Administration: Onboarding services, configuring agents, and tuning metrics collection.
  • Visualization: Designing and building insightful dashboards tailored to SLOs/SLIs and business KPIs.
  • Alerting: Implementing and managing alert policies to reduce noise and improve MTTR.

Process & Framework

  • Service Management: Skilled in managing SLOs, SLIs, and SLAs to align IT services with business goals.
  • ITIL Practices: Well-versed in ITIL frameworks for Incident, Change, Major Incident, and Problem Management.

πŸ”‘ Key Skills

New Relic Java

  • ITIL: Incident, Change, Major Incident, Problem Management; SLOs, SLIs, SLAs (metrics, traces, logs).
  • Alerting: success/error/composite alerts, threshold tuning, refinement, noise/toil reduction.
  • App monitoring: triage in production, dev collaboration via JIRA, runbooks, dashboards, reporting.
  • Tooling: Grafana (error insights), Kibana (log analysis), Datadog admin (monitors, dashboards), PagerDuty (on-call).
  • Process: onboarding services to monitoring, gap analysis, RCA participation, weekly/monthly reporting.
  • Programming: Java, Python (custom metrics, light instrumentation).

Professional Experience


DXC Technology, Bangalore β€” Site Reliability Engineer (Dec 2022 – Present)

Client: Qatar Airways β€” Payments Monitoring Group

  • Provided 24/7 support to global customers for payments applications in production environments.
  • Managed and administered the full observability stack: DataDog, Grafana, Kibana, Dynatrace, Splunk, OMI, Tidal, Sitescope.
  • Implemented SLOs, SLIs, SLAs to ensure performance and reliability goals were met and measured.
  • I involve ITIL frameworks for Incident, Change, Major, and Problem Management.
  • Created and maintained comprehensive DataDog dashboards & monitors for real-time application performance tracking.
  • Onboarded new application services into production environments and performed gap analysis to ensure monitoring coverage.
  • Developed and refined alerts for KPIs such as success rate, error rate, and composite metrics to reduce noise and improve MTTR.
  • Collaborated with development teams via JIRA for ticket creation, escalation, and resolution tracking.
  • Configured and monitored alerts with PagerDuty to ensure timely incident response and on-call rotations.
  • Performed advanced observability tasks: custom dashboards, widgets, panels in DataDog; threshold tuning; noise reduction in alerts.
  • Analyzed and exported observability data from DataDog into Google Sheets, reporting key insights and trends to business stakeholders.
  • Monitored applications, services, and jobs across DataDog, Grafana, Kibana.
  • Prepared detailed incident checklists and shared structured, client-facing updates.
  • Worked extensively on SLA & SLI definitions for critical payments services in production systems.
  • Configured JIRA dashboards as per project requirements for enhanced visibility and reporting.

Wipro, Bangalore β€” Site Reliability Engineer (Apr 2022 – Nov 2022)

Client: HSBC β€” Payments Monitoring

  • Provided 24/7 L1/L2 support to global customers for critical payments applications in production environments.
  • Managed and administered the APM/Monitoring stack: Datadog, Grafana, Kibana, OMI, Tidal, SiteScope.
  • Configured and tuned alert thresholds, significantly reducing noise from ineffective alerts and improving signal clarity.
  • Monitored and supported applications, services, and batch jobs across multiple platforms to ensure system health.
  • Created and escalated JIRA tickets to development teams for faster incident resolution and tracking.
  • Prepared structured incident checklists and runbooks, sharing clear documentation with clients and business teams.
  • Defined and monitored SLA/SLI metrics for payment services using Datadog to uphold service quality agreements.
  • Built and customized JIRA dashboards based on project requirements to streamline workflow and visibility.
  • Configured PagerDuty for effective alerting and implementing escalation workflows to ensure on-call responsiveness.
  • Performed detailed incident analysis and engaged with Root Cause Analysis (RCA) teams to drive long-term fixes.
  • Generated and shared daily, weekly, and monthly status reports with business stakeholders to communicate system health and incidents.
  • Conducted basic front-end troubleshooting of applications and engaged next-level support teams for complex issues.
  • Provided front-line and second-level IT operations support, ensuring outstanding client service delivery.
  • Supported weekend server patching activities, including comprehensive pre- and post-patching validation checks.

πŸ› οΈ Technical Stack

πŸ“Š Monitoring & Observability

Datadog Grafana Kibana New Relic Dynatrace Splunk HP OMi SiteScope

🎫 Ticketing Systems

JIRA

☁️ Cloud Platforms

AWS Azure

πŸ’» Programming Languages

Python Java JavaScript

πŸ—„οΈ Databases

MySQL PostgreSQL PL/SQL

πŸ”„ CI/CD

Jenkins GitHub AWS DevOps Azure DevOps

πŸ“‹ Practices & Frameworks

SLOs SLIs SLAs ITIL Agile Scrum SRE Incident Management Problem Management Change Management

πŸ–₯️ Operating Systems

Unix Linux Windows Ubuntu

⚠️ Alerting & Performance

Performance Metrics Alert Refinement Noise Reduction

🎯 Java Ecosystem

J2EE Spring Boot Spring Data JPA Spring Actuator Spring Cloud

Contact Me

If you'd like to collaborate, ask a question, or just say hello β€” feel free to drop a message!

Email Mobile Location

πŸ“Š GitHub Stats

Metric Details
πŸ† Total Contributions Contributions
πŸ“‚ Languages Used Languages
⭐ Total Stars Stars

About

This is my profile

Resources

Stars

Watchers

Forks