Hi, my name is

Amit Singh.

I build resilient and observable systems

I’m a Site Reliability and Observability Engineer with a passion for crafting scalable, data-driven platforms. I specialize in cloud-native tooling, automation, and making infrastructure work smarter and more transparently.

About Me

I’m a Site Reliability and Observability Engineer with 14+ years of experience spanning full-stack development, performance engineering, and platform reliability.

Currently, I’m driving greenfield observability adoption at McCain, where I architect scalable telemetry pipelines using Grafana Cloud, OpenTelemetry, and Azure-native tools.

My passion lies in building resilient systems, enabling proactive incident response, and simplifying the way engineering teams interact with production.

When I’m not optimizing platforms, I’m preparing for MAANG interviews and exploring financial literacy, cloud architecture, and technical storytelling.

Here are a few technologies I've been working with recently:
  • Azure (AKS, Functions, Cosmos DB)
  • OpenTelemetry
  • Grafana Cloud
  • Prometheus & Loki
  • .NET Core
  • Terraform & GitHub Actions

Experience

Observability Engineer - McCain Foods
Dec 2024 – Present
  • Currently leading observability initiatives at McCain, where I’m building a scalable telemetry platform from the ground up using Grafana Cloud, OpenTelemetry, and Azure-native services.
  • My responsibilities include architecting and implementing full-stack observability for cloud and on-prem environments, improving incident detection, and enabling engineering teams with actionable insights across metrics, logs, and traces.
Expert Site Reliability Engineer - Finastra
Mar 2021 – Dec 2024
  • Spearheaded the transformation of Grafana Cloud observability management by automating infrastructure provisioning through Terraform and implementing label-based access control for secure, role-driven Grafana usage.
  • Defined and led the organizational strategy for adopting Grafana Cloud services while supporting 30+ product teams across Kubernetes, Terraform, ARO, Helm, Flux CD, and GitOps pipelines. Addressed a wide range of operational issues, including CI/CD failures, manifest updates, and observability onboarding.
  • Designed and implemented a self-service Azure DevOps pipeline for unified observability integration, using Ansible for configuration. The solution provided infrastructure, application, and log monitoring, as well as distributed tracing via OpenTelemetry.
  • Achieved a 100x reduction in turnaround time for onboarding observability across environments by automating Grafana Agent deployment and Prometheus/Loki/Tempo configuration management—adopted by over 150 machines across 30 environments within 3 months.
  • Built a custom toggle tool to enable or disable OpenTelemetry tracing for Java-based applications, significantly accelerating rollout and configuration cycles across environments.
Senior Performance Engineer - Wolters Kluwer
Feb 2018 – Mar 2021
  • Led performance engineering initiatives across multiple enterprise applications, ensuring high system reliability and scalability through data-driven performance analysis and testing.
  • Actively engaged throughout the project lifecycle, collaborating with stakeholders to validate performance expectations and ensure functional stability post-deployment. Developed and executed end-to-end performance test plans, including test case design, scripting, and reporting across three global regions.
  • Automated the collection and monitoring of server logs and performance metrics using the ELK stack, streamlining observability for production-like environments. Delivered reusable, optimized performance testing frameworks—achieving up to 90% model alignment using real-world workload simulations.
  • Conducted capacity planning by analyzing production trends and identifying mismatches between infrastructure and application behavior, recommending architectural and code-level improvements.
  • Interpreted large volumes of performance data, diagnosed bottlenecks, and presented root cause analyses to business and engineering stakeholders with actionable mitigation strategies.
Infrastructure Technology Specialist - Cognizant Technology Solutions
May 2016 – Feb 2018
  • Specialized in end-to-end performance engineering and infrastructure optimization for enterprise-grade applications across multiple environments.

  • Performed in-depth JVM and garbage collection (GC) analysis to optimize memory usage and successfully scale down application servers in performance environments, improving infrastructure efficiency and cost-effectiveness.

  • Conducted thorough application diagnostics using tools such as Splunk, Grafana, HP JMeter, Precise i3, and JDigger to analyze thread dumps and performance metrics. Built interactive Splunk dashboards for real-time test monitoring and insights.

  • Developed custom automation tools to streamline performance testing processes:

    • Shell utilities for rapid and controlled server restarts
    • A .NET (C#) utility to trace transactions across business layers
  • Managed full-spectrum performance testing activities—from requirements gathering, estimation, and planning to script development, workload modeling, test execution, and monitoring. Delivered detailed performance assessments and tuning recommendations across the tech stack.

Senior Project Engineer - Wipro Technologies
Dec 2015 – May 2016
  • Played a dual role in Performance Monitoring and Engineering, driving proactive performance analysis, JVM stability, and system tuning for large-scale enterprise applications.

  • As part of the Performance Monitoring team:

    • Led and monitored Load and Soak tests using tools such as CA Wily Introscope, BMC Performance Assurance, BMC Perceiver, and Oracle SQL Developer for ESB reporting.
    • Developed detailed monitoring reports and conducted client-facing presentations via email and live sessions.
    • Investigated portal JVM crashes and bottlenecks using tools like IBM Heap Analyzer, TMDA, Verbose GC Analyzer, Eclipse MAT, PMAT, IBM Support Assistant, and Log Analyzer.
    • Triage and escalated performance defects, ensuring timely resolution in coordination with development teams.
  • As part of the Performance Engineering team:

    • Profiled high-latency transactions at the client, server, and JVM layers using tools like Dynatrace Ajax Edition, HTTP Watch, jProfiler, CA Wily, browser dev tools, and SonarQube.
    • Led defect triage meetings, provided root cause insights, and coordinated performance fixes to reduce turnaround time.
    • Delivered actionable performance tuning recommendations to improve responsiveness and system throughput across key services.
Project Engineer - Wipro Technologies
Sep 2011 – Dec 2015
  • Designed and developed a Global Performance Test Reporting Tool to standardize report generation and accelerate performance result analysis across teams—reducing manual reporting effort by 31 hours per test and generating estimated annual savings of $1.2 million.

  • The tool was implemented as a thin-client web application using ASP.NET (frontend) and Microsoft SQL Server 2014 (backend), accessible through a browser-based interface. Users could upload HTML reports (in ZIP format), from which the tool would automatically extract, parse, and insert performance data into a structured database.

  • Key achievements include:

    • Reduced reporting effort from 8000 to 2000 hours per year.
    • Automated report generation, enabling main reports in 30 seconds and detailed reports in under 2 minutes.
    • Created a scalable and standardized framework for trend analysis and test result archival, enabling real-time performance insights and improved decision-making.
  • The tool became an integral part of the testing lifecycle, significantly enhancing operational efficiency and data reliability for global performance engineering teams.

Education

2015
Master of Technology (Software Engineering)
Birla Institute of Technology & Science, Pilani (Rajasthan, India).

Projects

CLI Productivity Toolkit – Alias Your Day
Shell
CLI Productivity Toolkit – Alias Your Day
A curated collection of daily-use CLI commands consolidated into efficient and memorable aliases to supercharge developer productivity. This project is designed to streamline repetitive tasks, reduce typing overhead, and serve as a personal command-line assistant across Linux/macOS environments. Ideal for SREs, DevOps, and developers who live in the terminal.

Achievements

Microsoft Azure Certified – AZ-900
Earned the Microsoft Azure Fundamentals (AZ-900) certification, demonstrating foundational knowledge of cloud concepts, Azure services, and governance principles.
Winner – iLab Code Games 2018
Secured 1st place in Wolters Kluwer’s internal coding competition, showcasing creativity, problem-solving, and full-stack development expertise.
Best Newcomer Award
Recognized by Cognizant Infrastructure Services – PACE Practice for outstanding contributions and quick ramp-up in a high-performance environment.
Best Practice Initiator Award
Honored by the Global Reporting Team for driving process innovation and initiating best practices that improved team efficiency.
Best Presentation – PASE Day
Received appreciation for delivering an exceptional technical presentation at the practice level during PASE Day.
Outstanding Contribution to Success
Awarded for demonstrating exceptional ownership, innovation, and consistent delivery beyond expectations.
Feathers in My Cap Award
Acknowledged for delivering innovative solutions and spearheading the development of the Global Performance Reporting Tool.
Star of the Quarter – NLI
Team-based recognition for delivering exceptional results and collaboration during critical project milestones.
Thanks, A Zillion – Client Appreciation
Received direct client recognition from Citi US for exemplary support and value-added contributions to the engagement.

Get in Touch

My inbox is always open. Whether you have a question or just want to say hi, I’ll try my best to get back to you!