Site Reliability Engineer

Details of the offer

Job Title : Site Reliability Engineer – Observability Overview : We are seeking a Site Reliability Engineer III to develop and maintain our observability platform. This role focuses on ensuring the reliability, performance, and scalability of microservices, Kubernetes clusters, and cloud infrastructure. You'll collaborate with cross-functional teams to deliver metrics, logs, and traces for system health and performance, enabling proactive monitoring and troubleshooting. Responsibilities : Develop and maintain a resilient observability stack using tools like Prometheus, Grafana, Loki, InfluxDB, Telegraf, and OpenTelemetry. Partner with teams to identify monitoring needs and provide data-driven insights. Implement monitoring solutions across diverse environments, including Kubernetes, cloud, and on-premises setups. Aggregate logs, metrics, and traces for end-to-end system visibility. Set up alerts and thresholds for proactive performance monitoring. Create dashboards to track system health and resource utilization. Support incident response efforts and perform post-incident analyses for continuous improvement. Document observability best practices, setups, and troubleshooting techniques. Stay current on observability technologies and trends. Preferred Qualifications : Bachelor's degree in a relevant field or equivalent experience. 3–5 years of experience in observability, SRE, DevOps, or platform engineering. Experience with observability solutions for complex infrastructure (e.g., Kubernetes, AWS, Azure, on-prem vSphere). Proficiency in Git and CI/CD workflows; familiarity with cloud platforms and containerized environments. Relevant certifications are a plus. Skills : Deep knowledge of observability principles, monitoring tools, and cloud-native technologies. Strong scripting and automation skills (Python, Bash, or Go). Proficient in data visualization (Grafana, Kibana). Effective troubleshooting using logs, metrics, and traces. Collaborative and adaptable with a continuous improvement mindset. This role is perfect for those passionate about reliable, seamless systems and proactive monitoring. Join us to drive innovation and resilience in our observability practices!

Nominal Salary: To be agreed

Source: Talent2_Ppc

Job Function:

Engineering

Requirements

Similar offers

See more similar offers

Diesel Technician

Position Description Immediately hiring a Permanent Full Time Senior Level Diesel Technician to support our Truck Fleet at Ryder in Forest Park, Georgia ...

Ryder System - Georgia

Published 6 days ago

Diesel Technician

Position Description Immediately hiring a Permanent Full Time Experienced Mid-Level Diesel Technician to support our Truck Fleet at Ryder in Buford, Georgi...

Ryder System - Georgia

Published 6 days ago

Aircraft Component Technician (Hydraulic Lab)

Aircraft Component Technician (Hydraulic Lab) in GAC Savannah Unique Skills: Hiring for Multiple Shifts. Must be able to work any shift as required. Candida...

Gulfstream Aerospace Corporation - Georgia

Published 6 days ago

Aircraft Component Technician (Hydraulic Lab)

Aircraft Component Technician (Hydraulic Lab) in GAC Savannah Unique Skills: Hiring for Multiple Shifts. Must be able to work any shift as required. Candida...

Gulfstream Aerospace Corporation - Georgia

Published 6 days ago

Built at: 2024-11-15T06:43:05.986Z