Site Reliability Engineer Ii

Details of the offer

The Site Reliability Engineer II position will report to the Lead Cloud Engineer. As an SRE II Engineer, you will: Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems. Develop and manage IaC to ensure reliable, scalable, and high-performance systems, reducing configuration drift and enabling rapid recovery. Implement and maintain both in-house and SaaS-based tools to measure SLOs, SLAs, and SLIs, ensuring we meet our reliability targets and provide transparency into system health. Identify opportunities for automation across the infrastructure to minimize manual interventions, streamline operations, and improve response times. Participate in on-call rotations, respond to incidents, conduct root cause analyses, and contribute to post-incident reviews to drive improvements. Work closely with cross-functional teams to enhance system design, support code deployments, and optimize system performance. About You: 3+ years of professional experience in Site Reliability Engineering or a similar role, with a focus on infrastructure, automation, and system reliability. Hands-on experience with cloud providers (AWS), containerization (Kubernetes, Docker), CI/CD pipelines, and observability tools (e.g., Prometheus, Grafana or New Relic/Splunk). Willing to travel to the Oakland office monthly to engage with team members and strengthen collaboration. You enjoy learning new technologies, stay adaptable in a dynamic environment, and thrive in a team-oriented setting where shared goals are prioritized. Even Better: Passionate about seeking opportunities to innovate and implement changes that enhance system reliability and client satisfaction. Champions self-service infrastructure solutions to empower development teams and accelerate deployment cycles. Embodies continuous improvement and is committed to driving projects beyond "good enough" toward operational excellence. Proactively identifies potential issues and implements preventive measures to ensure consistent system uptime. Able to clearly document processes and communicate with technical and non-technical stakeholders to ensure alignment. Where: This role will be based in the San Francisco Bay Area. While you'll enjoy the flexibility of remote work, we also love to see our Earnies face-to-face! We ask you to join us at our Oakland office for 3 consecutive days a month for team collaboration and some fun. It's a chance to connect, share ideas, and maybe even grab some coffee together! #LI-NS1

#J-18808-Ljbffr


Nominal Salary: To be agreed

Source: Jobleads

Job Function:

Requirements

Maintenance Technician

Req ID: 448232 Address:?551 West Main Street Westmorland, CA, 92281  Benefits: * $15.50 - $17.00 p/hr * Fuel Your Growth with Love's - company funded tuition...


Loves Travel Stops & Country Store - California

Published 14 days ago

Hvac Tb Chiller Foreman (Union)

HVAC Truck Based Chiller Foreman (Union)What you will doWe are looking for a skilled Journeymen Chiller Foreman who has worked in commercial, industrial, and...


Johnson Controls - California

Published 14 days ago

Senior Solutions Engineer, West (Mid-Market)

At Webflow, our mission is to bring development superpowers to everyone. Webflow is a Website Experience Platform (WXP) that empowers modern marketing teams ...


Webflow - California

Published 14 days ago

Electrical Engineering Intern

Who We Are Humane is a team of proven industry experts who have invented, built, and shipped category-defining hardware and software products to billions of ...


Humane - California

Published 14 days ago

Built at: 2024-12-18T16:56:30.317Z