The Site Reliability Engineer II position will report to the Lead Cloud Engineer. As an SRE II Engineer, you will: Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems. Develop and manage IaC to ensure reliable, scalable, and high-performance systems, reducing configuration drift and enabling rapid recovery. Implement and maintain both in-house and SaaS-based tools to measure SLOs, SLAs, and SLIs, ensuring we meet our reliability targets and provide transparency into system health. Identify opportunities for automation across the infrastructure to minimize manual interventions, streamline operations, and improve response times. Participate in on-call rotations, respond to incidents, conduct root cause analyses, and contribute to post-incident reviews to drive improvements. Work closely with cross-functional teams to enhance system design, support code deployments, and optimize system performance. About You: 3+ years of professional experience in Site Reliability Engineering or a similar role, with a focus on infrastructure, automation, and system reliability. Hands-on experience with cloud providers (AWS), containerization (Kubernetes, Docker), CI/CD pipelines, and observability tools (e.g., Prometheus, Grafana or New Relic/Splunk). Willing to travel to the Oakland office monthly to engage with team members and strengthen collaboration. You enjoy learning new technologies, stay adaptable in a dynamic environment, and thrive in a team-oriented setting where shared goals are prioritized. Even Better: Passionate about seeking opportunities to innovate and implement changes that enhance system reliability and client satisfaction. Champions self-service infrastructure solutions to empower development teams and accelerate deployment cycles. Embodies continuous improvement and is committed to driving projects beyond "good enough" toward operational excellence. Proactively identifies potential issues and implements preventive measures to ensure consistent system uptime. Able to clearly document processes and communicate with technical and non-technical stakeholders to ensure alignment. Where: This role will be based in the San Francisco Bay Area. While you'll enjoy the flexibility of remote work, we also love to see our Earnies face-to-face! We ask you to join us at our Oakland office for 3 consecutive days a month for team collaboration and some fun. It's a chance to connect, share ideas, and maybe even grab some coffee together! #LI-NS1
#J-18808-Ljbffr