We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We're in the top 50 companies to watch in LA.
Our client is a USA-based company producing video solutions with the mission to advance scientific research and education. Their institutional clients comprise over 1,000 universities, colleges, and biopharma companies, including such leaders as Harvard, MIT, Yale, and Stanford. As a rapidly growing company, with offices in the USA, UK, Australia, and India servicing clients in over 60 countries, our client is seeking talented individuals to join their company.
Our client is looking for an amazing Site Reliability Engineer who will be part of their centralized Site Reliability Team. You will play an integral role in the deployment of highly scalable systems, optimization, documentation, and support of the infrastructure components of their software products hosted on AWS. Cloud Infrastructure and Operations are critical in enabling us to provide users with their technology offerings.
Responsibilities:Develop web applications with a keen focus on user experience.Work closely with our existing tech event-driven tech stack: Python, Elasticsearch, Typescript, and AWS Cloud Native architecture.Build APIs to ensure seamless data flow and storage.Actively collaborate with designers, front-end experts, other engineers, stakeholders, and clients.Participate in code reviews, knowledge sharing sessions, and paired programming exercises.Assure application reliability and quality, especially in a production setting with heightened user traffic and data processing.Actively participate in project scoping, estimating, and planning.Learn and evolve your skills using the latest technology tools in a rapidly growing company.Work on challenging problems, innovate, and positively impact many people's lives while having fun doing it.Minimum Requirements:Upper-intermediate to fluent speaking and writing English. Able to have a real-time conversation.3+ years of full-time hands-on Site Reliability Engineer experience.3+ years of full-time hands-on DevOps experience.2+ years of full-time hands-on AWS experience.2+ years of full-time hands-on Docker experience.2+ years of full-time hands-on Kubernetes experience.2+ years of full-time hands-on IAC (Infrastructure as Code) experience.1+ year of full-time hands-on Software Development or experience with Crossplane.1+ years of full-time hands-on Terraform experience.Extensive in-depth experience with cloud-based provisioning, monitoring, troubleshooting, and related SRE and DevOps technologies, in addition to networking knowledge.MUST have working experience with cloud-native infrastructure such as AWS or GCP (ideally AWS).MUST understand AWS VPC, subnets, Network ACLs, Security Groups, IAM Role, and EKS.Experience configuring Kubernetes RBAC Authorization, Ingress controller, ServiceAccount, and AWS role annotations.Strong experience with CI/CD automation and configuration management.Experience with monitoring and observability systems such as New Relic, DataDog, Grafana, Kibana, CloudWatch, and Kafka.Ability to triage and resolve incidents and lead incident investigations.Experience with security practices, credential rotations, and secrets management systems like the Vault project.Must be able to ensure Agile/Scrum concepts and principles are adhered to and be a voice of reason.Experience working in a 24/7 on-call, highly transactional, or streaming production environment.Working knowledge of GitOps, FluxCD, or ArgoCD.Building Kubernetes Operator is a plus.Go (programming language) expertise.Crossplane experience.Bachelor's degree in Computer Science or equivalent demonstrated ability.The majority of our clients are venture-backed startups at the growth stage. Usually, at this stage, the company already achieved a product-market fit and is looking to expand rapidly. That's where we bring the best engineering practices, strong architecture, the latest technologies, and consistent processes to help companies scale.
#J-18808-Ljbffr