Site Reliability Engineer (Internal Engineering) (Remote)

Details of the offer

The Internal SRE ensures thereliability, scalability, and performanceof internal systems and infrastructure. This role involvesmonitoring, automation, incident management, and maintaining self-hosted platformsto support smooth development operations. The Internal SRE works closely with cross-functional teams to manageGitLab CI/CD workflows and cloud infrastructure on AWS. The position emphasizesproactive problem-solving, automation, and collaborationto continuously improve system stability and efficiency.
Responsibilities:
Manage and maintainGitLab environmentsto ensure high availability and security.
Design and implementCI/CD pipelinesto automate software delivery.
Monitor and troubleshoot system performance issues, usingobservability tools like Prometheus, Grafana, or Datadog.
Collaborate with development teams to align infrastructure efforts with project needs and timelines.
Build and maintaininfrastructure as code (IaC) solutionsusing tools like Terraform and Ansible.
ManageAWS services, including ECS, S3, API Gateway, DynamoDB, RDS, IAM, and VPC.
Participate inincident response, conducting root cause analysis and post-incident reviews.
Automate manual tasks to improve operational efficiency and reduce technical debt.

Minimum Qualifications:
Bachelor's degree in Computer Science, Information Technology, or a related field.
Equivalent work experience in SRE, DevOps, or infrastructure management may substitute for formal education.
GitLab Administration:Experience managing and securing self-hosted GitLab environments.
CI/CD Workflows:Expertise in designing and maintaining automated pipelines for continuous delivery.
AWS Cloud Expertise:Strong knowledge of AWS services, includingECS, S3, API Gateway, DynamoDB, RDS, IAM, VPC, and Lambda.
Infrastructure-as-Code:Proficiency in Terraform, Ansible, or similar tools.
Monitoring and Observability:Experience with Prometheus, Grafana, Datadog, or other observability platforms.
Automation and Scripting:Proficiency in Python, Bash, or other scripting languages to automate tasks.
Incident Management:Ability to lead incident response efforts and conduct root cause analysis.
Collaboration and Communication:Strong interpersonal skills to work effectively across teams and with stakeholders.

The base pay for this position ranges from $110,000 - $125,000, which will vary depending on how well an applicant's skills and experience align with the job description listed above.
We will accept applications until 2/18/2025.


Nominal Salary: To be agreed

Source: Greenhouse

Requirements

The Air Force Jag Corps - Military Attorney

If the traditional civilian career path has left you wanting more, both personally and professionally, a JAG career will re-inspire your love of the law. Thi...


Air Force - Florida

Published 7 days ago

Implementation Analyst

Strategic Solutions Unlimited, Inc. (SSU) is a strategically focused, innovative organization that provides world class services and customizable modular con...


Strategic Solutions Unlimited - Florida

Published 8 days ago

Security (Desoto Campus)

South Florida State College Job Posting: Security (DeSoto Campus)South Florida State College is an open-access, higher education institution dedicated to pro...


South Florida State College - Florida

Published 8 days ago

$130/Eval-Winter Haven, Fl- Physical Therapist(Dpt,Pt,Rpt) : Home Setting

Seeking a DPT or Physical Therapist to cover therapy services in Winter Haven, FL to provide treatment in a Home Setting: $130/eval We are an outpatient the...


Sobeinnovativerehab - Florida

Published 8 days ago

Built at: 2024-11-24T21:14:48.634Z