Ctp Reliability And Monitoring Engineer

Details of the offer

Role: CTP Reliability and Monitoring Engineer Location: Plano, TX Duration: Long Term Contract

Responsible for ensuring the availability, performance, and reliability of our cloud-based infrastructure and services.
The primary focus of this role is designing, implementing, and managing robust monitoring and alerting systems to proactively identify issues and timely incident response.
This resource will work closely with the CTP Platform Engineering and Development teams to optimize services and maintain service uptime.


Duties include: Develop and maintain comprehensive monitoring solutions for cloud-based services and applications. Configure monitoring tools and systems to collect relevant metrics, logs, and traces. Create custom monitoring dashboards and reports using DataDog or other tools, to provide real-time insights into system performance and health. Continuously monitor the cloud infrastructure's performance and capacity, anticipating and addressing potential scalability issues. Proactively suggest and implement improvements to enhance the system's reliability, resilience, and fault tolerance. Work on automating tasks to streamline operational processes and reduce manual intervention. Collaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-users. Work with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measures. Ideal Qualifications: 3+ years' experience working with cloud platforms and services (AWS, Azure, GCP, etc.)
in a production environment. Solid understanding of monitoring and logging tools, such as Prometheus, Grafana, ELK stack, Splunk, etc. Experience with infrastructure as code (IaC) tools, like Terraform, CloudFormation, or Ansible. Strong scripting and automation skills (e.g., Python, Bash) to facilitate operational tasks. Knowledge of containerization technologies (Docker, Kubernetes) and microservices architecture. Familiarity with DevOps practices and Agile methodologies. Key Skills: DataDog, Grafana, Prometheus, Zabbix, Nagios, ELK Stack, Splunk, AWS, DevOps, Terraform, CloudFormation, Ansible, Python, Bash, Docker, Kubernetes.


Source: Appcast_Ppc

Job Function:

Requirements

Superintendent-Industrial And Healthcare Construction

Key Responsibilities: Project Oversight: Oversee all daily field operations to ensure the timely completion of the project.Supervise, coordinate, and schedul...


From Scott Humphrey Corporation - Texas

Published 10 days ago

Customer Engineer Restaurants

STAND 8 provides end to end IT solutions to enterprise partners across the United States with offices in Los Angeles, New York, New Jersey, Atlanta and more,...


From Stand 8 Technology Services - Texas

Published 10 days ago

Engineering Manager - Carbon Capture

ABOUT THE ROLE Join us for continued growth, strategy and development! The role reports to the Holcim North America Senior Project Manager for Carbon Ca...


From Holcim - Texas

Published 10 days ago

Reservoir Engineering Analyst

Overview We are looking for a motivated Reservoir Engineering Analyst to join our team. This role involves maintaining and enhancing analytics tools (primari...


From Safeguard Global Recruiting - Texas

Published 10 days ago

Built at: 2024-11-02T19:34:53.447Z