Reliability Engineer - Ansible And Datadog - Wfh - 1099 / C2C Ok

Reliability Engineer - Ansible And Datadog - Wfh - 1099 / C2C Ok
Company:

Datamanagementgroup



Job Function:

Engineering

Details of the offer

Reliability Engineer - Ansible and DataDog - WFH - 1099 / C2C okLooking for an experienced Reliability Engineer to support critical projects for our Technology, Infrastructure & Operations teams. Work from home, work to be done primarily on US Eastern Timezone.
Minimum of 7 years performance engineering and performance testing experienceMUST HAVE 3+ years of recent work with AnsibleMUST HAVE 4+ years of work with DataDogExcellent English Communications skills - Verbal & Written (idiomatic English)Experience managing performance engineering efforts for applications strongly preferredKnowledge of developing scripts for monitoring using PowerShell, Python, and Shell scripting5 years of Splunk programming proficiency is highly preferred5-6 years experience using .NET and Java application and Application Monitoring Tools like AppDynamics or DataDog are highly preferredProficiency in performance tuning is preferredGood understanding of the UI, Middleware, and backend DatabasesBA/BS degree in Information Technology, Computer Science, or related field of studyDuties include:
Develop and maintain comprehensive monitoring solutions for cloud-based services and applicationsConfigure monitoring tools and systems to collect relevant metrics, logs, and tracesCreate custom monitoring dashboards and reports using Splunk, DataDog, DynaTrace, or other tools, to provide real-time insights into system performance and healthContinuously monitor the cloud infrastructure's performance and capacity, anticipating and addressing potential scalability issuesProactively suggest and implement improvements to enhance the system's reliability, resilience, and fault toleranceWork on automating tasks to streamline operational processes and reduce manual interventionCollaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-usersWork with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measuresUnderstand the overall architecture of our systems to identify gaps in monitoring and troubleshoot issuesConfigure and maintain custom dashboards and alerts in various monitoring toolsCreate custom reports, deliver report presentations to various stakeholdersDevelop scripts for monitoring PowerShell, Python, Shell scriptingDevelop metrics for both the business and technical teams to determine the health of systemsProvide on-call support as neededLeads and coordinates performance engineering for medium to large initiativesCollect and document expected system performance and operational characteristicsCollect and/or prepare test data for test executionDevelop and execute performance tests including load, stress, endurance, fail-over, and interoperabilityConduct technical analysis of performance test results and production systems, and provide recommendations on performance tuning, systems, and infrastructure. Identify, report, and review defects in assessing system performance and stabilityDefining the strategy for enabling performance diagnostics and monitoring using an Application Performance Management (APM) tool, other monitoring tools, and diagnostic techniquesCollaborating with developers to promote the concept of performance engineering during all phases of the SDLC to detect and correct performance issues earlier in the lifecycleLeads peer reviews to ensure the completeness of all test assets createdResolve performance and stability issues in the performance test environmentDevelop a performance engineering work plan structure and project scheduleReview architectural design for performance risks and potential issuesPrepare capacity analysis when applicable

#J-18808-Ljbffr


Source: Grabsjobs_Co

Job Function:

Requirements

Reliability Engineer - Ansible And Datadog - Wfh - 1099 / C2C Ok
Company:

Datamanagementgroup



Job Function:

Engineering

Distinguished Engineer - Card Machine Learning

Distinguished Engineer - Card Machine LearningAt Capital One, we believe that machine learning represents the biggest opportunity in financial services today...


From Capital One - Illinois

Published 13 days ago

Automotive Mechanic - Travel Required

Overview: Territory Mechanic - Out of State Travel RequiredAtlanta, GA The Community Choice Financial® Family of Brands is currently seeking a Territory Mech...


From Community Choice Financial Family Of Brands - Illinois

Published 13 days ago

Qa Automation Engineer

Dice is the leading career destination for tech experts at every stage of their careers. Our client, Deloitte, is seeking the following. Apply via Dice today...


From Dice - Illinois

Published 12 days ago

Engineer In Training

Apply Here: https://career4.successfactors.com/sfcareer/jobreqcareer?jobId=84349&company=CPR Canadian Pacific is a transcontinental railway in Canada and t...


From Canadian Pacific Kansas City - Illinois

Published 12 days ago

Built at: 2024-09-27T17:27:09.353Z