Hpc System Administrator

Details of the offer

This job was posted by https://illinoisjoblink.illinois.gov : For more information, please see: https://illinoisjoblink.illinois.gov/jobs/12269679 Department Provost Research Computing Center About the Department The University of Chicago Research Computing Center (RCC), a unit in the Office of Research, provides high-end research computing resources to researchers at the University of Chicago.
It is dedicated to enabling research by providing access to centrally managed High-Performance Computing (HPC), storage, and visualization resources.
These resources include hardware, software, high-level scientific and technical user support, and the education and training required to help researchers make full use of modern HPC technology and local and national supercomputing resources.
The Office of Research oversees the conduct of sponsored research, research program development, and contract management functions.
Job Summary The job designs automated, scalable, and rapidly deployable solutions to infrastructure development and server configuration.
Works independently to install, configure, and maintain operating systems.
Uses best practices and systems knowledge to monitor and alert systems, utility software, and firewalls.
Guides maintenance for production servers as well as Windows and Linux servers.
The University of Chicago Research Computing Center (RCC) is seeking a highly qualified HPC system engineer to join its system and operation team that builds and manages RCC HPC systems and facility operations.
The individual in this position will be involved in the management and administration of RCC hardware and software.
Responsibilities Installing, configuring, and maintaining large computer clusters/servers and software.
Day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components.
Management of the system's network switch, parallel file system and HPC software stack and tools.
10% Configuration of the scheduling and queuing system.
Diagnosing and resolving system operational problems quickly and effectively.
Coordinating with vendors to resolve hardware and software problems.
Assist users with access and other help desk ticket requests or issues.
Building and deploying open source software and software from vendors/partners.
Providing reliable and efficient backups/restores for all managed systems.
Maintaining and monitoring the security of the HPC systems and servers.
Documenting system administration procedures for routine and complex tasks.
Plans and installs necessary patches and upgrades for servers and their associated storage, network, communications, and peripheral sub-systems.
Installs and maintains an appropriate level of intrusion detection, monitoring, and auditing software as required.
Tracks compliance and maintains documentation for hardware, software, and service inventories for management reports.
Performs other related work as needed.
Minimum Qualifications Education: Minimum requirements include a college or university degree in related field.
Work Experience: Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.
Certifications: Preferred Qualifications Education: Bachelor's degree in Computer Science or closely related field.
Experience: A minimum of three years of Linux system administration experience in a large distributed computing environment.
At least two years experience in HPC system administration or managing large HPC clusters.
Technical Skills or Knowledge: Knowledge of Linux.
Experience scripting with one or more language such as Python, Shell, Perl.
Experience with Linux build automation tools such as puppet, Ansible, GIT, Docker, highly preferred.
Experience implementing automation and monitoring using shell scripting and other related tools strongly preferred.
Experience with installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.)
strongly preferred.
Experience with operating system deployment tools (e.g.
XCAT, ROCKS) strongly preferred.
Experience configuring, administering, and supporting network storage subsystems (e.g.
IBM, NetAppl DataDirect Network, LSI, etc.)
strongly preferred.
Experience with one or more distributed file systems (GPFS, Lustre, Gluster, etc.)
strongly preferred.
Experience configuring, installing, tuning and maintaining scientific application software strongly preferred.
Experience configuring, installing, maintaining and/or using performance monitoring and optimization tools strongly preferred.
Experienc


Nominal Salary: To be agreed

Source: Appcast_Ppc

Requirements

Gcp Data Architect/Engineer

Our client is looking to fill the role of GCP Data Architect / Lead Engineer. This position will be fully remote. We are committed to being a place where the...


From Solomon Page - Illinois

Published 6 days ago

Director - Identity And Access Management

Job Description Overview The Identity and Access Management (IAM) Director will be responsible for end-to-end architecture, design, engineering, delivery, an...


From Fortune 500 Companies - Illinois

Published 6 days ago

Lead Data Engineer

11 West 19th Street (22008), United States of America, New York, New York Lead Data Engineer Do you love building and pioneering in the technology space? Do ...


From Capital One - Illinois

Published 6 days ago

Senior Software Engineer, Full Stack (Enterprise Platform Technology)

Center 3 (19075), United States of America, McLean, VirginiaSenior Software Engineer, Full Stack (Enterprise Platform Technology)Do you love building and pio...


From Capital One - Illinois

Published 6 days ago

Built at: 2024-11-06T08:58:07.017Z