HPC Systems Administrator - Center for Research Computing (Fixed term for 2 years)

Posting Number 2024-13565
Posted Date 3 weeks ago(6/27/2024 6:46 AM)
Department
Center for Research Computing
School/Division
NYU Abu Dhabi (AD00001)
Compensation Grade
Band 52
Is relocation available for this job?
Yes
FT/PT
Full-Time
Category
Business/Professional Administrative

Position Summary

UAE nationals are encouraged to apply 

 

New York University Abu Dhabi seeks to appoint a HPC Systems Administrator reporting to the Senior Director, Center for Research Computing. 

 

This position is part of a team of high-performance computing systems administrators that would operate, manage and maintain the HPC infrastructure, overseeing the system for performance and security. This individual will participate in the evaluation of the latest hardware technologies and cluster management tools and make recommendations.

 

Key Responsibilities: 

  • Oversee smooth operation of hardware and software on HPC systems in support of university research computing systems. Perform installation, testing, maintenance, upgrades and administration of operating system and application software; perform account maintenance
  • Fine-tune system configuration for reliability and performance. Perform file management and administration tasks; troubleshoot problems; ensure system remains operational and assist with access to the system
  • Monitor systems for performance and security; analyze malfunctions; troubleshoot and resolve problems in response to system/security
  • Plan, implement, debug, document and maintain HPC systems in conjunction with fellow HPCSystem administrators to support the users applications and to maintain and enhance the University's research computing environment
  • Implement system policies to adhere with HPC and NYU policies and standards; recommend policies where applicable
  • Research and recommend configurations for new systems based on vendor and industry trends and contacts. Maintain up-to-date knowledge of the HPC hardware and management tools

Qualifications

Required Education: 

  • Bachelor’s degree + 5 years experience (or educational equivalent) in relevant discipline
  • RHCSA or RHCE certification or equivalent
  • ITIL v3 certification or equivalent

Preferred Education: 

  • M.Sc. degree or higher + 3 years experience (or educational equivalent) in relevant discipline
  • Relevant certification/training in various Cluster management tools
  • Relevant certification/training in various Cluster provisioning tools
  • Relevant certification/training in high-performance networking system such as Infiniband
  • Relevant certification/training in various parallel file systems e.g. Luster

Required Experience: 

  • Experience with Linux systems administration
  • Experience with network and security administration
  • Experience with cluster design and system tunings
  • Experience with client server applications
  • Experience with programming experience with modern languages and
  • Experience with parallel application software, protocols, tools and utilities
  • Management Linux systems, fundamentals of networking, Linux kernel design and OS implementations (e.g., Redhat), software and network security
  • Provisioning and configuration tools (e.g., Ansible)
  • Understanding of Linux file systems (e.g., Ext 3)
  • Schedulers and Workload Manager
  • Excellent problem identification and troubleshooting, system performance tuning
  • Excellent organizational and communication skills; ability to clearly communicate technical concepts to non-technical audience

Preferred Experience: 

  • Experience with hardware maintenance of HPC systems
  • Experience with deploying and managing virtual environments (e.g. VMware etc.)
  • Management and Design of HPC systems, fundamentals of Infiniband networking
  • Cluster provisioning and configuration tools (Warewulf, SaltStack)
  • Understanding of parallel file systems (e.g., Lustre)
  • HPC Schedulers and Workload Managers (e.g., Slurm)
  • Application Parallelization (MPI and OpenMP), working knowledge of core programming languages (C, Python, Perl)
  • Working knowledge of parallel applications installation, debugging, and support
  • Ability to provide technical

Additional Information

About NYUAD

NYU Abu Dhabi is a degree-granting research university with a fully integrated liberal arts and science undergraduate program in the Arts, Sciences, Social Sciences, Humanities, and Engineering. NYU Abu Dhabi, NYU New York, and NYU Shanghai, form the backbone of NYU’s global network university, an interconnected network of portal campuses and academic centers across six continents that enable seamless international mobility of students and faculty in their pursuit of academic and scholarly activity. This global university represents a transformative shift in higher education, one in which the intellectual and creative endeavors of academia are shaped and examined through an international and multicultural perspective. As a major intellectual hub at the crossroads of the Arab world, NYUAD serves as a center for scholarly thought, advanced research, knowledge creation, and sharing, through its academic, research, and creative activities.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share via Social

Need help finding the right job?

We can recommend jobs specifically for you! Click here to get started.