Peopledecode Solutions Pvt Ltd

AWS HPC Subject Matter Expert - Cluster Management

Click Here to Apply

Job Location

bangalore, India

Job Description

Role : AWS High Performance Computing (HPC) Subject Matter Expert Position Overview : We are seeking an experienced Subject Matter Expert in AWS High Performance Computing to architect, implement, and optimize HPC solutions on AWS. This role will provide technical leadership in designing and managing large-scale computational workloads, parallel computing environments, and HPC clusters in the AWS cloud. Key Responsibilities : - Design and implement scalable HPC architectures on AWS for complex computational workloads - Provide technical leadership in HPC solution architecture, including cluster management, job scheduling, and workflow optimization - Guide teams in implementing best practices for AWS HPC services including Parallel Cluster, Batch, FSx for Lustre, and EFA Optimize cost and performance of HPC workloads on AWS - Develop automation solutions for HPC infrastructure deployment and management - Lead technical discussions with stakeholders to understand computational requirements and propose Qualifications : - Bachelor's degree in Computer Science, Engineering, or related technical field - 7 years of experience in HPC systems administration or architecture - 5 years of hands-on experience with AWS services AWS - Professional level certification (Solutions Architect or DevOps Engineer) - Strong experience with Linux/Unix systems administration - Expertise in HPC schedulers (Slurm, AWS Batch, GridEngine) - Proficiency in scripting languages (Python, Bash, etc.) Preferred Qualifications : - Master's degree or Ph.D. in related field Experience with container technologies (Docker, Singularity) - Knowledge of ML/AI frameworks and their HPC requirements - Expertise in parallel programming (MPI, OpenMP) Experience with CFD, FEA, or other scientific computing applications - Background in research computing or scientific domains Technical Skills : - AWS Services - Expertise AWS Parallel Cluster AWS Batch Amazon FSx for Lustre Elastic Fabric Adapter (EFA) EC2 Instance Types (especially HPC-optimized) S3 and storage solutions - CloudFormation/CDK - AWS Identity and Access Management (IAM) HPC & Computing Skills Cluster management and orchestration - Job scheduling and workload management Parallel file systems - Performance optimization Network optimization Queue management Resource monitoring and metrics - Infrastructure as Code (IaC) HPC architecture documentation - Performance optimization reports Best practices guides - Technical training materials Implementation playbooks Cost optimization strategies . (ref:hirist.tech)

Location: bangalore, IN

Posted Date: 11/27/2024
Click Here to Apply
View More Peopledecode Solutions Pvt Ltd Jobs

Contact Information

Contact Human Resources
Peopledecode Solutions Pvt Ltd

Posted

November 27, 2024
UID: 4943654164

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.