Aqilea

GCP Site Reliability Engineer

Job Location

bangalore, India

Job Description

Role : Senior Site Reliability Engineer. Experience : 6 to 9 years. Work Location : Bangalore. Responsibilities : - Design, implement, and maintain highly available and scalable systems and infrastructure. - Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs). - Proactively identify and address potential reliability risks. - Develop and maintain automation tools and scripts to streamline operational tasks. - Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation. - Automate deployment pipelines and configuration management. - Design and implement comprehensive monitoring and alerting systems. - Utilize monitoring tools such as Prometheus, Grafana, ELK stack, or similar. - Develop dashboards and alerts to detect and respond to system issues. - Lead incident response efforts, including troubleshooting, root cause analysis, and post-incident reviews. - Develop and maintain incident response procedures and playbooks. - Participate in on-call rotations. - Perform capacity planning and performance analysis to ensure systems can handle future growth. - Identify and implement performance optimizations to improve system efficiency. - Conduct load testing and performance benchmarking. - Collaborate with development teams to integrate SRE practices into the software development lifecycle. - Communicate effectively with technical and non-technical stakeholders. - Document operational procedures and best practices. - Implement security best practices and ensure compliance with relevant regulations. - Conduct security audits and vulnerability assessments. - Manage access control and authentication. Required Skills and Expertise : - Strong experience with cloud platforms (AWS, Azure, GCP). - Understanding of cloud-native architectures and services. - Proficiency in Linux system administration. - Understanding of Windows server environments. - Strong scripting skills in languages like Python, Bash, or Go. - Experience with configuration management tools (Ansible, Chef, Puppet). - Experience with Infrastructure as Code tools (Terraform, CloudFormation). - Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, etc.). - Ability to design and implement effective monitoring solutions. - Experience with containerization technologies (Docker, Kubernetes). - Understanding of container orchestration principles. - Strong understanding of networking concepts and protocols. - Experience with network troubleshooting and analysis. - Experience with incident response and management procedures. - Strong troubleshooting and problem-solving skills. - Excellent communication and collaboration skills. Preferred Skills : - Experience with database administration. - Knowledge of security best practices. - Experience with CI/CD pipelines. - Experience with service mesh technologies. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or a related field. - 6 to 9 years of experience in Site Reliability Engineering or a related role. - Proven experience in managing and maintaining highly available and scalable systems. - Strong understanding of cloud computing and automation principles. (ref:hirist.tech)

Location: bangalore, IN

Posted Date: 4/18/2025
View More Aqilea Jobs

Contact Information

Contact Human Resources
Aqilea

Posted

April 18, 2025
UID: 5111452993

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.