Aqilea

GCP Site Reliability Engineer

Job Location

bangalore, India

Job Description

Role : Senior Site Reliability Engineer. Experience : 6 to 9 years. Work Location : Bangalore. Responsibilities : - Design, implement, and maintain highly available and scalable systems and infrastructure. - Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs). - Proactively identify and address potential reliability risks. - Develop and maintain automation tools and scripts to streamline operational tasks. - Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation. - Automate deployment pipelines and configuration management. - Design and implement comprehensive monitoring and alerting systems. - Utilize monitoring tools such as Prometheus, Grafana, ELK stack, or similar. - Develop dashboards and alerts to detect and respond to system issues. - Lead incident response efforts, including troubleshooting, root cause analysis, and post-incident reviews. - Develop and maintain incident response procedures and playbooks. - Participate in on-call rotations. - Perform capacity planning and performance analysis to ensure systems can handle future growth. - Identify and implement performance optimizations to improve system efficiency. - Conduct load testing and performance benchmarking. - Collaborate with development teams to integrate SRE practices into the software development lifecycle. - Communicate effectively with technical and non-technical stakeholders. - Document operational procedures and best practices. - Implement security best practices and ensure compliance with relevant regulations. - Conduct security audits and vulnerability assessments. - Manage access control and authentication. Required Skills and Expertise : - Strong experience with cloud platforms (AWS, Azure, GCP). - Understanding of cloud-native architectures and services. - Proficiency in Linux system administration. - Understanding of Windows server environments. - Strong scripting skills in languages like Python, Bash, or Go. - Experience with configuration management tools (Ansible, Chef, Puppet). - Experience with Infrastructure as Code tools (Terraform, CloudFormation). - Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, etc.). - Ability to design and implement effective monitoring solutions. - Experience with containerization technologies (Docker, Kubernetes). - Understanding of container orchestration principles. - Strong understanding of networking concepts and protocols. - Experience with network troubleshooting and analysis. - Experience with incident response and management procedures. - Strong troubleshooting and problem-solving skills. - Excellent communication and collaboration skills. Preferred Skills : - Experience with database administration. - Knowledge of security best practices. - Experience with CI/CD pipelines. - Experience with service mesh technologies. Qualifications : - Bachelor's degree in Computer Science, Information Technology, or a related field. - 6 to 9 years of experience in Site Reliability Engineering or a related role. - Proven experience in managing and maintaining highly available and scalable systems. - Strong understanding of cloud computing and automation principles. (ref:hirist.tech)

Location: bangalore, IN

Posted Date: 4/18/2025

View More Aqilea Jobs

Contact Information

Contact	Human Resources Aqilea