U-SET
Senior Site Reliability Engineer - DevOps
Job Location
in, India
Job Description
Job Description : - Deep understanding of SRE principles and experience in anomaly detection, root cause analysis, and predictive maintenance. - Working Knowledge on Automation first approach, defining SLI/SLO/Error Budgets - Experience in leading an operations team in Application Production Environment - Experience in Scripting Languages (Java, Python, PowerShell, VBScript) - Working knowledge of Kubernetes and Opentelemetry - Knowledge on the Generative AI concepts, LLM fundamentals and Responsible AI concepts - Knowledge of DevOps methodologies, tools and automation -CICD pipelines, tools (GitHub, Terraform, ArgoCD, Helm etc) and infrastructure automation - Experience in working with Public / Private cloud (AWS, Azure, GCP, Rancher etc.,) - Proficiency in incident response, change and release process, application monitoring, and platform optimization. - Define and implement effective observability solutions to proactively identify and resolve issues and drive optimisation - Define and manage incident process, change and release management process, deployment process, on-call and escalation process. - Develop automation (IaC, Alert as code, dashboard as code etc) to increase efficiency and reduce toil - Conduct POC to implement tools and solutions to support Generative AI application platform - Analyse operational performance (Incidents, Problems and Alerts trends) and drive optimisation - Follow and implement SRE best practices and standards within the team - Document SOPs, processes, critical system information, KB articles, POCs, standards and best practices for current and future references - Provide technical guidance and mentorship to junior SRE team members - Stay updated with the latest advancements in Generative AI space What you bring to the team : - Experience in SRE principles & best practices to manage on-premises and cloud applications - Working knowledge on the Generative AI applications - Ability to lead the team for continuous improvement, estimate work and escalate issues on time - Strong analytical skills to identify and resolve complex technical issues to ensure system reliability and minimize downtimes - Strong communication and interpersonal skills to effectively collaborate with cross-functional teams. (ref:hirist.tech)
Location: in, IN
Posted Date: 11/27/2024
Location: in, IN
Posted Date: 11/27/2024
Contact Information
Contact | Human Resources U-SET |
---|