Operations Engineer (Senior) 1918
Job Location
Menlyn, South Africa
Job Description
Essential Skills Required: Strong understanding of IT service management principles and practices. Proficiency in monitoring and management tools (e.g., dashboards, alerting systems). Strong analytical and problem-solving abilities, particularly in IT service management. Experience in conducting root cause analysis (RCA) and managing known issues. Experience in performing regular and sporadic operational tasks to ensure optimal performance of IT services. Ability to manage IT service continuity, availability, and capacity effectively. Experience with change management processes, including creating and syncing changes with teams. Ability to plan and execute capacity extensions and backup/restore processes. Any additional responsibilities assigned in the Agile Working Model (AWM) Charter. Advantageous Skills: Experience with IT service management frameworks (e.g., ITIL, SRE practices). Familiarity with cloud platforms (e.g., Azure) and their operational management. Experience with automation tools (e.g., Ansible, Puppet, Terraform) and scripting languages (e.g., Python, Bash) to streamline operational tasks. Understanding of DevOps methodologies and practices, including CI/CD processes. Knowledge of network protocols, configurations, and troubleshooting to support IT infrastructure. Understanding of IT security best practices and compliance requirements to ensure secure operations. Skills in data analysis and visualization tools (e.g., Splunk, Grafana) to interpret operational metrics and trends. Willing and able to travel internationally (twice a year). Above-board work ethics. Qualifications/Experience: Minimum of 6 years of experience in IT operations or a similar role. Role and Responsibilities: Monitor and Operate IT Products: Perform regular and sporadic operational tasks to ensure optimal performance of IT services. Own and maintain the Regular OPS Tasks list, refining sporadic tasks based on input from the Operations Experts (OE) network. Manage IT Service Continuity: Prepare for and attend emergency exercises (EE), reviewing outcomes and deriving follow-up tasks. Communicate findings and improvements to the OE network. Manage Availability: Participate in "Gamedays" and backup/restore test sessions, practicing and executing backup and restore processes. Own the recovery and backup plan, reviewing success and identifying follow-up tasks. Manage Capacity: Monitor cluster capacity using prepared dashboards and coordinate with the DevOps team for any issues. Plan and execute capacity extensions as needed. Manage Service Configuration: Oversee service configuration management using ITSM tools. Manage Events: Observe dashboards and alerts, taking action for root cause analysis (RCA) and creating tasks for the DevOps team. Provide proactive feedback and maintain monitoring and alerting solutions. Manage Problems: Conduct root cause analysis and manage known issues, creating Jira defects for further assistance if required. Enable Changes: Create and sync changes with the team, assisting with releases and deployment plans. Manage Service Requests and Incidents: Observe and resolve service requests and incidents, creating Jira tasks for the DevOps team as necessary. Manage Knowledge: Create, use, and extend knowledge articles, ensuring availability and consistency. On-call Rotations: Participate in 24/7 on-call rotations with teams around the world and restore systems efficiently. If you are passionate about IT operations and ready to take on new challenges, we would love to hear from you Apply now to join our dynamic team.
Location: Menlyn, ZA
Posted Date: 4/18/2025
Location: Menlyn, ZA
Posted Date: 4/18/2025
Contact Information
Contact | Human Resources |
---|