Senior Director, Site Reliability Engineering (SRE) (SDSE) Job Vacancy in Yatra Online Delhi, Delhi – Updated today

Are you looking for a New Job or Looking for better opportunities?
We got a New Job Opening for

Full Details :
Company Name :
Yatra Online
Location : Delhi, Delhi
Position :

Job Description : What you will be responsible for: A ‘Senior Director, Site Reliability Engineering’ role is for proven winners in SRE practice. You are expected to bring to table a diverse and valuable experience of building and running a SRE team on your own. You will be responsible for running highly scalable, modular and performing software applications that using complex and latest programming languages and frameworks. Therefore the tools that you will use to operate, monitor and maintain these services will be equally or more complex. As a highly experienced leader of our SRE practice teams you are expected to be capable of leading one or few teams who are part of the SRE practice. This is a dual intent role where you will develop yourself as well as lead teams of other engineers who also develop, under your leadership, all SRE automation scripts, monitoring dashboards, maintenance systems, etc. which enable online bookings, payment transactions and personalized messaging to millions of customers who book their travel with Yatra.com. This role requires you to expose your team members to complex programming skills, design patterns, SRE and DevOps practices. The role requires you to demonstrate ability to guide younger members of the team by way of coaching and exemplary work. As a senior director, the role requires you to have in-depth knowledge of various design patterns, frameworks, architectural traits that will make your product apt for SRE practice required to run a high-volume transactional environment. Alongwith, the role also requires very good analytical skills, ability to debug problems, perform RCAs, prepare environments for holiday spikes, disaster management and recovery etc. You will administer and govern the entire SLI, SLO and SLA framework that will drive the SRE operations on daily basis. This inturn requires you to perform stakeholder engagements to define SLIs, SLOs and SLAs. The role requires you to provide continuous constructive feedback to the team members, evaluate their performance and guide them in the right direction. The role also requires you to hire, nurture and retain talent in the organization. Profile What you need to succeed in the role: Responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning Create a bridge between development and operations by applying a software engineering mindset to system administration topics. Split time between operations/on-call duties and developing systems and software that help increase site reliability and performance. Build self-service tools for user groups that rely on SRE for example automatic provisioning of test environments, logs, and statistics visualization, monitoring dashboards, developer portals, repository access management, etc. Collaborate closely with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability. Contribute to SLI, SLO and SLA definition, monitoring, alerting and reporting efforts. Staying abreast of new trends in application and infrastructure monitoring, provisioning, maintenance and uptime. Learn, prototype and apply newest tools and best practices in real life to meet the goals of SRE practice. Core Skills: Any two APM tools – PinPoint, Dynatrace, Datadog, AppDynamics, Splunk Infrastructure Monitoring, Sumo Logic, Prometheus, Instana, LogicMonitor F5, NGINX, Azure Application Gateway, Kemp LoadMaster, HAProxy, Varnish Software, Amazon Lightsail.AWS Elastic Load Balancing, A10 Thunder ADC, Apache Web Server, Apache Tomcat, Netty, Glassfish GCP/ AWS/ Azure/Terraform/Ansible Administration; SAN and NAS administration; LDAP/AD administration; Identity services administration such as Keycloak Elasticsearch, Logstash, Kibana, Kiali, Grafana, Nagios, PagerDuty, Twilio Istio, Envoy, Kubernetes, Docker, DockerHub, Docker Swarm, EKR, GKE, Spinnaker Additional Skills: Monitoring Dashboards Instrumentation Alerting Systems Setup Log Aggregation Log Analysis Log Management Tracing Systems Application Performance Monitoring (APM) APM Analysis JVM Monitoring How to apply: Email your latest resume to jobs@yatra.com. Mention the job title “Senior Director, Site Reliability Engineering (SRE) (SDSE)” in the subject field for quick consideration .

This post is listed Under  Technology
Disclaimer : Hugeshout works to publish latest job info only and is no where responsible for any errors. Users must Research on their own before joining any company

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *