SRE Manager The Site Reliability Engineering (SRE) Manager will be working with the development & operations team, focusing on
Cloud provider infrastructure and automation. This role is responsible for the day-to-day operations of the DevOps team
and combines a mix of project management, team management, and engineering duties. As a Binary Fountain SRE Manager
you'll have the opportunity to work with likeminded and capable engineers and managers from across the organization
to help drive innovation and capability.
Duties & Responsibilities: - Should have experience of more than 5 years in handling SRE teams and systems
- Act as primary point-of-contact (PoC) on all cloud infrastructure projects
- Should have experience handling large production systems with l00+ nodes and microservices
- Work collaboratively with software engineering to define infrastructure and deployment requirements; be a sounding board and provide recommendations for engineering around AWS services
- Be the driving force behind our automation and observability initiatives
- Train and mentor the SRE team
- Build and maintain operational tools for deployment, monitoring, and analysis of cloud provider infrastructure and systems
- Perform infrastructure cost analysis and optimization
- Provide project management, sprint planning, and road-mapping support to the DevOps team
Qualifications: - Insatiable desire to learn and grow; curiosity about all things technology, development, operations, and cloud
- Hands-on experience in handling large CD system with various deployment architectures and technologies like Blue/Green, Active/Active etc
- Should have experience with DR systems
- Hands-on experience deploying and managing infrastructure with Terraform, HELM
- Hands-on experience with configuration management tools like Ansible
- Hands-on experience with devops tools such as Docker, Git, Jenkins, ArgoCD, GitOps
- Hands-on experience with monitoring and logging tools such as DynaTrace, Prometheus, Grafana, ELK
- Experience with and knowledge of cloud native architectures; ability to design highly available, resilient, multi-region systems in Azure/AWS
- Experience with and deep knowledge of Linux systems
- Strong bias for action and ownership
- Basic understanding on Istio is an added advantage
- Strong experience with observability - tracing, monitoring, logging using open source tools is a must
- Basic understanding of linux fundamentals like tcpdump, wiresharks, kernel commands