Nơi làm việc: Singapore
Mức lương:
Ngành nghề: CNTT - Phần mềm
Site Reliability Engineer - Observability
As a Site Reliability Engineer (SRE) at Grab, you will be responsible for the stable operation of the core Grab systems. You will also be reviewing and integrating new services and preparing them for large scale usage. You will be working on building out and maintaining large scale observability systems that help Engineers debug and troubleshoot issues in their services.
As part of the SRE team, you will:
Engage in and improve the whole lifecycle of our monitoring services - from design, through deployment, operation and refinement.
Work with the engineering teams to identify the needs and challenges of our monitoring systems.
Help improve reliability, stability and scalability challenges with engineering teams
Get involved in deep diagnosis of incidents, and engage with multiple highly skilled engineering teams on resolutions.
Mentor other engineers, define our technical culture, and help build a fast-growing team
Requirements
Min 5 years experience in a SRE role
BS degree in computer science, software engineering, information technology or related technical field involving coding, or equivalent practical experience.
Experience with cloud based large-scale infrastructure from vendors such as Amazon Web Services, Azure or Google Cloud Platform
Experience in one or more of the following: Go, C, C++, Java, Python, Perl or Ruby.
Expertise in running monitoring systems at scale, ideally one of ELK, Promethus, Grafana, Zipkin etc.
Highly accountable and takes ownership. Outstanding work ethic, high-integrity, team player, and a lifelong learner
Really Nice to Haves
Contributes to open source project experience with performance analysis and debugging tools.
Ability to debug and optimize code and automate routine tasks.
Grab Vietnam