Get more other jobs in your inbox
Verified daily — no ghost listings.
About This RoleAI processing…
Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We're all in on this mission. If you are too, let's talk.
Key Responsibilities
- 1Design and build custom software in Go to enhance the platform's reliability, resiliency, and redundancy.
- 2Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.
- 3Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.
- 4Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues.
- 5Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.
- 6Define, document, and champion reliability best practices across the organisation.
Requirements
- A proactive and systematic approach to problem-solving, with a high degree of ownership.
- Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy.
- Proficiency in at least one programming language, with a strong preference for Go. You should be comfortable writing custom applications, not just scripts.
- Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD).
- Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP).
- A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues.
- An understanding of core SRE principles, including SLIs, SLOs, and error budgets.
- Experience in an on-call rotation for a 24/7 cloud-based environment.
- Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven.
Perks & BenefitsTypical for this role
Apply to This Job in Minutes
Generate ATS-optimized resume + cover letter + interview prep with Jobease.ca AI. Complete your application faster.
75% of AI Resumes Get Rejected
Beat the ATS with Jobease.ca's AI Resume Builder. Optimized for real hiring systems.
Build My ResumeProfile Match
Loading…Checking your profile against this job…
Job Overview
Share This Job
Track All Your Applications
Never lose track again. Jobease.ca organizes every application, interview, and follow-up.
Organize My Search