Staff Software Engineer, GraphQL

airbnb· 270 open roles

Remote RemoteFull-time2 months ago

Salary

$204,000 - $255,000

Experience

Mid

Job Type

Full-time

Posted

2 months ago

Apply Now

Get more other jobs in your inbox

Verified daily — no ghost listings.

About This RoleAI processing…

Airbnb was born in 2007 when two hosts welcomed three guests to their San Francisco home, and has since grown to over 5 million hosts who have welcomed over 2 billion guest arrivals in almost every country across the globe. Every day, hosts offer unique stays and experiences that make it possible for guests to connect with communities in a more authentic way.

Key Responsibilities

1
Embrace an AI-first engineering approach, using LLM-powered agents to generate and iterate on code while you focus on problem-solving, system design, and quality oversight.
2
Investigate and resolve complex production issues by analyzing distributed traces, resource utilization patterns, and system metrics to identify root causes and implement durable fixes.
3
Design and implement observability features including span instrumentation, SLO dashboards, and fine-grained attribution for blocking time, memory, and CPU across tenant workloads.
4
Develop and iterate on tooling for deployment triage, service health monitoring, and incident response automation using LLM capabilities.
5
Lead technical design discussions and RFCs for initiatives like performance regression testing pipelines, emergency deployment workflows, and runtime resiliency improvements.
6
Partner with tenant teams to debug performance issues, provide guidance on GraphQL best practices, and enable self-service capabilities for common operational tasks.
7
Contribute to open-source Viaduct by ensuring platform improvements are generalizable and well-documented for the broader engineering community.

Requirements

9+ years of software engineering experience, with significant depth in backend systems, distributed architectures, and platform engineering.
Deep expertise in observability and monitoring, including experience designing SLO frameworks, distributed tracing systems, and metrics pipelines at scale.
Proven track record in reliability engineering, with hands-on experience in incident response, root cause analysis, and building systems that maintain high availability (99.99%+).
Strong experience with performance tuning and resource management in JVM-based systems, including profiling, garbage collection optimization, and understanding of concurrency models (blocking I/O, thread pools, coroutines in Kotlin).
Experience operating critical, high-traffic systems with a focus on deployment safety, automated rollbacks, and progressive delivery strategies.
Familiarity with GraphQL or similar API gateway/data access layer technologies
Experience building developer tooling and platforms, with a product mindset focused on developer experience and self-service capabilities.
Strong leadership and communication skills with the ability to partner effectively across infrastructure and product engineering teams.