(Senior) Site Reliability Engineer (m/f/d) - Platform & Agentic Operations
Get more other jobs in your inbox
Verified daily — no ghost listings.
About This RoleAI processing…
1KOMMA5° At 1KOMMA5° , we pursue a clear vision: Living on wind and sunlight forever for free . To make this a reality, we are building the energy system of the future with Heartbeat AI. Want to be part of it?We bring together regional craftsmanship and scalable software: We don't think of solar, batteries, heat pumps, and e-mobility as isolated components, but control them as an intelligent, integrated overall system in our virtual power plant. Directly connected to the electricity market – in real time, fully automated. This way, energy is used when it is available from renewables and partic
Key Responsibilities
- 1Implement and improve monitoring, alerting, and incident response systems and processes to ensure high reliability for our customers and meet defined SLOs
- 2Design, build, and maintain resilient, scalable infrastructure utilizing SRE principles and best practices
- 3Attend post-incident reviews, detect patterns and contribute to continuous improvement efforts
- 4Execute performance testing, analyze system bottlenecks, and formulate strategies for capacity planning to ensure our systems meet current and future demands effectively
- 5Build systems where CI/CD test failures serve as immediate, real-time context for agents, enabling them to analyze logs, trace dependencies, and suggest or apply instant code fixes.
Requirements
- 6+ years in SRE, DevOps, or Platform Engineering
- Strong understanding and practical application of Site Reliability Engineering (SRE) principles, methodologies, and best practices
- Proficiency in programming/scripting languages such as Python, GoLang or TypeScript
- Practical understanding of integrating LLMs into automated workflows. You know how to feed live system state (like a fresh CI test failure) into an agent as actionable context.
- Prior experience in incident management, post-incident reviews, and implementing improvements to prevent future incidents
- Ability to troubleshoot complex technical issues systematically and effectively
- Good experience working with a public cloud provider, ideally Google Cloud Platform (GCP), and a solid understanding of its observability services
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Excellent communication skills to convey technical concepts and collaborate effectively with diverse teams
- Very good knowledge of spoken and written english, german is a plus
- Residency in Germany
- Interest in climate tech industry
- Prior experience with IoT applications
- Having worked in a scale up environment at a company of similar size
Perks & Benefits
Apply to This Job in Minutes
Generate ATS-optimized resume + cover letter + interview prep with Jobease.ca AI. Complete your application faster.
75% of AI Resumes Get Rejected
Beat the ATS with Jobease.ca's AI Resume Builder. Optimized for real hiring systems.
Build My ResumeProfile Match
Loading…Checking your profile against this job…
Job Overview
Share This Job
Track All Your Applications
Never lose track again. Jobease.ca organizes every application, interview, and follow-up.
Organize My Search