Director of Site Reliability Engineering at Stellar | United States | Full-Time | cryptojobs.com | Best Platform for the Latest Web3 and Blockchain Jobs

Director of Site Reliability Engineering

This job is no longer available.

Summary

You will lead an experienced Site Reliability Engineering team, ensuring our services and tooling are available, building infrastructure to make our team's production and testing environments available, and greasing the rails of our systems and processes to ensure they're robust, efficient, and easy to deploy.

SDF has a robust career path for both individual contributors and managers.

In this role, you will:

Establish a clear vision and mandate for the Site Reliability Engineering team
Define the SRE team's quarterly OKRs to best align with the company's goals
Define processes of collaboration between SREs and development teams throughout the software development lifecycle
Define a career growth path for the SRE team, as well as coach and mentor individual contributors on the team
Define and track metrics across engineering and help hold engineering teams accountable for their KPIs
Coordinate priorities with other teams and areas of the organization
Participate in sprint planning and execution, track progress and oversee day-to-day tactical decisions
Design and build reliable systems, and infrastructure that is easy to use by software engineers
Monitor and troubleshoot systems in production
Define and participate in 24/7 on-call rotations alongside the team
Mediate technical discussions and review PRs
Jump in as needed with code fixes, troubleshooting and hands-on contributions
Collaborate across the Stellar ecosystem, engaging with key partners and advising on their integration to set them up for success

You have:

3+ years of experience working as a Site Reliability Engineer
3+ years of experience managing an SRE team
Site Reliability Engineering experience:
Strong track record of collaborating with dev teams at all stages of product development (design, development/CI, beta testing, production)
Strong track record collaborating on defining, measuring and driving improvements in KPIs
Strong track record assisting teams during Root Cause Analysis and post mortems
Infrastructure and Operations experience:
Designing and building out the infrastructure for large distributed systems
Maintaining highly-available infrastructure
Troubleshooting and understanding complex technical problems
Using configuration Management or IaC tooling such as Terraform, Ansible, Puppet
Building and maintaining infrastructure using Kubernetes
Highly autonomous; able to find clarity in ambiguous circumstances
Excellent communicator; comfortable working with remote team members

Bonus Points if (optional):

3+ years of experience writing code in a major programming language
You have worked on an open source project
You have managed a distributed team
You build things for fun in your spare time

We offer competitive pay with a base salary range for this position of $210,000 - $310,000 depending on job-related knowledge, skills, experience, and location.

Skills

Communications Skills
Development
Problem Solving
Software Engineering
Team Collaboration

About Company

Job Description

Summary

Skills

About Company

Job Description

Summary

Skills

Newsletter