Job Description
Summary
Lead the people who make mainnet boring. You'll own a team of 5–10 Technical Operations Engineers (M3), set the bar for 99.99%+ uptime, and drive the operational excellence that lets us ship chains fast and keep them running. You run point on SEV-0/1 escalations, translate org priorities into quarterly goals, and build a team culture where engineers grow, toil shrinks, and incidents don't repeat.
What We Actually Need
- People leadership: 3+ years (M3) or 5+ years (M4) managing SRE/DevOps/TechOps teams; you run effective 1:1s, grow careers, and keep on-call humane.
- Technical credibility: hands-on background in Linux, Kubernetes, cloud (AWS/GCP/OCI/Azure), and observability (Prometheus/Grafana/Datadog). You can still read a flame graph.
- Incident command: you've led SEV-0/1 responses, written RCAs that drove real fixes, and built processes that lower MTTR.
- Operational rigor: SLO/SLA ownership, on-call scheduling, workload distribution—teams you run hit targets without burning out.
- Cross-functional fluency: you partner with Product, Engineering, Security, and Support; you represent TechOps in customer escalations and audits.
- Communication: you articulate complex failures simply—to engineers, execs, and customers.
What You'll Do
- Own team delivery: set quarterly goals, run sprints, ship infra projects and upgrades on time.
- Drive reliability: maintain SLAs/SLOs, reduce incident recurrence, and keep the platform at 99.99%+.
- Lead escalations: coordinate SEV-0/1 response, stakeholder comms, post-mortems, and corrective actions.
- Grow the team: hire, mentor, run performance reviews, and build a culture of accountability and continuous improvement.
- Reduce toil: champion automation initiatives that free engineers for higher-leverage work.
- Coordinate cross-functionally: align with Product, Engineering, Security, and Support on priorities and rapid issue resolution.
Why This Role Stands Out
- Leverage: multiply impact through a high-caliber team shipping real infrastructure at scale.
- Ownership: you set the strategy, run the incidents, and see the results in production metrics.
- Growth: deep exposure to blockchain protocol operations while leveling up distributed-systems leadership.
- Remote-first: async culture, follow-the-sun coverage, sustainable on-call.
- Compensation: region-aligned, bonus-eligible, transparent from the start.
The Bar: Signals We Care About
- Teams you've led hit reliability targets and kept attrition low.
- Post-mortems you wrote drove systemic fixes, not blame.
- You've scaled a team through a growth phase—hiring, onboarding, leveling.
- You balance technical depth with people leadership; engineers trust your judgment.
- You simplify processes and reduce toil; your teams ship faster over time.
- Prior blockchain/Web3 infrastructure experience (not required, but we'll notice).
- Community leadership: conference talks, published writing, open-source contributions.
Skills
- Communications Skills
- Leadership
- Operations
- Team Collaboration

