Job Description
Summary
BitGo is seeking a talented DevOps/SRE Engineer to join our Infrastructure team. This role is crucial for architecting and automating our highly available digital asset infrastructure on Kubernetes, guaranteeing robust performance and reliability. The team's main objective is to proactively monitor and enhance security, ensuring network integrity, minimizing operational overhead, and delivering a stable, cost-efficient platform that supports our developers and fosters user trust. This position blends web2 and web3 technologies, directly contributing to the security and scalability of over $100 billion in digital assets.
Responsibilities:
- Design, develop, and drive adoption of IaC tooling and automation solutions
- Collaborate cross-functionally with engineering and business teams to understand and address infrastructure requirements, ensuring scalable and reliable solutions.
- Evaluate, integrate, and deploy cutting-edge open-source and commercial tools to enhance our security, infrastructure capabilities and meet evolving business needs.
- Define and own the technical roadmaps for key system components, ensuring alignment with strategic objectives.
- Ensure the operational excellence, reliability, and performance of critical client and internal systems through proactive project work, incident response, and participation in on-call rotations.
Required:
- Proven experience securing and operating multiple environments on Kubernetes as well as associated tooling (ArgoCD, gitops, Grafana) using Terraform
- Operational experience with relational and NoSQL databases (connection maintenance, slow query analysis, index management) as well object storage.
- Familiarity with Github Actions and building CI/CD pipelines
- Experience with major public cloud providers (e.g., AWS, GCP, Azure) and advanced container orchestration/fleet management, including cost optimization and modeling
- At least three years of experience building, securing, and maintaining complex production systems.
- Strong analytical and communication skills along with an inquisitive disposition
Preferred:
- Experience with compliance frameworks and audit lifecycles (SOC, ISO, regulators, etc)
- Experience in observability, web security threats / CSP / SIEM tools, and uptime-critical infrastructure.
- Demonstrated experience delivering internal-facing tools, automation, or infrastructure that improved developer or operator workflows.
Skills
- AWS
- Communications Skills
- Development
- Operations
- Software Engineering
- Team Collaboration