Job Description

Summary

Role Overview

As a member of the Platform Engineering team, you will be responsible for managing and supporting the infrastructure which drives our platform. The reliability and scalability of our technology is key to our success and this position will work with our development and  security teams to help design highly available and fault tolerant systems.

In particular you will be focussed on monitoring and optimizing our network performance to support the low-latency, high throughput operation of our trading exchange.

Key Responsibilities

  • Continuously improve the resiliency, throughput and latency profiles of our trading systems, by working hand-in-hand with our trading technology teams
  • Manage and support our AWS cloud infrastructure, EC2 instances and physical
  • servers
  • Development and management of IaC to ensure consistency of our infrastructure
  • Ensuring security hardening of our OS builds and configurations
  • Manage and maintain config management tooling to ensure consistency
  • Integration of our stack with Kubernetes
  • Ensure SRE best practices for design and operation of the stack
  • Design, implement and test disaster recovery capabilities to ensure our business
  • can continue to operate in the event of a technology failure
  • Participate in an on-call rota for escalations

Qualifications

  • Theoretical and practical networking knowledge, incl. but not limited to unicast and multicast routing protocols, Linux kernels TCP stack implementation, congestion avoidance/control (e.g. BBR), traffic control, network simulation, AWS VPC / TGW & Kubernetes VPC CNI, etc. DPDK experience being a plus.
  • Professional experience with kernel troubleshooting: strace, bpftrace, perf profiling/tracing, navigating / reading / building the relevant kernel code.
  • Professional experience with userland monitoring (e.g. Thanos/Prometheus/AlertManaging), logging (e.g. Splunk/Loki), alerting, troubleshooting, profiling/tracing, etc.
  • Strong practical AWS knowledge, with min. 5 years of SRE / DevOps experience supporting and managing Linux based systems. Computer science, or engineering, degree preferred - strong understanding of fundamental Computer Science principles is required.
  • Familiarity with Kubernetes / Ansible / Chef, and with one or more programming language: Python, Golang, C, NodeJS.

Skills
  • AWS
  • C++
  • Networking
  • Python
© 2024 cryptojobs.com. All right reserved.