Job Description
Summary
Responsibilities
- Be part of a trading systems engineering team, dedicated to building out the core trading platforms.
- Work closely with cross functional teams to improve the system reliability, scalability and security.
- Engage in and improve the quality supporting the platform.
- Build and manage systems, infrastructure and applications through automation.
- Provide operational support to internal teams working on the platform.
- Work on improvements to bring in high efficiency, reduce latency, deploy systems faster.
- Practice sustainable incident response and blameless postmortems.
- Together with your engineering team, you will share an on-call rotation and be an escalation contact for service incidents.
- Implement and maintain rigorous security best practices across all infrastructure, with a focus on minimizing attack surface and ensuring data integrity.
- Monitor system health and performance with a keen eye for identifying and resolving issues before they affect trading activity.
- Manage user queries and service requests (often requiring in depth analysis of the technical and/or business logic of our systems).
- Proactive approach to problem analysis and resolution of production incidents.
- Manage Issue tracking and prioritisation of day to day production incidents.
- Manage platform trading support initiatives (e.g. platform monitoring, reporting, release automation).
Minimum Qualification
- Bachelors with 5+ years of working experience as Site Reliability Engineering (SRE) / Devops Engineer
- Strong knowledge of Linux system internals (kernels, memory management, file systems & network I/O optimization) as well as bash scripting.
- Minimum 3-5 years experience in a Front Office Trading Floor or related application support experience.
- Good product knowledge in financial instruments including experience in managing front to back trading lifecycle and familiar with order life cycle (Equities, FX, Crypto etc).
- Experience with electronic trading platforms and OMS/EMS systems.
- Knowledge of ITIL practices including Incident, Problem and Change management.
- Experience with system monitoring platforms.
- Proven expertise optimizing systems for lowest latency performance and productionizing trading systems, from initial concept to full deployment, including on-prem and colocation.
- Strong working knowledge of network communications fundamentals TCP/IP and UDP
- Knowledge of FIX protocol, as well as familiarity with low latency multicast with Aeron.
- Strong skills around observability, debugging and performance tuning, with a willingness to dive-deep to determine the root cause,
- Effective communicator, with the ability to explain root causes in plain language, then outline plans and recommendations to remediate and ultimately to execute plans that systematically resolve issues.
- Demonstrated ability to build tools that enable rapid operational triage; leveraging logs, performance telemetry metrics and traces to make Grafana dashboards (or similar)
- Strong experience deploying and managing Cloud infrastructure with providers like AWS, Azure and/or GCP (accounts, profiles, IAM roles, Policies, service classes, instance types, etc.)
- Strong experience adapting workloads to use Cloud Native technologies, like kubernetes and docker etc. (Dockerfile, docker-compose, helm charts, deploy manifest, ingress wiring, service mesh, etc.)
- Expertise using devops deployment tooling, like Github Actions, Jenkins, ArgoCD and Ansible.
- Hands-on experience developing in one or more programming languages: C, C++, Java, Python, or Go.
- Proficient in utilizing C++ build systems such as CMake and building CI/CD pipelines for trading systems.
- Working knowledge of key backing services like mysql, PostgresDB, Redis, MongoDB, DynamoDB, Kafka, RabbitMQ, etc.
Key attributes critical to success:
- Strong interpersonal skills and the ability to communicate confidently and clearly with internal teams and external counterparts..
- Ability to analyse complex scenarios from both a technical and business perspective and deliver solutions.
- A sense of urgency, and an ability to deliver to deadlines in an environment where multiple competing requirements are a part of the job.
- Proactivity and a strong ownership ethos, ensuring that queries are resolved through to completion.
- Desire and ability to deliver improvements to the platform.
Preferred Qualification
- Extensive experience with DevOps for Deployment automation and App level support, as well as SRE and/or Linux systems Administrator for systems and infrastructure level support.
- Extensive experience in a Low Latency Electronic trading environment, where high performance is critical at all stages of the infrastructure, network and application design and implementation.
- Expertise in developing advanced observability capabilities (USE-RED signals, Tracing, front end observability etc)
Skills
- C++
- Communications Skills
- Development
- Java
- Python
- Software Engineering
- Team Collaboration