Job Description
Summary
The Trading Infrastructure team is building a high-performance, front-to-back Trading Platform that supports multi-asset trading. The platform is designed to handle financial instruments with low-latency execution, robust risk controls, and seamless integration across trading, risk, operations, and finance workflows.
The system is built with a modular architecture encompassing core components such as market data feeds, order gateways, execution algorithms, risk engines, UI dashboards, middle office reconciliation, and account infrastructure. We emphasize event-driven, deterministic system design, real-time observability, and strong security.
Our tech stack includes Java (low-latency), Python, Web UI (React/Ag-Grid), Aeron, ClickHouse, Kubernetes, and modern CI/CD tooling, with a strong focus on automation, scalability, and performance. AI-assisted development tools are also leveraged to boost productivity and quality across the team.
Responsibilities
- Design, provision, and maintain scalable infrastructure for our trading systems, including CI/CD pipelines, observability stack, and runtime environments.
- Administer and tune databases such as AWS Aurora, PostgreSQL, and ClickHouse.
- Automate provisioning and configuration of EC2 and related resources using Ansible and other infrastructure-as-code tools (e.g., Terraform).
- Ensure secure, stable environments through proper VPC design, IAM governance, and secret management.
- Build and maintain system metrics and alerts using Prometheus, Grafana, and Loki.
- Enforce GitHub repo and branching standards across development teams.
- Ensure cost-effective infrastructure usage through continuous monitoring, resource optimization, and cost control strategies across AWS and containerized deployments.
- Manage backup and disaster recovery procedures for all critical systems.
- Collaborate with engineering teams to containerize services and fine-tune runtime performance.
- Evaluate and integrate AI/LLM tools to improve automation, diagnostics, and operational efficiency.
Requirements
- 5+ years of DevOps or SRE experience in high-availability, real-time systems.
- Strong hands-on experience with AWS services (EC2, VPC, IAM, CloudWatch, RDS) and cost monitoring tools (e.g., AWS Budgets, Cost Explorer).
- Skilled in provisioning and config management using Ansible; Terraform experience is a plus.
- Proficient in ClickHouse and PostgreSQL; knowledge of kdb+ or InfluxDB is a bonus.
- Solid scripting skills (Python, Bash, or equivalent).
- Hands-on experience with Docker, Kubernetes, Helm, and deployment automation.
- Familiar with monitoring and logging stacks; experience with Prometheus/Grafana is expected.
- Security-conscious and experienced in IAM, encryption, and secure system design.
- Able to monitor and optimize computing resources to maintain performance within budget constraints.
- Comfortable using AI tools for automation, diagnostics, or efficiency gains.
Communication & Collaboration
- Proficient in English (spoken and written); Chinese or other languages are a plus but not required.
- Comfortable working in a global team with colleagues across APAC, EMEA, and North America.
- Strong communication skills; able to interface across all levels from senior leadership to engineers and cross-functional stakeholders.
- Able to explain complex systems to both technical and non-technical audiences and coordinate effectively across teams with diverse backgrounds.
Skills
- AWS
- Communications Skills
- Development
- Python
- Software Engineering
- Team Collaboration