Job Description
Summary
Site Reliability Engineering at MoonPay is responsible for providing a resilient, secure, production-ready platform that enables MoonPay to safely deploy applications and services in a self-serve, repeatable manner. We believe that SRE should support both our product delivery and operational teams by surfacing data from our production environment and driving meaningful change based upon what we learn from it.
🚀 What you will do
In the short term we need to increase the resiliency and reliability of our current PaaS solution with things such as:
· Improving the maintainability of our infrastructure as code
· Building dashboards, monitoring & alerting mechanisms with Datadog
· Load testing and performance tuning our production services
· Lifecycling and maintenance of our Kubernetes clusters
In the medium to long term you’ll get to:
· Implement new and shiny technologies on top of Kubernetes as you see fit to ensure our tech can scale with the business.
· Develop and integrate solutions with a bias for automation in order to improve and maintain reliability across the production estate and make recovery easier.
· Design and track metrics for site uptime and performance ensuring high levels of visibility are maintained.
· Own the deployment pipelines and continuously improve our monitoring and alerting capabilities.
· Collaborate closely with all other engineering functions to provide timely feedback from our environments.
· Support Engineering on their journey to deliver better software, faster and more safely (think “It’s OK to deploy on Fridays” 😎).
💻 What you will be working with
· Typescript
· Node.js
· TypeORM, TypeDI, TypeGraphQL and routing-controllers
· React and NextJS hosted on Vercel
· Google Cloud Platform
· Postgres
· Redis
· Bull, BullMQ
· DataDog
· ArgoCD
· Kubernetes
· GitHub
· Jest
🧑🚀 About You
· Strong systems administration skills, know the difference between a container and a virtual machine, and know your way around a Linux terminal
· Platform engineering/SRE experience at leading startups or fast growing tech companies
· Either experience with some of our tech stack or are confident you can cross train and up skill quickly
· Experience working in a regulated industry
· Confident working with and guiding developers on monitoring and logging of complex systems at scale
· Worked on complex projects
· Work collaboratively with different teams i.e. Security, Data, Engineering
· Want to forge and own MoonPays reliability & recovery processes
· Have at least a basic understanding of complex reliability structures, theories, principles, and best practices
· Worked with JavaScript codebases and frameworks e.g Typescript, Node.JS and React
Skills
- Communications Skills
- Development
- Software Engineering
- Team Collaboration
- TypeScript

