Site Reliability Engineer at Zero Hash | Remote | Full-Time | cryptojobs.com | Best Platform for the Latest Web3 and Blockchain Jobs

Zero Hash is looking for an experienced and passionate Site Reliability Engineer to join our Platform team.

What you will do:

Take an active role as co-owner of production services to ensure they are built, maintained, and operated in a reliable and scalable way.
Be part of the successful delivery of new features and services, as well as the day-to-day operations of existing services.
Collaborate with Software Engineering to identify and help drive operational improvements through metric driven collection and analysis.
Develop and maintain performance benchmarks for our applications to ensure a consistent customer experience
Help drive operational efficiencies releasing code and monitoring performance.
Provide traditional SRE/Operational support scopes like tooling and automation, monitoring, workflow management, maintaining and improving CI/CD, etc.
Participate in our weekly on-call rotation to investigate and resolve potential system issues.
Get your hands dirty managing and scaling our various infrastructure systems.

Desired Skills:

You have extensive experience deploying, managing and troubleshooting infrastructure in AWS.
You have managed the full lifecycle of deploying a container to a production environment using self-managed kubernetes, ECS, or EKS.
When the perfect tool wasn’t available you wrote one yourself and taught others how to use it.
You understand CI/CD and have built custom tooling to deploy code to production environments.
You are able to solve problems in distributed Linux systems and are comfortable tracing requests across applications, systems and networks.
You hold a CKS certification (kubernetes security)
You can automate routine tasks and are proficient in at least two programming languages.
You have fantastic communication skills in both spoken and written forms to explain complex ideas to various audiences.
You thrive in an environment where collaboration and communication are paramount but are able to solve problems on your own.

Projects you might work on:

Creating and maintaining application performance benchmarks so we know when our applications are not performing well.
Improving our CI/CD pipeline to reduce the time it takes from development merge to production deployment
Continue to improve and scale our AWS and application infrastructure.
Work closely with software development to help optimize local development workflow.
Identify common issues and come up with solutions on how to reduce their impact or remove them altogether.
Help implement a scalable solution for blue/green and canary deployments.

Newsletter