Job Description

Summary

Responsibilities

  1. Be part of a devops team, dedicated to building internal platforms.
  2. Work closely with internal teams to improve the system reliability, scalability and developer productivity
  3. Engage in and improve the infrastructure quality supporting the platform.
  4. Build and manage systems, infrastructure and applications through automation.
  5. Provide operational support to internal teams working on the platform.
  6. Work on improvements to bring in high efficiency, reduce latency, deploy systems faster.
  7. Practice sustainable incident response and blameless postmortems.
  8. Together with your engineering team, you will share an on-call rotation and be an escalation contact for service incidents.

 

Minimum Qualification

  1. Bachelors with 5+ years of working experience as Site Reliability Engineering (SRE) / Devops Engineer
  2. Experience with programming. Preferably Python, or Go.
  3. Knowledge of Linux internals and bash scripting.
  4. Strong skills around observability, debugging and performance tuning, willing to dive into understanding, debugging, and improving any layer of the stack.
  5. Strong experience in managing infrastructure with cloud providers like AWS.
  6. Experience in container orchestration systems like kubernetes. 
  7. Strong experience in Observability platforms like Prometheus, Grafana etc
  8. Experience in standards devops tools for infrastructure management (terraform/opentofu etc), CI/CD (ArgoCD, Jenkins etc)

Preferred Qualification

  1. Expertise in automation tools like Ansible, Terraform.
  2. Expertise in devops tooling like Jenkins, ArgoCD, github actions
  3. Expertise in advanced observability (USE-RED signals, Tracing, front end observability etc) and monitoring stacks, preferably Grafana stack. 
  4. Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
  5. Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
  6. Extensive experience in supporting production systems as SRE.
  7. Experience in setting up monitoring stack for process and docker based environments.

Skills
  • AWS
  • Communications Skills
  • Development
  • Problem Solving
  • Software Engineering
© 2025 cryptojobs.com. All right reserved.