Job Description

Summary

Responsibilities

  1. Handle production incidents and conduct post-mortem analysis to enhance system stability.
  2. Design, deploy, monitor, and troubleshoot Kafka and Redis clusters in production environments, ensuring optimal performance and reliability.
  3. Collaborate with development teams to ensure seamless deployment of applications and systems.
  4. Manage and optimize cloud infrastructure (AWS, Alicloud) for performance, cost efficiency, and reliability.
  5. Develop DevOps platforms, including online load testing and change management systems.
  6. Enhance automation in infrastructure operations management using LLM or AI.

Requirements

  1. At least 5 years of hands-on experience in Kafka and Redis operations in large-scale production environments, with the ability to collaborate with developers to optimize code.
  2. Proficient in at least one programming language: Python, Go, or Java, along with strong SQL skills.
  3. Hands-on experience with containerization and orchestration technologies, including Docker and Kubernetes.
  4. Strong experience with CI/CD tools such as GitHub Actions, Ansible, and Terraform.
  5. At least 3 years of experience with the AWS cloud platform; experience with GCP, Azure, or AliCloud is a plus.
  6. Excellent problem-solving and troubleshooting skills.
  7. Strong team collaboration skills and the ability to build partnerships with other teams and business units.
  8. Practical experience in AIOps is preferred.

Skills
  • AWS
  • Database Management
  • Development
  • Java
  • Problem Solving
  • Python
  • SQL
  • Team Collaboration
© 2025 cryptojobs.com. All right reserved.