Site Reliability Engineer at Ripple | India | Full-Time | cryptojobs.com | Best Platform for the Latest Web3 and Blockchain Jobs

Site Reliability Engineer

This job is no longer available.

We are seeking a Site Reliability Engineer (SRE) to join our Team in India.

WHAT YOULL DO:

Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs,
Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch,
Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more,
Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing,
Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents,
Participate and establish best practices in Site Reliability Engineering,
Manage code deployments, fixes, updates, and related processes,
Work with a close-knit team and brainstorm on the best ways to tackle complex problems in infrastructure, security and monitoring,
Provide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!),
Automating any software maintenance processes which previously required a manual procedure.

WHAT WERE LOOKING FOR:

3+ years experience with software engineering, software development, or system operations on high available and high traffic environments,
Strong experience with Linux-based infrastructures, Linux/Unix administration, and Azure
Experience with databases such as PostgreSQL
Experience administering linux servers as well as docker based infrastructure (like Kubernetes, AKS, etc.) in a highly available environment,
Experience of scripting languages such as Python, Bash,
Experience with message broker/queue technologies like RabbitMQ,
Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Application Insights, Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc,
Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.).
Good understanding of cybersecurity fundamentals and best practices,
Containerizing and clustering (Dockerfiles, docker-compose, Helm, Kubernetes, etc.),
Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems,
Fluent language skills in English,
Excellent oral and written communication skills,
Process-oriented with great documentation skills,
Solid team player!

Newsletter