Job Description
Summary
As Senior Infrastructure Engineer at Notabene, you will play a key role in managing and maintaining our core infrastructure, including AWS and Kubernetes environments. Your expertise will ensure the reliability, scalability, and security of our platform, directly supporting the development teams and enabling seamless deployment and operation of applications. By providing critical infrastructure support and collaborating across teams, you will help drive the stability and growth of our technology foundation.
What You'll Do
- Design, develop, and optimize scalable, secure infrastructure using AWS, Kubernetes, and infrastructure-as-code (IaC) tools such as Terraform, Helm, and Kustomize to ensure robust and reliable cloud environments.
- Implement and manage CI/CD pipelines with a focus on GitOps practices (e.g., GitLab, ArgoCD), driving automation and efficiency across development and deployment workflows.
- Collaborate closely with Development, Solutions Engineering, and Product teams to implement solutions that align with business requirements and meet regulatory standards.
- Leverage observability tools such as Datadog for comprehensive monitoring, tracing, and alerting across infrastructure and applications, ensuring high availability and rapid incident response.
- Act as a consultative resource to other teams, providing guidance and support for infrastructure-related challenges
- Implement and manage CI/CD pipelines and infrastructure as code using tools like Terraform and GitLab
- Optimize cloud resources to balance cost and performance, deploying cost-saving and scalability strategies using tools like Kubecost, OpenCost, or Datadog, and best practices such as rightsizing and reserved/spot instances.
- Contribute to security policy implementation and compliance efforts (e.g., ISO27001, SOC2), supporting robust data protection and operational security practices.
- Participate in on-call rotations to ensure 24/7 availability and rapid incident response.
- Contribute to infrastructure governance and best practices, ensuring alignment with company standards and security requirements.
- Support business continuity planning and disaster recovery efforts related to infrastructure systems
- Continuously research and assess emerging technologies to enhance infrastructure efficiency, resilience, and innovation.
Must Haves
- At least 8 years of overall experience, with a minimum of 5 years in an infrastructure-focused role, managing cloud environments on AWS and Kubernetes, and with expertise in microservices architectures.
- Hands-on experience in CI/CD & GitOps (preferably with GitLab, ArgoCD) and infrastructure-as-code setup (Terraform, Helm, Kustomize), with an automation mindset (DRY), and experience with Bash; Go or Python is a plus.
- Proven experience in advanced monitoring, tracing, and alerting with observability tools such as Datadog.
- Strong governance and cost management experience related to infrastructure and deployment processes, including familiarity with cost optimization tools like Kubecost, OpenCost, or Datadog.
- Excellent communication skills with the ability to collaborate effectively across teams and support other engineers
- Demonstrated ability to identify critical issues and propose sustainable solutions at the infrastructure level in a high-impact, fast-paced environment.
- Familiarity with security frameworks like ISO27001 and SOC2; skills in applying security practices in infrastructure environments.
- Familiarity with CI/CD pipelines, Terraform, and GitOps practices
- Comfort working in a dynamic, collaborative environment with a proactive attitude toward continuous improvement and innovation.
- Willingness to participate in on-call rotations and support cross-team infrastructure needs
- No specific location requirement; remote candidates are welcome, with a preference for some timezone overlap for collaboration
Nice-to-Haves
- Relevant security certifications (CISSP, CISM, or equivalent)
- Knowledge of cloud security and infrastructure (AWS, GCP, Azure)
- Experience with automation and security tooling implementation
- Background in risk management or IT audit
Skills
- Communications Skills
- Leadership
- Software Engineering
- Team Collaboration