Back to Careers

Infrastructure Engineer Lead

Build resilient, scalable, cost-efficient infrastructure

We need someone who thinks of infrastructure as a product: reliable, observable, and optimized. They will use AI to predict failures, automate right-sizing, and ensure availability, with the speed and pragmatism that a startup demands.

Key responsibilities

  • Design, implement, and maintain cloud infrastructure (AWS/GCP/Azure) using Infrastructure as Code (Terraform, Pulumi).

  • Implement intelligent monitoring and predictive alerts with AI-powered tools (Datadog, Grafana with ML, etc.).

  • Run full infra inventories, document architecture, identify single points of failure, and continuously optimize costs.

  • Configure automated backups with restore verification, disaster recovery plans with defined RTO/RPO, and auto-scaling.

  • Create runbooks for common incidents and document everything so the team can operate autonomously.

  • Lead right-sizing and cost allocation by project/team, always seeking efficiency without sacrificing performance.

What we're looking for

  • 5+ years managing cloud infrastructure at scale, with mastery of at least one major cloud provider.

  • Experience using AI/ML for AIOps: anomaly detection, predictive capacity planning, auto-remediation.

  • Natural problem solver: when an incident occurs, implement the permanent fix, not just a workaround.

  • Comfortable operating in high-uncertainty, fast-changing environments (startup stage).

  • Experience with IaC (Terraform/Pulumi), containers (Docker/K8s), and CI/CD pipelines.

  • Preferred certifications: AWS Solutions Architect, GCP Professional Cloud Architect, CKA.

Horizon DNA

What defines us and what we expect from everyone on the team:

  • AI as a superpower: We don't use AI for the sake of it. We integrate it into every process to multiply our capacity and speed.

  • Radical proactivity: We don't wait to be told what to do. We see the problem, propose the solution, and execute.

  • Resolution over perfection: We prefer a working solution today over a perfect one in three months. We iterate fast.

  • Constant adaptation: Change doesn't scare us, it motivates us. We pivot with data, not fear.

  • Total ownership: Everyone owns their area. No excuses, only solutions.

Ready to apply?

Send your resume and a short note about why you're a fit to hello@usehorizon.ai.

Apply now