Build resilient, scalable, cost-efficient infrastructure
We need someone who thinks of infrastructure as a product: reliable, observable, and optimized. They will use AI to predict failures, automate right-sizing, and ensure availability, with the speed and pragmatism that a startup demands.
Design, implement, and maintain cloud infrastructure (AWS/GCP/Azure) using Infrastructure as Code (Terraform, Pulumi).
Implement intelligent monitoring and predictive alerts with AI-powered tools (Datadog, Grafana with ML, etc.).
Run full infra inventories, document architecture, identify single points of failure, and continuously optimize costs.
Configure automated backups with restore verification, disaster recovery plans with defined RTO/RPO, and auto-scaling.
Create runbooks for common incidents and document everything so the team can operate autonomously.
Lead right-sizing and cost allocation by project/team, always seeking efficiency without sacrificing performance.
5+ years managing cloud infrastructure at scale, with mastery of at least one major cloud provider.
Experience using AI/ML for AIOps: anomaly detection, predictive capacity planning, auto-remediation.
Natural problem solver: when an incident occurs, implement the permanent fix, not just a workaround.
Comfortable operating in high-uncertainty, fast-changing environments (startup stage).
Experience with IaC (Terraform/Pulumi), containers (Docker/K8s), and CI/CD pipelines.
Preferred certifications: AWS Solutions Architect, GCP Professional Cloud Architect, CKA.
What defines us and what we expect from everyone on the team:
AI as a superpower: We don't use AI for the sake of it. We integrate it into every process to multiply our capacity and speed.
Radical proactivity: We don't wait to be told what to do. We see the problem, propose the solution, and execute.
Resolution over perfection: We prefer a working solution today over a perfect one in three months. We iterate fast.
Constant adaptation: Change doesn't scare us, it motivates us. We pivot with data, not fear.
Total ownership: Everyone owns their area. No excuses, only solutions.
Ready to apply?
Send your resume and a short note about why you're a fit to hello@usehorizon.ai.
Apply now