At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny.
Klaviyo’s mission is to empower businesses to independently drive their growth, and the Engineering Department's contribution to this mission is crucial. As Director of Production Infrastructure, you'll helm the creation and management of a high-performance platform designed to support the rapid innovation demanded by our R&D teams. This role is all about defining the pillars of our infrastructure, compute, storage, networking, observability, and setting a robust set of principles that guide their use.
In this position, you'll be entrusted with the responsibility of developing and maintaining platform primitives that empower our engineering teams to bring ideas to life seamlessly. Collaborating with industry leaders across engineering, security, and finance, your decisions will shape the infrastructure blueprint that underpins our scalable, secure, and cost-effective operations. As a leader, your mission is to foster a culture of ownership, innovation, and productivity while steering teams toward achieving critical reliability and performance metrics. Your role will span across defining clear service contracts, instituting capacity plans, and honing our developer enablement strategy to reduce friction and enhance developer velocity.
How You’ll Make a Difference
- Lead the definition of platform primitives such as compute runtimes, storage options, and service networking, ensuring they are scalable, secure, and aligned with Klaviyo's standards for operational excellence.
- Create and disseminate golden paths and decision trees that simplify the technological choices for R&D teams, enhancing consistency and self-sufficiency across engineering efforts.
- Drive initiatives that enhance the reliability of production systems, focusing on incident prevention, transparent response protocols, and proactive capacity planning.
- Coordinate with product teams to identify and eliminate infrastructure bottlenecks, aiding in improving the time-to-market for new services and increasing developer satisfaction.
- Establish frameworks for cost-effective infrastructure management, balancing financial discipline with flexibility and efficiency to maximize value delivery.
- Mentor and develop high-performing teams, fostering a culture of inclusivity and ownership, while setting clear, impactful goals that align with business priorities.
- Collaborate with cross-functional partners to manage platform investments, clarify ownership, and safely implement infrastructure changes that drive strategic outcomes.
- Track and report critical performance metrics, such as system reliability, developer productivity, and infrastructure costs, enabling data-driven decision-making and accountability.
- Optimize the use of AI to enhance infrastructure management and development processes, pioneering innovative workflows that keep Klaviyo at the forefront of technological advancement.
- Champion operational readiness by establishing robust SLAs and SLIs, ensuring all infrastructure components meet defined performance thresholds conforming to Klaviyo's quality standards.
- Facilitate a culture of continuous learning and experimentation with AI tools, deploying enhancements that intelligently streamline engineering workflows.
- Lead a disciplined approach to incident management and postmortems, establishing a blameless culture of learning and innovation to minimize future disruptions.
Who You Are