← BACK_TO_JOBS

Principal Site Reliability Engineer

Accela · Remote Based - US · posted 1 day ago
REMOTE REMOTE Software / IT
PrincipalPythonAzureKubernetesTerraform

ABOUT THE ROLE

As a Principal Site Reliability Engineer, you will serve as a technical leader responsible for the reliability, scalability, performance, and operational excellence of Accela's Civic Platform. You will partner closely with Engineering, DevOps, Database Engineering, Security, and Architecture teams to evolve our cloud platform, modernize infrastructure, and ensure our SaaS offerings remain highly available, secure, and cost-effective at scale.

This role combines deep technical expertise with strategic influence. You will drive reliability initiatives, define operational standards, mentor engineers, and lead complex technical efforts that improve the resiliency and efficiency of our platform. Your focus is simple: keep systems resilient, scalable, secure, and continuously improving.

SPECIFIC RESPONSIBILITIES

  • Serve as a technical leader for reliability engineering, operational excellence, and platform modernization across the Civic Platform.
  • Drive platform modernization initiatives, including the continued evolution from VM-based architectures toward containerized and cloud-native services, in partnership with DevOps Engineering, Database Engineering, Security, and Development teams.
  • Lead efforts that improve and sustain the availability, performance, scalability, security, and cost efficiency of Accela's SaaS offerings.
  • Define, implement, and operate service level objectives (SLOs), service level agreements (SLAs), and error budgets for critical platform services, using data to drive prioritization and risk-based decision making.
  • Lead observability initiatives across metrics, distributed tracing, logging, and monitoring platforms to improve system visibility and accelerate issue detection and resolution.
  • Drive Root Cause Analysis (RCA) efforts for complex production incidents, facilitate blameless postmortems, and ensure corrective actions are implemented and tracked to completion.
  • Design, develop, and maintain automation, tooling, and software solutions that improve reliability, operational efficiency, scalability, and developer productivity.
  • Serve as a senior technical escalation point during production incidents and for platform changes that impact availability, performance, security, or compliance.
  • Partner with Security and Compliance teams to ensure platform operations meet regulatory and compliance requirements, including SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS.
  • Translate operational metrics, reliability trends, and platform health data into actionable insights for engineering leadership and executive stakeholders.
  • Mentor engineers across the Cloud Engineering organization and influence engineering best practices through technical leadership and collaboration

REQUIRED QUALIFICATIONS

  • 8+ years of experience in Site Reliability Engineering, Software Engineering, Cloud Infrastructure, or related disciplines within a SaaS environment, including experience leading complex technical initiatives.
  • Demonstrated technical leadership driving platform modernization in containerized and orchestrated environments, including Kubernetes or equivalent technologies.
  • Hands-on experience operating and supporting large-scale SaaS platforms on Microsoft Azure.
  • Experience developing automation and operational tooling using Python, PowerShell, Bash, or similar scripting languages.
  • Deep expertise designing, operating, analyzing, and troubleshooting complex distributed systems across the application, infrastructure, networking, and operating system layers.
  • Strong experience with modern observability platforms, including monitoring, logging, metrics, and distributed tracing.
  • Demonstrated success leading incident response, Root Cause Analysis, and continuous improvement initiatives.
  • Experience establishing and maturing Incident, Problem, and Change Management practices.
  • Strong written and verbal communication skills with the ability to effectively communicate technical concepts to engineering leadership and executive stakeholders.
  • Experience using Git and GitHub-based development workflows.

DESIRED QUALIFICATIONS

  • Experience with Infrastructure-as-Code practices and tooling, particularly Terraform.
  • Experience with configuration management platforms such as Ansible.
  • Experience supporting SaaS platforms subject to public-sector compliance frameworks, including SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS.
  • Experience implementing GitOps deployment methodologies using tools such as Argo CD or Flux.
  • Experience implementing and operating OpenTelemetry-based observability solutions.
  • Cloud FinOps experience, including cost optimization and resource efficiency initiatives within Microsoft Azure environments.
  • Strong Linux systems administration experience alongside Microsoft Windows expertise.
  • Experience leveraging AI-assisted engineering tools such as GitHub Copilot, Claude Code, or similar technologies to improve engineering productivity, incident response, automation, and operational efficiency.

TRAVEL

  • Up to 10% travel for team collaboration, strategic planning sessions, industry conferences, and critical business initiatives.

ABOUT ACCELA
For nearly 20 years, Accela has been an industry leader in designing and delivering government software to improve efficiency, increase citizen engagement and enable the development of thriving communities. Today, citizens are savvy to how services should be delivered, and expect a consistently convenient, openly transparent view into their local government. While government agencies struggle to do more with less, our mission has never been more critical. Accela provides a robust, cloud-based platform of government software solutions that accelerate growth, efficiency, and transparency in communities of all sizes. From planning, to building, to service request management and more, Accela’s SaaS offerings level the playing field for small and medium governments and enable smaller agencies to leverage larger city technologies. Our open and flexible technology helps agencies address specific needs today, while ensuring they are well prepared for the emerging challenges of the future.