ScalePad Logo

ScalePad

Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Vancouver, BC
Mid level
In-Office or Remote
Hiring Remotely in Vancouver, BC
Mid level
As a Site Reliability Engineer, you'll ensure system reliability, scalability, and efficiency, supporting developer experience, automating tasks, and optimizing performance.
The summary above was generated by AI

We're Hiring!

We're looking for brilliant thinkers to join our #Rocketeers. If you've ever wondered what it's like to work in a place where people enjoy their work and where talent is more important than the title, then keep reading.

What is ScalePad?

ScalePad is a market-leading software-as-a-service (SaaS) company with headquarters in Vancouver, Toronto, Montreal and Phoenix, AZ. However, we are proud to say our employee reach is now global so we can best serve our partners all over the world.


Our success is no accident: ScalePad provides MSPs of every size with the knowledge, technology, and community they need to deliver increased client value while navigating the continuously changing terrain of the IT landscape. With a suite of integrated products that automate and standardize MSP’s operations, analyze and uncover new opportunities, and expand value to clients, ScalePad is equipping the MSP adventure.

ScalePad has received awards such as MSP Today’s Product of the Year, G2’s 2024 Fastest Growing Product, and 2024 Best IT Management Product. In 2023, it was named a Best Workplace in Canada by Great Place to Work™. ScalePad is a privately held company serving over 12,000 MSPs across the globe.

You can contribute to our innovation and appreciate how your work is helping take this company to a higher level of operational maturity. More on that here.

Your mission should you choose to accept it.

As a Site Reliability Engineer (SRE) at ScalePad, you play a crucial role in ensuring the reliability, scalability, and efficiency of our infrastructure and development platforms. You support developer experience, automate operational tasks, and optimize system performance to maintain high availability and seamless deployments. Your expertise in monitoring, incident management, and automation helps ensure that our applications run smoothly and meet reliability targets.

Responsibilities.

  • Strong proficiency in system operations, observability, and infrastructure monitoring
  • Full understanding of AWS offerings, including core compute, networking, storage, IAM
  • Experience with Infrastructure as Code (IaC) tools such as Terraform
  • Proficiency in scripting and automation using Python, Bash, or equivalent languages
  • Base knowledge of Java, Go, and Python is a strong plus
  • Knowledge of CI/CD pipelines and best practices for continuous integration and delivery
  • Experience with containerization and orchestration technologies such as Kubernetes and Docker
  • Strong understanding of SLOs, SLAs, and incident management best practices.
  • Ability to troubleshoot and resolve complex system issues in a high-availability environment.
  • Familiarity with Agile methodologies and DevOps culture

Qualifications.

1. System Operations and Reliability

  • Maintain and improve system uptime and reliability according to established Service Level Objectives (SLOs)
  • Monitor and optimize system performance using observability tools like Prometheus and Grafana
  • Implement and maintain alerting systems to proactively detect and resolve issues
  • Execute capacity planning and scaling activities, ensuring infrastructure efficiency
  • Participate in the 24/7 on-call rotation, responding to and resolving system outages

2. Incident Management

  • Respond to and resolve production incidents within defined Service Level Agreements (SLAs)
  • Document incident responses and contribute to post-mortem analysis to improve system resilience
  • Implement preventive measures based on insights from incidents
  • Manage escalations and coordinate with teams to resolve complex system issues

3. Development and Automation

  • Develop and maintain Infrastructure as Code (IaC) to enable automated infrastructure management
  • Create and optimize CI/CD pipelines, ensuring smooth and reliable software releases
  • Write automation scripts for routine operational tasks, reducing manual workload
  • Implement monitoring solutions and dashboards to provide real-time system visibility

4. Collaboration

  • Work closely with development teams, ensuring seamless integration of SRE principles into application design
  • Participate in team planning and retrospective meetings, contributing to continuous improvement
  • Document technical processes and procedures, making knowledge accessible across teams
  • Contribute to knowledge base maintenance, sharing best practices and troubleshooting insights

What You’ll Love Working As A Rocketeer:

  • Everyone’s an Owner: Through our Employee Stock Option Plan (ESOP), each team member has a stake in our success. As we scale, your contributions directly shape our future – and you share in the rewards.
  • Growth, Longevity and Stability: Benefit from insights and training from our leadership and founder, whose extensive experience in funding and scaling successful software companies creates a stable environment for your long-term career growth. Their proven track record fosters a culture of lasting success.
  • Annual Training & Development: Every employee receives an annual budget for professional development, empowering you to advance your skills and career on your terms.
  • Hybrid Flexibility: Enjoy a world-class office at our headquarters in downtown Vancouver, Toronto, and Montreal
  • Cutting-Edge Gear: Whether in the office or at home, you’ll be set up for success with top-of-the-line hardware.
  • Wellness at Work: Our Vancouver office features a fitness facility, outdoor ping-pong tables
  • Comprehensive Benefits: We’ve got you covered with an extensive benefits package with 100% medical and dental coverage fully employer-paid, RRSP matching after one year of employment, and even a monthly stipend to help offset the costs of the hybrid experience.
  • Flexible Time Off: With our unlimited flex-time policy in addition to all accrued vacation allows you to take the time you need to recharge and thrive.

Dream jobs don’t knock on your door every day.  

ScalePad is not your typical software company. When we hire you, we aren’t just offering you a job, but rather we are committing to investing in both you and your long-term career. You'll help shape how this modern SaaS company operates and make a genuine impact on the future of our people, product, and partners.

We invite all qualified candidates to apply. Please note, you must be eligible to work in Canada to be considered for this role. We thank you for your interest. However, only successful applicants will be contacted.

At ScalePad, we believe in the power of Diversity, Equity, Inclusion, and Belonging (DEIB) to drive innovation, collaboration, and success. We are committed to fostering a workplace where every individual's unique experiences and perspectives are valued, and where employees from all backgrounds can thrive. Our dedication to DEIB is woven into the fabric of our culture, guiding our actions and decisions as we build a stronger and more inclusive future together.

Join us and be part of a team that celebrates differences, embraces fairness, and ensures that everyone has an equal opportunity to contribute and grow. Together, we're creating an environment where diverse voices are not only heard but also amplified, where everyone feels valued, and where we can all achieve our full potential.

Please no recruiters or phone calls.

Top Skills

AWS
Bash
Ci/Cd
Docker
Go
Grafana
Java
Kubernetes
Prometheus
Python
Terraform

ScalePad Vancouver, British Columbia, CAN Office

1021 West Hastings, 3200, , Vancouver, BC , Canada, V6E 0C3

Similar Jobs

4 Days Ago
Remote
Canada
Expert/Leader
Expert/Leader
Information Technology • Cybersecurity
As Director of SRE, lead infrastructure and reliability efforts, focusing on cloud scalability, cost optimization, and team management to ensure efficient and secure services at Blackpoint Cyber.
Top Skills: AWSAzureCi/CdCloudFormationDatadogGCPGrafanaKubernetesPrometheusPulumiSplunkTerraform
5 Days Ago
Easy Apply
Remote
2 Locations
Easy Apply
Senior level
Senior level
Fintech • Information Technology
As a Site Reliability Engineer, you will ensure the reliability and performance of systems, manage incidents, and collaborate with teams to enhance applications and their scalability.
Top Skills: GoKafkaKubernetesLinuxPrometheusRabbitMQRedpanda
6 Days Ago
In-Office or Remote
16 Locations
Mid level
Mid level
Software
As a Site Reliability Engineer, you will enhance system reliability, manage go-to incident processes, build automation systems, and improve observability across the tech stack.
Top Skills: AWSDatadogGrafanaNode.jsOpentelemetry

What you need to know about the Vancouver Tech Scene

Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account