ScalePad

Site Reliability Engineer

Reposted 21 Days Ago

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Vancouver, BC

Mid level

In-Office or Remote

Hiring Remotely in Vancouver, BC

Mid level

As a Site Reliability Engineer, you'll ensure system reliability, scalability, and efficiency, supporting developer experience, automating tasks, and optimizing performance.

The summary above was generated by AI

We're Hiring!

We're looking for brilliant thinkers to join our #Rocketeers. If you've ever wondered what it's like to work in a place where people enjoy their work and where talent is more important than the title, then keep reading.

What is ScalePad?

ScalePad is a market-leading software-as-a-service (SaaS) company with headquarters in Vancouver, Toronto, Montreal and Phoenix, AZ. However, we are proud to say our employee reach is now global so we can best serve our partners all over the world.

Our success is no accident: ScalePad provides MSPs of every size with the knowledge, technology, and community they need to deliver increased client value while navigating the continuously changing terrain of the IT landscape. With a suite of integrated products that automate and standardize MSP’s operations, analyze and uncover new opportunities, and expand value to clients, ScalePad is equipping the MSP adventure.

ScalePad has received awards such as MSP Today’s Product of the Year, G2’s 2024 Fastest Growing Product, and 2024 Best IT Management Product. In 2023, it was named a Best Workplace in Canada by Great Place to Work™. ScalePad is a privately held company serving over 12,000 MSPs across the globe.

You can contribute to our innovation and appreciate how your work is helping take this company to a higher level of operational maturity. More on that here.

Your mission should you choose to accept it.

As a Site Reliability Engineer (SRE) at ScalePad, you play a crucial role in ensuring the reliability, scalability, and efficiency of our infrastructure and development platforms. You support developer experience, automate operational tasks, and optimize system performance to maintain high availability and seamless deployments. Your expertise in monitoring, incident management, and automation helps ensure that our applications run smoothly and meet reliability targets.

Responsibilities.

1. System Operations and Reliability

Maintain and improve system uptime and reliability according to established Service Level Objectives (SLOs)
Monitor and optimize system performance using observability tools like Prometheus and Grafana
Implement and maintain alerting systems to proactively detect and resolve issues
Execute capacity planning and scaling activities, ensuring infrastructure efficiency
Participate in the 24/7 on-call rotation, responding to and resolving system outages

2. Incident Management

Respond to and resolve production incidents within defined Service Level Agreements (SLAs)
Document incident responses and contribute to post-mortem analysis to improve system resilience
Implement preventive measures based on insights from incidents
Manage escalations and coordinate with teams to resolve complex system issues

3. Development and Automation

Develop and maintain Infrastructure as Code (IaC) to enable automated infrastructure management
Create and optimize CI/CD pipelines, ensuring smooth and reliable software releases
Write automation scripts for routine operational tasks, reducing manual workload
Implement monitoring solutions and dashboards to provide real-time system visibility

4. Collaboration

Work closely with development teams, ensuring seamless integration of SRE principles into application design
Participate in team planning and retrospective meetings, contributing to continuous improvement
Document technical processes and procedures, making knowledge accessible across teams
Contribute to knowledge base maintenance, sharing best practices and troubleshooting insights

Qualifications.

Strong proficiency in system operations, observability, and infrastructure monitoring
Full understanding of AWS offerings, including core compute, networking, storage, IAM
Experience with Infrastructure as Code (IaC) tools such as Terraform
Proficiency in scripting and automation using Python, Bash, or equivalent languages
Base knowledge of Java, Go, and Python is a strong plus
Knowledge of CI/CD pipelines and best practices for continuous integration and delivery
Experience with containerization and orchestration technologies such as Kubernetes and Docker
Strong understanding of SLOs, SLAs, and incident management best practices.
Ability to troubleshoot and resolve complex system issues in a high-availability environment.
Familiarity with Agile methodologies and DevOps culture

What You’ll Love Working As A Rocketeer:

Everyone’s an Owner: Through our Employee Stock Option Plan (ESOP), each team member has a stake in our success. As we scale, your contributions directly shape our future – and you share in the rewards.
Growth, Longevity and Stability: Benefit from insights and training from our leadership and founder, whose extensive experience in funding and scaling successful software companies creates a stable environment for your long-term career growth. Their proven track record fosters a culture of lasting success.
Annual Training & Development: Every employee receives an annual budget for professional development, empowering you to advance your skills and career on your terms.
Hybrid Flexibility: Enjoy a world-class office at our headquarters in downtown Vancouver, Toronto, and Montreal
Cutting-Edge Gear: Whether in the office or at home, you’ll be set up for success with top-of-the-line hardware.
Wellness at Work: Our Vancouver office features a fitness facility, outdoor ping-pong tables
Comprehensive Benefits: We’ve got you covered with an extensive benefits package with 100% medical and dental coverage fully employer-paid, RRSP matching after one year of employment, and even a monthly stipend to help offset the costs of the hybrid experience.
Flexible Time Off: With our unlimited flex-time policy in addition to all accrued vacation allows you to take the time you need to recharge and thrive.

Dream jobs don’t knock on your door every day.

ScalePad is not your typical software company. When we hire you, we aren’t just offering you a job, but rather we are committing to investing in both you and your long-term career. You'll help shape how this modern SaaS company operates and make a genuine impact on the future of our people, product, and partners.

We invite all qualified candidates to apply. Please note, you must be eligible to work in Canada to be considered for this role. We thank you for your interest. However, only successful applicants will be contacted.

At ScalePad, we believe in the power of Diversity, Equity, Inclusion, and Belonging (DEIB) to drive innovation, collaboration, and success. We are committed to fostering a workplace where every individual's unique experiences and perspectives are valued, and where employees from all backgrounds can thrive. Our dedication to DEIB is woven into the fabric of our culture, guiding our actions and decisions as we build a stronger and more inclusive future together.

Join us and be part of a team that celebrates differences, embraces fairness, and ensures that everyone has an equal opportunity to contribute and grow. Together, we're creating an environment where diverse voices are not only heard but also amplified, where everyone feels valued, and where we can all achieve our full potential.

Please no recruiters or phone calls.

Top Skills

AWS

Bash

Ci/Cd

Docker

Grafana

Java

Kubernetes

Prometheus

Python

Terraform

1021 West Hastings, 3200, , Vancouver, BC , Canada, V6E 0C3

Similar Jobs

CodeRabbit

Site Reliability Engineer

Yesterday

In-Office or Remote

Senior level

Artificial Intelligence • Information Technology • Software

The Site Reliability Engineer will ensure high availability and performance of CodeRabbit's AI-powered code review platform, enhancing system reliability through infrastructure ownership, performance engineering, and automation.

Top Skills: AWSDatadogDockerElk StackGoogle Cloud PlatformGrafanaKubernetesLinuxNode.jsPrometheusTerraformTypescript

Block

Site Reliability Engineer

10 Days Ago

In-Office or Remote

Senior level

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency

The Embedded Site Reliability Engineer will develop and maintain software applications for Bitcoin mining, focusing on embedded systems and cloud observability. Responsibilities include software testing, bug triage, and collaboration with engineering teams to optimize performance and reliability.

Top Skills: CC++DatadogElasticGoGrafanaJavaScriptLinuxPythonRustSplunkSQLTypescript

Bobsled

Site Reliability Engineer

Yesterday

Easy Apply

Remote

Canada

Easy Apply

Senior level

Cloud • Software

The DevSecOps Engineer will drive security, reliability, and operational excellence for Bobsled's data-sharing platform, integrating security best practices into CI/CD, managing multi-cloud security, and ensuring compliance.

Top Skills: AWSAzureCi/CdDockerGCPHashicorp VaultKubernetesOciTerraformTypescript

What you need to know about the Vancouver Tech Scene

Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.

ScalePad

Site Reliability Engineer

Top Skills

ScalePad Vancouver, British Columbia, CAN Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the Vancouver Tech Scene