Guidepoint Logo

Guidepoint

Team Lead, Site Reliability Engineering

Posted 3 Days Ago
Be an Early Applicant
Toronto, ON
Senior level
Toronto, ON
Senior level
As the Team Lead for Site Reliability Engineering, you will oversee the SRE team to enhance the reliability and performance of a SaaS product, ensuring automated monitoring and efficient process optimization while collaborating across departments to manage incidents and architect resilient systems.
The summary above was generated by AI

Overview: 

Guidepoint’s Engineering team thrives on problem-solving and creating happier users. As Guidepoint works to achieve its mission of making individuals, businesses, and the world smarter through personalized knowledge-sharing solutions, the engineering team is taking on challenges to improve our internal application architecture and create new products to optimize the seamless delivery of our services. 

The site reliability engineering team lead is responsible for ensuring the reliability, scalability and performance of a SaaS product running on Azure. The role involves, leading a team of SRE’s to proactively monitor, Automate and optimize system performance while fostering a culture of collaboration with development teams, innovations and continuous improvements. As the SRE lead, this person will act as the bridge between development ad operations driving best practices of in reliability engineering and proactive management of environments thru Observability, Key areas of focus would include maintaining uptime, monitoring performance, resolving incidents, optimizing capacity, managing error budgets, and collaborating with development teams to build resilient and maintainable systems.


This is a hybrid position based in Toronto. 

What You’ll Do:

  • Guide, mentor, and upskill the SRE team, ensuring alignment with organizational priorities
  • Design and implement monitoring strategies to ensure uptime and minimize failures
  • Automate manual processes to improve efficiency and reduce human error
  • Define, manage, and maintain SLOs and SLIs to ensure high availability of systems
  • Manage error budgets and trigger breach actions as per established policies
  • Enhance Datadog automated monitoring and alerting, ensuring critical events are managed through the Status Page
  • Lead incident response alongside engineering leads, support RCA efforts, and drive auto-remediation initiatives
  • Collaborate with Product, Support, Engineering, and Cloud Operations teams to deliver scalable and reliable solutions
  • Actively participate in cost optimization initiatives with Cloud Operations and Engineering
  • Handle escalated customer issues and ensure satisfactory resolution
  • Conduct regular team meetings and training sessions
  • Identify areas for process improvement and implement best practices
  • Provide insights and recommendations to enhance reliability and customer satisfaction

What You Have:

  • 8+ years of experience in software development and Site Reliability Engineering or Production Engineering
  • 3+ years of experience leading an SRE team with expertise in Infrastructure as Code (IaC) using Terraform and Ansible, managing and operating Kubernetes clusters, and implementing monitoring and observability solutions with Datadog
  • Comprehensive understanding of web application security
  • Strong system engineering background with Linux/Windows
  • Proficient in development with Python or Golang
  • Strong understanding of Azure libraries (Client, Management, Asset)
  • In-depth knowledge of web application SaaS platforms and architecture
  • Proficient in SQL and possibly other database operations
  • Strong communication skills
  • Expertise in technical writing and documentation
  • Ability to rapidly analyze issues, anticipate consequences, make decisions, and take action
  • Ability to work independently and as part of a team
  • Experience in presenting monthly reports and metrics to managers and stakeholders

What We Offer:

  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform

About Guidepoint: 

Guidepoint is a leading research enablement platform designed to advance understanding and empower our clients’ decision-making process. Powered by innovative technology, real-time data, and hard-to-source expertise, we help our clients to turn answers into action.

Backed by a network of nearly 1.5 million experts and Guidepoint’s 1,300 employees worldwide, we inform leading organizations’ research by delivering on-demand intelligence and research on request. With Guidepoint, companies and investors can better navigate the abundance of information available today, making it both more useful and more powerful.

At Guidepoint, our success relies on the diversity of our employees, advisors, and client base, which allows us to create connections that offer a wealth of perspectives. We are committed to upholding policies that contribute to an equitable and welcoming environment for our community, regardless of background, identity, or experience.

#LI-DH1

#LI-Hybrid

Top Skills

Go
Python

Similar Jobs

2 Hours Ago
Remote
Hybrid
Toronto, ON, CAN
Mid level
Mid level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Software Engineer will collaborate with the Cash App Business team to design, build, test, and deploy features and APIs aimed at empowering nanosellers. Responsibilities include optimizing existing services, maintaining seamless operations, and contributing to a culture of continuous improvement.
Top Skills: JavaKotlin
2 Hours Ago
Remote
Hybrid
Kitchener, ON, CAN
Mid level
Mid level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
As a Software Engineer for Cash App Business, you will collaborate to design and deploy features, enhance APIs, and ensure system reliability. You'll respond to issues and share insights for growth within a globally distributed team, impacting the entrepreneurial journey of nanosellers.
Top Skills: JavaKotlin
5 Hours Ago
Toronto, ON, CAN
Senior level
Senior level
Food • Retail • Agriculture • Manufacturing
The Functional Engineer will analyze requirements, design solutions, implement SAP S4HANA configurations, conduct testing, and document processes. They will collaborate with stakeholders and support technology finance transformations.
Top Skills: SAP

What you need to know about the Vancouver Tech Scene

Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account