Clarifai

Senior Site Reliability Engineer

Reposted 18 Days Ago

Be an Early Applicant

Canada

Senior level

Canada

Senior level

As a Senior Site Reliability Engineer, you will ensure the high availability of Clarifai's core services, monitor system performance, optimize reliability, develop resources for Kubernetes, and design scalable infrastructure solutions. You will collaborate with teams to resolve engineering challenges in both cloud and on-premise environments.

The summary above was generated by AI

Senior Site Reliability EngineerAbout the Company

Clarifai is a leading, full-lifecycle deep-learning AI platform for computer vision, natural language processing, LLM and audio recognition. We help organizations transform unstructured images, video, text, and audio data into structured data at a significantly faster and more accurate rate than humans would be able to do on their own. Founded in 2013 by Matt Zeiler, Ph.D. Clarifai has been a market leader in AI since winning the top five places in image classification at the 2013 ImageNet Challenge. Clarifai continues to grow with employees remotely based throughout the United States, Canada, Argentina, India and Estonia.

We have raised $100M in funding to date, with $60M coming from our most recent Series C, and are backed by industry leaders like Menlo Ventures, Union Square Ventures, Lux Capital, New Enterprise Associates, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm and Osage.

Clarifai is proud to be an equal opportunity workplace dedicated to pursuing, hiring, and retaining a diverse workforce.

Your Impact

Clarifai’s platform is a kubernetes-native distributed system that requires the orchestration of many components. Efficiently serving and training large neural networks presents unique design and infrastructure challenges.

You will be critical to solving these challenges both in the context of the cloud and in on premise environments. Additionally, you will be responsible for our broader cloud infrastructure and development tools and environments.

The Opportunity

Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Develop Kubernetes resources and custom tooling for seamless cloud and on-premise deployments
Design and implement scalable, secure, and cost-effective infrastructure solutions.
Partner with teams across the organization to identify & solve engineering challenges

Requirements

BS/BA in Computer Science or related degree
Good knowledge of cloud providers (AWS, GCP or similar)
Expertise with Kubernetes (EKS, GKE, self-hosted) and Infrastructure as Code using Terraform, Helm
Solid understanding of web and networking (HTTP, TLS, DNS, Certificates, etc)
Experience with CI/CD pipelines using tools such as GitHub Actions, ArgoCD, and Atlantis
Strong interpersonal skills working with teams across different time zones and regions

Great to Have

Knowledge of basic Microservice Architecture principles
Familiarity with security best practices for cloud-based systems.
Experience with relational databases, message queues, key value stores
Experience writing python, golang, or any other popular programming language
Familiarity with any RPC framework
Experience developing & building custom Kubernetes operators

Top Skills

Argocd

Atlantis

AWS

GCP

Github Actions

Helm

Key Value Stores

Kubernetes

Message Queues

Microservice Architecture

Python

Relational Databases

Terraform

Similar Jobs

Braze

Senior Site Reliability Engineer II (Kafka)

9 Days Ago

Easy Apply

Remote

Hybrid

Ontario, ON, CAN

Easy Apply

Mid level

Marketing Tech • Mobile • Software

As a Senior Site Reliability Engineer at Braze, you will maintain internal services, ensure site uptime, and enhance automation and infrastructure reliability. Collaborating with engineering teams, you will leverage a diverse tech stack to develop scalable solutions, manage incidents, and improve tooling for efficient workflows.

Top Skills: ChefDockerKafkaKubernetesMongoDBRedisRuby On RailsTerraform

Cisco Meraki

Expert chevronné de la fiabilité des sites / Senior Site Reliability Engineer, Fleet - REMOTE within Canada

18 Days Ago

Easy Apply

Remote

Hybrid

Canada

Easy Apply

Senior level

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI

The Senior Site Reliability Engineer will ensure the stability, scalability, and efficiency of the infrastructure at Cisco Meraki. Responsibilities include developing automation code, debugging complex systems, optimizing CI pipelines, and collaborating with engineering teams across multiple locations to enhance infrastructure performance and reliability.

Top Skills: AnsibleAutomationCi/CdCloudGitlab CiLinuxRspecRuby

McCain Foods

Sr Engineering Manager, SRE & Observability

21 Hours Ago

Toronto, ON, CAN

Senior level

Food • Retail • Agriculture • Manufacturing

The Sr Engineering Manager, SRE & Observability will lead the design, implementation, and monitoring of secure, fault-tolerant SRE and Observability infrastructure. Responsibilities include developing strategies, collaborating with teams, mentoring engineers, and driving operational excellence through advanced monitoring and automation techniques.

What you need to know about the Vancouver Tech Scene

Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.