Hiive Logo

Hiive

Site Reliability Engineer

Posted 3 Days Ago
Be an Early Applicant
Hybrid
Vancouver, BC, CAN
Mid level
Hybrid
Vancouver, BC, CAN
Mid level
As a Site Reliability Engineer at Hiive, you will enhance platform reliability and performance, support AI/ML workloads, automate processes, and assist with incident responses.
The summary above was generated by AI

Hiive is redefining how private companies and their shareholders access liquidity. Through its institutional-grade platform, Hiive brings together buyers, sellers, and issuers to facilitate secondary transactions in venture-backed, pre-IPO companies, introducing efficiency, transparency, and standardization to an otherwise opaque asset class.

Recognized as one of Canada’s fastest-growing companies and backed by leading U.S. investors, Hiive is profitable, well-capitalized, and building a high-performance team to meet growing demand and pursue new market opportunities.

Interested in learning more about life at Hiive? Check out our careers page to see how you can grow with us!

As a Site Reliability Engineer at Hiive, you will be responsible for ensuring the reliability, availability, and performance of our platform. You’ll join our small but growing infrastructure team, working closely with the DevOps team and engineering leadership. As a hands-on contributor, you will build scalable and resilient infrastructure, automate processes, and respond to incidents efficiently and effectively.

You will help implement security and compliance measures, act as a trusted resource to your colleagues, and collaborate across teams to continuously improve our platform’s performance and reliability. You’ll also contribute to fostering an excellent, supportive engineering culture.

As Hiive continues to expand its use of AI across the platform, you will play a key role in building and operating the infrastructure that powers these systems. This includes supporting AI/ML workloads, improving observability into model performance and system behavior, and ensuring these services are reliable, scalable, and cost-efficient in production.

In this role, your responsibilities would include:

  • Maintain and improve our platform's uptime and availability

  • Optimize and maintain our infrastructure to improve reliability, performance, and security

  • Proactively identify and resolve scaling and reliability issues before they impact users or business metrics

  • Partner with product engineers to troubleshoot performance issues and implement effective solutions

  • Configure and maintain monitoring, alerting, and observability systems across our stack

  • Assist with incident response, including investigation, mitigation, and postmortems; develop and maintain incident runbooks

  • Participate in an on-call rotation shared across the engineering organization

  • Support and scale infrastructure for AI/ML systems, including model-serving workloads, data pipelines, and batch/async processing

  • Improve observability for AI systems (latency, cost, drift, failures) and help define reliability standards for these workloads

Required Skills:

  • Experience in a Site Reliability Engineering or similar role

  • Experience working with (writing or deploying) Elixir, or a strong desire to learn

  • Experience operating production Kubernetes clusters

  • Proficiency building infrastructure with Terraform

  • Strong experience with AWS (especially EKS, RDS, and VPC) and Vercel

  • Experience working with and optimizing PostgreSQL

  • Experience with Datadog or similar observability tools

Preferred Skills:

  • Experience working in regulated or high-compliance environments

  • Experience with CI/CD systems such as GitHub Actions

  • Experience supporting SOC 2 or similar certifications

  • Experience working with Cloudflare

  • Hands-on development experience in one or more programming languages

  • Experience supporting AI/ML systems in production (e.g., model serving, vector databases, or data pipelines)

Compensation, benefits & perks:

  • Opportunity to participate in ownership of a rapidly growing early-stage startup through our employee stock option plan.

  • Comprehensive 100% employer-paid health and dental premiums, and a health spending account.

  • A dedicated desk in our Vancouver, BC HQ, in the heart of downtown, with a fridge stocked with healthy snacks and drinks, an onsite gym and a gorgeous rooftop amenity.

  • Preference to those willing to work in our Vancouver, BC HQ, with a first-class view of the mountains. Open to Canadian or US-based remote candidates.

  • Enjoy a $20 per day commuter benefit for every day you work in our Vancouver HQ.

  • An engaging social calendar, including bi-weekly catered lunches, bi-weekly “Friday bar”, team workouts, annual summer party and holiday party, two “onsite” all-team retreats each year, semi-annual team-building events, and Hiive Womens’ Network events.

  • Significant opportunities for growth into team leadership and management roles.

  • Entrepreneurial culture, and a small and dynamic team.

  • Sponsorship, immigration and relocation for exceptional candidates.

Hiive is committed to fostering an inclusive workplace where all individuals have an opportunity to succeed.

Top Skills

AWS
Datadog
Elixir
Kubernetes
Postgres
Terraform
Vercel
HQ

Hiive Vancouver, British Columbia, CAN Office

34 W 8th Ave, Vancouver, British Columbia, Canada, V5Y 1M7

Similar Jobs

3 Days Ago
Easy Apply
Hybrid
Vancouver, BC, CAN
Easy Apply
Expert/Leader
Expert/Leader
Big Data • Cloud • Software • Database
The role involves building and maintaining secure multi-cloud infrastructure for communication between systems, incorporating networking and distributed systems expertise. Responsibilities include collaborating with teams for service connectivity and participating in a 24/7 on-call rotation.
Top Skills: AWSAzureBgpDnsGCPKubernetesSdnTcp/IpTls/Mtls
Yesterday
In-Office or Remote
CA
Senior level
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills: Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Yesterday
In-Office or Remote
CA
Senior level
Senior level
Artificial Intelligence • Software
The Site Reliability Engineer ensures the reliability and performance of products Devin and Windsurf, managing incident response, CI/CD pipelines, infrastructure as code, and fostering a reliability culture within the engineering team.
Top Skills: AWSAzureCi/CdGCPKubernetesTerraform

What you need to know about the Vancouver Tech Scene

Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account