Boson AI Logo

Boson AI

Network Engineer, AI/ML Infrastructure

Posted 10 Days Ago
Be an Early Applicant
In-Office
Toronto, ON
Mid level
In-Office
Toronto, ON
Mid level
Design, build, and optimize networking infrastructure for AI/ML operations. Troubleshoot issues, monitor performance, and develop automation while collaborating with teams to ensure capacity and evaluate technologies.
The summary above was generated by AI
About The Role

We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, and hundreds of servers.

You'll be hands-on with the full lifecycle of our network infrastructure: planning, building, testing, deploying, and keeping everything running at peak performance. That means troubleshooting issues as they arise, monitoring network performance and throughput, developing automation to streamline operations, and working closely with HPC and ML teams to ensure they have the bandwidth they need. You'll also help us plan for future capacity and evaluate emerging network technologies as we scale to meet increasingly demanding workloads.

Responsibilities

  • Configure and maintain InfiniBand and high-speed Ethernet fabrics
  • Optimize network performance for RDMA, and GPU-to-GPU communication
  • Manage network switches (Mellanox, NVIDIA, Micas Networks)
  • Troubleshoot network bottlenecks and latency issues
  • Plan and execute network upgrades and expansions
  • Network security implementation (firewalls, VLANs, ACLs)
  • Collaborate on storage network optimizationInfrastructure monitoring

Minimum Qualifications

  • 4+ years of network engineering experience in production environments
  • Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs)
  • Hands-on experience with high-speed networking (100Gb+ Ethernet and InfiniBand)
  • Hands-on experience with network security (firewalls, ACLs, network segmentation)
  • Knowledge of HPC network topologies
  • Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB
  • Strong troubleshooting and problem-solving skills

Preferred Qualifications

  • Experience in data center environments or AI/ML infrastructure
  • Hands-on experience with high-performance Ethernet switches (e.g., Broadcom Tomahawk), and latest InfiniBand switches (e.g., Nvidia/Mellanox)
  • Experience optimizing networks for GPU-to-GPU communication
  • Experience with open-source firewall solutions (OPNsense, pfSense, or similar)
  • Experience with network automation tools
  • Understanding of distributed storage networking (Ceph cluster networks)
  • Familiarity with network monitoring and observability tools (Prometheus, Grafana)
  • Knowledge of multi-site network connectivity and WAN optimization
  • Familiarity with cloud networking in at least one platform (AWS, GCP, or Azure) including VPC design, site-to-site VPN configuration, Direct Connect/ExpressRoute/Cloud Interconnect, hybrid cloud connectivity, and cloud-to-datacenter network integration

If you're a natural problem-solver with a passion for continuous learning, we'd love to hear from you.

Top Skills

A100
AWS
Azure
Bgp
Ceph
Ethernet
GCP
Grafana
Infiniband
Mellanox Switches
Network Automation Tools
Nvidia H100
Ospf
Prometheus
Tcp/Ip
Vlans

Similar Jobs

14 Hours Ago
Easy Apply
In-Office
2 Locations
Easy Apply
Mid level
Mid level
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
As a Software Engineer, you'll develop scalable APIs and services, improve code quality, collaborate with teams, and manage AWS infrastructure.
Top Skills: AWSCassandraDynamoDBGoJavaKotlinPostgresRedisRuby
14 Hours Ago
Hybrid
Toronto, ON, CAN
Senior level
Senior level
Gaming • Information Technology • Mobile • Software
Lead the Threat Intelligence Operations, analyze cyber threats, enhance security posture, produce actionable reporting, and mentor junior analysts.
Top Skills: Cyber Kill ChainDiamond ModelMitre Att&CkPowershellPythonSecurity Information And Event Management SystemsThreat Intelligence Platforms
14 Hours Ago
Easy Apply
Hybrid
Toronto, ON, CAN
Easy Apply
Mid level
Mid level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Join Samsara as a Full-Stack Software Engineer, developing customer-facing features and maintaining production systems in a platform team, impacting global operations.
Top Skills: AWSGoGraphQLReactReact NativeTypescript

What you need to know about the Vancouver Tech Scene

Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account