The AI Systems team owns the core recommendations engine and ML platform that powers billions of AI-driven marketing decisions daily across some of the world's largest consumer brands. As a Senior Data Engineer, you will own the Spark-based data pipelines and data infrastructure at the heart of this system - building, scaling, and optimizing the data layer that feeds our production ML models. You will work alongside ML engineers and scientists in a collaborative environment, contributing data pipelines and products to power our core recommender systems and our DaVinci Personalization product. This is an opportunity to work end-to-end on large-scale data systems that touch millions of customers, on a team working at the intersection of data engineering and machine learning.
This role will be reporting to the Director of Engineering (AI/ML).
Responsibilities:
- Build, maintain, and optimize production data pipelines that power AI-driven personalization at scale across content selection, send-time optimization, subject line personalization, and frequency capping
- Own and scale Spark-based batch pipelines, including cluster configuration, tuning, and performance optimization across GCP Dataproc
- Build and maintain our ML Data Lake, ensuring data quality, accessibility, and efficient storage
- Support the data needs of ML Engineers and Scientists for model development, training, and evaluation
- Identify and resolve performance bottlenecks and scaling limitations in data pipelines and infrastructure
- Collaborate with distributed systems engineers on the platform's architectural evolution, ensuring data layer continuity throughout
- Continuously improve data infrastructure for greater scalability and reliability
- Release features and data products that deliver measurable and tangible business value
Qualifications:
- 5+ years of data engineering experience
- Deep expertise with Apache Spark, including the PySpark DataFrame API and experience solving challenging scaling problems
- Experience with large-scale data processing, cluster configuration, optimization, and tuning (we use GCP Dataproc)
- Strong software development skills in Python (unit testing, git, code review, CI/CD)
- Experience with data storage formats (we use Parquet, Delta Lake)
- Experience with event streaming data (we use Kafka)
- Experience with cloud computing platforms (we use Google Cloud Platform)
- Experience with advanced query optimization
- Familiar with Software Development Lifecycle practices, such as continuous integration/continuous delivery and automated deployment (we use Docker, Kubernetes, and GitHub Actions)
- Ability to collaborate with technical partners - you'll be working closely with ML engineers, scientists, and other teams to determine requirements and make design decisions
- Enjoys working in a fast-paced, goal-driven environment
The base pay range for this position is $144K CAD - 188K CAD/year, which can include additional bonus depending on the position ultimately offered, in addition to a full range of medical, financial, and/or other benefits. The base pay offered may vary depending on job-related knowledge, skills, and experience.
Studies have shown that women, communities of color, and historically underrepresented people are less likely to apply to jobs unless they meet every single qualification. We are committed to building a diverse and inclusive culture where all Inkers can thrive. If you’re excited about the role but don’t meet all of the abovementioned qualifications, we encourage you to apply. Our differences bring a breadth of knowledge and perspectives that makes us collectively stronger.
We welcome and employ people regardless of race, color, gender identity or expression, religion, genetic information, parental or pregnancy status, national origin, sexual orientation, age, citizenship, marital status, ethnicity, family or marital status, physical and mental ability, political affiliation, disability, Veteran status, or other protected characteristics. We are proud to be an equal opportunity employer.

