The Site Reliability Engineer will enhance cloud services by overseeing caching infrastructure and automation, ensuring high availability and performance. The role involves monitoring, debugging, and improving code while scaling distributed software in production environments. Responsibilities include communication across technical levels and implementing best practices in service reliability.
We are looking for an engineer who is passionate about scaling cloud services to join our growing SRE team. The SRE team owns the caching infrastructure, tooling, and automation that support Atlassian's suite of Cloud products.
We'd love it if you had an understanding of modern cloud infrastructure, programming expertise, operational experience and a desire to change the status quo. We're looking for an engineer who can analyze and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency.
On your first day, we'll expect you to have:
- 1+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
- 1+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure).
- Familiarity with Unix / Linux operating systems.
- Great emphasis to debug, improve code, and automate routine tasks.
- Backend engineering experience in one or more prominent languages such as Java, Go or Python.
- Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
It would be great, but not mandatory if you had:
- Experience implementing caching solutions, strategies, and best practices.
- Experience in microservice architecture.
- Experience building web-services and clients using REST/GraphQL.
Top Skills
Go
Java
Python
Similar Jobs at Atlassian
As a Site Reliability Engineer at Atlassian, you will manage and improve cloud infrastructure, automate processes, and ensure the reliability and performance of services. You will build monitoring into code, troubleshoot, and communicate technical issues effectively. Experience with public cloud offerings and backend engineering is essential.
As a Principal Site Reliability Engineer at Atlassian, you'll be responsible for improving the reliability and performance of cloud services. You'll leverage your expertise in cloud infrastructure and distributed systems to build reliable products, mentor engineers, and drive cross-organizational initiatives to achieve optimal outcomes. Your role is pivotal in advocating for reliability methodologies and establishing best practices across teams.
The Senior Infrastructure Engineer will deliver and maintain infrastructure that supports Loom's product initiatives, manage internal tool usability, and lead automation efforts using AWS, Kubernetes, Go, and Terraform. Responsibilities also include resolving system bottlenecks while scaling the user base significantly.
What you need to know about the Vancouver Tech Scene
Raincouver, Vancity, The Big Smoke — Vancouver is known by many names, and in recent years, it has gained a reputation as a growing hub for both tech and sustainability. Renowned for its natural beauty, the city has become a magnet for professionals eager to create environmental solutions, and with an emphasis on clean technology, renewable energy and environmental innovation, it's attracted companies across various industries, all working toward a shared goal: advancing clean technology.