Educational requirements: Bachelor
English requirements: Competent English
Requirements for skilled employment experience for years: 3-5 years
Required residence status: Temporary visa, Permanent resident, Citizen
Accept remote work: unacceptable
What you should know about our Engineering team:
We make room to do things the right way
You own your projects: you build it, you ship it, you run it
We were born in the cloud: practices and principles like CI/CD, IaC, O11y and CyberSec are part of what we do
When it comes to Learning and Development:
Learning is part of our fabric. We have a world-class Engineering Learning and Development program and we are passionate about career development.
Thought Leaders regularly give tailored talks and workshops for our team
We are proactive in the community attending many local and international conferences; and when covid hit we held our own day-long conference: Techtonic
We have both a technical and management track for career progression and promotions
We have a library of books and videos with good content and encourage staying on top of new practices.
In terms of our tech stack, we’re container-based, running on ECS Fargate supported by Amazon Aurora, CloudFront and S3 for our front-ends. Our infrastructure is provisioned with Terraform and we use a mix of CircleCI, Github Actions and Terraform Workspaces for our CI/CD.
Key responsibilities will include:
Investigating production incidents and conducting blameless postmortems to identify areas of improvement
Providing cool and calm guidance as an Incident Commander during major incidents
Providing guidance and hands-on support in building VGW’s cloud infrastructure
Using your unique voice and technical skills to drive improvements in processes and policies with a focus on reliability and stability
Collaboratively working with SRE teams to foster a culture of continuous improvement, driving the maturity of the SRE discipline forward
Identifying new approaches and solutions to eliminate toil in engineering teams
Participating in project kick-off meetings, code reviews and post-incident review meetings
What you will bring to the role:
You have experience working as a DevOps Engineer, Systems Administrator, SRE or a related field and ready to take the next step
You have strong knowledge of Google SRE
You have experience working with Infrastructure as Code tools
You have experience in and knowledge of CI/CD tools and techniques
You have cloud experience (AWS, Azure or GCP)
You have experience in, or a desire to learn OpenTelemetry
Nice to have:
Experience working with a microservice architecture
Understanding of Unix/Linux operating systems
Knowledge of networking protocols such as TCP, HTTP/2, WebSockets, etc.
Knowledge of Service Level Objectives, Service Level Indicators and Error Budgets