This job board retrieves part of its jobs from: Toronto Jobs | Emplois Montréal | IT Jobs Canada

The top job offers in Canada!

To post a job, login or create an account |  Post a Job   

A simple jobs board with daily updated offers from Canada

previous arrow
next arrow

Site Reliability Engineer – Team Lead


This is a Full-time position in Hamilton, ON posted March 1, 2021.

About Q4
Q4’s objectives are simple. Hire smart, diverse and capable people to build the best platforms possible and provide exceptional client experiences. 
We’ve been revolutionizing the Investor Relations space, connecting over 2,300 public companies including Nike, Amazon, Shopify & Apple with investors using our cloud-based full-stack IR solutions for the past 14 years.  
If you’re looking for a career opportunity with a fast-growing tech company and a ‘get it done’ attitude then we want to hear from you.
The gig.
The Team lead of Cloud Site Reliability Engineer is an extension of our Infrastructure team where you will be a valued member assisting them in identifying, quantifying and ultimately overcoming diverse technical challenges. Act as a lead for the Site Reliability Engineering team providing both technical leadership and capable execution of assigned projects. Actively promote and implement service efficiency, reliability, scalability and the development of automation to manage a set of repetitive tasks to scale.

Key Responsibilities:

  • Provide technical and thought leadership to all levels of Infrastructure services. Mentor other team members and assign tasks to complete infrastructure projects on time.
  • Coach teams on process and practices. Assist director with roadmap planning and development goals. Anticipate and manage technology evolution and map to business solutions.
  • Developing & improving application & infrastructure monitoring and alerting standards.
  • Strong programming knowledge to automation, improving existing and new system design, building resilience into our systems so the team does not have to repeatedly fix the same problems by utilizing software capabilities.
  • Writing and automating reliability-oriented development
  • Improve service reliability through root cause analysis, blameless postmortems, and using code to prevent or respond to problem recurrence.
  • Lead on-call problem escalations and outages recovery effort related to AWS cloud infrastructure investigation and support.
  • Optimizing application monitoring tool stack including fine tuning actionable alerts, process improvements, major incident review and enhancement to reduce recovery time.
  • Guiding teams with SRE best practices, including improved scalability, performance, reliability and speed to market.
  • Design and Develop high availability and business continuity using self healing based architecture, fail-over routing policies, auto scaling and other disaster recovery models.
  • Set technical vision and innovate to be at the forefront of self healing SaaS service.


  • 3+ years experience in software development and understanding the SRE operational model
  • Strong experience designing complex SaaS applications for Cloud reliability and scalability.
  • Strong knowledge of AWS Cloudformation, AWS Lambda, AWS System Manager and Dockers.
  • Strong knowledge of AWS Security Hub, Guard Duty, AWS shield and AWS firewall.
  • Strong knowledge of AWS Config, Trusted Advisor, AWS well Architected tool, AWS IAM, AD and AWS Secrets Manager
  • Experience with Github, Jira, Teamcity, Jenkins.
  • Experience with open source database ( MySQL, Postgres, Redis and others)
  • General knowledge of relational databases (i.e. Microsoft SQL Server) and non relational database (NoSQL, DynamoDB)
  • Expert level knowledge of Infrastructure as a code (IaC) practices and one or more of related DSLs such as Terraform, Ansible, or Chef).
  • Ability to program to solve problems and automate repetitive tasks in a common scripting or programming language such as C, C++, Perl, PowerShell, Python, Java, Ruby, shell (ksh, bash) and working with API (REST, SOAP)
  • Expert in operational monitoring and management of tools ( Negios, NewRelic, ELK etc)
  • General knowledge of Storage technologies (i.e. SAN, NFS, iSCSI)
  • General knowledge of Networking technologies (i.e. TCP, Load Balancing, Routing, Switching, Firewalls)
  • General knowledge of designing, troubleshooting and tuning web application environments
  • General knowledge of application and OS performance tuning in Windows and Linux environments
  • Knowledge of Clustering technologies and configurations
  • Experience with Sensitive Data Environments (SOC 2.)
Why Q4?
We are motivated by solving complex problems in unorthodox ways.  Emphasis on your well-being means you experience your true potential.  We offer a variety of benefits to ensure you can always work hard and have fun:
– 360 Support.  Leverage our lifestyle benefit and employee assistance program to spruce up your workspace, invest in personal wellness or simply spoil yourself!  
– Unlimited paid time off and flexible working hours. Rest is important. Enough said.   
– Flexible working environment.  Choose your home, one of our trendy offices or mix it up. 
– Generous health and lifestyle benefits.  You are in charge of your benefit dollars.  
– Virtual team building and socials. Keeping people connected is important.  
Invest in your development.  We’ll help you with your tuition.  
Join #Q4orce
Q4’s diverse culture fosters a friendly, open-minded workplace. As a member of a dynamic, high-performing team, each Q4 employee is hungry to learn, valued for their contribution, and approaches each day excited to make an impact. With great reasons to work here, take advantage by submitting your application to join our growing team.
Q4 values diversity and people of all backgrounds and abilities.  Should you require any accommodations prior to or during the interview process, please indicate this during the interview process.