Skip to main content
Blip Careers Site Our Teams

Senior Site Reliability Engineer

Blip is a top-of-the-edge Portuguese IT company, focused on software engineering solutions for sports betting and gaming. As part of the Flutter Entertainment group, we are an essential piece of the business, delivering safe and entertaining websites, mobile apps, and retail systems for over 7.6 million monthly customers around the globe.

We bet on people first. That’s why employer branding and flexible practices are cornerstones of our working culture. And our working culture is more than job benefits, it empowers you to come as you are and find the perfect balance between your life and your working challenges. We focus on autonomy, diversity, lifelong learning, and work-life balance.
The Role

We are seeking a motivated and experienced senior engineer to join our dynamic organisation. As a Senior Site Reliability Engineer in our UK&I division, you will be responsible for overseeing a group of employees, providing direction and support to ensure goals are met and operations run smoothly. If you have a strong background in team management and are ready to take on a new challenge, we want to hear from you. Come be a part of our team and make a positive impact on our organisation’s success.
 
What will you be doing?
  • Engage in and improve the whole lifecycle of services—from design, deployment, operation, and refinement.
  • Take an active part in production problems root cause investigation, identification, and resolution (where necessary)
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Be an active part of performance and capacity testing;
  • Optimize reliability monitoring & alerting;
  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Iteratively perform Auditing of performance and reliability vulnerabilities;
  • Define and revise Service Level Indicators (SLIs);
  • Practice sustainable incident response and blameless postmortems.
We are looking for someone who:
  • Deep familiarity building and troubleshooting release and build pipelines (ex Jenkins, buildkite, GitHub actions)
  • Experience implementing creative approach in monitoring distributed systems while leveraging industry best practices (ex instrumenting tagging taxonomy across disparate systems)
  • Experience building, managing, and deploying an application utilizing containerized microservices, in a distributed infrastructure (ex AWS, GCP, self hosted cloud)
  • Experience leveraging new technologies when it best serves a business need
  • Comprehensive understanding of incident management best practices
  • Opinionated and knowledgable approach for implementing industry best practices
  • Demonstrated experience developing teams, encouraging growth, serving as a technical mentor and leader
  • Shows strength and comprehension in at least one programming languages (ex. Java, Python, Scala, Kotlin)
  • Experience making large directional technical decisions (ex. Deciding which technology, or pattern to create or leverage)
  • Experience being “on-call” for a service, and familiarity with incident notification tooling (ex. Pagerduty, Opsgenie)
  • Comprehensive understanding of SRE principles (ex. Working knowledge of the Google SRE book)
  • Demonstrated strength in leading a project in a agile/scrum environment
  • Thrives in a diverse work environment
Excel in at least one area below:
  • Experience managing complex telemetry solutions which directly contributed to overall reliability
  • Design greenfield solutions leveraging Configuration Management/Infrastructure as Code tools (ex. Chef, puppet, Terraform)
  • Create automated tooling that contributed to multiple teams velocity
  • Demonstrated experience with project management best practices
  • Shows the ability to break down large technical concepts into effective communication with stakeholders from across the organization
  • Extensive knowledge of networking best practices, tools, and observability
  • Experiencing developing and deploying automated service configuration at the edge (ex. CDN configuration, certificate renewal)
  • Work consulting with a team being able to advise on their technology, workflows, dev tooling, monitoring, alerting best practices
  • Identified need for and lead development of automation that significantly reduced toil (ex Deployment pipelines, distributed dev environments)
  • Built and maintained a system and culture that supported and implemented SLOs
  • Has shown to be a thought leader contributing to the broader industry conversation about SRE principals and topics (ex. Speaking at conferences)

This is what you should have. What do we have, you ask? Well...you can check our     amazing perks & benefits    right     here   ! So ... Are you in?

Senior Site Reliability Engineer

Apply Now
Share

Working at Blip