
This article explores our preparations and strategic initiatives during major sports events, focusing on how we ensure peak performance and customer satisfaction. We will discuss the meticulous planning behind our technical operations, highlight key initiatives such as optimising platform performance and decoupling cashout options, and share insights from our teams on managing high-demand periods of the Spring Racing season, that includes Cheltenham and Grand National.
Why Event Readiness and Strategic Initiatives Matter
In the dynamic landscape of technical operations, preparing for major events is essential to ensure our systems operate flawlessly under high demand. Months before events like Cheltenham and the Grand National, our technical teams meticulously analyse past metrics to identify areas for improvement and optimise our platform's performance.
One significant initiative was the decoupling of cashout features from the Bet Reporting Flow during peak sports events. This decision aimed to streamline our architecture, reduce infrastructure costs, and enhance cashout success rates. By evaluating the impact of these changes during critical events, we aimed to achieve efficiency and improve customer satisfaction.
Throughout these preparations, our teams work in war rooms, actively monitoring platform performance and swiftly addressing any issues. This proactive approach not only ensures smooth event operations but also highlights our commitment to delivering exceptional technical solutions that meet the rigorous demands of our industry.

Project Overview: Key Facts, Figures, and Milestones
In the rapidly evolving field of data processing, ensuring the efficiency and reliability of a system that handles large volumes of data is crucial. Our tech project, involving 12 Sportsbook (SBK) Platforms teams, 24 engineering managers and tech leads, and 71 engineers actively monitoring, focuses on optimising the processing and comparison of betting information across multiple data streams. Preparation for this project began on January 9th with the leadership team, followed by the first load test session on January 19th. Key collaboration dates included meetings with tech leads and service lifecycle managers throughout January. The project targeted major events like Cheltenham (March 12th to March 15th) and the Grand National (April 13th), requiring extensive coordination. The leadership team, engineering managers, and tech leads participated in 12 sync sessions, while 10 additional sessions involved service lifecycle managers. 8 overall load test sessions and 10 flow-specific load test sessions were conducted. During event days, over 200 operational tasks were performed early each day by the 12 teams, with operational checks before each race, ensuring smooth and efficient handling of the data flow. We will explore further the core methods that drive our system, detailing their functions and interactions.
Case Study: Optimising Performance During Major Sports Events
The bet information is continuously updated and published across multiple data streams. The bet reporting flow consumes these streams and stores the updated bet information. Previously, the cashout flow had to consult the bet reporting flow to obtain bet information. With this new project, the cashout flow now directly consumes information from the data streams.
This change brought several advantages. Firstly, it completely isolated the bet reporting and cashout flows from each other. Secondly, the load on the bet reporting flow was significantly reduced. Thirdly, the cashout flow became more efficient and faster, gaining the capacity to handle more requests more frequently. This improvement allowed us to increase the success rate of cashouts for our customers.
Overall, these methods work in concert to ensure that our tech project operates smoothly, efficiently, and reliably, handling large volumes of data with precision and speed. By leveraging parallel processing, we are able to maintain a high standard of data integrity and operational performance. Each method contributes to the robustness of our system, making it capable of meeting the demands of a dynamic data environment.
General Operation Cycle of the Consumer
With this project, cashout now consumes multiple data streams of bet information. These streams can send thousands of events per second, so assuring that we have a high-performant and robust consumer for them was vital. With this project, cashout now consumes multiple data streams of bet information. These streams can send thousands of events per second, so assuring that we have a high-performant and robust consumer for them was vital.
The general operation cycle of the consumer is encapsulated within the run method. The run method is the "base" method of the consumer, meaning all the consumer's logic is contained within it. What this method does is it starts the consumer (handle startup), and then cyclically processes messages from the stream (converting them and storing them in an internal database) and commits the messages to signal that they have been processed. Eventually, when the consumer is stopped, the method handles the shutdown gracefully (handle shutdown). This method is crucial as it orchestrates the core activities of the consumer, ensuring it continuously processes incoming data. It sets up the necessary configurations, establishes connections, and manages the flow of data through the system. The run method ensures that the consumer remains operational and responsive to new data.
public void run() { try { handleStartup(); while (true) { ConsumerRecords<K, V> consumerRecords = kafkaConsumer.poll(Duration.ofMillis(100)); if (!consumerRecords.isEmpty()) { processRecords(consumerRecords); processCommit(); } checkHealth(); } } catch (Exception exception) { handleShutdown(); KafkaConsumerExceptionHandler.handleKafkaConsumerException(exception, kafkaConsumer); } }
Parallel Processing of Records by the Processor
Parallel processing of records is a significant feature of our system, managed by the processRecords method. This method enables the processor to handle multiple records simultaneously, optimising efficiency and performance. By processing records in parallel, we significantly reduce the time required to handle large volumes of data, making our system more scalable and responsive.
protected void processRecords(ConsumerRecords<K, V> consumerRecords) { updateOffsetsTracker(consumerRecords); consumerRecords.forEach(consumerRecord -> Optional.ofNullable(shardedExecutor.getExecutorByKey(consumerRecord.key())) .filter(executorService -> !executorService.isShutdown()) .ifPresent(executorService -> executorService.submit(() -> { try { eventProcessor.processEvent(consumerRecord); kafkaConsumerOffsetsTracker.remove(consumerRecord.offset()); } catch (Exception exception) { thrownException.set(exception); } } ) ) ); }
Converting and Persisting Bets to the Database
When it comes to data persistence, particularly the conversion and storage of betting information, the persistBet method is employed. This method is responsible for converting bet data into a suitable format and then persisting it into our database. Ensuring the accuracy and integrity of this data is critical, as it underpins many of our analytical and operational processes.
private void persistBet(ConsumerRecord<String, BetDomainEventOuterClass.BetDomainEvent> consumerRecord) { try { Bet convertedBet = convertBet(consumerRecord.value()); retryableOperationHandler.handleOperationWithRetries(() -> dataStoreClient.persistBet(consumerRecord.value().getBet().getBetId(), convertedBet, getStreamCreationTime(consumerRecord.value()).orElseThrow()).join()); } catch (WakeupException | MaxRetriesExceededException exception) { throw exception; } catch (Exception exception) { handleException(exception, consumerRecord, PERSISTING_ACTION); } }
Bets Comparison Between Two Streams
The cashout process used to obtain bet information from the bet reporting flow, but it now connects directly to the streams that send the bet information. To ensure we didn't introduce any bugs, we needed a mechanism to guarantee that a given bet, after being transformed into the internal cashout format, was the same whether it was consumed directly from the bet stream or obtained from the bet reporting flow.
This method served to compare the bets and log any discrepancies. Since our logs did not record any differences, we concluded that there were no errors in consuming and converting the information from the bet streams. This gave us the confidence needed to move this project into production.
Comparing bets between two data streams is handled by the compareGetBets method. This method plays a vital role in maintaining data consistency and integrity across different streams. By comparing the bets from two distinct sources, we can identify discrepancies and ensure that our data remains reliable. This method involves a call that systematically compares the bets, highlighting any differences and enabling us to address them promptly.
public void compareGetBets(Set<String> betIds, List<Bet> betList) { executorService.submit(() -> { if (useDatastoreToggle.isEnabled()) { List<Bet> betReportingBets = betReportClientDelegate.getBets(betIds) .join(); compareBets(betReportingBets, betList); } else { betDatastoreDelegate.getBets(betIds) .thenAccept(betDatastoreBets -> compareBets(betList, betDatastoreBets)); } }); }
Cashout Success Rate Comparison
These statistics underscore the tangible impact of our initiatives, highlighting significant improvements in cashout success rates during high-profile events like Cheltenham and the Grand National. These metrics serve as a testament to the effectiveness of our strategic preparations and operational enhancements in enhancing customer satisfaction and optimising performance.
In 2023, during the Cheltenham event, Betfair (BF) boasted a cashout success rate of 54%, while Paddy Power (PP) led with 68%. Similarly, during the Grand National, BF achieved a 59% success rate, while PP slightly surpassed with 73%. Fast forward to 2024, significant improvements were evident. At Cheltenham, BF saw a remarkable increase to 79%, marking a substantial 25% surge, whereas PP experienced a more moderate rise to 77%, up by 9%. The Grand National also witnessed impressive progress, with BF achieving a 78% success rate, a notable 19% increase, and PP reaching an impressive 91%, indicating an 18% improvement. These statistics underscore the tangible impact of our decision to decouple cashout features, resulting in enhanced customer satisfaction and operational efficiency during peak events.

Team Contributions and Experiences
To bring our project to life and prepare for important events with significant customer flow, the collective expertise and dedication of our team members have been invaluable. Below are some testimonials and experiences from colleagues who have been instrumental in the success of this project.
Rui Santos, Associate Software Engineer:
"Spring Racing represents a significant challenge for all UK&I teams, due to the considerable increase in traffic on our services before, during and after the races. To ensure the quality of the services, it was imperative to identify possible risks and perform multiple performance tests. These performance tests were supported by specific targets, which needed to be aligned with other teams. To facilitate these processes, regular working group meetings were organised to discuss the tests and the associated risks. During the Cheltenham/Grand National days, in addition to the spectacular surroundings and collaboration in the Blip office, it was essential to implement active monitoring to detect possible failures in our services, particularly regarding performance (response times to our clients) and lag (on our streams). The main objectives were achieved, namely the absence of P1 or P2 incidents in our team's services, and the absence of reported problems. For the next few years, it's important to continue working to make Spring Racing smoother and improve certain aspects, such as automating some performance tests."
Filipe Lemos, Software Engineer:
"I work with Rich Content and Live Data, which makes the Spring Racing season always a very challenging event. But this year was special, a new brand had started using our services — SBG (Sky Betting & Gaming). This meant more load, more eyes, we had to be at our A-game. Additionally, we were handing over our services to a new team, which had never seen or touched this flow. Pressure was high but that is also where we as a company thrive. Prior to the event, we have done weekly load tests to confirm that our services were performant with the load expected for those days. During the event, we had all our focus on the races. We had to make sure that every runner had their silks, tips or other relevant datapoints. Every day a new team member was responsible for the sanities and to keep an eye on the production channels for any possible issue. Thankfully, everything went well, and we are ready for the next big event - Euro 2024."
Narciso Caldas, Software Engineer:
"I'm a software engineer at Blip, working in the Bet Building and Placement flow for the Betfair and Paddy Power brands. This flow is responsible for proving and allowing to place betting opportunities. I was the tech lead responsible for the services of this flow during the Spring Racing season (Cheltenham and Grand National). From the beginning, the main objective for this season was to have the most sooth possible events without any major incidents and with 0% downtime in the team's services, providing the best customer experience possible. To achieve it, we needed to ensure that the following topics were aligned with the working group and completed before the first event:
- Have realistic targets defined for the services;
- Service capacity assessed and capable of handling the targets;
- Risks and action plans defined to act in case of issues;
- Alignment with the teams of dependent services about the previous topics.
During the Cheltenham and Grand National events, the office has a different atmosphere since people from all the teams go there to monitor the services. There is a lot of communication and exchange of information between the teams to help each other and have fun. During racing days, the monitoring activities consisted of looking at and analysing metrics and logs from the services to act rapidly and minimise the impact on the customers if we found any issues. Thankfully, we had a smooth Spring Racing season without any major issues, which paid off all the effort we put into the preparation and monitoring and was our objective for the festival. Having a clear plan of action on what to do daily and if something goes wrong and good communication with all the working group was essential to achieve a smooth festival and great experience from our services."

Conclusion
Our strategic initiatives and meticulous preparations have proven instrumental in ensuring peak performance and customer satisfaction during major sports events like Cheltenham and the Grand National. By optimising platform performance and decoupling cashout options, we streamlined our operations, reduced costs, and significantly enhanced our cashout success rates. Through collaborative efforts across our technical teams, we maintained flawless system operations, swiftly addressing challenges in real-time, and ultimately delivering exceptional technical solutions that meet the rigorous demands of our industry. Looking ahead, we remain committed to refining our processes and leveraging our learnings to further elevate the experience for our customers in future events.
We acknowledge the expertise and input of João Marques (Associate Engineering Manager), Daniel Silva (Associate Engineering Manager), João Basto (Software Engineer), and the other colleagues mentioned in the testimonials throughout the development of this article.



