Real-Time Exactly-Once Event Processing with Apache Flink, Kafka, and Pinot //Jacob Tsafatinos // MLOps Coffee Sessions #97

MLOps Coffee Sessions #97 with Jacob Tsafatinos, Real-Time Exactly-Once Event Processing with Apache Flink, Kafka, and Pinot co-hosted by Mihail Eric. // Abstract A few years ago Uber set out to create an ads platform for the Uber Eats app that relied heavily on three pillars; Speed, Reliability, and Accuracy. Some of the technical challenges they were faced with included exactly-once semantics in real-time. To accomplish this goal, they created the architecture diagram above with lots of love from Flink, Kafka, Hive, and Pinot. You can dig into the whole paper (https://go.mlops.community/k8gzZd) to see all the reasoning for their design decisions. // Bio Jacob Tsafatinos is a Staff Software Engineer at Elemy. He led the efforts of the Ad Events Processing system at Uber and has previously worked on a range of problems including data ingestion for search and machine learning recommendation pipelines. In his spare time, he can be found playing lead guitar in his band Good Kid. // MLOps Jobs board   https://mlops.pallet.xyz/jobs // Related Links Uber blog https://eng.uber.com/author/jacob-tsafatinos/ https://eng.uber.com/real-time-exactly-once-ad-event-processing/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Mihail on LinkedIn: https://www.linkedin.com/in/mihaileric/ Connect with Jacob on LinkedIn: https://www.linkedin.com/in/jacobtsaf/ Timestamps: [00:00] Introduction to Jacob Tsafatinos [00:40] Takeaways [04:25] Jacob's band [05:29] Lyrics about software engineers or artistic stuff [06:20] Connection of hobby and real-time system [08:43] How to game Spotify Algorithm? [10:00] Data stack for analytics [13:28] Uber blog [16:28] Video mess up [17:04] Considerations and importance of the Uber System [21:22] Challenges encountered through the Uber System journey [26:06] Crucial to building the system [28:13] Not exactly real-time [30:22] Design decisions main questions [34:23] Testament to OSS   [36:58] Real-time processing systems for analytical use cases vs Real-time processing systems for predictive use cases [38:46] Real-time systems necessity [41:04] Potential that opens up new doors [41:40] Runaway or learn it? [46:09] Real-time use case target [49:31] Resource constrained [50:48] ML Oops stories [52:45] Wrap up

Om Podcasten

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.