Realtime Store Visit Predictions at Scale -- Luca Giovagnoli(Yelp)
This talk aims to inspire attendees with a multidisciplinary Flink application, where different fields have come together with a graceful synergy. You will hear about geospatial clustering algorithms, a gradient boosting ML model, and cutting-edge stream-processing technology - all in the same talk! And, if you are wondering, you can incorporate all this into your SOA using Async I/O!
本次演讲旨在通过一个多学科的Flink应用程序来激发与会者的兴趣,在该应用程序中,不同的领域结合在一起,形成了优雅的协同效应。您将听到关于地理空间聚类算法,梯度增强ML模型,和尖端的流处理技术-在同一个谈话!而且,如果您想知道,您可以使用异步I/O将所有这些集成到您的SOA中!
After introducing our product use-case (real-time notifications for nearby local businesses), we’ll dive into the big data challenges. The talk will be describing a Visit Detection algorithm we have built to cluster raw GPS pings into Visits, using Flink state management and custom processing constructs (custom Windows, Triggers and Evictors). Finally we will discuss a real-time machine learning model to predict the correct nearby business, leveraging Flink’s Async I/O at scale.
在介绍了我们的产品用例(针对附近本地企业的实时通知)之后,我们将深入探讨大数据挑战。讨论将描述一种访问检测算法,我们已经构建了该算法,使用Flink状态管理和自定义处理结构(自定义窗口、触发器和选择器)将原始GPS Ping集群到访问中。最后,我们将讨论一个实时机器学习模型,利用Flink的异步I/O在规模上预测正确的附近业务。
Flink enabled us to scale complex algorithms to thousands of operations per second, and to power hundreds of thousands of daily push notifications. It availed itself as a clearly superior alternative, whose performance netted Yelp great cost savings, and allowed us to move away from hardly scalable Python alternatives.
Flink使我们能够将复杂的算法扩展到每秒数千次操作,并为每天数十万次推送通知提供动力。它显然是一个优秀的替代方案,它的性能使Yelp节省了大量的成本,并使我们摆脱了难以扩展的Python替代方案。