Deploying ONNX models on Flink - Isaac Mckillen - Godfried(AI Stream)
The Open Neural Network exchange format (ONNX) is a popular format to export models to from a variety of frameworks. It can handle the more popular frameworks like PyTorch and MXNet but also lesser known frameworks like Chainer and PaddlePaddle. To this point there have been few attempts to integrate deep learning models into the Flink ecosystem and those that have focused entirely on Tensorflow models. However, the amount of deep learning models written in PyTorch continues to grow and many companies prefer to use the other frameworks. This talk will focus on different strategies to use ONNX models in Flink applications for realtime inference. Specifically, it will compare using an external microservice with AsyncIO, Java Embedded Python, and Lantern (a new backend for deep learning in Scala). The talk will weigh these different approaches and which setups works faster in practice and which are easier to setup. It will also feature a demonstration where we will take a recent PyTorch natural language processing model, convert it to ONNX and integrate it into a Flink application. Finally, it will also look at a set of open-source tools aimed at making it easy to take models to production and monitor performance.
开放式神经网络交换格式(ONNX)是从各种框架导出模型的流行格式。它可以处理比较流行的框架,如pytorch和mxnet,但也可以处理不太知名的框架,如chainer和paddle。到目前为止,很少有人尝试将深度学习模型集成到Flink生态系统中,而那些完全专注于TensorFlow模型的研究。然而,用pytorch编写的深度学习模型的数量继续增长,许多公司更喜欢使用其他框架。本文将重点讨论在Flink应用程序中使用ONNX模型进行实时推理的不同策略。具体来说,它将使用外部微服务与AsyncIO、Java嵌入式Python和Lunn(Scala中的深度学习的新后端)进行比较。讨论将权衡这些不同的方法,哪些设置在实践中工作更快,哪些设置更容易设置。它还将以一个演示为特色,我们将采用最近的pytorch自然语言处理模型,将其转换为onnx并将其集成到Flink应用程序中。最后,它还将研究一组开源工具,旨在使模型易于投入生产和监控性能。