Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream
Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.
Flink目前为绑定/批处理(dataset)和流式(datastream)程序提供不同的API。尽管数据流API可以处理批处理用例,但与数据集API相比,它的效率要低得多。表api构建为一个统一的api,位于两者之上,以覆盖使用相同api的批处理和流式处理,并在hood下委托给数据集或数据流。
In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:
- The future roles and interplay of DataSet, DataStream, and Table API
- The new Flink stack and the abstractions on which these APIs will build
- The new unified batch/streaming sources
- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like
在本文中,我们介绍了Flink社区为更好地统一批处理和流式处理体验而重新编写API和堆栈的最新成果。我们将讨论:
-数据集、数据流和表API的未来角色和相互作用
-新的Flink堆栈和这些API将在其上构建的抽象
-新的统一批处理/流媒体源
-批处理和流式优化在运行时有什么不同,以及批处理和流式执行的未来交互可能是什么样子的