多年前在并发编程网http://ifeve.com/disruptor了解到了自认为是黑科技的并发框架DISRUPTOR, 我当时在想NETTY为什么没有和它整合。后来了解过的log4j2, jstorm也慢慢有用到, 而一直以来也并没有机会去使用和了解细节, 大多时候觉得Doug Lea的JDK并发包也足够使用。而近期业务需要基于NETTY简单裹了一个类似vertx的luoying-server https://github.com/zealzeng/luoying-server 业务处理线程不适合在event loop中处理, 简单用有界的ThreadPoolExecutor作为worker pool, 想考虑把disruptor整合进来, 看了两天发觉对disruptor的使用场景产生了误解。
具体的入门有需要可以到并发网或官网https://github.com/LMAX-Exchange/disruptor看下
1. 官方的性能测试用例
1.1 JDK BlockingQueue的吞吐测试, 先来个一个生产者, 一个消费者用例
https://github.com/zealzeng/fabric-samples/blob/master/disruptor-demo/src/main/java/com/lmax/disruptor/queue/OneToOneQueueThroughputTest.java
package com.lmax.disruptor.queue; import com.lmax.disruptor.AbstractPerfTestQueue; import com.lmax.disruptor.support.ValueAdditionQueueProcessor; import com.lmax.disruptor.util.DaemonThreadFactory; import java.util.concurrent.*; import static com.lmax.disruptor.support.PerfTestUtil.failIf; /** * <pre> * UniCast a series of items between 1 publisher and 1 event processor. * * ---- ----- * | P1 |--->| EP1 | * ---- ----- * * Queue Based: * ============ * * put take * ---- ==== ----- * | P1 |--->| Q1 |<---| EP1 | * ---- ==== ----- * * P1 - Publisher 1 * Q1 - Queue 1 * EP1 - EventProcessor 1 * * </pre> */ public final class OneToOneQueueThroughputTest extends AbstractPerfTestQueue { private static final int BUFFER_SIZE = 1024 * 64; private static final long ITERATIONS = 1000L * 1000L * 10L; private final ExecutorService executor = Executors.newSingleThreadExecutor(DaemonThreadFactory.INSTANCE); private final long expectedResult = ITERATIONS * 3L; /////////////////////////////////////////////////////////////////////////////////////////////// private final BlockingQueue<Long> blockingQueue = new LinkedBlockingQueue<Long>(BUFFER_SIZE); private final ValueAdditionQueueProcessor queueProcessor = new ValueAdditionQueueProcessor(blockingQueue, ITERATIONS - 1); /////////////////////////////////////////////////////////////////////////////////////////////// @Override protected int getRequiredProcessorCount() { return 2; } @Override protected long runQueuePass() throws InterruptedException { final CountDownLatch latch = new CountDownLatch(1); queueProcessor.reset(latch); Future<?> future = executor.submit(queueProcessor); long start = System.currentTimeMillis(); for (long i = 0; i < ITERATIONS; i ) { blockingQueue.put(3L); } latch.await(); long opsPerSecond = (ITERATIONS * 1000L) / (System.currentTimeMillis() - start); queueProcessor.halt(); future.cancel(true); failIf(expectedResult, 0); return opsPerSecond; } public static void main(String[] args) throws Exception { OneToOneQueueThroughputTest test = new OneToOneQueueThroughputTest(); test.testImplementations(); } }
主线程往queue增加元素, 一个消费线程执行ValueAdditionQueueProcessor是runnable任务,逻辑简单就是获取队列元素后累加, 时间损耗很很小基本相当于空跑。在老的四代I5貌似每秒吞吐也蛮高, 百万级。实际场景ValueAdditionQueueProcessor处理业务也要几十到几百毫秒吧,单线程消费基本就不行了,所以ThreadPoolExecutor基本是要为获取出来的任务分配一个线程,这个是常规的搞法。
Starting Queue tests
Run 0, BlockingQueue=4,539,264 ops/sec
Run 1, BlockingQueue=5,414,185 ops/sec
Run 2, BlockingQueue=4,657,661 ops/sec
Run 3, BlockingQueue=5,288,207 ops/sec
Run 4, BlockingQueue=5,339,028 ops/sec
Run 5, BlockingQueue=5,246,589 ops/sec
Run 6, BlockingQueue=5,197,505 ops/sec
1.2 disruptor的一个生产者, 一个消费者
https://github.com/zealzeng/fabric-samples/blob/master/disruptor-demo/src/main/java/com/lmax/disruptor/sequenced/OneToOneSequencedThroughputTest.java
package com.lmax.disruptor.sequenced; import static com.lmax.disruptor.RingBuffer.createSingleProducer; import static com.lmax.disruptor.support.PerfTestUtil.failIfNot; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import com.lmax.disruptor.*; import com.lmax.disruptor.support.PerfTestUtil; import com.lmax.disruptor.support.ValueAdditionEventHandler; import com.lmax.disruptor.support.ValueEvent; import com.lmax.disruptor.util.DaemonThreadFactory; /** * <pre> * UniCast a series of items between 1 publisher and 1 event processor. * * ---- ----- * | P1 |--->| EP1 | * ---- ----- * * Disruptor: * ========== * track to prevent wrap * ------------------ * | | * | v * ---- ==== ==== ----- * | P1 |--->| RB |<---| SB | | EP1 | * ---- ==== ==== ----- * claim get ^ | * | | * -------- * waitFor * * P1 - Publisher 1 * RB - RingBuffer * SB - SequenceBarrier * EP1 - EventProcessor 1 * * </pre> */ public final class OneToOneSequencedThroughputTest extends AbstractPerfTestDisruptor { private static final int BUFFER_SIZE = 1024 * 64; private static final long ITERATIONS = 1000L * 1000L * 100L; private final ExecutorService executor = Executors.newSingleThreadExecutor(DaemonThreadFactory.INSTANCE); private final long expectedResult = PerfTestUtil.accumulatedAddition(ITERATIONS); /////////////////////////////////////////////////////////////////////////////////////////////// private final RingBuffer<ValueEvent> ringBuffer = createSingleProducer(ValueEvent.EVENT_FACTORY, BUFFER_SIZE, new YieldingWaitStrategy()); private final SequenceBarrier sequenceBarrier = ringBuffer.newBarrier(); private final ValueAdditionEventHandler handler = new ValueAdditionEventHandler(); private final BatchEventProcessor<ValueEvent> batchEventProcessor = new BatchEventProcessor<ValueEvent>(ringBuffer, sequenceBarrier, handler); { ringBuffer.addGatingSequences(batchEventProcessor.getSequence()); } /////////////////////////////////////////////////////////////////////////////////////////////// @Override protected int getRequiredProcessorCount() { return 2; } @Override protected PerfTestContext runDisruptorPass() throws InterruptedException { PerfTestContext perfTestContext = new PerfTestContext(); final CountDownLatch latch = new CountDownLatch(1); long expectedCount = batchEventProcessor.getSequence().get() ITERATIONS; handler.reset(latch, expectedCount); executor.submit(batchEventProcessor); long start = System.currentTimeMillis(); final RingBuffer<ValueEvent> rb = ringBuffer; for (long i = 0; i < ITERATIONS; i ) { long next = rb.next(); rb.get(next).setValue(i); rb.publish(next); } latch.await(); perfTestContext.setDisruptorOps((ITERATIONS * 1000L) / (System.currentTimeMillis() - start)); perfTestContext.setBatchData(handler.getBatchesProcessed(), ITERATIONS); waitForEventProcessorSequence(expectedCount); batchEventProcessor.halt(); failIfNot(expectedResult, handler.getValue()); return perfTestContext; } private void waitForEventProcessorSequence(long expectedCount) throws InterruptedException { while (batchEventProcessor.getSequence().get() != expectedCount) { Thread.sleep(1); } } public static void main(String[] args) throws Exception { OneToOneSequencedThroughputTest test = new OneToOneSequencedThroughputTest(); test.testImplementations(); } }
用例没用完整封装的Disruptor类, 而直接用了RingBuffer和BatchEventProcessor处理, 一样的处理逻辑,吞吐是千万级别。但还是那句话, 如果ValueAdditionEventHandler 耗时几十到几百ms, ring buffer再无锁再高效也没用。所以接下来我们看下disruptor的两种消费方式。
Starting Disruptor tests
Run 0, Disruptor=32,701,111 ops/sec BatchPercent=95.16% AverageBatchSize=20
Run 1, Disruptor=36,805,299 ops/sec BatchPercent=62.61% AverageBatchSize=2
Run 2, Disruptor=69,348,127 ops/sec BatchPercent=86.93% AverageBatchSize=7
Run 3, Disruptor=69,396,252 ops/sec BatchPercent=87.21% AverageBatchSize=7
Run 4, Disruptor=67,430,883 ops/sec BatchPercent=86.10% AverageBatchSize=7
Run 5, Disruptor=69,108,500 ops/sec BatchPercent=86.49% AverageBatchSize=7
Run 6, Disruptor=66,979,236 ops/sec BatchPercent=86.42% AverageBatchSize=7
2. Disruptor消息处理方式
2.1 muti-cast 广播消息
官方入门例子给的蛮多都是这个模式, 即使用Disruptor.handleEventsWith(EventHandler... handlers) 这种, 实际会构建一个BatchEventProcessor, 而对应一个线程在跑这个EventProcessor, 这个EventProcessor把ring buffer获取到的任务在同一线程内调用多个EventHandler处理。
多次调用Disruptor.handleEventsWith()就多个BatchEventProcessor消费者线程, 不过这种模式是广播, 每个BatchEventProcessor都可以获取到广播的Event.
// Construct the Disruptor Disruptor<LongEvent> disruptor = new Disruptor<>(factory, bufferSize, DaemonThreadFactory.INSTANCE); // Connect the handler disruptor.handleEventsWith(new LongEventHandler());
不是多个EventProcessor消费者去抢一个Event, 是广播。如果EventHandler如果是耗时多基本没意义, 又得起个线程池异步处理了。
2.2 Work Pool模式
即调用Disruptor.handleEventsWithWorkerPool, 这样每个WorkHandler会是一个线程,各自处理属于自己的Event, 这样就跟平常用的线程池差不多的用法了。
public final EventHandlerGroup<T> handleEventsWithWorkerPool(final WorkHandler<T>... workHandlers) { return createWorkerPool(new Sequence[0], workHandlers); }
官方也有若干个例子,百万级别吞吐, 当WorkHandler耗时较多时其实和线程池相差不大。
https://github.com/zealzeng/fabric-samples/blob/master/disruptor-demo/src/main/java/com/lmax/disruptor/workhandler/OneToThreeWorkerPoolThroughputTest.java
Starting Disruptor tests
Run 0, Disruptor=4,349,906 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 1, Disruptor=4,591,579 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 2, Disruptor=4,590,946 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 3, Disruptor=4,662,222 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 4, Disruptor=4,695,276 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 5, Disruptor=4,690,211 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 6, Disruptor=4,713,201 ops/sec BatchPercent=0.00% AverageBatchSize=-1
3. Disruptor使用场景
参考使用到disruptor的一些框架.
3.1 log4j2
Log4j2异步日志使用到了disruptor, 日志一般是有缓冲区, 满了才写到文件, 增量追加文件结合NIO等应该也比较快, 所以无论是EventHandler还是WorkHandler处理应该延迟比较小的, 写的文件也不多, 所以场景是比较合适的。
3.2 Jstorm
在流处理中不同线程中数据交换,数据计算可能蛮多内存中计算, 流计算快进快出,disruptor应该不错的选择。
3.3 百度uid-generator
部分使用ring buffer和去伪共享等思路缓存已生成的uid, 应该也部分参考了disruptor吧。
3.4小结
Luoying-framework在event loop中在使用disruptor作为work pool性能不会有什么提升, 因为服务器实现内部的业务带着数据库查询等操作, disruptor只是数据交换快, 业务慢终究还是慢。
这就好比我要寄给合同到东北,有些快递确实快两天就到了,有些慢些可能3,4天,但快递只负责把东西送收件人手上, 收件人处理合同讲不好要个一两周,快递再快也解决不了合同处理慢的问题。
不同线程间需要快速交换数据, 快速处理数据的场景我从事的领域见得不多, 可能在大数据开发中和一些中间件开发中会有用武之地。
Disruptor也没深入去看,但是源码貌似不多,如果有纰漏请大家指正, 但ring buffer, sequencer等代码值得研究。