聊聊PowerJob的MapReduceProcessor

2024-01-26 09:21:14 浏览数 (2)

本文主要研究一下PowerJob的MapReduceProcessor

MapReduceProcessor

代码语言:javascript复制
public interface MapReduceProcessor extends MapProcessor {

    /**
     * reduce方法将在所有任务结束后调用
     * @param context 任务上下文
     * @param taskResults 保存了各个子Task的执行结果
     * @return reduce产生的结果将作为任务最终的返回结果
     */
    ProcessResult reduce(TaskContext context, List<TaskResult> taskResults);
}

MapReduceProcessor继承了MapProcessor,它新增了reduce方法

TaskResult

tech/powerjob/worker/core/processor/TaskResult.java

代码语言:javascript复制
@Data
public class TaskResult {

    private String taskId;
    private boolean success;
    private String result;

}

TaskResult定义了taskId、success、result属性

handleLastTask

tech/powerjob/worker/core/processor/runnable/HeavyProcessorRunnable.java

代码语言:javascript复制
    private void handleLastTask(String taskId, Long instanceId, TaskContext taskContext, ExecuteType executeType) {
        final BasicProcessor processor = processorBean.getProcessor();
        ProcessResult processResult;
        Stopwatch stopwatch = Stopwatch.createStarted();
        log.debug("[ProcessorRunnable-{}] the last task(taskId={}) start to process.", instanceId, taskId);

        List<TaskResult> taskResults = workerRuntime.getTaskPersistenceService().getAllTaskResult(instanceId, task.getSubInstanceId());
        try {
            switch (executeType) {
                case BROADCAST:

                    if (processor instanceof BroadcastProcessor) {
                        BroadcastProcessor broadcastProcessor = (BroadcastProcessor) processor;
                        processResult = broadcastProcessor.postProcess(taskContext, taskResults);
                    } else {
                        processResult = BroadcastProcessor.defaultResult(taskResults);
                    }
                    break;
                case MAP_REDUCE:

                    if (processor instanceof MapReduceProcessor) {
                        MapReduceProcessor mapReduceProcessor = (MapReduceProcessor) processor;
                        processResult = mapReduceProcessor.reduce(taskContext, taskResults);
                    } else {
                        processResult = new ProcessResult(false, "not implement the MapReduceProcessor");
                    }
                    break;
                default:
                    processResult = new ProcessResult(false, "IMPOSSIBLE OR BUG");
            }
        } catch (Throwable e) {
            processResult = new ProcessResult(false, e.toString());
            log.warn("[ProcessorRunnable-{}] execute last task(taskId={}) failed.", instanceId, taskId, e);
        }

        TaskStatus status = processResult.isSuccess() ? TaskStatus.WORKER_PROCESS_SUCCESS : TaskStatus.WORKER_PROCESS_FAILED;
        reportStatus(status, suit(processResult.getMsg()), null, taskContext.getWorkflowContext().getAppendedContextData());

        log.info("[ProcessorRunnable-{}] the last task execute successfully, using time: {}", instanceId, stopwatch);
    }

HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法

getAllTaskResult

tech/powerjob/worker/persistence/TaskPersistenceService.java

代码语言:javascript复制
    public List<TaskResult> getAllTaskResult(Long instanceId, Long subInstanceId) {
        try {
            return execute(() -> taskDAO.getAllTaskResult(instanceId, subInstanceId));
        }catch (Exception e) {
            log.error("[TaskPersistenceService] getTaskId2ResultMap for instance(id={}) failed.", instanceId, e);
        }
        return Lists.newLinkedList();
    }

TaskPersistenceService的getAllTaskResult方法根据instanceId, subInstanceId查询task_info表select task_id, status, result from task_info where instance_id = ? and sub_instance_id = ?,最后只返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息

小结

MapReduceProcessor继承了MapProcessor,它新增了reduce方法;HeavyProcessorRunnable的handleLastTask方法先通过workerRuntime.getTaskPersistenceService().getAllTaskResult获取taskResults,然后对于MapReduceProcessor则回调mapReduceProcessor.reduce方法;getAllTaskResult方法根据instanceId, subInstanceId查询task_info表返回状态是WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED的任务信息(task_info表只在worker节点上),默认是h2(~/powerjob/worker/h2/{uuid}/powerjob_worker_db.mv.db)

0 人点赞