聊聊PowerJob的HeavyTaskTracker的dispatchTask

2023-12-30 21:32:27 浏览数 (2)

本文主要研究一下PowerJob的HeavyTaskTracker的dispatchTask

dispatchTask

tech/powerjob/worker/core/tracker/task/heavy/HeavyTaskTracker.java

代码语言:javascript复制
    protected void dispatchTask(TaskDO task, String processorTrackerAddress) {

        // 1. 持久化,更新数据库(如果更新数据库失败,可能导致重复执行,先不处理)
        TaskDO updateEntity = new TaskDO();
        updateEntity.setStatus(TaskStatus.DISPATCH_SUCCESS_WORKER_UNCHECK.getValue());
        // 写入处理该任务的 ProcessorTracker
        updateEntity.setAddress(processorTrackerAddress);
        boolean success = taskPersistenceService.updateTask(instanceId, task.getTaskId(), updateEntity);
        if (!success) {
            log.warn("[TaskTracker-{}] dispatch task(taskId={},taskName={}) failed due to update task status failed.", instanceId, task.getTaskId(), task.getTaskName());
            return;
        }

        // 2. 更新 ProcessorTrackerStatus 状态
        ptStatusHolder.getProcessorTrackerStatus(processorTrackerAddress).setDispatched(true);
        // 3. 初始化缓存
        taskId2BriefInfo.put(task.getTaskId(), new TaskBriefInfo(task.getTaskId(), TaskStatus.DISPATCH_SUCCESS_WORKER_UNCHECK, -1L));

        // 4. 任务派发
        TaskTrackerStartTaskReq startTaskReq = new TaskTrackerStartTaskReq(instanceInfo, task, workerRuntime.getWorkerAddress());
        TransportUtils.ttStartPtTask(startTaskReq, processorTrackerAddress, workerRuntime.getTransporter());

        log.debug("[TaskTracker-{}] dispatch task(taskId={},taskName={}) successfully.", instanceId, task.getTaskId(), task.getTaskName());
    }

HeavyTaskTracker的dispatchTask方法通过TransportUtils.ttStartPtTask将任务派发到ProcessorTracker

ttStartPtTask

tech/powerjob/worker/common/utils/TransportUtils.java

代码语言:javascript复制
    public static void ttStartPtTask(TaskTrackerStartTaskReq req, String address, Transporter transporter) {
        final URL url = easyBuildUrl(ServerType.WORKER, WPT_PATH, WPT_HANDLER_START_TASK, address);
        transporter.tell(url, req);
    }

ttStartPtTask方法往rootPath为processorTracker,handlerPath为startTask的接口发送TaskTrackerStartTaskReq

TaskTrackerStartTaskReq

tech/powerjob/worker/pojo/request/TaskTrackerStartTaskReq.java

代码语言:javascript复制
@Getter
@Setter
@NoArgsConstructor
public class TaskTrackerStartTaskReq implements PowerSerializable {

    // TaskTracker 地址
    private String taskTrackerAddress;
    private InstanceInfo instanceInfo;

    private String taskId;
    private String taskName;
    private byte[] taskContent;
    // 子任务当前重试次数
    private int taskCurrentRetryNums;

    // 秒级任务专用
    private long subInstanceId;

    private String logConfig;


    /**
     * 创建 TaskTrackerStartTaskReq,该构造方法必须在 TaskTracker 节点调用
     */
    public TaskTrackerStartTaskReq(InstanceInfo instanceInfo, TaskDO task, String taskTrackerAddress) {

        this.taskTrackerAddress = taskTrackerAddress;
        this.instanceInfo = instanceInfo;

        this.taskId = task.getTaskId();
        this.taskName = task.getTaskName();
        this.taskContent = task.getTaskContent();

        this.taskCurrentRetryNums = task.getFailedCnt();
        this.subInstanceId = task.getSubInstanceId();

        this.logConfig = instanceInfo.getLogConfig();
    }
}

TaskTrackerStartTaskReq包含了taskTrackerAddress、instanceInfo、taskId、taskName、taskContent等信息

onReceiveTaskTrackerStartTaskReq

tech/powerjob/worker/actors/ProcessorTrackerActor.java

代码语言:javascript复制
@Slf4j
@Actor(path = RemoteConstant.WPT_PATH)
public class ProcessorTrackerActor {

    private final WorkerRuntime workerRuntime;

    public ProcessorTrackerActor(WorkerRuntime workerRuntime) {
        this.workerRuntime = workerRuntime;
    }

    /**
     * 处理来自TaskTracker的task执行请求
     * @param req 请求
     */
    @Handler(path = RemoteConstant.WPT_HANDLER_START_TASK, processType = ProcessType.NO_BLOCKING)
    public void onReceiveTaskTrackerStartTaskReq(TaskTrackerStartTaskReq req) {

        Long instanceId = req.getInstanceInfo().getInstanceId();

        // 创建 ProcessorTracker 一定能成功
        ProcessorTracker processorTracker = ProcessorTrackerManager.getProcessorTracker(
                instanceId,
                req.getTaskTrackerAddress(),
                () -> new ProcessorTracker(req, workerRuntime));

        TaskDO task = new TaskDO();

        task.setTaskId(req.getTaskId());
        task.setTaskName(req.getTaskName());
        task.setTaskContent(req.getTaskContent());
        task.setFailedCnt(req.getTaskCurrentRetryNums());
        task.setSubInstanceId(req.getSubInstanceId());

        processorTracker.submitTask(task);
    }

    //......
}    

ProcessorTrackerActor的onReceiveTaskTrackerStartTaskReq方法用于处理TaskTrackerStartTaskReq,它主要是创建ProcessorTracker,然后执行性submit方法

ProcessorTracker

tech/powerjob/worker/core/tracker/processor/ProcessorTracker.java

代码语言:javascript复制
    public ProcessorTracker(TaskTrackerStartTaskReq request, WorkerRuntime workerRuntime) {
        try {
            // 赋值
            this.startTime = System.currentTimeMillis();
            this.workerRuntime = workerRuntime;
            this.instanceInfo = request.getInstanceInfo();
            this.instanceId = request.getInstanceInfo().getInstanceId();
            this.taskTrackerAddress = request.getTaskTrackerAddress();

            this.omsLogger = OmsLoggerFactory.build(instanceId, request.getLogConfig(), workerRuntime);
            this.statusReportRetryQueue = Queues.newLinkedBlockingQueue();
            this.lastIdleTime = -1L;
            this.lastCompletedTaskCount = 0L;

            // 初始化 线程池,TimingPool 启动的任务会检查 ThreadPool,所以必须先初始化线程池,否则NPE
            initThreadPool();
            // 初始化定时任务
            initTimingJob();
            // 初始化 Processor
            processorBean = workerRuntime.getProcessorLoader().load(new ProcessorDefinition().setProcessorType(instanceInfo.getProcessorType()).setProcessorInfo(instanceInfo.getProcessorInfo()));
            log.info("[ProcessorTracker-{}] ProcessorTracker was successfully created!", instanceId);
        } catch (Throwable t) {
            log.warn("[ProcessorTracker-{}] create ProcessorTracker failed, all tasks submitted here will fail.", instanceId, t);
            lethal = true;
            lethalReason = ExceptionUtils.getMessage(t);
        }
    }

ProcessorTracker的构造器会initThreadPool、initTimingJob、加载processorBean

initThreadPool

代码语言:javascript复制
    private static final int THREAD_POOL_QUEUE_MAX_SIZE = 128;

    private void initThreadPool() {

        int poolSize = calThreadPoolSize();
        // 待执行队列,为了防止对内存造成较大压力,内存队列不能太大
        BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(THREAD_POOL_QUEUE_MAX_SIZE);
        // 自定义线程池中线程名称 (PowerJob Processor Pool -> PPP)
        ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("PPP-%d").build();
        // 拒绝策略:直接抛出异常
        RejectedExecutionHandler rejectionHandler = new ThreadPoolExecutor.AbortPolicy();

        threadPool = new ThreadPoolExecutor(poolSize, poolSize, 60L, TimeUnit.SECONDS, queue, threadFactory, rejectionHandler);

        // 当没有任务执行时,允许销毁核心线程(即线程池最终存活线程个数可能为0)
        threadPool.allowCoreThreadTimeOut(true);
    }

    private int calThreadPoolSize() {
        ExecuteType executeType = ExecuteType.valueOf(instanceInfo.getExecuteType());
        ProcessorType processorType = ProcessorType.valueOf(instanceInfo.getProcessorType());

        // 脚本类自带线程池,不过为了少一点逻辑判断,还是象征性分配一个线程
        if (processorType == ProcessorType.PYTHON || processorType == ProcessorType.SHELL) {
            return 1;
        }

        if (executeType == ExecuteType.MAP_REDUCE || executeType == ExecuteType.MAP) {
            return instanceInfo.getThreadConcurrency();
        }
        if (TimeExpressionType.FREQUENT_TYPES.contains(instanceInfo.getTimeExpressionType())) {
            return instanceInfo.getThreadConcurrency();
        }
        return 2;
    }    

initThreadPool先根据不同的processorType获取线程池大小,然后创建大小为128的ArrayBlockingQueue,然后设定线程名称前缀PPP-%d",设定rejectionHandler为AbortPolicy,最后创建ThreadPoolExecutor,设置allowCoreThreadTimeOut为true

initTimingJob

代码语言:javascript复制
    private void initTimingJob() {

        // PowerJob Processor TimingPool
        ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("PPT-%d").build();
        timingPool = Executors.newSingleThreadScheduledExecutor(threadFactory);

        timingPool.scheduleAtFixedRate(new CheckerAndReporter(), 0, 10, TimeUnit.SECONDS);
    }

initTimingJob方法设定线程名称前缀为PPT-%d",然后通过Executors.newSingleThreadScheduledExecutor创建timingPool,调度CheckerAndReporter

submitTask

代码语言:javascript复制
    public void submitTask(TaskDO newTask) {

        // 一旦 ProcessorTracker 出现异常,所有提交到此处的任务直接返回失败,防止形成死锁
        // 死锁分析:TT创建PT,PT创建失败,无法定期汇报心跳,TT长时间未收到PT心跳,认为PT宕机(确实宕机了),无法选择可用的PT再次派发任务,死锁形成,GG斯密达 T_T
        if (lethal) {
            ProcessorReportTaskStatusReq report = new ProcessorReportTaskStatusReq()
                    .setInstanceId(instanceId)
                    .setSubInstanceId(newTask.getSubInstanceId())
                    .setTaskId(newTask.getTaskId())
                    .setStatus(TaskStatus.WORKER_PROCESS_FAILED.getValue())
                    .setResult(lethalReason)
                    .setReportTime(System.currentTimeMillis());

            TransportUtils.ptReportTask(report, taskTrackerAddress, workerRuntime);
            return;
        }

        boolean success = false;
        // 1. 设置值并提交执行
        newTask.setInstanceId(instanceInfo.getInstanceId());
        newTask.setAddress(taskTrackerAddress);

        HeavyProcessorRunnable heavyProcessorRunnable = new HeavyProcessorRunnable(instanceInfo, taskTrackerAddress, newTask, processorBean, omsLogger, statusReportRetryQueue, workerRuntime);
        try {
            threadPool.submit(heavyProcessorRunnable);
            success = true;
        } catch (RejectedExecutionException ignore) {
            log.warn("[ProcessorTracker-{}] submit task(taskId={},taskName={}) to ThreadPool failed due to ThreadPool has too much task waiting to process, this task will dispatch to other ProcessorTracker.",
                    instanceId, newTask.getTaskId(), newTask.getTaskName());
        } catch (Exception e) {
            log.error("[ProcessorTracker-{}] submit task(taskId={},taskName={}) to ThreadPool failed.", instanceId, newTask.getTaskId(), newTask.getTaskName(), e);
        }

        // 2. 回复接收成功
        if (success) {
            ProcessorReportTaskStatusReq reportReq = new ProcessorReportTaskStatusReq();
            reportReq.setInstanceId(instanceId);
            reportReq.setSubInstanceId(newTask.getSubInstanceId());
            reportReq.setTaskId(newTask.getTaskId());
            reportReq.setStatus(TaskStatus.WORKER_RECEIVED.getValue());
            reportReq.setReportTime(System.currentTimeMillis());

            TransportUtils.ptReportTask(reportReq, taskTrackerAddress, workerRuntime);

            log.debug("[ProcessorTracker-{}] submit task(taskId={}, taskName={}) success, current queue size: {}.",
                    instanceId, newTask.getTaskId(), newTask.getTaskName(), threadPool.getQueue().size());
        }
    }

submitTask方法主要是创建HeavyProcessorRunnable,然后通过threadPool提交

HeavyProcessorRunnable

tech/powerjob/worker/core/processor/runnable/HeavyProcessorRunnable.java

代码语言:javascript复制
@Slf4j
@AllArgsConstructor
@SuppressWarnings("squid:S1181")
public class HeavyProcessorRunnable implements Runnable {


    private final InstanceInfo instanceInfo;
    private final String taskTrackerAddress;
    private final TaskDO task;
    private final ProcessorBean processorBean;
    private final OmsLogger omsLogger;
    /**
     * 重试队列,ProcessorTracker 将会定期重新上报处理结果
     */
    private final Queue<ProcessorReportTaskStatusReq> statusReportRetryQueue;
    private final WorkerRuntime workerRuntime;

    //......

    public void run() {
        // 切换线程上下文类加载器(否则用的是 Worker 类加载器,不存在容器类,在序列化/反序列化时会报 ClassNotFoundException)
        Thread.currentThread().setContextClassLoader(processorBean.getClassLoader());
        try {
            innerRun();
        } catch (InterruptedException ignore) {
            // ignore
        } catch (Throwable e) {
            reportStatus(TaskStatus.WORKER_PROCESS_FAILED, e.toString(), null, null);
            log.error("[ProcessorRunnable-{}] execute failed, please contact the author(@KFCFans) to fix the bug!", task.getInstanceId(), e);
        } finally {
            ThreadLocalStore.clear();
        }
    }

    //......
}    

HeavyProcessorRunnable的run方法主要是执行innerRun方法

innerRun

代码语言:javascript复制
    public void innerRun() throws InterruptedException {

        final BasicProcessor processor = processorBean.getProcessor();

        String taskId = task.getTaskId();
        Long instanceId = task.getInstanceId();

        log.debug("[ProcessorRunnable-{}] start to run task(taskId={}&taskName={})", instanceId, taskId, task.getTaskName());
        ThreadLocalStore.setTask(task);
        ThreadLocalStore.setRuntimeMeta(workerRuntime);

        // 0. 构造任务上下文
        WorkflowContext workflowContext = constructWorkflowContext();
        TaskContext taskContext = constructTaskContext();
        taskContext.setWorkflowContext(workflowContext);
        // 1. 上报执行信息
        reportStatus(TaskStatus.WORKER_PROCESSING, null, null, null);

        ProcessResult processResult;
        ExecuteType executeType = ExecuteType.valueOf(instanceInfo.getExecuteType());

        // 2. 根任务 & 广播执行 特殊处理
        if (TaskConstant.ROOT_TASK_NAME.equals(task.getTaskName()) && executeType == ExecuteType.BROADCAST) {
            // 广播执行:先选本机执行 preProcess,完成后 TaskTracker 再为所有 Worker 生成子 Task
            handleBroadcastRootTask(instanceId, taskContext);
            return;
        }

        // 3. 最终任务特殊处理(一定和 TaskTracker 处于相同的机器)
        if (TaskConstant.LAST_TASK_NAME.equals(task.getTaskName())) {
            handleLastTask(taskId, instanceId, taskContext, executeType);
            return;
        }

        // 4. 正式提交运行
        try {
            processResult = processor.process(taskContext);
            if (processResult == null) {
                processResult = new ProcessResult(false, "ProcessResult can't be null");
            }
        } catch (Throwable e) {
            log.warn("[ProcessorRunnable-{}] task(id={},name={}) process failed.", instanceId, taskContext.getTaskId(), taskContext.getTaskName(), e);
            processResult = new ProcessResult(false, e.toString());
        }
        reportStatus(processResult.isSuccess() ? TaskStatus.WORKER_PROCESS_SUCCESS : TaskStatus.WORKER_PROCESS_FAILED, suit(processResult.getMsg()), null, workflowContext.getAppendedContextData());
    }

innerRun先构建taskContext,然后reportStatus为WORKER_PROCESSING,之后针对广播的root任务执行handleBroadcastRootTask,最后通过processor.process(taskContext)执行任务,这里catch了Throwable,出现异常则返回ProcessResult(false, e.toString()),最后根据处理结果reportStatus为WORKER_PROCESS_SUCCESS或者WORKER_PROCESS_FAILED

小结

HeavyTaskTracker的dispatchTask方法通过TransportUtils.ttStartPtTask将任务派发到ProcessorTracker;ProcessorTrackerActor的onReceiveTaskTrackerStartTaskReq方法用于处理TaskTrackerStartTaskReq,它主要是创建ProcessorTracker,然后执行性submit方法;ProcessorTracker的构造器会initThreadPool、initTimingJob、加载processorBean;submitTask方法主要是创建HeavyProcessorRunnable,然后通过threadPool提交;HeavyProcessorRunnable的run方法主要是执行innerRun方法,innerRun主要是执行processor.process(taskContext),这里catch了Throwable,出现异常则返回ProcessResult(false, e.toString())。

0 人点赞