Android系统启动流程 -- 学习笔记

2022-08-28 08:58:32 浏览数 (1)

引言

Android系统每年都会迎来大版本的更新,Android系统的应用程序基于java语言编写,底层又是基于Linux内核,系统的启动流程包含了整个系统从内核-->runtime-->java世界的全过程,掌握Android系统的启动的原理是整体上理解Android架构的关键。

此外,做开机启动的优化也必须要掌握Android系统启动的流程。


Android系统架构

这里,先放一张Google官方提供的巨经典Android系统分层架构图:

本着探究原理的角度,Android系统架构分为5个层次,从下到上依次是Linux Kernel,HAL,Native库&Runtime层,Framework、App。其中每一层都包含了大量的系统子模块和子系统。

上图是Google在好几年前提供的一张Android系统架构图,虽然很经典,但是为了更进一步的理解Android系统,这里以关键进程的视角,以分层的方式来诠释Android系统,如下图所示:

接下来,就依照上图所示内容,来阐述一下Android系统启动的过程分析。


Android系统启动过程分析

从上图可以清晰的看出,Android的启动过程从下到上的一个过程是:Bootloader --> Linux Kernel --> Native --> Framework --> App,具体来说:

Bootloader层:Android底层基于Linux内核,所以这个阶段的启动流程和正常启动Linux操作系统一样(不过pc环境与嵌入式环境不同,嵌入式系统通常不会有像bios的固件程序,所以整个系统的加载任务都是通过Bootloader来完成)。当手机上电后会先执行Bootloader引导程序,Bootloader是在Linux Kernel之前运行的一段小程序,主要是为了进行硬件设备初始化、RAM检查、建立内存空间映射表等等。从而把设备的软硬件环境进入一个合适的状态,以便为下一阶段Linux Kernel的执行准备好环境。

Linux Kernel:Linux Kernel通常包含两部分代码,分别为实模式代码和保护模式代码。当Bootloader装载完内核到内存后,分别放置两部分代码到不同的内存地址,然后先执行实模式代码,然后再执行保护模式代码。这里,会先启动Kernel的swapper进程(pid=0),该进程又称为idle进程,用于初始化内核的功能模块和驱动;之后启动init进程(pid=1);再之后启动kthreadd进程(pid=2),内核级进程,它是所有内核进程的鼻祖。

init:init进程是在上一个阶段启动Linux Kernel时创建的,它是整个Android的第一个用户进程,init在Android开机启动过程中起着至关重要的作用,它会解析init.rc文件,以及其他一些init.<xxx>.rc文件,这部分的工作主要是:

(2)负责启动ServiceManager,它是binder的服务大管家;

(4)提供property属性服务的功能,比如

代码语言:javascript复制
on property:sys.boot_completed=1
  start myCode

上边的示例程序是在Android系统启动完毕时执行myCode程序。

(5)孵化出Zygote进程,Zygote进程是Android系统的第一个Java 进程,Zygote是所有Java进程的父进程;

  • Zygote:Zygote是init进程通过解析init.rc文件后fork生成的,Zygote进程的作用也是至关重要的,这部分的工作主要是: (1)加载ZygoteInit类,注册Zygote socket套接字; (2)加载Dalvik/ART虚拟机; (3)preloadClass和preloadResource,这里会把系统的类和系统资源提前加载; (4)孵化出System_server进程,System_server是Zygote孵化的第一个进程; (5)完成上述任务后,Zygote功成身退,进入休眠,随时待命,当后续收到请求创建新进程时会唤醒并执行相应工作;
  • System_server:System_server负责启动和管理整个Android Framework,它最主要的工作就是负责启动Android的系统服务,包括AMS、PMS、WMS等;
  • Home Launcher:当System_server加载了所有的系统服务后就意味着系统准备就绪了,它会向所有的服务发送一个systemready的广播。当AMS收到该条广播后,会向Zygote进程发送创建虚拟机实例的请求,Zygote进程会fork出一个新的进程,然后AMS会在系统中查找具有<category android:name = "android.intent.category.HOME" />属性的Activity,并且启动它,在这里,系统App里的Launcher应用就具有这条属性,所以Launcher就启动了;

Launcher是Android的系统桌面App,包含了映入用户眼帘的第一个ui,负责与用户进行交互。

上述过程,就是Android系统启动流程的全过程了。

原理探究

在Android启动过程中,有3个阶段是关键,分别是init、Zygote、System_server,下图展示了这几个重量级进程之间的关系:

接下来,就以Android源码的形式,来讲解一下启动过程的原理。

init

[init/init.cpp]

代码语言:javascript复制
int main(int argc, char** argv) {
    if (!strcmp(basename(argv[0]), "ueventd")) {
        return ueventd_main(argc, argv);
    }

    if (!strcmp(basename(argv[0]), "watchdogd")) {
        return watchdogd_main(argc, argv);
    }

    if (argc > 1 && !strcmp(argv[1], "subcontext")) {
        InitKernelLogging(argv);
        const BuiltinFunctionMap function_map;
        return SubcontextMain(argc, argv, &function_map);
    }

    if (REBOOT_BOOTLOADER_ON_PANIC) {
        InstallRebootSignalHandlers();
    }

    bool is_first_stage = (getenv("INIT_SECOND_STAGE") == nullptr);

    if (is_first_stage) {
        boot_clock::time_point start_time = boot_clock::now();

        // Clear the umask.
        umask(0);

        clearenv();
        setenv("PATH", _PATH_DEFPATH, 1);
        mkdir("/dev/socket", 0755);

        // Mount staging areas for devices managed by vold
        // See storage config details at http://source.android.com/devices/storage/
        mount("tmpfs", "/mnt", "tmpfs", MS_NOEXEC | MS_NOSUID | MS_NODEV,
              "mode=0755,uid=0,gid=1000");
        // /mnt/vendor is used to mount vendor-specific partitions that can not be
        // part of the vendor partition, e.g. because they are mounted read-write.
        mkdir("/mnt/vendor", 0755);

        // Now that tmpfs is mounted on /dev and we have /dev/kmsg, we can actually
        // talk to the outside world...
        InitKernelLogging(argv);

        LOG(INFO) << "init first stage started!";

        if (!DoFirstStageMount()) {
            LOG(FATAL) << "Failed to mount required partitions early ...";
        }

        SetInitAvbVersionInRecovery();

        // Enable seccomp if global boot option was passed (otherwise it is enabled in zygote).
        global_seccomp();

        // Set up SELinux, loading the SELinux policy.
        SelinuxSetupKernelLogging();
        SelinuxInitialize();

        // We're in the kernel domain, so re-exec init to transition to the init domain now
        // that the SELinux policy has been loaded.
        if (selinux_android_restorecon("/init", 0) == -1) {
            PLOG(FATAL) << "restorecon failed of /init failed";
        }

        setenv("INIT_SECOND_STAGE", "true", 1);

        static constexpr uint32_t kNanosecondsPerMillisecond = 1e6;
        uint64_t start_ms = start_time.time_since_epoch().count() / kNanosecondsPerMillisecond;
        setenv("INIT_STARTED_AT", std::to_string(start_ms).c_str(), 1);

        char* path = argv[0];
        char* args[] = { path, nullptr };
        execv(path, args);

        // execv() only returns if an error happened, in which case we
        // panic and never fall through this conditional.
        PLOG(FATAL) << "execv("" << path << "") failed";
    }

    // At this point we're in the second stage of init.
    InitKernelLogging(argv);
    LOG(INFO) << "init second stage started!";

    // Set up a session keyring that all processes will have access to. It
    // will hold things like FBE encryption keys. No process should override
    // its session keyring.
    keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1);

    // Indicate that booting is in progress to background fw loaders, etc.
    close(open("/dev/.booting", O_WRONLY | O_CREAT | O_CLOEXEC, 0000));

    property_init();

    // If arguments are passed both on the command line and in DT,
    // properties set in DT always have priority over the command-line ones.
    process_kernel_dt();
    process_kernel_cmdline();

    // Propagate the kernel variables to internal variables
    // used by init as well as the current required properties.
    export_kernel_boot_props();

    // Make the time that init started available for bootstat to log.
    property_set("ro.boottime.init", getenv("INIT_STARTED_AT"));
    property_set("ro.boottime.init.selinux", getenv("INIT_SELINUX_TOOK"));

    // Set libavb version for Framework-only OTA match in Treble build.
    const char* avb_version = getenv("INIT_AVB_VERSION");
    if (avb_version) property_set("ro.boot.avb_version", avb_version);

    // Clean up our environment.
    unsetenv("INIT_SECOND_STAGE");
    unsetenv("INIT_STARTED_AT");
    unsetenv("INIT_SELINUX_TOOK");
    unsetenv("INIT_AVB_VERSION");

    // Now set up SELinux for second stage.
    SelinuxSetupKernelLogging();
    SelabelInitialize();
    SelinuxRestoreContext();

    epoll_fd = epoll_create1(EPOLL_CLOEXEC);
    if (epoll_fd == -1) {
        PLOG(FATAL) << "epoll_create1 failed";
    }

    sigchld_handler_init();

    if (!IsRebootCapable()) {
        // If init does not have the CAP_SYS_BOOT capability, it is running in a container.
        // In that case, receiving SIGTERM will cause the system to shut down.
        InstallSigtermHandler();
    }

    property_load_boot_defaults();
    export_oem_lock_status();
    start_property_service();
    set_usb_controller();

    const BuiltinFunctionMap function_map;
    Action::set_function_map(&function_map);

    subcontexts = InitializeSubcontexts();

    ActionManager& am = ActionManager::GetInstance();
    ServiceList& sm = ServiceList::GetInstance();

    LoadBootScripts(am, sm);

    // Turning this on and letting the INFO logging be discarded adds 0.2s to
    // Nexus 9 boot time, so it's disabled by default.
    if (false) DumpState();

    am.QueueEventTrigger("early-init");

    // Queue an action that waits for coldboot done so we know ueventd has set up all of /dev...
    am.QueueBuiltinAction(wait_for_coldboot_done_action, "wait_for_coldboot_done");
    // ... so that we can start queuing up actions that require stuff from /dev.
    am.QueueBuiltinAction(MixHwrngIntoLinuxRngAction, "MixHwrngIntoLinuxRng");
    am.QueueBuiltinAction(SetMmapRndBitsAction, "SetMmapRndBits");
    am.QueueBuiltinAction(SetKptrRestrictAction, "SetKptrRestrict");
    am.QueueBuiltinAction(keychord_init_action, "keychord_init");
    am.QueueBuiltinAction(console_init_action, "console_init");

    // Trigger all the boot actions to get us started.
    am.QueueEventTrigger("init");

    // Repeat mix_hwrng_into_linux_rng in case /dev/hw_random or /dev/random
    // wasn't ready immediately after wait_for_coldboot_done
    am.QueueBuiltinAction(MixHwrngIntoLinuxRngAction, "MixHwrngIntoLinuxRng");

    // Don't mount filesystems or start core system services in charger mode.
    std::string bootmode = GetProperty("ro.bootmode", "");
    if (bootmode == "charger") {
        am.QueueEventTrigger("charger");
    } else {
        am.QueueEventTrigger("late-init");
    }

    // Run all property triggers based on current state of the properties.
    am.QueueBuiltinAction(queue_property_triggers_action, "queue_property_triggers");

    while (true) {
        // By default, sleep until something happens.
        int epoll_timeout_ms = -1;

        if (do_shutdown && !shutting_down) {
            do_shutdown = false;
            if (HandlePowerctlMessage(shutdown_command)) {
                shutting_down = true;
            }
        }

        if (!(waiting_for_prop || Service::is_exec_service_running())) {
            am.ExecuteOneCommand();
        }
        if (!(waiting_for_prop || Service::is_exec_service_running())) {
            if (!shutting_down) {
                auto next_process_restart_time = RestartProcesses();

                // If there's a process that needs restarting, wake up in time for that.
                if (next_process_restart_time) {
                    epoll_timeout_ms = std::chrono::ceil<std::chrono::milliseconds>(
                                           *next_process_restart_time - boot_clock::now())
                                           .count();
                    if (epoll_timeout_ms < 0) epoll_timeout_ms = 0;
                }
            }

            // If there's more work to do, wake up again immediately.
            if (am.HasMoreCommands()) epoll_timeout_ms = 0;
        }

        epoll_event ev;
        int nr = TEMP_FAILURE_RETRY(epoll_wait(epoll_fd, &ev, 1, epoll_timeout_ms));
        if (nr == -1) {
            PLOG(ERROR) << "epoll_wait failed";
        } else if (nr == 1) {
            ((void (*)()) ev.data.ptr)();
        }
    }

    return 0;
}

其中LoadBootScripts函数内容为:

代码语言:javascript复制
static void LoadBootScripts(ActionManager& action_manager, ServiceList& service_list) {
    Parser parser = CreateParser(action_manager, service_list);

    std::string bootscript = GetProperty("ro.boot.init_rc", "");
    if (bootscript.empty()) {
        parser.ParseConfig("/init.rc");
        if (!parser.ParseConfig("/system/etc/init")) {
            late_import_paths.emplace_back("/system/etc/init");
        }
        if (!parser.ParseConfig("/product/etc/init")) {
            late_import_paths.emplace_back("/product/etc/init");
        }
        if (!parser.ParseConfig("/odm/etc/init")) {
            late_import_paths.emplace_back("/odm/etc/init");
        }
        if (!parser.ParseConfig("/vendor/etc/init")) {
            late_import_paths.emplace_back("/vendor/etc/init");
        }
    } else {
        parser.ParseConfig(bootscript);
    }
}

从上述代码中,init进程的主要功能点:

  • 分析和运行所有的init.rc;
  • 生成设备驱动节点;
  • 提供属性服务property service;

Zygote

当init解析到下条这句,便会启动Zygote:

代码语言:javascript复制
service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
    class main                             //伴随着main class的启动而启动
    socket zygote stream 660 root system   //创建socket
    onrestart write /sys/android_power/request_state wake
    onrestart write /sys/power/state on
    onrestart restart media   

    onrestart restart netd

可以看到Zygote其实是app_process改了个名字叫Zygote了。接下来,就进入了Zygote进程了。

Zygote进程启动后,会执行到frameworks/base/cmds/app_process/App_main.cpp当main方法。

代码语言:javascript复制
int main(int argc, char* const argv[])
{
     ..... // 条件判断
    AppRuntime runtime(argv[0], computeArgBlockSize(argc, argv));
    
    // 解析命令行
    .....
    while (i < argc) {
       ..... // 解析参数
    }


    .....
    if (!niceName.isEmpty()) {
        runtime.setArgv0(niceName.string(), true /* setProcName */);
    }

    // 下边为两条分支
    if (zygote) {
        runtime.start("com.android.internal.os.ZygoteInit", args, zygote);
    } else if (className) {
        runtime.start("com.android.internal.os.RuntimeInit", args, zygote);
    } else {
        fprintf(stderr, "Error: no class name or --zygote supplied.n");
        app_usage();
        LOG_ALWAYS_FATAL("app_process: no class name or --zygote supplied.");
    }
}

在app_process进程启动时,有两个分支:

  • 当zygote为true,执行ZygoteInit
  • 当zygote为false,执行RuntimeInit

因为在Android系统启动时,zygote一定为true,所以会走到ZygoteInit.main()函数。

代码语言:javascript复制
public static void main(String argv[]) {
        ZygoteServer zygoteServer = new ZygoteServer();
        .....
        try {
            .....
            //注册socket
            zygoteServer.registerServerSocketFromEnv(socketName);

            if (!enableLazyPreload) {
                .....
                preload(bootTimingsTraceLog); //预加载
                .....
            } 
  
              .....
            if (startSystemServer) {
                // 孵化System_server
                Runnable r = forkSystemServer(abiList, socketName, zygoteServer);
                // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                // child (system_server) process.
                if (r != null) {
                    r.run();
                    return;
                }
            }
            .....
            // 功成身退,休眠
            caller = zygoteServer.runSelectLoop(abiList);
        } 
        .....
    }

Zygote进程创建Java虚拟机,并注册JNI方法, 真正成为Java进程的母体,用于孵化Java进程. 在创建完system_server进程后,zygote功成身退,调用runSelectLoop(),随时待命,当接收到请求创建新进程请求时立即唤醒并执行相应工作。

System_server

Zygote通过fork后创建System_server进程,具体是通过forkSystemServer函数:

代码语言:javascript复制
private static boolean startSystemServer(String abiList, String socketName)
        throws MethodAndArgsCaller, RuntimeException {
    ...

    // fork子进程system_server
    pid = Zygote.forkSystemServer(
            parsedArgs.uid, parsedArgs.gid,
            parsedArgs.gids,
            parsedArgs.runtimeFlags,
            null,
            parsedArgs.permittedCapabilities,
            parsedArgs.effectiveCapabilities);
    ...

    if (pid == 0) {
        if (hasSecondZygote(abiList)) {
            waitForSecondaryZygote(socketName);
        }

        zygoteServer.closeServerSocket();
        // 进入System_server进程        
        return handleSystemServerProcess(parsedArgs);
    }

        return null;
}

接着看一下handleSystemServerProcess函数:

代码语言:javascript复制
private static void handleSystemServerProcess( ZygoteConnection.Arguments parsedArgs) throws ZygoteInit.MethodAndArgsCaller {
    ...
    if (parsedArgs.niceName != null) {
         //设置当前进程名为"system_server"
        Process.setArgV0(parsedArgs.niceName);
    }

    final String systemServerClasspath = Os.getenv("SYSTEMSERVERCLASSPATH");
    if (systemServerClasspath != null) {
        //执行dex优化操作,比如services.jar
        performSystemServerDexOpt(systemServerClasspath);
        ......
    }

    if (parsedArgs.invokeWith != null) {
        ...
    } else {
        ClassLoader cl = null;
        if (systemServerClasspath != null) {
            cl = createPathClassLoader(systemServerClasspath, parsedArgs.targetSdkVersion);

            Thread.currentThread().setContextClassLoader(cl);
        }

        /*
         * Pass the remaining arguments to SystemServer.
         */
        return ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs, cl);
    }
}

在handleSystemServerProcess函数里设置了进程名,执行了dexopt的优化工作,然后就执行了ZygoteInit.zygoteInit函数:

代码语言:javascript复制
public static final Runnable zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader) throws ZygoteInit.MethodAndArgsCaller {

    Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "RuntimeInit");
    RuntimeInit.redirectLogStreams(); //重定向log输出

    RuntimeInit.commonInit(); // 通用的一些初始化
    ZygoteInit.nativeZygoteInit(); // zygote初始化
    return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); 
}

在这里,applicationInit函数经过层层调用,会抛出ZygoteInit.MethodAndArgsCaller(m, argv)的一个异常,具体代码如下:

代码语言:javascript复制
protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
        ClassLoader classLoader) {
    ...
    VMRuntime.getRuntime().setTargetHeapUtilization(0.75f);
    VMRuntime.getRuntime().setTargetSdkVersion(targetSdkVersion);
    final Arguments args = new Arguments(argv);
    //找到目标类的静态main()方法
    return findStaticMain(args.startClass, args.startArgs, classLoader);
}

private static Runnable findStaticMain(String className, String[] argv,
            ClassLoader classLoader)  {
    //此处的className等于SystemServer
    Class<?> cl = Class.forName(className, true, classLoader);
    Method  m = cl.getMethod("main", new Class[] { String[].class });
    //新建MethodAndArgsCaller对象
    return new MethodAndArgsCaller(m, argv);
}

设置虚拟机的堆利用率0.75和置TargetSdk版本;最终新建了MethodAndArgsCaller的对象,它的构造函数如下:

代码语言:javascript复制
static class MethodAndArgsCaller implements Runnable {
        /** method to call */
        private final Method mMethod;

        /** argument array */
        private final String[] mArgs;

        public MethodAndArgsCaller(Method method, String[] args) {
            mMethod = method;
            mArgs = args;
        }

        public void run() {
            try {
                mMethod.invoke(null, new Object[] { mArgs });
            } catch (IllegalAccessException ex) {
                throw new RuntimeException(ex);
            } catch (InvocationTargetException ex) {
                Throwable cause = ex.getCause();
                if (cause instanceof RuntimeException) {
                    throw (RuntimeException) cause;
                } else if (cause instanceof Error) {
                    throw (Error) cause;
                }
                throw new RuntimeException(ex);
            }
        }
    }

这里,通过run()方法,执行mMethod.invoke,反射的方式会调用到SystemServer.main()方法。

代码语言:javascript复制
public final class SystemServer {
    ...
    public static void main(String[] args) {
        //先初始化SystemServer对象,再调用对象的run()方法
        new SystemServer().run();
    }
}

private void run() {
  .....
  //加载android_servers.so库,该库包含的源码在frameworks/base/services/目录下
  System.loadLibrary("android_servers");

  //检测上次关机过程是否失败,该方法可能不会返回
  performPendingShutdown();
  createSystemContext(); //初始化系统上下文

  //创建系统服务管理
  mSystemServiceManager = new SystemServiceManager(mSystemContext);
  LocalServices.addService(SystemServiceManager.class, mSystemServiceManager);

  //启动各种系统服务
  try {
      startBootstrapServices(); // 启动引导服务
      startCoreServices();      // 启动核心服务
      startOtherServices();     // 启动其他服务
  } catch (Throwable ex) {
      Slog.e("System", "************ Failure starting system services", ex);
      throw ex;
  }

  //一直循环执行
  Looper.loop();
  throw new RuntimeException("Main thread loop unexpectedly exited");  
}

这里,最主要的就是startBootstrapServices(),startCoreServices(),startOtherServices()函数,这三个函数会启动所有的Android的系统服务。

系统服务启动完毕后,将会发送一条systemReady的广播,这条广播就会被Home Launcher的系统App接收到,从而被拉起启动。

至此,Android系统的启动流程的原理,就分析完毕了。


总结

本篇文章记录了个人学习Android系统的启动流程,从架构,到拆分启动过程的各个环节,然后从代码的角度来剖析Android系统启动的全过程。

后续,还会出Android App启动的流程分析。


作者的话

个人喜欢计算机技术,主要涉及的领域包括:Android系统,Linux内核,嵌入式软/硬件,机器人和智能硬件。同时也对其他的各个技术栈都感兴趣。

同时也很喜欢生活,喜欢享受生活,喜欢用拍照和视频的方式来记录生活。

如果你也是个爱学习爱技术的人,欢迎一起探讨。没准,咱们能称为好朋友。如果觉得本文有哪些不对的地方,欢迎指出,大家一起学习进步。

0 人点赞