类的加载（一）

在上篇文章应用程序的加载——dyld动态链接器的工作流程中，我们聊了动态链接器dyld，知道了dyld最终会走到objc库的初始化函数_objc_init，接下来我们就来分析一下这个函数。

一、_objc_init

首先来看一下_objc_init函数的实现：

代码语言：javascript复制

/***********************************************************************
* _objc_init
* Bootstrap initialization. Registers our image notifier with dyld.
* Called by libSystem BEFORE library initialization time
**********************************************************************/

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    lock_init();
    exception_init();

    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
}

我们可以看到，在调用_dyld_objc_notify_register函数之前调用了很多其他的函数，下面我们做简单介绍。

<一>environ_init

environ_init函数是用于初始化环境变量的，这些环境变量会影响运行时，并且如果有需要的话，在这个函数里面还可以打印出各个环境变量：

上图红框内的代码就是用来打印环境变量的，但是这个打印是有条件限制的，现在我们手动将这个打印操作提前：

此时运行工程，就可以将各个环境变量打印出来了：

这些环境变量有啥用呢？主要是用于调试。

我们之前有提到，isa指针分为nonpointer和非nonpointer。

isa指针定义如下：

代码语言：javascript复制

union isa_t {
    isa_t() { }
    isa_t(uintptr_t value) : bits(value) { }

    Class cls;
    uintptr_t bits;
#if defined(ISA_BITFIELD)
    struct {
        ISA_BITFIELD;  // defined in isa.h
    };
#endif
};

isa指针分为nonpointer指针和非nonpointer指针。

非nonpointer指针没有经过优化，它里面只通过cls属性存储对应的类的地址；

nonpointer指针是经过优化的，它通过bits存储很多信息。

需要注意的是，cls和bits是互斥的：非nonpointer指针只使用到cls，而nonpointer指针只使用到bits。

因此，如果是非nonpointer指针，那么其对象的内存地址中的第一段是等于其类的内存地址的，如下：

我们可以通过环境变量来控制是否使用nonpointer isa。我们前面不是在控制台打印出了环境变量吗，现在就直接搜索【nonpointer】字段，结果如下：

然后我们依次点击Edit Scheme -> Run -> Arguments -> Environment Variables，在这里面就可以通过设置OBJC_DISABLE_NONPOINTER_ISA字段来控制是否进行isa指针的优化了：

还有一个环境变量也需要着重说明一下：

如上图，将环境变量OBJC_PRINT_LOAD_METHODS设置成YES ，就可以打印出所有实现了 load方法的类，如下：

我们知道，load方法是影响启动时间的，所以后期再优化的时候，我们可能会需要知道都有哪些地方重写了load方法，此时就可以通过设置OBJC_PRINT_LOAD_METHODS环境变量来获取到。

<二>tls_init

代码语言：javascript复制

void tls_init(void)
{
#if SUPPORT_DIRECT_THREAD_KEYS
    _objc_pthread_key = TLS_DIRECT_KEY;
    pthread_key_init_np(TLS_DIRECT_KEY, &_objc_pthread_destroyspecific);
#else
    _objc_pthread_key = tls_create(&_objc_pthread_destroyspecific);
#endif
}

这个函数是对线程池进行初始化的。

<三>static_init

代码语言：javascript复制

/***********************************************************************
* static_init
* Run C   static constructor functions.
* libc calls _objc_init() before dyld would call our static constructors, 
* so we have to do it ourselves.
**********************************************************************/
static void static_init()
{
    size_t count;
    auto inits = getLibobjcInitializers(&_mh_dylib_header, &count);
    for (size_t i = 0; i < count; i  ) {
        inits[i]();
    }
}

static_init函数是对系统级别的C 构造函数进行调用，那些自定义的C 构造函数是不在这里进行初始化的。

static_init函数的调用是在dyld调用构造函数之前。

<四>lock_init

代码语言：javascript复制

void lock_init(void)
{
}

我们看到lock_init函数里面啥也没有，这说明它的实现苹果并没有开源出来，我们不做深究。

<五>exception_init

代码语言：javascript复制

/***********************************************************************
* exception_init
* Initialize libobjc's exception handling system.
* Called by map_images().
**********************************************************************/
void exception_init(void)
{
    old_terminate = std::set_terminate(&_objc_terminate);
}

exception_init函数是用于初始化libobjc库的异常处理系统，实际上就是注册异常监听的回调。

系统方法在执行过程中，如果遇到问题就会报出异常，系统就会捕获到这次异常，然后提供接口给到上层以允许程序员去处理这些底层的异常。

exception_init中调用了_objc_terminate函数：

代码语言：javascript复制

static void (*old_terminate)(void) = nil;
static void _objc_terminate(void)
{
    if (PrintExceptions) {
        _objc_inform("EXCEPTIONS: terminating");
    }

    if (! __cxa_current_exception_type()) {
        // No current exception.
        (*old_terminate)();
    }
    else {
        // There is a current exception. Check if it's an objc exception.
        @try {
            __cxa_rethrow();
        } @catch (id e) {
            // It's an objc object. Call Foundation's handler, if any.
            (*uncaught_handler)((id)e);
            (*old_terminate)();
        } @catch (...) {
            // It's not an objc object. Continue to C   terminate.
            (*old_terminate)();
        }
    }
}

这里的关键代码是第18行：(*uncaught_handler)((id)e);

它是用一个回调将捕获到的异常抛出去。

先来看下uncaught_handler的定义：

代码语言：javascript复制

static void _objc_default_uncaught_exception_handler(id exception)
{
}
static objc_uncaught_exception_handler uncaught_handler = _objc_default_uncaught_exception_handler;

我们看到这里是给了uncaught_handler一个默认的空实现。

接下来我们全局搜一下uncaught_handler：

我们看到，在objc_setUncaughtExceptionHandler函数中可以给uncaught_handler赋值，因此我们可以在外界通过objc_setUncaughtExceptionHandler函数传入一个实现函数fn，这样就可以在内部捕获到异常的时候抛出到外面去，方便我们在外界处理异常。

<六>_dyld_objc_notify_register

_dyld_objc_notify_register函数是我们今天研究的主角。

代码语言：javascript复制

//
// Note: only for use by objc runtime
// Register handlers to be called when objc images are mapped, unmapped, and initialized.
// Dyld will call back the "mapped" function with an array of images that contain an objc-image-info section.
// Those images that are dylibs will have the ref-counts automatically bumped, so objc will no longer need to
// call dlopen() on them to keep them from being unloaded.  During the call to _dyld_objc_notify_register(),
// dyld will call the "mapped" function with already loaded objc images.  During any later dlopen() call,
// dyld will also call the "mapped" function.  Dyld will call the "init" function when dyld would be called
// initializers in that image.  This is when objc calls any  load methods in that image.
//
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped);

需要注意的是，_dyld_objc_notify_register函数仅供OC的运行时使用。

这里面包含三个参数，其含义如下：

mapped（外层传入的&map_images），dyld将image镜像文件加载进内存的时候会触发该函数
init（外层传入的load_images），dyld初始化image镜像文件的时候会触发该函数
unmapped（外层传入的unmap_image），dyld将image镜像文件移除的时候会触发该函数

这里在梳理一下dyld的关键流程：

在recursiveInitialization方法中调用bool hasInitializers = this->doInitialization(context);这个方法是来判断image是否已加载
doInitialization这个方法会调用doModInitFunctions(context)这个方法就会进入libSystem框架里调用libSystem_initializer方法，最后就会调用_objc_init方法
_objc_init会调用_dyld_objc_notify_register将map_images、load_images、unmap_image传入dyld方法registerObjCNotifiers。
在registerObjCNotifiers方法中，我们把_dyld_objc_notify_register传入的map_images赋值给sNotifyObjCMapped，将load_images赋值给sNotifyObjCInit，将unmap_image赋值给sNotifyObjCUnmapped。
在registerObjCNotifiers方法中，我们将传参复制后就开始调用notifyBatchPartial()。
notifyBatchPartial方法中会调用(*sNotifyObjCMapped)(objcImageCount, paths, mhs)；触发map_images方法。
dyld的recursiveInitialization方法在调用完bool hasInitializers = this->doInitialization(context)方法后，会调用notifySingle()方法
在notifySingle()中会调用(*sNotifyObjCInit)(image->getRealPath(),image->machHeader();上面我们将load_images赋值给了sNotifyObjCInit，所以此时就会触发load_images方法。
sNotifyObjCUnmapped会在removeImage方法里触发，字面理解就是删除Image（映射的镜像文件）。

二、map_images

先来看下map_images函数的实现：

代码语言：javascript复制

void
map_images(unsigned count, const char * const paths[],
           const struct mach_header * const mhdrs[])
{
    mutex_locker_t lock(runtimeLock);
    return map_images_nolock(count, paths, mhdrs);
}

里面调用了map_images_nolock函数：

关键代码就是_read_images函数，看这个函数名就知道，这里面做的事情就是读取镜像文件，这个函数很重要，_read_images函数名要记住哦~所以我们继续研究_read_images的源码。

由于_read_images函数的源码有400多行，因此我就不在这里全量列出了，大家可以去下载objc源码然后搜索【_read_images】即可找到。针对比较重要的代码，我接下来会提出来，单独做分析。

<一> 初始化缓存表

doneOnce变量保证了这里初始化缓存表的操作只会走一次。

我们会将镜像文件给读取出来存储到内存中，比如镜像文件中的类、协议、方法、分类等，这些信息会存储到表结构中。

这里使用NXCreateMapTable来创建缓存表，表中主要存储的就是上面那些从mach-o镜像文件中读取出来的各种结构。

我们发现，这里其实是创建了两张表分别是gdb_objc_realized_classes和allocatedClasses我现在来比较一下它们俩：

代码语言：javascript复制

// This is a misnomer: gdb_objc_realized_classes is actually a list of 
// named classes not in the dyld shared cache, whether realized or not.
NXMapTable *gdb_objc_realized_classes;  // exported for debuggers in objc-gdb.h

gdb_objc_realized_classes表里面存储的是，所有不是在共享缓存中的类（不管有没有分配内存，即不管有没有创建实例）

代码语言：javascript复制

/***********************************************************************
* allocatedClasses
* A table of all classes (and metaclasses) which have been allocated
* with objc_allocateClassPair.
**********************************************************************/
static NXHashTable *allocatedClasses = nil;

allocatedClasses表里面存储的是，所有被分配了内存的类（即所有创建了实例对象的类）。

<二>_read_images函数的宏观结构

前面第一步我们知道了，系统会在第一次进来的时候创建缓存表，那么接下来会做什么事情呢？由于源代码太过于冗长，我这里写了一个简化版，如下：

它里面做的事情主要就是下面这些：

<三> 读取类

上面?第一步已经创建好了缓存表，接下来就该往缓存表里面插入内容了。

第一个插入的就是类：

首先，会通过_getObjc2ClassList函数获取到类列表。

然后遍历类列表，在每一次遍历里面，都是先通过下标获取到类的地址，然后再去通过地址读取类的相关信息。

我们通过readClass函数来读取类的相关信息，如下：

我们可以看到，在readClass函数内部最重要的一个操作就是将类插入缓存表。

读取完了类之后，会进行类的重映射，这个重映射的判断一般不会走进来，我们这里不做深究：

<四>修正SEL

这里是将在Mach-O静态段中读取到的所有的SEL都注册到另外一张哈希表中，这个点我们后面会做详细研究，这里了解即可：

我们写代码的时候，调用一个方法，如下：

代码语言：javascript复制

[norman play];

这里的play实际上就是一个字符串，编译之后通过这里说的这张哈希表就可以将play这个字符串与其所对应的SEL关联起来。

<五> 读取协议

<六> 实现（即初始化）非懒加载的类

首先会通过_getObjc2NonlazyClassList函数来读取到Mach-O二进制镜像文件中的对应静态段中的非懒加载类列表，使用classref_t指针来接收。

然后遍历非懒加载类列表，通过realizeClassWithoutSwift函数来对列表中的每一个类进行实现。

所以说，类的实现，其重点就是realizeClassWithoutSwift函数。下面我们就来研究一下该函数，先来看下源码：

代码语言：javascript复制

/***********************************************************************
* realizeClassWithoutSwift
* Performs first-time initialization on class cls, 
* including allocating its read-write data.
* Does not perform any Swift-side initialization.
* Returns the real class structure for the class. 
* Locking: runtimeLock must be write-locked by the caller
**********************************************************************/
static Class realizeClassWithoutSwift(Class cls)
{
    runtimeLock.assertLocked();

    const class_ro_t *ro;
    class_rw_t *rw;
    Class supercls;
    Class metacls;
    bool isMeta;

    if (!cls) return nil;
    if (cls->isRealized()) return cls;
    assert(cls == remapClass(cls));

    // fixme verify class is not in an un-dlopened part of the shared cache?

    ro = (const class_ro_t *)cls->data();
    if (ro->flags & RO_FUTURE) {
        // This was a future class. rw data is already allocated.
        rw = cls->data();
        ro = cls->data()->ro;
        cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
    } else {
        // Normal class. Allocate writeable class data.
        // 这里是给rw开辟内存初始化，其内部数据除了ro和flags之外，其他的数据均尚未填充
        rw = (class_rw_t *)calloc(sizeof(class_rw_t), 1);
        rw->ro = ro;
        rw->flags = RW_REALIZED|RW_REALIZING;
        cls->setData(rw);
    }

    isMeta = ro->flags & RO_META;

    rw->version = isMeta ? 7 : 0;  // old runtime went up to 6


    // Choose an index for this class.
    // Sets cls->instancesRequireRawIsa if indexes no more indexes are available
    cls->chooseClassArrayIndex();

    if (PrintConnecting) {
        _objc_inform("CLASS: realizing class '%s'%s %p %p #%u %s%s",
                     cls->nameForLogging(), isMeta ? " (meta)" : "", 
                     (void*)cls, ro, cls->classArrayIndex(),
                     cls->isSwiftStable() ? "(swift)" : "",
                     cls->isSwiftLegacy() ? "(pre-stable swift)" : "");
    }

    // 通过递归来实现父类和元类
    // Realize superclass and metaclass, if they aren't already.
    // This needs to be done after RW_REALIZED is set above, for root classes.
    // This needs to be done after class index is chosen, for root metaclasses.
    // This assumes that none of those classes have Swift contents,
    //   or that Swift's initializers have already been called.
    //   fixme that assumption will be wrong if we add support
    //   for ObjC subclasses of Swift classes.
    supercls = realizeClassWithoutSwift(remapClass(cls->superclass));
    metacls = realizeClassWithoutSwift(remapClass(cls->ISA()));

#if SUPPORT_NONPOINTER_ISA
    // Disable non-pointer isa for some classes and/or platforms.
    // Set instancesRequireRawIsa.
    bool instancesRequireRawIsa = cls->instancesRequireRawIsa();
    bool rawIsaIsInherited = false;
    static bool hackedDispatch = false;

    if (DisableNonpointerIsa) {
        // Non-pointer isa disabled by environment or app SDK version
        instancesRequireRawIsa = true;
    }
    else if (!hackedDispatch  &&  !(ro->flags & RO_META)  &&  
             0 == strcmp(ro->name, "OS_object")) 
    {
        // hack for libdispatch et al - isa also acts as vtable pointer
        hackedDispatch = true;
        instancesRequireRawIsa = true;
    }
    else if (supercls  &&  supercls->superclass  &&  
             supercls->instancesRequireRawIsa()) 
    {
        // This is also propagated by addSubclass() 
        // but nonpointer isa setup needs it earlier.
        // Special case: instancesRequireRawIsa does not propagate 
        // from root class to root metaclass
        instancesRequireRawIsa = true;
        rawIsaIsInherited = true;
    }
    
    if (instancesRequireRawIsa) {
        cls->setInstancesRequireRawIsa(rawIsaIsInherited);
    }
// SUPPORT_NONPOINTER_ISA
#endif

    // 重新映射父类和元类的归属关系
    // Update superclass and metaclass in case of remapping
    cls->superclass = supercls;
    cls->initClassIsa(metacls);

    // Reconcile instance variable offsets / layout.
    // This may reallocate class_ro_t, updating our ro variable.
    if (supercls  &&  !isMeta) reconcileInstanceVariables(cls, supercls, ro);

    // Set fastInstanceSize if it wasn't set already.
    cls->setInstanceSize(ro->instanceSize);

    // Copy some flags from ro to rw
    if (ro->flags & RO_HAS_CXX_STRUCTORS) {
        cls->setHasCxxDtor();
        if (! (ro->flags & RO_HAS_CXX_DTOR_ONLY)) {
            cls->setHasCxxCtor();
        }
    }
    
    // Propagate the associated objects forbidden flag from ro or from
    // the superclass.
    if ((ro->flags & RO_FORBIDS_ASSOCIATED_OBJECTS) ||
        (supercls && supercls->forbidsAssociatedObjects()))
    {
        rw->flags |= RW_FORBIDS_ASSOCIATED_OBJECTS;
    }

    // Connect this class to its superclass's subclass lists
    if (supercls) {
        addSubclass(supercls, cls);
    } else {
        addRootClass(cls);
    }

    // Attach categories
    methodizeClass(cls);

    return cls;
}

上面realizeClassWithoutSwift函数的源码中：

第34~37行是给一个正常类的rw开辟内存，并且会初始化rw中的ro和flags这两个字段。注意哦，除了ro和flags，rw中其他的字段数据暂时还没有初始化哦~比如rw中的properties、protocols等均未初始化。

第65~66行，这两行代码都递归调用了realizeClassWithoutSwift函数，这是在实现父类和元类。

第19~21行是递归函数调用的出口（即结束条件）。

第105~106行，是重新映射父类和元类的归属关系。为什么需要重新映射呢？你想想，我现在在初始化当前类，现在通过递归函数初始化了当前类的父类，这个父类也是有superclass指针的，那么这个指针的归属我是不是需要重新设置一下？这么一分析，是不是豁然开朗？~

第132~136行，是将当前类连接到其超类的subclass列表中。

第139行，调用了methodizeClass函数，这个函数也是非常重要的。这个函数是做什么的呢？这个函数中就开始真正对rw进行处理了。

我们点进去看methodizeClass函数源码：

通过源码我们可以知道，methodizeClass函数中做的事情就是将在Mach-O内存段中读取到的ro的内容复制一份到rw中。

此时你可能会有一个疑问，为什么我们直接使用ro不就可以了吗？为什么还需要一个rw？为什么还需要多一个复制的操作？

接下来我们全局搜一下执行复制操作的attachLists函数：

我们发现，除了在methodizeClass函数中调用了attachLists函数，在其他的地方也有调用，这说明，在其他的地方也往rw中添加了内容，那么什么时候会往rw中添加内容呢？除了刚才所说的类的初始化，还有协议Protocol、分类Category、运行时动态添加等。也就是说，ro（readOnly）是读取的Mach-o内存段中的最原始数据，它是干净的，要保证其不备污染，所以它是只读的；而rw（readWrite）是在ro的基础上，还会有其他的内容动态增加。

你可能还会有一个疑问，为什么methods、properties、protocols的数据结构是一样的吗？不然的话为什么他们三个都是通过attachLists函数进行内容增加的呢？不同的数据结构，其处理方式应该不一样才对啊。

只能有一个原因，method_list_t、property_list_t和protocol_list_t它们虽然名称不一样，但是其底层的数据结构是一样的，都是一个二维数组。

往rw中添加内容是通过attachLists函数，我们看其实现：

代码语言：javascript复制

    void attachLists(List* const * addedLists, uint32_t addedCount) {
        if (addedCount == 0) return;

        if (hasArray()) {
            // 之前有多个，现在新增多个
            uint32_t oldCount = array()->count;
            uint32_t newCount = oldCount   addedCount;
            setArray((array_t *)realloc(array(), array_t::byteSize(newCount)));
            array()->count = newCount;
            memmove(array()->lists   addedCount, array()->lists, 
                    oldCount * sizeof(array()->lists[0]));
            memcpy(array()->lists, addedLists, 
                   addedCount * sizeof(array()->lists[0]));
        }
        else if (!list  &&  addedCount == 1) {
            // 之前没有，现在新增1个
            list = addedLists[0];
        } 
        else {
            // 之前有1个，现在新增多个
            List* oldList = list;
            uint32_t oldCount = oldList ? 1 : 0;
            uint32_t newCount = oldCount   addedCount;
            setArray((array_t *)malloc(array_t::byteSize(newCount)));
            array()->count = newCount;
            if (oldList) array()->lists[addedCount] = oldList;
            memcpy(array()->lists, addedLists, 
                   addedCount * sizeof(array()->lists[0]));
        }
    }

attachLists里面的处理分下面三种情形：

之前就有多个，现在又要新增多个，此时首先会扩容（第6~9行），然后将原来旧的移动（memmove）到扩容后的内存空间的末尾（第10~11行），将新增加的内容拷贝（memcpy）到最前面（第12~13行）。
之前没有，现在新增1个，就直接放到最前面就行了（第17行）
之前有1个，现在新增多个，第21~28行。

以上。

objective-c 缓存 java

0 人点赞