mold源码阅读九 未解析符号的处理

2023-10-22 16:29:51 浏览数 (2)

本期内容主要是claim_unresolved_symbols的部分,其次是其他一些简单的处理

claim_unresolved_symbols

代码语言:javascript复制
// If we are linking a .so file, remaining undefined symbols does
// not cause a linker error. Instead, they are treated as if they
// were imported symbols.
//
// If we are linking an executable, weak undefs are converted to
// weakly imported symbols so that they'll have another chance to be
// resolved.
claim_unresolved_symbols(ctx);
代码语言:javascript复制
template <typename E>
void claim_unresolved_symbols(Context<E> &ctx) {
  Timer t(ctx, "claim_unresolved_symbols");
  tbb::parallel_for_each(ctx.objs, [&](ObjectFile<E> *file) {
    file->claim_unresolved_symbols(ctx);
  });
}

这个函数主要还是针对需要在链接期就确定定义的符号进行检查,针对部分符号产生一些修改,在这个过程之后,不会再有符号发生新的变动了

对so来说undef是可以存在的,因此将避免报错,将undef的符号转换为imported,并且修改相关信息。

但是如果是protected或者hidden的符号即便链接了运行时也无法访问到,此时即便是undef也无法再在运行时找到定义,因此需要在链接时确定定义。也正因为这些条件,这里只需要对global符号做检查即可。

以下是具体处理过程

代码语言:javascript复制
template <typename E>
void ObjectFile<E>::claim_unresolved_symbols(Context<E> &ctx) {
  if (!this->is_alive)
    return;

	...
  for (i64 i = this->first_global; i < this->elf_syms.size(); i  ) {
    const ElfSym<E> &esym = this->elf_syms[i];
    Symbol<E> &sym = *this->symbols[i];
    if (!esym.is_undef())
      continue;

    std::scoped_lock lock(sym.mu);

    // If a protected/hidden undefined symbol is resolved to an
    // imported symbol, it's handled as if no symbols were found.
    if (sym.file && sym.file->is_dso &&
        (sym.visibility == STV_PROTECTED || sym.visibility == STV_HIDDEN)) {
      report_undef(sym);
      continue;
    }

    if (sym.file &&
        (!sym.esym().is_undef() || sym.file->priority <= this->priority))
      continue;

    // If a symbol name is in the form of "foo@version", search for
    // symbol "foo" and check if the symbol has version "version".
    std::string_view key = this->symbol_strtab.data()   esym.st_name;
    if (i64 pos = key.find('@'); pos != key.npos) {
      Symbol<E> *sym2 = get_symbol(ctx, key.substr(0, pos));
      if (sym2->file && sym2->file->is_dso &&
          sym2->get_version() == key.substr(pos   1)) {
        this->symbols[i] = sym2;
        continue;
      }
    }
		
		...
    if (esym.is_undef_weak()) {
      if (ctx.arg.shared && sym.visibility != STV_HIDDEN &&
          ctx.arg.z_dynamic_undefined_weak) {
        // Global weak undefined symbols are promoted to dynamic symbols
        // when when linking a DSO, unless `-z nodynamic_undefined_weak`
        // was given.
        claim(true);
      } else {
        // Otherwise, weak undefs are converted to absolute symbols with value 0.
        claim(false);
      }
      continue;
    }

    if (ctx.arg.unresolved_symbols == UNRESOLVED_WARN)
      report_undef(sym);

    // Traditionally, remaining undefined symbols cause a link failure
    // only when we are creating an executable. Undefined symbols in
    // shared objects are promoted to dynamic symbols, so that they'll
    // get another chance to be resolved at run-time. You can change the
    // behavior by passing `-z defs` to the linker.
    //
    // Even if `-z defs` is given, weak undefined symbols are still
    // promoted to dynamic symbols for compatibility with other linkers.
    // Some major programs, notably Firefox, depend on the behavior
    // (they use this loophole to export symbols from libxul.so).
    if (ctx.arg.shared && sym.visibility != STV_HIDDEN &&
        (!ctx.arg.z_defs || ctx.arg.unresolved_symbols != UNRESOLVED_ERROR)) {
      claim(true);
      continue;
    }

    // Convert remaining undefined symbols to absolute symbols with value 0.
    if (ctx.arg.unresolved_symbols != UNRESOLVED_ERROR || ctx.arg.noinhibit_exec)
      claim(false);
  }
}

如同上面所说,整个过程描述如下

从全局符号开始,先跳过了已经有定义的esym

将protected和hidden的符号进行报错

对esym对应位置的sym进行判断,如果sym所对应的esym是有定义的也跳过。

这种情况是esym实际的定义在其他位置,sym是esym resolve的结果

解析符号名,如果带有版本信息则再次尝试进行重新将esym和sym进行关联。这个关联体现在esym对应index的symbols重新设置值

代码语言:javascript复制
if (sym2->file && sym2->file->is_dso &&
    sym2->get_version() == key.substr(pos   1)) {
  this->symbols[i] = sym2;
  continue;
}

针对undef_weak进行claim

剩下的undef的符号在创建executable的时候导致链接失败,但在dso中会被提升为dynamic symbols

claim和report_undef的实现

代码语言:javascript复制
auto report_undef = [&](Symbol<E> &sym) {
  std::stringstream ss;
  if (std::string_view source = this->get_source_name(); !source.empty())
    ss << ">>> referenced by " << source << "n";
  else
    ss << ">>> referenced by " << *this << "n";

  typename decltype(ctx.undef_errors)::accessor acc;
  ctx.undef_errors.insert(acc, {sym.name(), {}});
  acc->second.push_back(ss.str());
};

// tbb::concurrent_hash_map<std::string_view, std::vector<std::string>> undef_errors;
代码语言:javascript复制
auto claim = [&](bool is_imported) {
  if (sym.traced)
    SyncOut(ctx) << "trace-symbol: " << *this << ": unresolved"
                 << (esym.is_weak() ? " weak" : "")
                 << " symbol " << sym;

  sym.file = this;
  sym.origin = 0;
  sym.value = 0;
  sym.sym_idx = i;
  sym.is_weak = false;
  sym.is_imported = is_imported;
  sym.is_exported = false;
  sym.ver_idx = is_imported ? 0 : ctx.default_version;
};

print dependencies

代码语言:javascript复制
// Handle --print-dependencies
if (ctx.arg.print_dependencies == 1)
  print_dependencies(ctx);
else if (ctx.arg.print_dependencies == 2)
  print_dependencies_full(ctx);

针对所有的obj和dso打印其依赖,那么具体怎么样才算依赖呢?在一个obj a里面,有一个未定义的符号,链接的时候另一个obj b包含了这个符号的定义,那么这就算是a依赖b。

代码语言:javascript复制
template <typename E>
void print_dependencies(Context<E> &ctx) {
  SyncOut(ctx) <<
R"(# This is an output of the mold linker's --print-dependencies option.
#
# Each line consists of three fields, <file1>, <file2> and <symbol>
# separated by tab characters. It indicates that <file1> depends on
# <file2> to use <symbol>.)";

  auto print = [&](InputFile<E> *file) {
    for (i64 i = file->first_global; i < file->elf_syms.size(); i  ) {
      ElfSym<E> &esym = file->elf_syms[i];
      Symbol<E> &sym = *file->symbols[i];
      if (esym.is_undef() && sym.file && sym.file != file)
        SyncOut(ctx) << *file << "t" << *sym.file << "t" << sym;
    }
  };

  for (InputFile<E> *file : ctx.objs)
    print(file);
  for (InputFile<E> *file : ctx.dsos)
    print(file);
}

这种是最简单的遍历所有文件打印其依赖,包含了obj a,obj b以及对应符号的名字

代码语言:javascript复制
template <typename E>
void print_dependencies_full(Context<E> &ctx) {
  SyncOut(ctx) <<
R"(# This is an output of the mold linker's --print-dependencies=full option.
#
# Each line consists of 4 fields, <section1>, <section2>, <symbol-type> and
# <symbol>, separated by tab characters. It indicates that <section1> depends
# on <section2> to use <symbol>. <symbol-type> is either "u" or "w" for
# regular undefined or weak undefined, respectively.
#
# If you want to obtain dependency information per function granularity,
# compile source files with the -ffunction-sections compiler flag.)";

  auto println = [&](auto &src, Symbol<E> &sym, ElfSym<E> &esym) {
    if (InputSection<E> *isec = sym.get_input_section())
      SyncOut(ctx) << src << "t" << *isec
                   << "t" << (esym.is_weak() ? 'w' : 'u')
                   << "t" << sym;
    else
      SyncOut(ctx) << src << "t" << *sym.file
                   << "t" << (esym.is_weak() ? 'w' : 'u')
                   << "t" << sym;
  };

  for (ObjectFile<E> *file : ctx.objs) {
    for (std::unique_ptr<InputSection<E>> &isec : file->sections) {
      if (!isec)
        continue;

      std::unordered_set<void *> visited;

      for (const ElfRel<E> &r : isec->get_rels(ctx)) {
        if (r.r_type == R_NONE)
          continue;

        ElfSym<E> &esym = file->elf_syms[r.r_sym];
        Symbol<E> &sym = *file->symbols[r.r_sym];

        if (esym.is_undef() && sym.file && sym.file != file &&
            visited.insert((void *)&sym).second)
          println(*isec, sym, esym);
      }
    }
  }

  for (SharedFile<E> *file : ctx.dsos) {
    for (i64 i = file->first_global; i < file->symbols.size(); i  ) {
      ElfSym<E> &esym = file->elf_syms[i];
      Symbol<E> &sym = *file->symbols[i];
      if (esym.is_undef() && sym.file && sym.file != file)
        println(*file, sym, esym);
    }
  }
}

这种更复杂一些,不仅打印依赖,还包含了符号到底是undefined还是weak这一信息。

另外遍历objs的时候还针对每个obj遍历InputSection及其包含的rel,根据这些信息来进行打印。

遍历dsos的判断条件则是和上面最简单的打印是相同的。

write_repro_file

代码语言:javascript复制
// Handle -repro
if (ctx.arg.repro)
  write_repro_file(ctx);

–repro Embed input files to .repro section

repro file是Reproducible Example Routine file的简称,包含最小可复现用例,用于调试。具体写的过程并非这里关注的重点,有兴趣可以自行查看更多细节,这里只简单看一下由哪些部分组成。

代码语言:javascript复制
template <typename E>
void write_repro_file(Context<E> &ctx) {
  std::string path = ctx.arg.output   ".repro.tar";

  std::unique_ptr<TarWriter> tar =
    TarWriter::open(path, filepath(ctx.arg.output).filename().string()   ".repro");
  if (!tar)
    Fatal(ctx) << "cannot open " << path << ": " << errno_string();

  tar->append("response.txt", save_string(ctx, create_response_file(ctx)));
  tar->append("version.txt", save_string(ctx, mold_version   "n"));

  std::unordered_set<std::string> seen;
  for (std::unique_ptr<MappedFile<Context<E>>> &mf : ctx.mf_pool) {
    if (!mf->parent) {
      std::string path = to_abs_path(mf->name).string();
      if (seen.insert(path).second) {
        // We reopen a file because we may have modified the contents of mf
        // in memory, which is mapped with PROT_WRITE and MAP_PRIVATE.
        MappedFile<Context<E>> *mf2 = MappedFile<Context<E>>::must_open(ctx, path);
        tar->append(path, mf2->get_contents());
        mf2->unmap();
      }
    }
  }
}

template <typename E>
static std::string create_response_file(Context<E> &ctx) {
  std::string buf;
  std::stringstream out;

  std::string cwd = std::filesystem::current_path().string();
  out << "-C " << cwd.substr(1) << "n";

  if (cwd != "/") {
    out << "--chroot ..";
    i64 depth = std::count(cwd.begin(), cwd.end(), '/');
    for (i64 i = 1; i < depth; i  )
      out << "/..";
    out << "n";
  }

  for (i64 i = 1; i < ctx.cmdline_args.size(); i  ) {
    std::string_view arg = ctx.cmdline_args[i];
    if (arg != "-repro" && arg != "--repro")
      out << arg << "n";
  }
  return out.str();
}

根据代码我们得知,主要分为三部分

  1. response_file,本质上是编译命令以及参数
  2. mold的version info
  3. 所有的输入文件

也就表示这三者就是确定问题的必要条件,另外还可以认为执行到这里之后符号不会再发生什么改动,也不会产生新的用户引发的问题(比如说少链接文件,或者什么参数错了导致符号决议出问题等)

required-defined

代码语言:javascript复制
// Handle --require-defined
for (std::string_view name : ctx.arg.require_defined)
  if (!get_symbol(ctx, name)->file)
    Error(ctx) << "--require-defined: undefined symbol: " << name;

强制要求某些符号是必须在链接时就包含定义的,对这些符号进行检查并且进行报错。

0 人点赞