golang源码分析:boltdb(3)

2023-09-06 19:29:54 浏览数 (1)

可用的空闲页号信息存储在freelist中,具体位于freelist.go文件中,定义如下:

代码语言:javascript复制
// freelist represents a list of all pages that are available for allocation.
// It also tracks pages that have been freed but are still in use by open transactions.
type freelist struct {
  ids     []pgid          // all free and available free page ids.
  pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
  cache   map[pgid]bool   // fast lookup of all free and pending page ids.
}

它的三个属性分别表示所有空闲可用的页,刚刚事务中被释放的可用页,但是事务还没提交,可能还被读事务在用的空闲页,以及每个页id到是否空闲的映射关系。

代码语言:javascript复制
// newFreelist returns an empty, initialized freelist.
func newFreelist() *freelist {
  return &freelist{
    pending: make(map[txid][]pgid),
    cache:   make(map[pgid]bool),
  }
}

加载一个磁盘页后会进行reindex操作,过程中会对空闲页id进行排序,并建立是否可用的映射关系

代码语言:javascript复制
    func (f *freelist) read(p *page) {
            ids := ((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx:count]
    f.ids = make([]pgid, len(ids))
    copy(f.ids, ids)


    // Make sure they're sorted.
    sort.Sort(pgids(f.ids))
        f.reindex()
代码语言:javascript复制
// reindex rebuilds the free cache based on available and pending free lists.
func (f *freelist) reindex() {
  f.cache = make(map[pgid]bool, len(f.ids))
  for _, id := range f.ids {
    f.cache[id] = true
  }
  for _, pendingIDs := range f.pending {
    for _, pendingID := range pendingIDs {
      f.cache[pendingID] = true
    }
  }
}

bolt_amd64.go中可以看到最大可以分配空间是2G

代码语言:javascript复制
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

逻辑视图上是通过Bucket管理数据的关联关系的,关系是一个树形结构,具体的定义和数据解析位于bucket.go

代码语言:javascript复制
func (b *Bucket) dereference() {
  if b.rootNode != nil {
    b.rootNode.root().dereference()
  }


  for _, child := range b.buckets {
    child.dereference()
  }
}
代码语言:javascript复制
// Bucket represents a collection of key/value pairs inside the database.
type Bucket struct {
  *bucket
  tx       *Tx                // the associated transaction
  buckets  map[string]*Bucket // subbucket cache
  page     *page              // inline page reference
  rootNode *node              // materialized node for the root page.
  nodes    map[pgid]*node     // node cache


  // Sets the threshold for filling nodes when they split. By default,
  // the bucket will fill to 50% but it can be useful to increase this
  // amount if you know that your write workloads are mostly append-only.
  //
  // This is non-persisted across transactions so it must be set in every Tx.
  FillPercent float64
}

每一个Bucket在物理存储上都是一个B 树,它包含了root节点 pageid和全局递增的sequence, 意味着每一个Bucket的都不一样。Bucket上存了关联事务的指针,用map存储了key到子Bucket的映射关联。满足内联的条件有两个

  • 该子Bucket再没有嵌套的子Bucket了。
  • 整个子Bucket的大小不能超过page size/4。

如果上述两个条件都符合,就不会为这个Bucket单独申请一个页,而是通过内联的方式,将多个Bucket存储到一个页里面,以节省空间。所以Bucket里面缓存了内联页的指针。

它同时也存储了根页面的指针,以及pageid到node页的映射缓存。

代码语言:javascript复制
// bucket represents the on-file representation of a bucket.
// This is stored as the "value" of a bucket key. If the bucket is small enough,
// then its root page can be stored inline in the "value", after the bucket
// header. In the case of inline buckets, the "root" will be 0.
type bucket struct {
  root     pgid   // page id of the bucket's root-level page
  sequence uint64 // monotonically incrementing, used by NextSequence()
}

其中node代表了一个具体的磁盘页,它的定义位于node.go

代码语言:javascript复制
// node represents an in-memory, deserialized page.

type node struct {

  bucket     *Bucket

  isLeaf     bool

  unbalanced bool

  spilled    bool

  key        []byte

  pgid       pgid

  parent     *node

  children   nodes

  inodes     inodes

}

其中key []byte和bucket *Bucket,分别代表存储的key和对应Bucket,parent指向了父亲页,children指向了孩子页列表,共同构成了一个B 树,inodes指向了页内存储的数据列表。每一个项都可以表示为

代码语言:javascript复制
type inode struct {
  flags uint32
  pgid  pgid
  key   []byte
  value []byte
}
        / inode represents an internal node inside of a node.
// It can be used to point to elements in a page or point
// to an element which hasn't been added to a page yet.
代码语言:javascript复制
type inodes []inode

key和value分别代表了页中存储的键值对信息。node的解析过程如下:

代码语言:javascript复制
// dereference causes the node to copy all its inode key/value references to heap memory.
// This is required when the mmap is reallocated so inodes are not pointing to stale data.
func (n *node) dereference() {
  if n.key != nil {
    key := make([]byte, len(n.key))
    copy(key, n.key)
    n.key = key
    _assert(n.pgid == 0 || len(n.key) > 0, "dereference: zero-length node key on existing node")
  }
          for i := range n.inodes {
    inode := &n.inodes[i]


    key := make([]byte, len(inode.key))
    copy(key, inode.key)
    inode.key = key
    _assert(len(inode.key) > 0, "dereference: zero-length inode key")


    value := make([]byte, len(inode.value))
    copy(value, inode.value)
    inode.value = value
  }
          for _, child := range n.children {
    child.dereference()
  }


  // Update statistics.
  n.bucket.tx.stats.NodeDeref  

整个过程是一个递归的过程,获取祖先节点的过程也是一样的:

代码语言:javascript复制
func (n *node) root() *node {
  if n.parent == nil {
    return n
  }
  return n.parent.root()
}

0 人点赞