golang源码分析:go-json(1)

2023-09-06 19:19:15 浏览数 (1)

https://github.com/goccy/go-json起步比较晚,但是它大量参考了json-iterator/go的思路,同时也进行来一系列优化。它具体做了哪些优化呢,首先看下序列化:

1,缓冲区重复使用,通过使用sync.Pool缓存小对象,使得整个运行过程中,json.Marshal(interface{}) ([]byte, error),函数只分配了一个[]byte

2,消除反射:原理是利用interface的地址是固定的,我们可以利用这个地址来做一些编译前的优化:因为一个interface包含两个指针,一个指向类型,一个指向数据,可以定义下面包含两个指针的的结构体来代表一个interface

代码语言:javascript复制
type emptyInterface struct {
    typ unsafe.Pointer
    ptr unsafe.Pointer
}

然后,我们就可以建立类型到编码函数的映射关系,其中type就是通过上述反射类型的第一个field获得的。

代码语言:javascript复制
var typeToEncoder = map[uintptr]func(unsafe.Pointer)([]byte, error){}
代码语言:javascript复制
func Marshal(v interface{}) ([]byte, error) {
    iface := (*emptyInterface)(unsafe.Pointer(&v)
    typeptr := uintptr(iface.typ)
    if enc, exists := typeToEncoder[typeptr]; exists {
        return enc(iface.ptr)
    }
    ...
}

定位类型部分就是利用了我们定义的emptyInterface类型做的地址转换。typeToEncoder可以被多个goroutine复用。通过这个地址转换我们我们可以从地址得到类型信息,避免了使用反射。

3,避免逃逸:正常情况下,我们需要利用反射包来动态获取类型,但是

reflect.Type的类型是一个interface,当调用方法reflect.TypeOf获取类型的时候,反射的参数就会逃逸。

代码语言:javascript复制
type Type interface {
  // Methods applicable to all types.

  // Align returns the alignment in bytes of a value of
  // this type when allocated in memory.
  Align() int

  // FieldAlign returns the alignment in bytes of a value of
  // this type when used as a field in a struct.
  FieldAlign() int

  // Method returns the i'th method in the type's method set.
  // It panics if i is not in the range [0, NumMethod()).
  //
  // For a non-interface type T or *T, the returned Method's Type and Func
  // fields describe a function whose first argument is the receiver,
  // and only exported methods are accessible.
  //
  // For an interface type, the returned Method's Type field gives the
  // method signature, without a receiver, and the Func field is nil.
  //
  // Methods are sorted in lexicographic order.
  Method(int) Method

  // MethodByName returns the method with that name in the type's
  // method set and a boolean indicating if the method was found.
  //
  // For a non-interface type T or *T, the returned Method's Type and Func
  // fields describe a function whose first argument is the receiver.
  //
  // For an interface type, the returned Method's Type field gives the
  // method signature, without a receiver, and the Func field is nil.
  MethodByName(string) (Method, bool)

  // NumMethod returns the number of methods accessible using Method.
  //
  // For a non-interface type, it returns the number of exported methods.
  //
  // For an interface type, it returns the number of exported and unexported methods.
  NumMethod() int

  // Name returns the type's name within its package for a defined type.
  // For other (non-defined) types it returns the empty string.
  Name() string

  // PkgPath returns a defined type's package path, that is, the import path
  // that uniquely identifies the package, such as "encoding/base64".
  // If the type was predeclared (string, error) or not defined (*T, struct{},
  // []int, or A where A is an alias for a non-defined type), the package path
  // will be the empty string.
  PkgPath() string

  // Size returns the number of bytes needed to store
  // a value of the given type; it is analogous to unsafe.Sizeof.
  Size() uintptr

  // String returns a string representation of the type.
  // The string representation may use shortened package names
  // (e.g., base64 instead of "encoding/base64") and is not
  // guaranteed to be unique among types. To test for type identity,
  // compare the Types directly.
  String() string

  // Kind returns the specific kind of this type.
  Kind() Kind

  // Implements reports whether the type implements the interface type u.
  Implements(u Type) bool

  // AssignableTo reports whether a value of the type is assignable to type u.
  AssignableTo(u Type) bool

  // ConvertibleTo reports whether a value of the type is convertible to type u.
  // Even if ConvertibleTo returns true, the conversion may still panic.
  // For example, a slice of type []T is convertible to *[N]T,
  // but the conversion will panic if its length is less than N.
  ConvertibleTo(u Type) bool

  // Comparable reports whether values of this type are comparable.
  // Even if Comparable returns true, the comparison may still panic.
  // For example, values of interface type are comparable,
  // but the comparison will panic if their dynamic type is not comparable.
  Comparable() bool

  // Methods applicable only to some types, depending on Kind.
  // The methods allowed for each kind are:
  //
  //  Int*, Uint*, Float*, Complex*: Bits
  //  Array: Elem, Len
  //  Chan: ChanDir, Elem
  //  Func: In, NumIn, Out, NumOut, IsVariadic.
  //  Map: Key, Elem
  //  Pointer: Elem
  //  Slice: Elem
  //  Struct: Field, FieldByIndex, FieldByName, FieldByNameFunc, NumField

  // Bits returns the size of the type in bits.
  // It panics if the type's Kind is not one of the
  // sized or unsized Int, Uint, Float, or Complex kinds.
  Bits() int

  // ChanDir returns a channel type's direction.
  // It panics if the type's Kind is not Chan.
  ChanDir() ChanDir

  // IsVariadic reports whether a function type's final input parameter
  // is a "..." parameter. If so, t.In(t.NumIn() - 1) returns the parameter's
  // implicit actual type []T.
  //
  // For concreteness, if t represents func(x int, y ... float64), then
  //
  //  t.NumIn() == 2
  //  t.In(0) is the reflect.Type for "int"
  //  t.In(1) is the reflect.Type for "[]float64"
  //  t.IsVariadic() == true
  //
  // IsVariadic panics if the type's Kind is not Func.
  IsVariadic() bool

  // Elem returns a type's element type.
  // It panics if the type's Kind is not Array, Chan, Map, Pointer, or Slice.
  Elem() Type

  // Field returns a struct type's i'th field.
  // It panics if the type's Kind is not Struct.
  // It panics if i is not in the range [0, NumField()).
  Field(i int) StructField

  // FieldByIndex returns the nested field corresponding
  // to the index sequence. It is equivalent to calling Field
  // successively for each index i.
  // It panics if the type's Kind is not Struct.
  FieldByIndex(index []int) StructField

  // FieldByName returns the struct field with the given name
  // and a boolean indicating if the field was found.
  FieldByName(name string) (StructField, bool)

  // FieldByNameFunc returns the struct field with a name
  // that satisfies the match function and a boolean indicating if
  // the field was found.
  //
  // FieldByNameFunc considers the fields in the struct itself
  // and then the fields in any embedded structs, in breadth first order,
  // stopping at the shallowest nesting depth containing one or more
  // fields satisfying the match function. If multiple fields at that depth
  // satisfy the match function, they cancel each other
  // and FieldByNameFunc returns no match.
  // This behavior mirrors Go's handling of name lookup in
  // structs containing embedded fields.
  FieldByNameFunc(match func(string) bool) (StructField, bool)

  // In returns the type of a function type's i'th input parameter.
  // It panics if the type's Kind is not Func.
  // It panics if i is not in the range [0, NumIn()).
  In(i int) Type

  // Key returns a map type's key type.
  // It panics if the type's Kind is not Map.
  Key() Type

  // Len returns an array type's length.
  // It panics if the type's Kind is not Array.
  Len() int

  // NumField returns a struct type's field count.
  // It panics if the type's Kind is not Struct.
  NumField() int

  // NumIn returns a function type's input parameter count.
  // It panics if the type's Kind is not Func.
  NumIn() int

  // NumOut returns a function type's output parameter count.
  // It panics if the type's Kind is not Func.
  NumOut() int

  // Out returns the type of a function type's i'th output parameter.
  // It panics if the type's Kind is not Func.
  // It panics if i is not in the range [0, NumOut()).
  Out(i int) Type

  common() *rtype
  uncommon() *uncommonType
}
代码语言:javascript复制
func TypeOf(i any) Type {
  eface := *(*emptyInterface)(unsafe.Pointer(&i))
  return toType(eface.typ)
}

因此Marshal and Unmarshal的参数总是逃逸到堆区,go-json正是利用reflect.Type的特点来避免逃逸到堆区。reflect.Type被定义成了一个接口。但是现实中仅仅被rtype这个struct实现了,正是这个原因,reflect.Type和*reflect.rtype指向的数据是同一个地址。

代码语言:javascript复制
// rtype is the common implementation of most values.
// It is embedded in other struct types.
//
// rtype must be kept in sync with ../runtime/type.go:/^type._type.
type rtype struct {
  size       uintptr
  ptrdata    uintptr // number of bytes in the type that can contain pointers
  hash       uint32  // hash of type; avoids computation in hash tables
  tflag      tflag   // extra type information flags
  align      uint8   // alignment of variable with this type
  fieldAlign uint8   // alignment of struct field with this type
  kind       uint8   // enumeration for C
  // function for comparing objects of this type
  // (ptr to object A, ptr to object B) -> ==?
  equal     func(unsafe.Pointer, unsafe.Pointer) bool
  gcdata    *byte   // garbage collection data
  str       nameOff // string form
  ptrToThis typeOff // type for pointer to this type, may be zero
}
代码语言:javascript复制
type nameOff int32 // offset to a name
type typeOff int32 // offset to an *rtype
type textOff int32 // offset from top of text section

所以通过直接使用*reflect.rtype,替代reflect.Type,可以避免逃逸,因为使用了结构体代替了接口。

但是,当marshal的数据比较大的时候,如果参数不能赋值给栈,还是有可能逃逸到堆的。如果想使用这个特性加上NoEscape,比如MarshalNoEscape()。

4,通过opcode序列编码:使用typeptr来调用提前分配好的方法。在其它包里,这个过程是一个方法调用,比如匿名函数调用。但是方法调用天生速度慢应该被避免。go-json采用了基于指令(Instruction-based)的执行处理系统,这种系统也被用于处理编程语言的虚拟机。如果是第一次处理,创建编码、解码对应的opcode指令序列,第二次开始,使用缓存好的,typeptr指向的opcode序列。比如编码一个结构体,他的opcode序列如下:

代码语言:javascript复制
- opStructFieldHead ( `{` )
- opStructFieldInt ( `"x": 1,` )
- opStructFieldString ( `"y": "hello"` )
- opStructEnd ( `}` )
- opEnd

opcode直接和内连的指令序列对应,避免了函数调用,所以性能得到很大提升。于是编码函数就变成了下面的样子:

代码语言:javascript复制
func encode(code *opcode, b []byte, p unsafe.Pointer) ([]byte, error) {
    for {
        switch code.op {
        case opStructFieldHead:
            b = append(b, '{')
            code = code.next
        case opStructFieldInt:
            b = append(b, code.key...)
            b = appendInt((*int)(unsafe.Pointer(uintptr(p) code.offset)))
            code = code.next

5,opcode序列优化:使用opcode和好处之一就是可以使用opcode消除技术,减少opcode,比如上面的opcode序列可以简化为:

代码语言:javascript复制
- opStructFieldHeadInt ( `{"x": 1,` )
- opStructEndString ( `"y": "hello"}` )
- opEnd

减少opcode数量,意味着减少switch case的分支数量。换句话说opcode的数量越接近1,处理的速度越快。

6,把递归调用的指令从CALL 改成JMP:在go-json包中递归处理的过程被opStructFieldRecursive操作执行,当获取了用于递归执行的opcode序列后。函数并不会被递归调用,仅仅是把一些必要的值信息存储在自己身上,并且通过移动到下一个应该执行的指令的方式来实现函数调用。

通过JMP操作实现了递归调用来避免CALL指令是一个非常有名的技术来实现高速虚拟机(high-speed virtual machine)。

7,将typeptr到具体操作的分发,由map类型改为slice:通过typeptr类型信息获取缓存的操作,我们一般使用map。但是map不支持并发,所以采用

sync.Map来替代。但是sync.Map还是比较慢的,最好使用atomic包来做这个存储,比如segmentio/encoding/json包就是这么干的。通过pprof发现,runtime.mapaccess2是一个热点操作,所以觉得用slice来替代map。提前获取类型的个数,并建好slice。就可以实现通过typeptr来查找,不必担心越界。

0 人点赞