golang源码分析:encoding/json(2)

2023-09-06 19:15:25 浏览数 (2)

在golang源码分析:encoding/json(1)分析完序列化方法后,我们来分析下Unmarshal函数,它的源码位于encoding/json/decode.go,同样,我先看下函数的注释:

1,它的第二个参数v是个interface,如果v时nil或者不是指针,返回 Unmarshal returns an InvalidUnmarshalError.

2,过程中会按需创建 maps, slices, and pointers 并分配内存空间。

3,解析json过程中会按照如下规则来解析:

A,,把 null 转成nil指针,否则把json Unmarshal赋值给这个指针,如果指针本身是nil,那么申请一块内存

B,如果类型实现了Unmarshaler 接口,使用Unmarshaler 接口来反序列化,即使输入为null也会调用方法;如果实现了encoding.TextUnmarshaler 接口,并且输入是带引号的字符串,会调用encoding.TextUnmarshaler方法

C,反序列化结构体的时候,需要匹配json的key和结构体的字段名字或者tag

代码语言:javascript复制
// To unmarshal JSON into a struct, Unmarshal matches incoming object
// keys to the keys used by Marshal (either the struct field name or its tag),

首选精确匹配,也考虑大小写不敏感匹配,找不到的字段会被忽略。

代码语言:javascript复制
see Decoder.DisallowUnknownFields for an alternative

D,如果反序列化成接口,会按照下面的规则把值存入接口

代码语言:javascript复制
//  bool, for JSON booleans
//  float64, for JSON numbers
//  string, for JSON strings
//  []interface{}, for JSON arrays
//  map[string]interface{}, for JSON objects
//  nil for JSON null

E,JSON 协议中没有整型和浮点型的区别,它们统称为 number,如果将 JSON 格式的数据反序列化为 map[string]interface{} 时,数字都变成科学计数法表示的浮点数。如果想更合理的处理数字,需要使用decoder去反序列化,使用json.Number类型

F,反序列化数组到slice,会重置slice的长度为0,然后,依次append元素到slice上,特殊情况:空数组会被替换成一个新的空的slice

G,反序列化数组到数组,如果go的数组比json的数组短,超出部分会被丢弃,反之会填充0值

H,反序列化到map,首先会新建立一个map来用,如果是map空的,新建一个,否则就填充这个map,把键值对填入这个map,键的类型必须满足下列条件

代码语言:javascript复制
The map's key type must
// either be any string type, an integer, implement json.Unmarshaler, or
// implement encoding.TextUnmarshaler.

4,如果json不合法,会返回SyntaxError

5,如果值的类型不匹配,或者数字的值溢出了,unmarshal会跳过这些字段,会尽可能完成解析。如果没有更严重的错误,会返回UnmarshalTypeError,不保证把所有剩余的字段解析完毕。

6,null值被解析进interface, map, pointer, or slice 类型的时候是nil

7,解析带引号的字符串的时候,不符合规范的字符不会报错会被替换成U FFFD

代码语言:javascript复制
func Unmarshal(data []byte, v any) error {
  // Check for well-formedness.
  // Avoids filling out half a data structure
  // before discovering a JSON syntax error.
  var d decodeState
  err := checkValid(data, &d.scan)
  if err != nil {
    return err
  }

  d.init(data)
  return d.unmarshal(v)
}

反序列化方法会定义一个解析状态机decodeState对象,然后检查一下json是否合法,最后将json数据传入解析状态机,进行反序列化。解析状态机的定义如下:

代码语言:javascript复制
type decodeState struct {
  data                  []byte
  off                   int // next read offset in data
  opcode                int // last read result
  scan                  scanner
  errorContext          *errorContext
  savedError            error
  useNumber             bool
  disallowUnknownFields bool
}

它的核心属性时json扫描器scanner,扫描器的核心方法时step方法,它会沿着输入的json串,依次解析出每一个json词法单元,然后赋值给go对象。

代码语言:javascript复制
type scanner struct {
  // The step is a func to be called to execute the next transition.
  // Also tried using an integer constant and a single func
  // with a switch, but using the func directly was 10% faster
  // on a 64-bit Mac Mini, and it's nicer to read.
  step func(*scanner, byte) int
  // Reached end of top-level value.
  endTop bool
  // Stack of what we're in the middle of - array values, object keys, object values.
  parseState []int
  // Error that happened, if any.
  err error
  // total bytes consumed, updated by decoder.Decode (and deliberately
  // not set to zero by scan.reset)
  bytes int64
}

检查合法性的时候就用到了step方法,如果到json串结尾没有遇到错误说明就是合法的。

代码语言:javascript复制
func checkValid(data []byte, scan *scanner) error {
  scan.reset()
  for _, c := range data {
    scan.bytes  
    if scan.step(scan, c) == scanError {
      return scan.err
    }
  }
  if scan.eof() == scanError {
    return scan.err
  }
  return nil
}

在reset方法里设置了step方法为stateBeginValue

代码语言:javascript复制
func (s *scanner) reset() {
  s.step = stateBeginValue
  s.parseState = s.parseState[0:0]
  s.err = nil
  s.endTop = false
}

可以看到,它识别出第一个字符后,根据第一个字符推断后面的类型,比如{后面是json对象,[后面是数组," 后面是字符串,类似的可以推断后面类型是true,false,null,num等。

代码语言:javascript复制
func stateBeginValue(s *scanner, c byte) int {
            if isSpace(c) {
    return scanSkipSpace
  }
  switch c {
  case '{':
    s.step = stateBeginStringOrEmpty
    return s.pushParseState(c, parseObjectKey, scanBeginObject)
  case '[':
    s.step = stateBeginValueOrEmpty
    return s.pushParseState(c, parseArrayValue, scanBeginArray)
  case '"':
    s.step = stateInString
    return scanBeginLiteral
  case '-':
    s.step = stateNeg
    return scanBeginLiteral
  case '0': // beginning of 0.123
    s.step = state0
    return scanBeginLiteral
  case 't': // beginning of true
    s.step = stateT
    return scanBeginLiteral
  case 'f': // beginning of false
    s.step = stateF
    return scanBeginLiteral
  case 'n': // beginning of null
    s.step = stateN
    return scanBeginLiteral
  }
  if '1' <= c && c <= '9' { // beginning of 1234.5
    s.step = state1
    return scanBeginLiteral
  }
  return s.error(c, "looking for beginning of value")

然后修改step方法为当前语义下,解析后面的单元应该用到的解析函数,比如,对象开始后,后面应该是空对象} ,或者字符串组成的key

代码语言:javascript复制
 // stateBeginStringOrEmpty is the state after reading `{`.
func stateBeginStringOrEmpty(s *scanner, c byte) int {
  if isSpace(c) {
    return scanSkipSpace
  }
  if c == '}' {
    n := len(s.parseState)
    s.parseState[n-1] = parseObjectValue
    return stateEndValue(s, c)
  }
  return stateBeginString(s, c)
}

如果不是空对象,就应该解析对应的字符串key:

代码语言:javascript复制
// stateBeginString is the state after reading `{"key": value,`.
func stateBeginString(s *scanner, c byte) int {
  if isSpace(c) {
    return scanSkipSpace
  }
  if c == '"' {
    s.step = stateInString
    return scanBeginLiteral
  }
  return s.error(c, "looking for beginning of object key string")
}

能够解析出的词法单元如下:

代码语言:javascript复制
const (
  // Continue.
  scanContinue     = iota // uninteresting byte
  scanBeginLiteral        // end implied by next result != scanContinue
  scanBeginObject         // begin object
  scanObjectKey           // just finished object key (string)
  scanObjectValue         // just finished non-last object value
  scanEndObject           // end object (implies scanObjectValue if possible)
  scanBeginArray          // begin array
  scanArrayValue          // just finished array value
  scanEndArray            // end array (implies scanArrayValue if possible)
  scanSkipSpace           // space byte; can skip; known to be last "continue" result


  // Stop.
  scanEnd   // top-level value ended *before* this byte; known to be first "stop" result
  scanError // hit an error, scanner.err.
)

解析到合法的词法单元后会放到栈中,做一些词法单元的匹配:

代码语言:javascript复制
func (s *scanner) pushParseState(c byte, newParseState int, successState int) int {
  s.parseState = append(s.parseState, newParseState)
  if len(s.parseState) <= maxNestingDepth {
    return successState
  }
  return s.error(c, "exceeded max depth")
}

json检查通过后,把json塞给状态机

代码语言:javascript复制
func (d *decodeState) init(data []byte) *decodeState {
  d.data = data

然后就进入了正式的反序列化过程:

代码语言:javascript复制
func (d *decodeState) unmarshal(v any) error {
   rv := reflect.ValueOf(v)
  if rv.Kind() != reflect.Pointer || rv.IsNil() {
    return &InvalidUnmarshalError{reflect.TypeOf(v)}
  }
  d.scan.reset()
  d.scanWhile(scanSkipSpace)
  err := d.value(rv)

检查是否是指针类型,跳过空格,然后通过反射将扫描到的值赋值给v。扫描的过程和刚刚类型检查的过程完全一样。从前往后扫描,直到非空格才退出

代码语言:javascript复制
func (d *decodeState) scanWhile(op int) {
  s, data, i := &d.scan, d.data, d.off
  for i < len(data) {
    newOp := s.step(s, data[i])
    i  
    if newOp != op {
      d.opcode = newOp
      d.off = i
      return

核心函数,赋值函数会根据扫描到的词法单元来赋值,被分为三类:json数组,json对象,和普通的json类型

代码语言:javascript复制
func (d *decodeState) value(v reflect.Value) error {
  switch d.opcode {
  default:
    panic(phasePanicMsg)
  case scanBeginArray:
    if v.IsValid() {
      if err := d.array(v); err != nil {
        return err
      }
    } else {
      d.skip()
    }
    d.scanNext()
  case scanBeginObject:
    if v.IsValid() {
      if err := d.object(v); err != nil {
        return err
      }
    } else {
      d.skip()
    }
    d.scanNext()
  case scanBeginLiteral:
    // All bytes inside literal return scanContinue op code.
    start := d.readIndex()
    d.rescanLiteral()
    if v.IsValid() {
      if err := d.literalStore(d.data[start:d.readIndex()], v, false); err != nil {
        return err
      }
    }
  }

首先看下数组类型是如何赋值的

代码语言:javascript复制
func (d *decodeState) array(v reflect.Value) error {
  // Check for unmarshaler.
  u, ut, pv := indirect(v, false)
  if u != nil {
    start := d.readIndex()
    d.skip()
    return u.UnmarshalJSON(d.data[start:d.off])
  }
  if ut != nil {
    d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
    d.skip()
    return nil
  }
  v = pv

  // Check type of target.
  switch v.Kind() {
  case reflect.Interface:
    if v.NumMethod() == 0 {
      // Decoding into nil interface? Switch to non-reflect code.
      ai := d.arrayInterface()
      v.Set(reflect.ValueOf(ai))
      return nil
    }
    // Otherwise it's invalid.
    fallthrough
  default:
    d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
    d.skip()
    return nil
  case reflect.Array, reflect.Slice:
    break
  }

  i := 0
  for {
    // Look ahead for ] - can only happen on first iteration.
    d.scanWhile(scanSkipSpace)
    if d.opcode == scanEndArray {
      break
    }

    // Get element of array, growing if necessary.
    if v.Kind() == reflect.Slice {
      // Grow slice if necessary
      if i >= v.Cap() {
        newcap := v.Cap()   v.Cap()/2
        if newcap < 4 {
          newcap = 4
        }
        newv := reflect.MakeSlice(v.Type(), v.Len(), newcap)
        reflect.Copy(newv, v)
        v.Set(newv)
      }
      if i >= v.Len() {
        v.SetLen(i   1)
      }
    }

    if i < v.Len() {
      // Decode into element.
      if err := d.value(v.Index(i)); err != nil {
        return err
      }
    } else {
      // Ran out of fixed array: skip.
      if err := d.value(reflect.Value{}); err != nil {
        return err
      }
    }
    i  

    // Next token must be , or ].
    if d.opcode == scanSkipSpace {
      d.scanWhile(scanSkipSpace)
    }
    if d.opcode == scanEndArray {
      break
    }
    if d.opcode != scanArrayValue {
      panic(phasePanicMsg)
    }
  }

  if i < v.Len() {
    if v.Kind() == reflect.Array {
      // Array. Zero the rest.
      z := reflect.Zero(v.Type().Elem())
      for ; i < v.Len(); i   {
        v.Index(i).Set(z)
      }
    } else {
      v.SetLen(i)
    }
  }
  if i == 0 && v.Kind() == reflect.Slice {
    v.Set(reflect.MakeSlice(v.Type(), 0, 0))
  }
  return nil
}

它首先通过indirect方法检查类型是否实现了自定义的反序列化方法,如果实现了,调用自定义的反序列化方法UnmarshalJSON。否则根据具体类型递归选择对应的反序列化方法。

代码语言:javascript复制
func indirect(v reflect.Value, decodingNull bool) (Unmarshaler, encoding.TextUnmarshaler, reflect.Value) {       
    for {
     if v.Type().NumMethod() > 0 && v.CanInterface() {
      if u, ok := v.Interface().(Unmarshaler); ok {
        return u, nil, reflect.Value{}
      }
      if !decodingNull {
        if u, ok := v.Interface().(encoding.TextUnmarshaler); ok {
          return nil, u, reflect.Value{}
        }
      }
    }          

如果是无函数的接口类型会调arrayInterface()用来进行json解析,然后通过反射把解析到的值赋值给v

代码语言:javascript复制
v.Set(reflect.ValueOf(ai))

如果是数组或者slice接着按后面的方法继续解析:

代码语言:javascript复制
case reflect.Array, reflect.Slice:
    break

因为json的数组,只能解析成go的interface,array,slice否则是不合法的。解析arrayInterface的时候会把每一个元素解析成interface

代码语言:javascript复制
func (d *decodeState) arrayInterface() []any {
     v = append(v, d.valueInterface())

解析每个valueInterface的过程其实是递归的。

代码语言:javascript复制
func (d *decodeState) valueInterface() (val any) {
  switch d.opcode {
  default:
    panic(phasePanicMsg)
  case scanBeginArray:
    val = d.arrayInterface()
    d.scanNext()
  case scanBeginObject:
    val = d.objectInterface()
    d.scanNext()
  case scanBeginLiteral:
    val = d.literalInterface()
  }
  return

如果是简单类型,会按照最普通的interface来解析,根据json的手字母依次解析null,bool,数字,字符串等。

代码语言:javascript复制
func (d *decodeState) literalInterface() any {
  start := d.readIndex()
  d.rescanLiteral()
 item := d.data[start:d.readIndex()]
 switch c := item[0]; c {
  case 'n': // null
    return nil
  case 't', 'f': // true, false
    return c == 't'
  case '"': // string
    s, ok := unquote(item)
    if !ok {
      panic(phasePanicMsg)
    }
    return s

  default: // number
    if c != '-' && (c < '0' || c > '9') {
      panic(phasePanicMsg)
    }
    n, err := d.convertNumber(string(item))
    if err != nil {
      d.saveError(err)
    }
    return n
  }
}

如果是按照array或者slice类型来解析的话,它其实在一个for循环里面,依次解析每一个元素,然后按照slice和array的填充规则来进行填充,如果是slice类型,中间会遇到内存的重新申请。

代码语言:javascript复制
for {
    // Look ahead for ] - can only happen on first iteration.
    d.scanWhile(scanSkipSpace)
    if d.opcode == scanEndArray {
      break
    }
    if v.Kind() == reflect.Slice {
        if i >= v.Cap() {
        newcap := v.Cap()   v.Cap()/2
        if newcap < 4 {
          newcap = 4
        }
        newv := reflect.MakeSlice(v.Type(), v.Len(), newcap)
        reflect.Copy(newv, v)
        
        if i >= v.Len() {
        v.SetLen(i   1)
      }
                  if i < v.Len() {
      // Decode into element.
      if err := d.value(v.Index(i)); err != nil {
        return err
      }
    } else {
      // Ran out of fixed array: skip.
      if err := d.value(reflect.Value{}); err != nil {
        return err
      }
    }
  if i < v.Len() {
    if v.Kind() == reflect.Array {
      // Array. Zero the rest.
      z := reflect.Zero(v.Type().Elem())
      for ; i < v.Len(); i   {
        v.Index(i).Set(z)
      }
    } else {
      v.SetLen(i)
    }
  }
   if i == 0 && v.Kind() == reflect.Slice {
    v.Set(reflect.MakeSlice(v.Type(), 0, 0))
  }

分析完json数组的解析过程,我们来分析普通json类型的解析过程

代码语言:javascript复制
func (d *decodeState) rescanLiteral() {
  data, i := d.data, d.off
Switch:
  switch data[i-1] {
  case '"': // string
    for ; i < len(data); i   {
      switch data[i] {
      case '\':
        i   // escaped char
      case '"':
        i   // tokenize the closing quote too
        break Switch
      }
    }
  case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-': // number
    for ; i < len(data); i   {
      switch data[i] {
      case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '.', 'e', 'E', ' ', '-':
      default:
        break Switch
      }
    }
  case 't': // true
    i  = len("rue")
  case 'f': // false
    i  = len("alse")
  case 'n': // null
    i  = len("ull")
  }
  if i < len(data) {
    d.opcode = stateEndValue(&d.scan, data[i])
  } else {
    d.opcode = scanEnd
  }
  d.off = i   1
}

它就是遍历json字符串,解析出基本类型。解析出基本类型后,如果是合法的,就将它和val绑定

代码语言:javascript复制
if v.IsValid() {
    if err := d.literalStore(d.data[start:d.readIndex()], v, false); err != nil {

绑定的过程,会根据不同类型来进行不同处理

代码语言:javascript复制
func (d *decodeState) literalStore(item []byte, v reflect.Value, fromQuoted bool) error {
  // Check for unmarshaler.
  if len(item) == 0 {
    //Empty string given
    d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
    return nil
  }
  isNull := item[0] == 'n' // null
  u, ut, pv := indirect(v, isNull)
  if u != nil {
    return u.UnmarshalJSON(item)
  }
  if ut != nil {
    if item[0] != '"' {
      if fromQuoted {
        d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
        return nil
      }
      val := "number"
      switch item[0] {
      case 'n':
        val = "null"
      case 't', 'f':
        val = "bool"
      }
      d.saveError(&UnmarshalTypeError{Value: val, Type: v.Type(), Offset: int64(d.readIndex())})
      return nil
    }
    s, ok := unquoteBytes(item)
    if !ok {
      if fromQuoted {
        return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
      }
      panic(phasePanicMsg)
    }
    return ut.UnmarshalText(s)
  }

  v = pv

  switch c := item[0]; c {
  case 'n': // null
    // The main parser checks that only true and false can reach here,
    // but if this was a quoted string input, it could be anything.
    if fromQuoted && string(item) != "null" {
      d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
      break
    }
    switch v.Kind() {
    case reflect.Interface, reflect.Pointer, reflect.Map, reflect.Slice:
      v.Set(reflect.Zero(v.Type()))
      // otherwise, ignore null for primitives/string
    }
  case 't', 'f': // true, false
    value := item[0] == 't'
    // The main parser checks that only true and false can reach here,
    // but if this was a quoted string input, it could be anything.
    if fromQuoted && string(item) != "true" && string(item) != "false" {
      d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
      break
    }
    switch v.Kind() {
    default:
      if fromQuoted {
        d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
      } else {
        d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
      }
    case reflect.Bool:
      v.SetBool(value)
    case reflect.Interface:
      if v.NumMethod() == 0 {
        v.Set(reflect.ValueOf(value))
      } else {
        d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
      }
    }

  case '"': // string
    s, ok := unquoteBytes(item)
    if !ok {
      if fromQuoted {
        return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
      }
      panic(phasePanicMsg)
    }
    switch v.Kind() {
    default:
      d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
    case reflect.Slice:
      if v.Type().Elem().Kind() != reflect.Uint8 {
        d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
        break
      }
      b := make([]byte, base64.StdEncoding.DecodedLen(len(s)))
      n, err := base64.StdEncoding.Decode(b, s)
      if err != nil {
        d.saveError(err)
        break
      }
      v.SetBytes(b[:n])
    case reflect.String:
      if v.Type() == numberType && !isValidNumber(string(s)) {
        return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
      }
      v.SetString(string(s))
    case reflect.Interface:
      if v.NumMethod() == 0 {
        v.Set(reflect.ValueOf(string(s)))
      } else {
        d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
      }
    }

  default: // number
    if c != '-' && (c < '0' || c > '9') {
      if fromQuoted {
        return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
      }
      panic(phasePanicMsg)
    }
    s := string(item)
    switch v.Kind() {
    default:
      if v.Kind() == reflect.String && v.Type() == numberType {
        // s must be a valid number, because it's
        // already been tokenized.
        v.SetString(s)
        break
      }
      if fromQuoted {
        return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
      }
      d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
    case reflect.Interface:
      n, err := d.convertNumber(s)
      if err != nil {
        d.saveError(err)
        break
      }
      if v.NumMethod() != 0 {
        d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
        break
      }
      v.Set(reflect.ValueOf(n))

    case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
      n, err := strconv.ParseInt(s, 10, 64)
      if err != nil || v.OverflowInt(n) {
        d.saveError(&UnmarshalTypeError{Value: "number "   s, Type: v.Type(), Offset: int64(d.readIndex())})
        break
      }
      v.SetInt(n)

    case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
      n, err := strconv.ParseUint(s, 10, 64)
      if err != nil || v.OverflowUint(n) {
        d.saveError(&UnmarshalTypeError{Value: "number "   s, Type: v.Type(), Offset: int64(d.readIndex())})
        break
      }
      v.SetUint(n)

    case reflect.Float32, reflect.Float64:
      n, err := strconv.ParseFloat(s, v.Type().Bits())
      if err != nil || v.OverflowFloat(n) {
        d.saveError(&UnmarshalTypeError{Value: "number "   s, Type: v.Type(), Offset: int64(d.readIndex())})
        break
      }
      v.SetFloat(n)
    }
  }
  return nil
}

比如整数类型,最终调用来ParseInt

代码语言:javascript复制
    case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
      n, err := strconv.ParseInt(s, 10, 64)
      if err != nil || v.OverflowInt(n) {
        d.saveError(&UnmarshalTypeError{Value: "number "   s, Type: v.Type(), Offset: int64(d.readIndex())})
        break
      }
      v.SetInt(n)

至此,我们完成了简单类型的解析和绑定,最后只剩下最为复杂的对象类型。

代码语言:javascript复制
func (d *decodeState) object(v reflect.Value) error {
  // Check for unmarshaler.
  u, ut, pv := indirect(v, false)
  if u != nil {
    start := d.readIndex()
    d.skip()
    return u.UnmarshalJSON(d.data[start:d.off])
  }
  if ut != nil {
    d.saveError(&UnmarshalTypeError{Value: "object", Type: v.Type(), Offset: int64(d.off)})
    d.skip()
    return nil
  }
  v = pv
  t := v.Type()

  // Decoding into nil interface? Switch to non-reflect code.
  if v.Kind() == reflect.Interface && v.NumMethod() == 0 {
    oi := d.objectInterface()
    v.Set(reflect.ValueOf(oi))
    return nil
  }

  var fields structFields

  // Check type of target:
  //   struct or
  //   map[T1]T2 where T1 is string, an integer type,
  //             or an encoding.TextUnmarshaler
  switch v.Kind() {
  case reflect.Map:
    // Map key must either have string kind, have an integer kind,
    // or be an encoding.TextUnmarshaler.
    switch t.Key().Kind() {
    case reflect.String,
      reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64,
      reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
    default:
      if !reflect.PointerTo(t.Key()).Implements(textUnmarshalerType) {
        d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
        d.skip()
        return nil
      }
    }
    if v.IsNil() {
      v.Set(reflect.MakeMap(t))
    }
  case reflect.Struct:
    fields = cachedTypeFields(t)
    // ok
  default:
    d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
    d.skip()
    return nil
  }

  var mapElem reflect.Value
  var origErrorContext errorContext
  if d.errorContext != nil {
    origErrorContext = *d.errorContext
  }

  for {
    // Read opening " of string key or closing }.
    d.scanWhile(scanSkipSpace)
    if d.opcode == scanEndObject {
      // closing } - can only happen on first iteration.
      break
    }
    if d.opcode != scanBeginLiteral {
      panic(phasePanicMsg)
    }

    // Read key.
    start := d.readIndex()
    d.rescanLiteral()
    item := d.data[start:d.readIndex()]
    key, ok := unquoteBytes(item)
    if !ok {
      panic(phasePanicMsg)
    }

    // Figure out field corresponding to key.
    var subv reflect.Value
    destring := false // whether the value is wrapped in a string to be decoded first

    if v.Kind() == reflect.Map {
      elemType := t.Elem()
      if !mapElem.IsValid() {
        mapElem = reflect.New(elemType).Elem()
      } else {
        mapElem.Set(reflect.Zero(elemType))
      }
      subv = mapElem
    } else {
      var f *field
      if i, ok := fields.nameIndex[string(key)]; ok {
        // Found an exact name match.
        f = &fields.list[i]
      } else {
        // Fall back to the expensive case-insensitive
        // linear search.
        for i := range fields.list {
          ff := &fields.list[i]
          if ff.equalFold(ff.nameBytes, key) {
            f = ff
            break
          }
        }
      }
      if f != nil {
        subv = v
        destring = f.quoted
        for _, i := range f.index {
          if subv.Kind() == reflect.Pointer {
            if subv.IsNil() {
              // If a struct embeds a pointer to an unexported type,
              // it is not possible to set a newly allocated value
              // since the field is unexported.
              //
              // See https://golang.org/issue/21357
              if !subv.CanSet() {
                d.saveError(fmt.Errorf("json: cannot set embedded pointer to unexported struct: %v", subv.Type().Elem()))
                // Invalidate subv to ensure d.value(subv) skips over
                // the JSON value without assigning it to subv.
                subv = reflect.Value{}
                destring = false
                break
              }
              subv.Set(reflect.New(subv.Type().Elem()))
            }
            subv = subv.Elem()
          }
          subv = subv.Field(i)
        }
        if d.errorContext == nil {
          d.errorContext = new(errorContext)
        }
        d.errorContext.FieldStack = append(d.errorContext.FieldStack, f.name)
        d.errorContext.Struct = t
      } else if d.disallowUnknownFields {
        d.saveError(fmt.Errorf("json: unknown field %q", key))
      }
    }

    // Read : before value.
    if d.opcode == scanSkipSpace {
      d.scanWhile(scanSkipSpace)
    }
    if d.opcode != scanObjectKey {
      panic(phasePanicMsg)
    }
    d.scanWhile(scanSkipSpace)

    if destring {
      switch qv := d.valueQuoted().(type) {
      case nil:
        if err := d.literalStore(nullLiteral, subv, false); err != nil {
          return err
        }
      case string:
        if err := d.literalStore([]byte(qv), subv, true); err != nil {
          return err
        }
      default:
        d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
      }
    } else {
      if err := d.value(subv); err != nil {
        return err
      }
    }

    // Write value back to map;
    // if using struct, subv points into struct already.
    if v.Kind() == reflect.Map {
      kt := t.Key()
      var kv reflect.Value
      switch {
      case reflect.PointerTo(kt).Implements(textUnmarshalerType):
        kv = reflect.New(kt)
        if err := d.literalStore(item, kv, true); err != nil {
          return err
        }
        kv = kv.Elem()
      case kt.Kind() == reflect.String:
        kv = reflect.ValueOf(key).Convert(kt)
      default:
        switch kt.Kind() {
        case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
          s := string(key)
          n, err := strconv.ParseInt(s, 10, 64)
          if err != nil || reflect.Zero(kt).OverflowInt(n) {
            d.saveError(&UnmarshalTypeError{Value: "number "   s, Type: kt, Offset: int64(start   1)})
            break
          }
          kv = reflect.ValueOf(n).Convert(kt)
        case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
          s := string(key)
          n, err := strconv.ParseUint(s, 10, 64)
          if err != nil || reflect.Zero(kt).OverflowUint(n) {
            d.saveError(&UnmarshalTypeError{Value: "number "   s, Type: kt, Offset: int64(start   1)})
            break
          }
          kv = reflect.ValueOf(n).Convert(kt)
        default:
          panic("json: Unexpected key type") // should never occur
        }
      }
      if kv.IsValid() {
        v.SetMapIndex(kv, subv)
      }
    }

    // Next token must be , or }.
    if d.opcode == scanSkipSpace {
      d.scanWhile(scanSkipSpace)
    }
    if d.errorContext != nil {
      // Reset errorContext to its original state.
      // Keep the same underlying array for FieldStack, to reuse the
      // space and avoid unnecessary allocs.
      d.errorContext.FieldStack = d.errorContext.FieldStack[:len(origErrorContext.FieldStack)]
      d.errorContext.Struct = origErrorContext.Struct
    }
    if d.opcode == scanEndObject {
      break
    }
    if d.opcode != scanObjectValue {
      panic(phasePanicMsg)
    }
  }
  return nil
}

它同样首先检查有没有自定义反序列化方法,如果没有,则采用内置的反序列化方法。然后检查需要绑定的值的类型是不是interface类型,如果是,就调用objectInterface来进行反序列化。否则检查需要绑定的值的类型是不是map类型,如果是map类型,需要检查key分类型是否能作为map的key或者是否实现textUnmarshalerType方法。最后才检查需要绑定的对象是不是结构体类型。如果是结构体类型,调用cachedTypeFields先通过反射获取结构体的每个字段和这个字段对应的反序列化方法,并缓存下来。

代码语言:javascript复制
func cachedTypeFields(t reflect.Type) structFields {
  if f, ok := fieldCache.Load(t); ok {
    return f.(structFields)
  }
  f, _ := fieldCache.LoadOrStore(t, typeFields(t))
  return f.(structFields)
}

同样这里也用到了sync.Map来做缓存

代码语言:javascript复制
var fieldCache sync.Map 
代码语言:javascript复制
func typeFields(t reflect.Type) structFields {
      next := []field{{typ: t}}
      for len(next) > 0 {
        for _, f := range current {
          if visited[f.typ] {
          continue
      }
      for i := 0; i < f.typ.NumField(); i   {
        sf := f.typ.Field(i)
                      tag := sf.Tag.Get("json")
                      name, opts := parseTag(tag)
                      index[len(f.index)] = i
                      if opts.Contains("string") {
                      
        // Record found field and index sequence.
        if name != "" || !sf.Anonymous || ft.Kind() != reflect.Struct {
                        field := field{
            name:      name,
            tag:       tagged,
            index:     index,
            typ:       ft,
            omitEmpty: opts.Contains("omitempty"),
            quoted:    quoted,
          }
                        HTMLEscape(&nameEscBuf, field.nameBytes)
                        fields = append(fields, field)
                        sort.Slice(fields, func(i, j int) bool {
    x := fields
                          if x[i].name != x[j].name {
      return x[i].name < x[j].name
    }
                        for advance, i := 0, 0; i < len(fields); i  = advance {
                            for advance = 1; i advance < len(fields); advance   {
           fj := fields[i advance]
                        sort.Sort(byIndex(fields))
          return structFields{fields, nameIndex}

会根据结构体的定义和tag标记,解析每个field的描述信息,存到slice里面,并且会对每个field的name进行排序。

代码语言:javascript复制
// A field represents a single field found in a struct.
type field struct {
  name      string
  nameBytes []byte                 // []byte(name)
  equalFold func(s, t []byte) bool // bytes.EqualFold or equivalent


  nameNonEsc  string // `"`   name   `":`
  nameEscHTML string // `"`   HTMLEscape(name)   `":`


  tag       bool
  index     []int
  typ       reflect.Type
  omitEmpty bool
  quoted    bool


  encoder encoderFunc
}

做完上述准备工作后,就进入了for循环里来进行json对象的每个字段的解析,得到一个个item,然后根据值的类型进行绑定:如果是map,就存到map的value,如果是结构体,就存到对应field

代码语言:javascript复制
for {
  d.scanWhile(scanSkipSpace)
  item := d.data[start:d.readIndex()]
  key, ok := unquoteBytes(item)
     if v.Kind() == reflect.Map {
      elemType := t.Elem()
      if !mapElem.IsValid() {
        mapElem = reflect.New(elemType).Elem()
      } else {
        mapElem.Set(reflect.Zero(elemType))
      }
      subv = mapElem

定位应该设置到哪个field,是通过field的名字来进行匹配的:

代码语言:javascript复制
} else {
      var f *field
      if i, ok := fields.nameIndex[string(key)]; ok {
        // Found an exact name match.
        f = &fields.list[i]
        for i := range fields.list {
        // Fall back to the expensive case-insensitive
        // linear search.
          ff := &fields.list[i]
          if ff.equalFold(ff.nameBytes, key) {
            f = ff
            break
          }
        }

如果没有找到field名字,那么就执行大小写不敏感规则,这个匹配过程是一个线性扫描过程,时间复杂度是O(n),配对上以后,就通过反射来进行赋值:

代码语言:javascript复制
if f != nil {
        subv = v
        destring = f.quoted
        for _, i := range f.index {
          if subv.Kind() == reflect.Pointer {
            if subv.IsNil() {
                  subv.Set(reflect.New(subv.Type().Elem()))
                if destring {
      switch qv := d.valueQuoted().(type) {
      case nil:
        if err := d.literalStore(nullLiteral, subv, false); err != nil {
          return err
        }
      case string:
        if err := d.literalStore([]byte(qv), subv, true); err != nil {
          return err
        }
      default:

以上就是反序列化的核心代码。除此之外,json解析在go里面限制最大深度 10000。json RawMessag是原始编码后的json值。含json.RawMessage字段的结构,在反序列化时会完整的接收json串对象的[]byte形式。延迟解析在实际使用时才解析它的具体类型。使用json.RawMessage方式,将json的字串继续以byte数组方式存在。

我们可以看到,针对简单类型和数组类型,我们可以依次从前往后解析json string绑定到我们的go对象。但是对于json的object类型,处理起来就比较棘手,首先,json object是无序的,如果不做优化,它和go struct类型匹配的过程是O(n^2)的复杂度。如果仅仅是解析到map类型或者interface类型,因为没有匹配过程,性能还好。在匹配struct类型的时候,golang也进行了优化,通过反射,建立类型和对应反序列化方法的影射关系,并且根据field的名字进行了排序,将复杂度降低到O(nlogn),但是,如果json的object的key和struct的field的名字不能完全匹配,退化到首字母不敏感匹配时,算法又会退化到O(n^2)的复杂度。在明确知道类型的时候,上述运行时的方法可以提前到编译时。另外在反序列化的时候先检查json是否合法,进行了一次json串的遍历,然后在值绑定的时候又进行了一次遍历。虽然提前一次遍历能够减少json不合法场景下的内存分配和反射操作,但是两次json遍历确实有很大浪费。因为在实际生产中多数json都是合法的,前面的一次检查可以优化掉。

0 人点赞