golang源码分析:json-iterator/go(1)

2023-09-06 19:18:26 浏览数 (2)

https://github.com/json-iterator/go是一个非常优秀的go json解析库,完全兼容官方的json解析库。相对于官方的解析器,它的优化点在于:

1,单次扫描:所有解析都是在字节数组流中直接在一次传递中完成的。readInt或readString一次完成,并没有做json的token切分,直接读取字符,转换成目标类型,readFloat或readDouble都以这种方式实现。避免重复扫描的同时,也最大限度避免了内存的申请和释放。

2,它不解析令牌,然后分支。相反,它是先将目标需要绑定的golang对象类型和对应的解析器解析出来,并缓存。然后遍历json串的时候,对取出来的每个key,结合json当前上下文,去map里取对应的解析器,去解析并绑定值。

3,对于不需要解析的字段,会跳过它所有的嵌套对象,因为匹配不到解析器,避免不必要的解析。跳过整个对象时,我们不关心嵌套字段名称

4,绑定到对象不使用反射api。而是取出原始指针interface{},然后转换为正确的指针类型以设置值。例如:*((*int)(ptr)) = iter.ReadInt()

5,尽量避免map的分配和寻址,对于小于等于10个字段的结构体,通过计算key的hash的方式,分配每个字段的结构体和对应的解析函数,这样解析到key的时候,直接通过hash值的匹配,避免了字符串匹配和map的分配,以及匹配。

总之通过上述一系列优化,使得它的反序列化性能,在特定场景下比官方的标准库能够快10倍。当然也有很多网友对此数据表示质疑,所以分析源码之前,拿它提供的benchmark跑了下数据: https://github.com/json-iterator/go-benchmark/blob/master/src/github.com/json-iterator/go-benchmark/benchmark_medium_payload_test.go

在不改变son-iterator 提供的样例数据的情况下,跑出来的效果是惊人的。

代码语言:javascript复制
var mediumFixture []byte = []byte(`{
  "person": {
    "id": "d50887ca-a6ce-4e59-b89f-14f0b5d03b03",
    "name": {
      "fullName": "Leonid Bugaev",
      "givenName": "Leonid",
      "familyName": "Bugaev"
    },
    "email": "leonsbox@gmail.com",
    "gender": "male",
    "location": "Saint Petersburg, Saint Petersburg, RU",
    "geo": {
      "city": "Saint Petersburg",
      "state": "Saint Petersburg",
      "country": "Russia",
      "lat": 59.9342802,
      "lng": 30.3350986
    },
    "bio": "Senior engineer at Granify.com",
    "site": "http://flickfaver.com",
    "avatar": "https://d1ts43dypk8bqh.cloudfront.net/v1/avatars/d50887ca-a6ce-4e59-b89f-14f0b5d03b03",
    "employment": {
      "name": "www.latera.ru",
      "title": "Software Engineer",
      "domain": "gmail.com"
    },
    "facebook": {
      "handle": "leonid.bugaev"
    },
    "github": {
      "handle": "buger",
      "id": 14009,
      "avatar": "https://avatars.githubusercontent.com/u/14009?v=3",
      "company": "Granify",
      "blog": "http://leonsbox.com",
      "followers": 95,
      "following": 10
    },
    "twitter": {
      "handle": "flickfaver",
      "id": 77004410,
      "bio": null,
      "followers": 2,
      "following": 1,
      "statuses": 5,
      "favorites": 0,
      "location": "",
      "site": "http://flickfaver.com",
      "avatar": null
    },
    "linkedin": {
      "handle": "in/leonidbugaev"
    },
    "googleplus": {
      "handle": null
    },
    "angellist": {
      "handle": "leonid-bugaev",
      "id": 61541,
      "bio": "Senior engineer at Granify.com",
      "blog": "http://buger.github.com",
      "site": "http://buger.github.com",
      "followers": 41,
      "avatar": "https://d1qb2nb5cznatu.cloudfront.net/users/61541-medium_jpg?1405474390"
    },
    "klout": {
      "handle": null,
      "score": null
    },
    "foursquare": {
      "handle": null
    },
    "aboutme": {
      "handle": "leonid.bugaev",
      "bio": null,
      "avatar": null
    },
    "gravatar": {
      "handle": "buger",
      "urls": [
      ],
      "avatar": "http://1.gravatar.com/avatar/f7c8edd577d13b8930d5522f28123510",
      "avatars": [
        {
          "url": "http://1.gravatar.com/avatar/f7c8edd577d13b8930d5522f28123510",
          "type": "thumbnail"
        }
      ]
    },
    "fuzzy": false
  },
  "company": null
}`)

它里面有很多不需要解析的字段,这些字段官方库解析起来比较慢,下面是跑出来的结果

代码语言:javascript复制
lib       decode            encode
std       156737 ns/op       2392 ns/op
jsoniter  18733 ns/op        2435 ns/op
easyjson  45686 ns/op        1793 ns/op

但是稍微调整下benchmark的数据,发现效果差别会很大

代码语言:javascript复制
var mediumFixture1 []byte = []byte(`{
  "person": {
    "id": "d50887ca-a6ce-4e59-b89f-14f0b5d03b03",
    "name": {
    "fullName": "Leonid Bugaev",
    "givenName": "Leonid",
    "familyName": "Bugaev"
    },
    "github": {
    "handle": "buger",
    "id": 14009,
    "avatar": "https://avatars.githubusercontent.com/u/14009?v=3",
    "company": "Granify",
    "blog": "http://leonsbox.com",
    "followers": 95,
    "following": 10
    },
    "gravatar": {
    "handle": "buger",
    "urls": [
    ],
    "avatar": "http://1.gravatar.com/avatar/f7c8edd577d13b8930d5522f28123510",
    "avatars": [
      {
      "url": "http://1.gravatar.com/avatar/f7c8edd577d13b8930d5522f28123510",
      "type": "thumbnail"
      }
    ]
    },
    "fuzzy": false
  },
  "company": null
  }`)

调整后的数据如下:

代码语言:javascript复制
lib        decode            encode 
std        9301 ns/op       953.3 ns/op
jsoniter   2262 ns/op       913.9 ns/op  
easyjson   2757 ns/op        733.2 ns/op

结果也告诫我们,其实不要盲目信任benchmark,每个库都有自己擅长的和不擅长的,提供benchmark的人,都有意或者无意有偏向性。还是需要根据自己的业务场景进行实际压测,跑bench来得出相应的结论。

话说回来,虽然改了下bench,效果差了很多,但是比起官方的库,性能优化还是非常可观的。下面我们研究下如何使用它。

代码语言:javascript复制
import (
  "fmt"

  jsoniter "github.com/json-iterator/go"
)

type ColorGroup struct {
  ID     int
  Name   string
  Colors []string
}

func main() {
  var json = jsoniter.ConfigCompatibleWithStandardLibrary

  group := ColorGroup{
    ID:     1,
    Name:   "Reds",
    Colors: []string{"Crimson", "Red", "Ruby", "Maroon"},
  }
  b, err := jsoniter.Marshal(group)

  fmt.Println(string(b), err)

  val := []byte(`{"ID":1,"Name":"Reds","Colors":["Crimson","Red","Ruby","Maroon"]}`)
  fmt.Println(jsoniter.Get(val, "Colors", 0).ToString())

  data := ColorGroup{}
  fmt.Println(data, json.Unmarshal(b, &data), data)
 }

默认情况下,它的接口api和官方api一致,初始化 var json = jsoniter.ConfigCompatibleWithStandardLibrary对象后就可以和官方api一样来使用它了。关于它的源码,我们后面接着分析。

0 人点赞