概述
Protocol Buffers 为结构化数据的序列化向前兼容,向后兼容,提供了语言中立、平台无关、可扩展机制的途径。类似JSON,但比JSON更小、更快。
通过.proto
文件来定义,生成接口代码、特定语言的运行库,以及数据的序列化格式。
解决了什么问题
网络包的序列化格式 ,高达几兆大小的结构化数据,适用于网络传输和长期的数据存储。面对变更,不用修改代码。
程序员只需编写.proto
文件
message Person {
optional string name = 1;
optional int32 id = 2;
optional string email = 3;
}
通过.proto
文件,可生成各种语言的代码,还包含字段的访问、序列化和反序列化的方法。
Person john = Person.newBuilder()
.setId(1234)
.setName("John Doe")
.setEmail("jdoe@example.com")
.build();
output = new FileOutputStream(args[0]);
john.writeTo(output);
由于可用于持久化,那么向后兼容就是至关重要的了。Protocol buffers 允许修改、新增、删除字段的同时,不影响现有服务,后面细说。
使用Protocol buffers 的好处
Protocol buffers可实现以下功能:
- 序列化结构化数据
- 记录
- 语言无关、平台无关的数据类型
- 可扩展
一般用于定义通信协议(同grpc一起使用)和数据存储。
优点:
- 紧凑型数据存储
- 快速解析
- 多语言可用
- 自动化生成代码
支持跨语言
可使用不同语言序列化和反序列化
支持跨项目
定义一份.proto
文件,多个项目都能使用。可用于跨项目之间的接口定义。
更新proto文件后没有更新代码
由于支持跨项目,就要考虑向前兼容和向后兼容。
- 向前兼容:proto没更新,代码更新了,新加的字段proto文件里没有,这种情况Protocol buffers会提供默认值
- 向后兼容:proto更新了,代码没有更新,会忽略新加的字段,针对删除的字段,Protocol buffers会提供默认值,删除的是list字段(repeated fields),将被置空。
不适合Protocol Buffers的情况
- 针对小文件,Protocol Buffers是一次加载进内存,但体积超过几兆的文件加载过程中会产生多个副本,倒导致瞬间内存出现峰值
- 序列化后的二进制文件不能直接比较,也就是同样的数据,序列化后不保证相同。要比较的话需要先解析。
- 消息没有压缩
- 对于涉及大型多维浮点数数组的许多科学和工程应用,Protocol Buffers message在大小和速度上都没有达到最大效率。
- 不支持非面向对象的语言
- Protocol Buffers message不能自描述
谁使用了Protocol Buffers
- gRPC
- Envoy Proxy
Protocol Buffers 定义文件的语法
字段选项
- optional:可选字段读取时,如果不存在,就会读取该字段类型的默认值,可主动设置默认值
optional int32 result_per_page = 3 [default = 10];
- repeated:数组,顺序会保留,proto3默认压缩
- singular
- required(不建议使用)
- 如果必填字段更改为非必填了,但某个项目的code没有及时更新,这时如果不传递该字段就会出现异常。
- 针对必填的枚举值,新增枚举值后,未更新code的项目,无法识别新的枚举值,会丢弃掉,导致无法通过必填校验
基础类型
.proto Type | Notes | C Type | Java Type | Python Type[2] | Go Type |
---|---|---|---|---|---|
double | double | double | float | *float64 | |
float | float | float | float | *float32 | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | *int32 |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long[3] | *int64 |
uint32 | Uses variable-length encoding. | uint32 | int[1] | int/long[3] | *uint32 |
uint64 | Uses variable-length encoding. | uint64 | long[1] | int/long[3] | *uint64 |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | *int32 |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long[3] | *int64 |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 228. | uint32 | int[1] | int/long[3] | *uint32 |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 256. | uint64 | long[1] | int/long[3] | *uint64 |
sfixed32 | Always four bytes. | int32 | int | int | *int32 |
sfixed64 | Always eight bytes. | int64 | long | int/long[3] | *int64 |
bool | bool | boolean | bool | *bool | |
string | A string must always contain UTF-8 encoded text. | string | String | unicode (Python 2) or str (Python 3) | *string |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | bytes | []byte |
复合类型
- message
- enum
- oneof:当一条消息有多个可选字段且最多同时设置一个字段时,可以使用该类型
- map
支持额外的数据类型
Duration
Timestamp
Interval
Date
DayOfWeek
TimeOfDay
LatLng
Money
.PostalAddress
Color
Month
简单使用
代码语言:javascript复制syntax = "proto3";
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}
文件第一行非空、非注释的代码,指定了proto的版本,否则默认按照proto2来解析
非配字段序号,二进制文件中字段的唯一标识,不应该改变和复用,会影响兼容性
为避免上诉问题,如果是多系统交互,删除字段后,应该通过reserved来标识该字段序号或者字段名被预留了
代码语言:javascript复制message Foo {
reserved 2, 15, 9 to 11;
reserved "foo", "bar";
}
1-15的字段序号(包含字段类型)需要一个字节存储,16-2047的字段序号需要两个字节存储,频繁使用的字段应放到1-15范围内
多个相关的message可以放到一个proto文件
注释
代码语言:javascript复制/* SearchRequest represents a search query, with pagination options to
* indicate which results to include in the response. */
message SearchRequest {
string query = 1;
int32 page_number = 2; // Which page number do we want?
int32 result_per_page = 3; // Number of results to return per page.
}
- 单行注释放到末尾(谷歌的是单独一行)
通过proto文件产生了哪些产物
- 字段的读写方法
- 序列化、反序列化方法
-
.pb.go
文件
默认值
枚举的默认值是第一个定义的枚举值,并且必须值为0
repeated字段的默认值为空的list
实际使用时需注意区分默认值和主动设置的值,例如一个布尔值为false,有可能是主动设置的false,也有可能是没有提供该参数而产生的默认值。这种情况可使用包装类
代码语言:javascript复制import "google/protobuf/wrappers.proto";
google.protobuf.Int32Value status = 2;
枚举值
代码语言:javascript复制message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
enum Corpus {
UNIVERSAL = 0;
WEB = 1;
IMAGES = 2;
LOCAL = 3;
NEWS = 4;
PRODUCTS = 5;
VIDEO = 6;
}
Corpus corpus = 4;
}
第一个枚举值必须为0,可用于默认值
重复值需注明,否则编译错误
代码语言:javascript复制message MyMessage1 {
enum EnumAllowingAlias {
option allow_alias = true;
UNKNOWN = 0;
STARTED = 1;
RUNNING = 1;
}
}
无法识别的枚举值也会被序列化到文件,还会反序列化到message
删除枚举值也会产生兼容性问题,和字段类似,可以通过预留的方式,防止被重新使用
代码语言:javascript复制enum Foo {
reserved 2, 15, 9 to 11, 40 to max;
reserved "FOO", "BAR";
}
引用其他proto文件里的message
代码语言:javascript复制import "myproject/other_protos.proto";
嵌套使用
代码语言:javascript复制message SearchResponse {
message Result {
string url = 1;
string title = 2;
repeated string snippets = 3;
}
repeated Result results = 1;
}
外部使用
代码语言:javascript复制message SomeOtherMessage {
SearchResponse.Result result = 1;
}
更新proto不更新代码
Protocol Buffers在处理数据时,会自动进行类型转换,所以有的情况下可以达到兼容的效果。例如string的code读取bytes时,只要bytes是utf8编码的,就可以读取为string。int32读取int64的数据,会自动截取32位。
这里主要是体现兼容性,但不建议故意为之。
Unknown Fields
old code parse new binary,new fields become unknown fields
Any message type
代码语言:javascript复制import "google/protobuf/any.proto";
message ErrorStatus {
string message = 1;
repeated google.protobuf.Any details = 2;
}
- 代表任意类型,相当于java.object,go.interface{}
Oneof
代码语言:javascript复制message SampleMessage {
oneof test_oneof {
string name = 4;
SubMessage sub_message = 9;
}
}
- 最终只有一个字段有值,设置多个字段的值,会自动清除已赋值的字段
- 不支持map、repeated
- 额外提供检测某个字段是否被被赋值的方法
- 向后兼容时需要注意,oneof返回值为
None/NOT_SET
,无法区分是没有设置值,还是因为兼容性问题导致的
map
代码语言:javascript复制map<string, Project> projects = 3;
- key只能是整数和字符串
- value不能是map
- 不能使用repeated
- 不能指定遍历顺序,只能是按照key排序
- 解析重复key的文件可能会失败
- 序列化value为空的map item时,C , Java, Kotlin, and Python 会使用value的默认值,其他语言不会序列化该map item
packages
代码语言:javascript复制package foo.bar;
message Open { ... }
message Foo {
...
foo.bar.Open open = 1;
...
}
- 使用package,避免message命名冲突
- 在go中,生成的文件也是用了上面指定的包名
service
代码语言:javascript复制service SearchService {
rpc Search(SearchRequest) returns (SearchResponse);
}
- 用于RPC
json
json里的空字段转Protocol buffers时,会转成默认值。Protocol buffers里的默认字段转json时会被忽略,但可配置。
proto3 | JSON | JSON example | Notes |
---|---|---|---|
message | object | {"fooBar": v, "g": null, …} | Generates JSON objects. Message field names are mapped to lowerCamelCase and become JSON object keys. If the json_name field option is specified, the specified value will be used as the key instead. Parsers accept both the lowerCamelCase name (or the one specified by the json_name option) and the original proto field name. null is an accepted value for all field types and treated as the default value of the corresponding field type. |
enum | string | "FOO_BAR" | The name of the enum value as specified in proto is used. Parsers accept both enum names and integer values. |
map<K,V> | object | {"k": v, …} | All keys are converted to strings. |
repeated V | array | [v, …] | null is accepted as the empty list []. |
bool | true, false | true, false | |
string | string | "Hello World!" | |
bytes | base64 string | "YWJjMTIzIT8kKiYoKSctPUB " | JSON value will be the data encoded as a string using standard base64 encoding with paddings. Either standard or URL-safe base64 encoding with/without paddings are accepted. |
int32, fixed32, uint32 | number | 1, -10, 0 | JSON value will be a decimal number. Either numbers or strings are accepted. |
int64, fixed64, uint64 | string | "1", "-10" | JSON value will be a decimal string. Either numbers or strings are accepted. |
float, double | number | 1.1, -10.0, 0, "NaN", "Infinity" | JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted. -0 is considered equivalent to 0. |
Any | object | {"@type": "url", "f": v, … } | If the Any contains a value that has a special JSON mapping, it will be converted as follows: {"@type": xxx, "value": yyy}. Otherwise, the value will be converted into a JSON object, and the "@type" field will be inserted to indicate the actual data type. |
Timestamp | string | "1972-01-01T10:00:20.021Z" | Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. |
Duration | string | "1.000340012s", "1s" | Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required. |
Struct | object | { … } | Any JSON object. See struct.proto. |
Wrapper types | various types | 2, "2", "foo", true, "true", null, 0, … | Wrappers use the same representation in JSON as the wrapped primitive type, except that null is allowed and preserved during data conversion and transfer. |
FieldMask | string | "f.fooBar,h" | See field_mask.proto. |
ListValue | array | [foo, bar, …] | |
Value | value | Any JSON value. Check google.protobuf.Value for details. | |
NullValue | null | JSON null | |
Empty | object | {} | An empty JSON object |
json options
- 输出默认值的字段
- 忽略unknown fields:Proto3 JSON parser 默认会报错
- 转json时使用proto里的字段名,默认会转成小驼峰(标注的proto应该是下划线分隔)
- 针对枚举,可以输出int值,默认是输出枚举值的name字符串
Options
不同级别的选项:file-level、message-level、field-level、enum types, enum values, oneof fields, service types, and service methods...
级别对应编写的位置
可以自定义option
go_packages
:指定生成的文件的引用路径,最后一个词作为包名
option go_package = "github.com/protocolbuffers/protobuf/examples/go/tutorialpb";