一、概要
ProtoBuffer由google公司用于数据交换的序列结构化数据格式,具有跨平台、跨语言、可扩展特性,类型于常用的XML及JSON,但具有更小的传输体积、更高的编码、解码能力,特别适合于数据存储、网络数据传输等对存储体积、实时性要求高的领域。以 .proto为后缀,有自己的编译器 Protoc, 本篇文章主要讲解protocol buffer从3.0 。支持c ,Java,Python,Go,Ruby,JavaNano,JavaScript,Objective-C,C#,PHP等开发语言。
- protobuf的开源地址为:github.com/google/proto
- protocol compiler下载地址为:github.com/google/proto
- 官方英文版文档:
- developers.google.com/p
- developers.google.com/p
如果英文版文档网址打开困难可以看文章末尾有直接搬运。
二、简介
- 标量类型
- 字段规则
- 保留字段
- 枚举
- 定义proto文件
三、主要内容
标量类型
数值型,double,float,int32,int64,uint32,uint64,sint32,sint64,fixed32,fixed64,sfixed32,sfixed64。
布尔型:bool类型为ture或者false。
字符串:string表示任意长度文本,但是它必须包含的是UTF-8编码或者7位ASCII的文本,长度不可超过232.
字节型:bytes可以表示任意的byte数组序列,但是长度也不可以超过232,最后是由开发者来决定如何解析这些bytes。例如你可以使用这个类型来表示一个图片。
字段规则
protobuf的字段必须满足以下两个规则之一: 单数字段(Singular) 表示这个字段只能出现0或1次(不能超过一次),这也是proto3的默认字段规则。
重复字段(repeated) 如果你想做一个list或者数组的话,你可以使用重复字段这个概念。这个list可以有任何数量(包括0)的元素。它里面的值的顺序将会得到保留。
保留字段
如果你对你定义的消息类型进行了更新,例如删除某个字段或者注释掉某个字段,那么其他开发者在以后更新这个消息类型的时候可能会重新使用被你删除/注释掉的字段的数值(tag)。如果以后还需要使用这个消息类型的老版本proto文件,那么这将会引起严重的问题,例如数据损坏、隐私漏洞等等。那么一种避免此类事情发生的解决办法就是删除/注释掉这些字段的数值(或/并且包括字段名,因为字段名可以引起json序列化问题)标记为reserved,如果其他人再使用这个数值作为字段表示符,那么编译器就会有错误提示。
枚举
枚举值可以起别名,作用是允许两个枚举值拥有同一个数值。要想起别名,首先需要设置allow_alias这个option为true。
定义proto文件
- syntax = "proto3";//proto文件必须携带关键字,指定proto版本为3
- import "date.proto";//导入其他proto文件
- package my.project; //相当于命名空间
- option juster_namespace = "my.service"; //修改命名空间
- message MyMessage{} //类似于c#中的class,写法也一样
- service MyService{} //定义服务,服务里定义具体的服务方法
定义grpc服务语法解读:
代码语言:javascript复制rpc GetByNo(GetMsgByNoRequest) returns (MsgResponse);
- rpc关键表示这是一个rpc调用,客户端想让服务端做的事情
- GetByNo是方法名
- (GetMsgByNoRequest)是入参(在proto文件中定义)
- returns 表示返回值类型(在proto文件中定义)
- (MsgResponse)表示返回类型(在proto文件中定义)
- stream 表示一个数据流可加在入参上或者返回值上。也可不加。应用场景可为传输图片。
syntax = "proto3";
option csharp_namespace = "DemoGRPC.Protos";
message JusterMessage{
int32 id = 1;
string name = 2;
string msg = 3;
}
//读取消息请求
message GetMsgByNoRequest{
int32 id = 1;
}
message MsgRequest{
JusterMessage msg = 1;
}
//读取消息响应
message MsgResponse{
JusterMessage msg = 1;
}
//读取所有消息请求
message GetAllMsgRequest{}
//添加照片请求
message AddPhotoRequest{
bytes data = 1;
}
//添加照片响应
message AddPhotoResponse{
bool isOk = 1;
}
service MessageService{
/*一元消息类型:即客户端发送一个请求给服务端,
从服务端获取一个应答,就像一次普通的函数调用。*/
rpc GetByNo(GetMsgByNoRequest) returns (MsgResponse);
/*单向Server Streaming:即客户端发送一个请求给服务端,可获取一个数据流用来读取一系列消息。
客户端从返回的数据流里一直读取直到没有更多消息为止。*/
rpc GetAll(GetAllMsgRequest) returns (stream MsgResponse);
/*单向Client Streaming:即客户端用提供的一个数据流写入并发送一系列消息给服务端。
一旦客户端完成消息写入,就等待服务端读取这些消息并返回应答。*/
rpc AddPhoto(stream AddPhotoRequest) returns (AddPhotoResponse);
/*双向:即两边都可以分别通过一个读写数据流来发送一系列消息。
这两个数据流操作是相互独立的,所以客户端和服务端能按其希望的任意顺序读写,
例如:服务端可以在写应答前等待所有的客户端消息,或者它可以先读一个消息再写一个消息,
或者是读写相结合的其他方式。每个数据流里消息的顺序会被保持。*/
rpc SaveAll(stream MsgRequest) returns (stream MsgResponse);
}
gRPC 提供 protocol buffer 编译插件,能够从一个服务定义的 .proto 文件生成客户端和服务端代码。通常 gRPC 用户可以在服务端实现这些API,并从客户端调用它们。
- 在服务侧,服务端实现服务接口,运行一个 gRPC 服务器来处理客户端调用。gRPC 底层架构会解码传入的请求,执行服务方法,编码服务应答。
- 在客户侧,客户端有一个_存根_实现了服务端同样的方法。客户端可以在本地存根调用这些方法,用合适的 protocol buffer 消息类型封装这些参数— gRPC 来负责发送请求给服务端并返回服务端 protocol buffer 响应。
同步、异步
同步 RPC 调用一直会阻塞直到从服务端获得一个应答,这与 RPC 希望的抽象最为接近。另一方面网络内部是异步的,并且在许多场景下能够在不阻塞当前线程的情况下启动 RPC 是非常有用的。
在多数语言里,gRPC 编程接口同时支持同步和异步的特点。
-----------------------------------分割线------------------------------------
Protocol Buffer Basics: C#
This tutorial provides a basic C# programmer's introduction to working with protocol buffers, using the proto3 version of the protocol buffers language. By walking through creating a simple example application, it shows you how to
Define message formats in a .proto file. Use the protocol buffer compiler. Use the C# protocol buffer API to write and read messages. This isn't a comprehensive guide to using protocol buffers in C#. For more detailed reference information, see the Protocol Buffer Language Guide, the C# API Reference, the C# Generated Code Guide, and the Encoding Reference. Why use protocol buffers? The example we're going to use is a very simple "address book" application that can read and write people's contact details to and from a file. Each person in the address book has a name, an ID, an email address, and a contact phone number.
How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:
Use .NET binary serialization with System.Runtime.Serialization.Formatters.Binary.BinaryFormatter and associated classes. This ends up being very fragile in the face of changes, expensive in terms of data size in some cases. It also doesn't work very well if you need to share data with applications written for other platforms. You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data. Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be. Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.
Where to find the example code Our example is a command-line application for managing an address book data file, encoded using protocol buffers. The command AddressBook (see: Program.cs) can add a new entry to the data file or parse the data file and print the data to the console.
You can find the complete example in the examples directory and csharp/src/AddressBook directory of the GitHub repository. Defining your protocol format To create your address book application, you'll need to start with a .proto file. The definitions in a .proto file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. In our example, the .proto file that defines the messages is addressbook.proto.
The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects.
syntax = "proto3"; package tutorial;
import "google/protobuf/timestamp.proto";
In C#, your generated classes will be placed in a namespace matching the package name if csharp_namespace is not specified. In our example, the csharp_namespace option has been specified to override the default, so the generated code uses a namespace of Google.Protobuf.Examples.AddressBook instead of Tutorial.
option csharp_namespace = "Google.Protobuf.Examples.AddressBook";
Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types.
代码语言:javascript复制message Person {
string name = 1;
int32 id = 2; // Unique ID number for this person.
string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
google.protobuf.Timestamp last_updated = 5;
}
// Our address book file is just one of these.
message AddressBook {
repeated Person people = 1;
}
In the above example, the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of MOBILE, HOME, or WORK.
The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.
If a field value isn't set, a default value is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of a field which has not been explicitly set always returns that field's default value.
If a field is repeated, the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.
You'll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that. Compiling your protocol buffers Now that you have a .proto, the next thing you need to do is generate the classes you'll need to read and write AddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:
If you haven't installed the compiler, download the package and follow the instructions in the README. Now run the compiler, specifying the source directory (where your application's source code lives – the current directory is used if you don't provide a value), the destination directory (where you want the generated code to go; often the same as SRC_DIR), and the path to your .proto. In this case, you would invoke: protoc -I=SRC_DIR --csharp_out=DST_DIR SRC_DIR/addressbook.proto
Because you want C# code, you use the --csharp_out option – similar options are provided for other supported languages. This generates Addressbook.cs in your specified destination directory. To compile this code, you'll need a project with a reference to the Google.Protobuf assembly.
The addressbook classes Generating Addressbook.cs gives you five useful types:
A static Addressbook class that contains metadata about the protocol buffer messages. An AddressBook class with a read-only People property. A Person class with properties for Name, Id, Email and Phones. A PhoneNumber class, nested in a static Person.Types class. A PhoneType enum, also nested in Person.Types. You can read more about the details of exactly what's generated in the C# Generated Code guide, but for the most part you can treat these as perfectly ordinary C# types. One point to highlight is that any properties corresponding to repeated fields are read-only. You can add items to the collection or remove items from it, but you can't replace it with an entirely separate collection. The collection type for repeated fields is always RepeatedField. This type is like List but with a few extra convenience methods, such as an Add overload accepting a collection of items, for use in colleciton initializers.
Here's an example of how you might create an instance of Person:
代码语言:javascript复制Person john = new Person
{
Id = 1234,
Name = "John Doe",
Email = "jdoe@example.com",
Phones = { new Person.Types.PhoneNumber { Number = "555-4321", Type = Person.Types.PhoneType.Home } }
};
Note that with C# 6, you can use using static to remove the Person.Types ugliness:
代码语言:javascript复制// Add this to the other using directives
using static Google.Protobuf.Examples.AddressBook.Person.Types;
...
// The earlier Phones assignment can now be simplified to:
Phones = { new PhoneNumber { Number = "555-4321", Type = PhoneType.HOME } }
Parsing and serialization The whole purpose of using protocol buffers is to serialize your data so that it can be parsed elsewhere. Every generated class has a WriteTo(CodedOutputStream) method, where CodedOutputStream is a class in the protocol buffer runtime library. However, usually you'll use one of the extension methods to write to a regular System.IO.Stream or convert the message to a byte array or ByteString. These extension messages are in the Google.Protobuf.MessageExtensions class, so when you want to serialize you'll usually want a using directive for the Google.Protobuf namespace. For example:
using Google.Protobuf; ... Person john = ...; // Code as before using (var output = File.Create("john.dat")) { john.WriteTo(output); }
Parsing is also simple. Each generated class has a static Parser property which returns a MessageParser for that type. That in turn has methods to parse streams, byte arrays and ByteStrings. So to parse the file we've just created, we can use:
Person john; using (var input = File.OpenRead("john.dat")) { john = Person.Parser.ParseFrom(input); }
A full example program to maintain an addressbook (adding new entries and listing existing ones) using these messages is available in the Github repository.
Extending a Protocol Buffer Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to "improve" the protocol buffer's definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:
you must not change the tag numbers of any existing fields. you may delete fields. you may add new fields but you must use fresh tag numbers (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields). (There are some exceptions to these rules, but they are rarely used.)
If you follow these rules, old code will happily read new messages and simply ignore any new fields. To the old code, singular fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages.
However, keep in mind that new fields will not be present in old messages, so you will need to do something reasonable with the default value. A type-specific default value is used: for strings, the default value is the empty string. For booleans, the default value is false. For numeric types, the default value is zero.
Reflection Message descriptors (the information in the .proto file) and instances of messages can be examined programmatically using the reflection API. This can be useful when writing generic code such as a different text format or a smart diff tool. Each generated class has a static Descriptor property, and the descriptor for any instance can be retrieved using the IMessage.Descriptor property. As a quick example of how these can be used, here is a short method to print the top-level fields of any message.
代码语言:javascript复制public void PrintMessage(IMessage message)
{
var descriptor = message.Descriptor;
foreach (var field in descriptor.Fields.InDeclarationOrder())
{
Console.WriteLine(
"Field {0} ({1}): {2}",
field.FieldNumber,
field.Name,
field.Accessor.GetValue(message);
}
}