protobuf 手册

最新推荐文章于 2025-06-20 17:14:44 发布

原创

最新推荐文章于 2025-06-20 17:14:44 发布 · 924 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#c++ #protobuf

本文档是Protocol Buffers（protobuf）的手册，涵盖了protobuf的编码规范、技术技巧、proto3语法以及C++ API参考。内容包括命名规则、序列化方式、避免的问题、消息流处理和大型数据集的处理策略。此外，还简要介绍了proto3的新特性，如Any、Oneof和Maps。对于C++开发者，提到了初始化检查、调试字符串、消息复制、清除、序列化和解析等常用方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Style Guide

proto文件风格指南

与原来的proto文件保持一致
Standard file formatting
- 80个字符宽度
- 缩进使用2个空格
File structure
- 文件名小写字母下划线 lower_snake_case.proto
- 建议的域顺序如下
- License header (if applicable) 一些证书说明
- File overview 文件内容说明概览注释等等
- Syntax
- Package
- Imports (sorted)
- File options
- Everything else
Packages
- 最好与文件路径一致 if a file is in my/package/, then the package name should be my.package
Message and field name
- 命名规则看示例如下
```
message SongServerRequest {
       
       

  required string song_name = 1;
}
```
- 如果你的属性名有数字数字放在字母的后面而不是下划线的后面 e.g., use song_name1 instead of song_name_1
Repeated fields
- 如果有复数的属性名记得用复数形式的词比如 repeated MyMessage accounts = 17;

Enums

全大写下划线

  enum Foo {
       
       
    FOO_UNSPECIFIED = 0;
    FOO_FIRST_VALUE = 1;
    FOO_SECOND_VALUE = 2;
  }

Services
- 如果你的.proto定义了一个RPC service应该使用正常的驼峰命名包括Services的名字和方法的名字并且需要加上服务的前缀
- 看起来是下面这样子
```
message SongServerRequest {
       
       

  required string song_name = 1;
}
```
Things to avoid
- Required fields (only for proto2)
- Groups (only for proto2)

Encoding

这主要是关于如何序列化为二进制编码的我没怎么看哈哈文章开头就说不看也行

序列化的时候和proto的顺序是无关的所以千万不要依赖这个顺序搞事情随时都会gg
就算是对同一个Message直接调用两次序列化结果也可能是会不一样的o
序列化只有在序列化一个特定的二进制时才保证一致
The following checks may fail for a protocol buffer message instance foo.

foo.SerializeAsString() == foo.SerializeAsString()
Hash(foo.SerializeAsString()) == Hash(foo.SerializeAsString())
CRC(foo.SerializeAsString()) == CRC(foo.SerializeAsString())
FingerPrint(foo.SerializeAsString()) == FingerPrint(foo.SerializeAsString())

就算两个逻辑上等级的Messge对象foo和bar的序列化结果在很多情况下也是不同的
Here’re a few example scenarios where logically equivalent protocol buffer messages foo and bar may serialize to different byte outputs.

bar is serialized by an old server that treats some fields as unknown.
bar is serialized by a server that is implemented in a different programming language and serializes fields in different order.
bar has a field that serializes in non-deterministic manner.
bar has a field that stores a serialized byte output of a protocol buffer message which is serialized differently.
bar is serialized by a new server that serializes fields in different order due to an implementation change.
Both foo and bar are concatenation of individual messages but with different order.

Techniques

Streaming Multiple Messages
- 常用的手段压入大小
Large Data Sets
- Protobug不是用来处理大数据的比如一个Message太大时(比如>1M)可以考虑拆分成各独立的Message
- now all you need is to handle a set of byte strings rather than a set of structures.
- 其实意思还是要设计合理面向对象不要一锅大杂烩
Self-describing Messages
- 支持这种自我描述消息但是google说他们自己内部也没用到所以需要的话自己了解我也没咋看

proto3 语法

这个一定要出现在非空非注释的第一行 不能让默认是proto2
syntax = "proto3";

看到给定的数字1,2,3了嘛 每个域都会有一个唯一值
1-15会用一个字节编码包括域的类型和值
16-2047会用两个字节编码
值的范围可以是1-2^19-1(536,870,911) 其中19000-19999是protobuf保留自己用的

所以建议1-15留给message中最常出现的元素
也可以考虑适当的为以后需要增加的元素将1-15进行一些保留不使用

singular 这个域只能有0个或者1个 proto3默认
repeated 这个域可以被重复0次或多次

单个proto文件可以定义多个message

可以使用c++的单行和多行注释

/* SearchRequest represents a search query, with pagination options to
 * indicate which results to include in the response. */

message SearchRequest {
   
   
  string query = 1;
  int32 page_number = 2;  // Which page number do we want?
  int32 result_per_page = 3;  // Number of results to return per page.
}


message SearchResponse {
   
   
 ...
}

枚举的第一个值必须是0

枚举值允许重复的写法如下使用allow_alias关键字

message MyMessage1 {
   
   
  enum EnumAllowingAlias {
   
   
    option allow_alias = true;
    UNKNOWN = 0;
    STARTED = 1;
    RUNNING = 1;
  }
}
message MyMessage2 {
   
   
  enum EnumNotAllowingAlias {
   
   
    UNKNOWN = 0;
    STARTED = 1;
    // RUNNING = 1;  // Uncommenting this line will cause a compile error inside Google and a warning message outside.
  }
}

保留字reserved
当你想要删掉或者注释掉一个不用的字段 很容易gg
1. 如果读到了旧的proto数据 直接gg
2. 或者一段时间过去之后别人不知道删了可能会一不小心就再次用到这个域名字或者域的唯一键
3. 这样读到旧数据更加gg

注意域名字和域的键不能混写在同一行

message Foo {
   
   
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

关于如何import proto文件
比如下面的import
1. 首先 可以使用-I/--proto_path选项--proto_path需要指向你的项目的根目录
2. 按照这里的import来看 假设你的路径是/home/user/myproject/other_protos.proto
3. 看他从myproject开始import的所以需要指定--proto_path为/home/user/
4. 如果不指定 那么在哪里调用protoc哪里就会设为起点 所以我们需要在/home/user/执行protoc
5. 比如protoc --proto_path=IMPORT_PATH

import "myproject/other_protos.proto";

关于proto转移了位置怎么处理
下面提供了一种方式而不需要修改每个proto中原来旧的proto的import

提供了一个关键字public来进行依赖的传递
所以移动proto的处理方式可以为
1. 保留旧的proto文件
2. 在旧的proto中增加import public new.proto
3. 然后就万事大吉啦

但是这里需要注意下面的最后一个注释
在client.proto中无法使用other.proto的内容
因为other.proto没有public关键字无法传递依赖 所以在client.proto中无法使用
但是在old.proto中还是可以other.proto的内容的

------------------------------------------------------------<