Proto Best Practices
Clients and servers are never updated at exactly the same time - even when you try to update them at the same time. One or the other may get rolled back. Don’t assume that you can make a breaking change and it’ll be okay because the client and server are in sync.
Don’t Re-use a Tag Number
Never re-use a tag number. It messes up deserialization. Even if you think no one is using the field, don’t re-use a tag number. If the change was live ever, there could be serialized versions of your proto in a log somewhere. Or there could be old code in another server that will break.
Do Reserve Tag Numbers for Deleted Fields
When you delete a field that’s no longer used, reserve its tag number so that no
one accidentally re-uses it in the future. Just reserved 2, 3; is enough. No
type required (lets you trim dependencies!). You can also reserve names to avoid
recycling now-deleted field names: reserved "foo", "bar";.
Do Reserve Numbers for Deleted Enum Values
When you delete an enum value that’s no longer used, reserve its number so that
no one accidentally re-uses it in the future. Just reserved 2, 3; is enough.
You can also reserve names to avoid recycling now-deleted value names: reserved "FOO", "BAR";.
Do Put New Enum Aliases Last
When you add a new enum alias, put the new name last to give services time to pick it up.
To safely remove the original name (if it’s being used for interchange, which it shouldn’t), you must do the following:
Add the new name below the old name and deprecate the old one (serializers will continue to use the old name)
After every parser has the schema rolled out, swap the order of the two names (serializers will begin using the new name, parsers accept both)
After every serializer has that version of the schema, you can delete the deprecated name.
Note: While in theory clients shouldn’t be using the old name for interchange, it’s still polite to follow the above steps, especially for widely-used enum names.
Don’t Change the Type of a Field
Almost never change the type of a field; it’ll mess up deserialization, same as
re-using a tag number. The
protobuf docs
outline a small number of cases that are okay (for example, going between
int32, uint32, int64 and bool). However, changing a field’s message type
will break unless the new message is a superset of the old one.
Don’t Add a Required Field
Never add a required field, instead add // required to document the API
contract. Required fields are considered harmful by so many they were
removed from proto3 completely. Make all fields
optional or repeated. You never know how long a message type is going to last
and whether someone will be forced to fill in your required field with an empty
string or zero in four years when it’s no longer logically required but the
proto still says it is.
For proto3 there are no required fields, so this advice does not apply.
Don’t Make a Message with Lots of Fields
Don’t make a message with “lots” (think: hundreds) of fields. In C++ every field adds roughly 65 bits to the in-memory object size whether it’s populated or not (8 bytes for the pointer and, if the field is declared as optional, another bit in a bitfield that keeps track of whether the field is set). When your proto grows too large, the generated code may not even compile (for example, in Java there is a hard limit on the size of a method ).
Do Include an Unspecified Value in an Enum
Enums should include a default FOO_UNSPECIFIED value as the first value in the
declaration.
When new values are added to an enum, old clients will see the field as unset
and the getter will return the default value or the first-declared value if no
default exists . For consistent behavior with proto enums,
the first declared enum value should be a default FOO_UNSPECIFIED value and
should use tag 0. It may be tempting to declare this default as a semantically
meaningful value but as a general rule, do not, to aid in the evolution of your
protocol as new enum values are added over time. All enum values declared under
a container message are in the same C++ namespace, so prefix the unspecified
value with the enum’s name to avoid compilation errors. If you’ll never need
cross-language constants, an int32 will preserve unknown values and generates
less code. Note that proto enums require the first value to be
zero and can round-trip (deserialize, serialize) an unknown enum value.
Don’t Use C/C++ Macro Constants for Enum Values
Using words that have already been defined by the C++ language - specifically,
in its headers such as math.h, may cause compilation errors if the #include
statement for one of those headers appears before the one for .proto.h. Avoid
using macro constants such as “NULL,” “NAN,” and “DOMAIN” as enum values.
Do Use Well-Known Types and Common Types
Using the following common, shared types is strongly encouraged. E.g., do not
use int32 timestamp_seconds_since_epoch or int64 timeout_millis in your code
when a perfectly suitable common type already exists!
durationis a signed, fixed-length span of time (for example, 42s).timestampis a point in time independent of any time zone or calendar (for example, 2017-01-15T01:30:15.01Z).intervalis a time interval independent of time zone or calendar (for example, 2017-01-15T01:30:15.01Z - 2017-01-16T02:30:15.01Z).dateis a whole calendar date (for example, 2005-09-19).monthis a month of year (for example, April).dayofweekis a day of week (for example, Monday).timeofdayis a time of day (for example, 10:42:23).field_maskis a set of symbolic field paths (for example, f.b.d).postal_addressis a postal address (for example, 1600 Amphitheatre Parkway Mountain View, CA 94043 USA).moneyis an amount of money with its currency type (for example, 42 USD).latlngis a latitude/longitude pair (for example, 37.386051 latitude and -122.083855 longitude).coloris a color in the RGBA color space.
Note: While the “Well-Known Types” (such as Duration and Timestamp) are
included with the Protocol Buffers compiler, the “Common Types” (such as Date
and Money) are not. To use the Common Types, you may need to add a dependency
on the googleapis repository.
Do Define Message Types in Separate Files
When defining a proto schema, you should have a single message, enum, extension, service, or group of cyclic dependencies per file. This makes refactoring easier. Moving files when they’re separated is much easier than extracting messages from a file with other messages. Following this practice also helps to keep the proto schema files smaller, which enhances maintainability.
If they will be widely used outside of your project, consider putting them in their own file with no dependencies. Then it’s easy for anyone to use those types without introducing the transitive dependencies in your other proto files.
For more on this topic, see 1-1-1 Rule.
Don’t Change the Default Value of a Field
Almost never change the default value of a proto field. This causes version skew between clients and servers. A client reading an unset value will see a different result than a server reading the same unset value when their builds straddle the proto change. Proto3 removed the ability to set default values.
Don’t Go from Repeated to Scalar
Although it won’t cause crashes, you’ll lose data. For JSON, a mismatch in
repeatedness will lose the whole message. For numeric proto3 fields and proto2
packed fields, going from repeated to scalar will lose all data in that
field. For non-numeric proto3 fields and un-annotated proto2 fields, going
from repeated to scalar will result in the last deserialized value “winning.”
Going from scalar to repeated is OK in proto2 and in proto3 with
[packed=false] because for binary serialization the scalar value becomes a
one-element list.
Do Follow the Style Guide for Generated Code
Proto generated code is referred to in normal code. Ensure that options in
.proto file do not result in generation of code which violate the style guide.
For example:
java_outer_classnameshould follow https://google.github.io/styleguide/javaguide.html#s5.2.2-class-namesjava_packageandjava_alt_packageshould follow https://google.github.io/styleguide/javaguide.html#s5.2.1-package-namespackage, although used for Java whenjava_packageis not present, always directly corresponds to C++ namespace and thus should follow https://google.github.io/styleguide/cppguide.html#Namespace_Names. If these style guides conflict, usejava_packagefor Java.ruby_packageshould be in the formFoo::Bar::Bazrather thanFoo.Bar.Baz.
Don’t use Text Format Messages for Interchange
Text-based serialization formats like text format and JSON represent fields and enum values as strings. As a result, deserialization of protocol buffers in these formats using old code will fail when a field or enum value is renamed, or when a new field or enum value or extension is added. Use binary serialization when possible for data interchange, and use text format for human editing and debugging only.
If you use protos converted to JSON in your API or for storing data, you may not be able to safely rename fields or enums at all.
Never Rely on Serialization Stability Across Builds
The stability of proto serialization is not guaranteed across binaries or across builds of the same binary. Do not rely on it when, for example, building cache keys.
Don’t Generate Java Protos in the Same Java Package as Other Code
Generate Java proto sources into a separate package from your hand-written Java
sources. The package, java_package and java_alt_api_package options
control
where the generated Java sources are emitted.
Make sure hand-written Java source code does not also live in that same package.
A common practice is to generate your protos into a proto subpackage in your
project that only contains those protos (that is, no hand-written source
code).
Do Derive Java Package from the .proto Package (if overridden)
Setting the java_package can introduce fully-qualified name collisions in
generated code that did not exist in the .proto semantics. For example, these
two files may create collisions in the generated code despite the
fully-qualified names not colliding in the original schema:
package x;
option java_package = "com.example.proto";
message Abc {}
package y;
option java_package = "com.example.proto";
message Abc {}
To avoid these problems, you should never set the same java_package in two
files that have different .proto packages set.
The best practice is to establish a local naming pattern where the package name
is derived from the .proto package. For example, a best practice file with
package y might consistently set option java_package = "com.example.proto.y".
This guidance also applies to any other language-specific options where package overrides are possible.
Avoid Using Language Keywords for Field Names
If the name of a message, field, enum, or enum value is a keyword in the language that reads from/writes to that field, then protobuf may change the field name, and may have different ways to access them than normal fields. For example, see this warning about Python.
You should also avoid using keywords in your file paths, as this can also cause problems.
Do Use Different Messages For RPC APIs and Storage
Reusing the same messages for APIs and long-term storage may seem convenient, reducing boilerplate and overhead of coversion between messages.
However, the needs of long-term storage and live RPC services tend to later diverge. Using separate types even if they are largely duplicative initially gives freedom to change your storage format without impacting your external clients. Layer your code so that modules deal either with client protos, storage protos, or translation.
There is a cost in maintaining the translation layer, but it quickly pays off once you have clients and have to do your first storage changes.
Don’t Use Booleans for Something That Has Two States Now, but Might Have More Later
If you are using boolean for a field, make sure that the field is indeed describing just two possible states (for all time, not just now and the near future). The future flexibility of using an enum is often worth it, even if it only has two values when it is first introduced.
message Photo {
// Bad: True if it's a GIF.
optional bool gif;
// Good: File format of the referenced photo (for example, GIF, WebP, PNG).
optional PhotoType type;
}
Do Use java_outer_classname
Every proto schema definition file should set option java_outer_classname to
the .proto file name converted to TitleCase with the ‘.’ removed. For example,
the file student_record_request.proto should set:
option java_outer_classname = "StudentRecordRequestProto";