Rust Generated Code Guide

Describes the API of message objects that the protocol buffer compiler generates for any given protocol definition.

This page describes exactly what Rust code the protocol buffer compiler generates for any given protocol definition.

Any differences between proto2 and proto3 generated code are highlighted. You should read the proto2 language guide and/or proto3 language guide before reading this document.

Protobuf Rust

Protobuf Rust is an implementation of protocol buffers designed to be able to sit on top of other existing protocol buffer implementations that we refer to as ‘kernels’.

The decision to support multiple non-Rust kernels has significantly influenced our public API, including the choice to use custom types like ProtoStr over Rust std types like str. See Rust Proto Design Decisions for more on this topic.

Generated Filenames

Each rust_proto_library will be compiled as one crate. Most importantly, for every .proto file in the srcs of the corresponding proto_library, one Rust file is emitted, and all these files form a single crate.

Files generated by the compiler vary between kernels. In general, the names of the output files are computed by taking the name of the .proto file and replacing the extension.

Generated files:

  • C++ kernel:
    • .c.pb.rs - generated Rust code
    • .pb.thunks.cc - generated C++ thunks (glue code that Rust code calls, and that delegates to the C++ Protobuf APIs).
  • C++ Lite kernel:
    • <same as C++ kernel>
  • UPB kernel
    • .u.pb.rs - generated Rust code.
      (However, rust_proto_library relies on the .thunks.c file produced by upb_proto_aspect.)

If the proto_library contains more than one file, the first file is declared a “primary” file and is treated as the entry point for the crate; that file will contain both the gencode corresponding to the .proto file, and also re-exports for all symbols defined in the files corresponding to all “secondary” files.

Packages

Unlike in most other languages, the package declarations in the .proto files are not used in Rust codegen. Instead, each rust_proto_library(name = "some_rust_proto") target emits a crate named some_rust_proto which contains the generated code for all .proto files in the target.

Messages

Given the message declaration:

message Foo {}

The compiler generates a struct named Foo. The Foo struct defines the following methods:

  • fn new() -> Self: Creates a new instance of Foo.
  • fn parse(data: &[u8]) -> Result<Self, protobuf::ParseError>: Parses data into an instance of Foo if data holds a valid wire format representation of Foo. Otherwise, the function returns an error.
  • fn clear_and_parse(&mut self, data: &[u8]) -> Result<(), ParseError>: Like calling .clear() and parse() in sequence.
  • fn serialize(&self) -> Result<Vec<u8>, SerializeError>: Serializes the message to Protobuf wire format. Serialization can fail but rarely will. Failure reasons include exceeding the maximum message size, insufficient memory, and required fields (proto2) that are unset.
  • fn merge_from(&mut self, other): Merges self with other.
  • fn as_view(&self) -> FooView<'_>: Returns an immutable handle (view) to Foo. This is further covered in the section on proxy types.
  • fn as_mut(&mut self) -> FooMut<'_>: Returns a mutable handle (mut) to Foo. This is further covered in the section on proxy types.

Foo implements the following traits:

  • std::fmt::Debug
  • std::default::Default
  • std::clone::Clone
  • std::ops::Drop
  • std::marker::Send
  • std::marker::Sync

Message Proxy Types

As a consequence of the requirement to support multiple kernels with a single Rust API, we cannot in some situations use native Rust references (&T and &mut T), but instead, we need to express these concepts using types - Views and Muts. These situations are shared and mutable references to:

  • Messages
  • Repeated fields
  • Map fields

For example, the compiler emits structs FooView<'a> and FooMut<'msg> alongside Foo. These types are used in place of &Foo and &mut Foo, and they behave the same as native Rust references in terms of borrow checker behavior. Just like native borrows, Views are Copy and the borrow checker will enforce that you can either have any number of Views or at most one Mut live at a given time.

For the purposes of this documentation, we focus on describing all methods emitted for the owned message type (Foo). A subset of these functions with &self receiver will also be included on the FooView<'msg>. A subset of these functions with either &self or &mut self will also be included on the FooMut<'msg>.

To create an owned message type from a View / Mut type call to_owned(), which creates a deep copy.

Nested Types

Given the message declaration:

message Foo {
  message Bar {
      enum Baz { ... }
  }
}

In addition to the struct named Foo, a module named foo is created to contain the struct for Bar. And similarly a nested module named bar to contain the deeply nested enum Baz:

pub struct Foo {}

pub mod foo {
   pub struct Bar {}
   pub mod bar {
      pub struct Baz { ... }
   }
}

Fields

In addition to the methods described in the previous section, the protocol buffer compiler generates a set of accessor methods for each field defined within the message in the .proto file.

Following Rust style, the methods are in lower-case/snake-case, such as has_foo() and clear_foo(). Note that the capitalization of the field name portion of the accessor maintains the style from the original .proto file, which in turn should be lower-case/snake-case per the .proto file style guide.

Optional Numeric Fields (proto2 and proto3)

For either of these field definitions:

optional int32 foo = 1;
required int32 foo = 1;

The compiler will generate the following accessor methods:

  • fn has_foo(&self) -> bool: Returns true if the field is set.
  • fn foo(&self) -> i32: Returns the current value of the field. If the field is not set, it returns the default value.
  • fn foo_opt(&self) -> protobuf::Optional<i32>: Returns an optional with the variant Set(value) if the field is set or Unset(default value) if it’s unset.
  • fn set_foo(&mut self, val: i32): Sets the value of the field. After calling this, has_foo() will return true and foo() will return value.
  • fn clear_foo(&mut self): Clears the value of the field. After calling this, has_foo() will return false and foo() will return the default value.

For other numeric field types (including bool), int32 is replaced with the corresponding C++ type according to the scalar value types table.

Implicit Presence Numeric Fields (proto3)

For these field definitions:

int32 foo = 1;
  • fn foo(&self) -> i32: Returns the current value of the field. If the field is not set, returns 0.
  • fn set_foo(&mut self, val: i32): Sets the value of the field. After calling this, foo() will return value.

For other numeric field types (including bool), int32 is replaced with the corresponding C++ type according to the scalar value types table.

Optional String/Bytes Fields (proto2 and proto3)

For any of these field definitions:

optional string foo = 1;
required string foo = 1;
optional bytes foo = 1;
required bytes foo = 1;

The compiler will generate the following accessor methods:

  • fn has_foo(&self) -> bool: Returns true if the field is set.
  • fn foo(&self) -> &protobuf::ProtoStr: Returns the current value of the field. If the field is not set, it returns the default value.
  • fn foo_opt(&self) -> protobuf::Optional<&ProtoStr>: Returns an optional with the variant Set(value) if the field is set or Unset(default value) if it’s unset.
  • fn clear_foo(&mut self): Clears the value of the field. After calling this, has_foo() will return false and foo() will return the default value.

For fields of type bytes the compiler will generate the ProtoBytes type instead.

Implicit Presence String/Bytes Fields (proto3)

For these field definitions:

optional string foo = 1;
string foo = 1;
optional bytes foo = 1;
bytes foo = 1;

The compiler will generate the following accessor methods:

  • fn foo(&self) -> &ProtoStr: Returns the current value of the field. If the field is not set, returns the empty string/empty bytes.
  • fn foo_opt(&self) -> Optional<&ProtoStr>: Returns an optional with the variant Set(value) if the field is set or Unset(default value) if it’s unset.
  • fn set_foo(&mut self, value: IntoProxied<ProtoString>): Sets the field to value. After calling this function foo() will return value and has_foo() will return true.
  • fn has_foo(&self) -> bool: Returns true if the field is set.
  • fn clear_foo(&mut self): Clears the value of the field. After calling this, has_foo() will return false and foo() will return the default value.

For fields of type bytes the compiler will generate the ProtoBytes type instead.

Singular String and Bytes Fields with Cord Support

[ctype = CORD] enables bytes and strings to be stored as an absl::Cord in C++ Protobufs. absl::Cord currently does not have an equivalent type in Rust . Protobuf Rust uses an enum to represent a cord field:

enum ProtoStringCow<'a> {
  Owned(ProtoString),
  Borrowed(&'a ProtoStr)
}

In the common case, for small strings, an absl::Cord stores its data as a contiguous string. In this case cord accessors return ProtoStringCow::Borrowed. If the underlying absl::Cord is non-contiguous, the accessor copies the data from the cord into an owned ProtoString and returns ProtoStringCow::Owned. The ProtoStringCow implements Deref<Target=ProtoStr>.

For any of these field definitions:

optional string foo = 1 [ctype = CORD];
string foo = 1 [ctype = CORD];
optional bytes foo = 1 [ctype = CORD];
bytes foo = 1 [ctype = CORD];

The compiler generates the following accessor methods:

  • fn my_field(&self) -> ProtoStringCow<'_>: Returns the current value of the field. If the field is not set, returns the empty string/empty bytes.
  • fn set_my_field(&mut self, value: IntoProxied<ProtoString>): Sets the field to value. After calling this function foo() returns value and has_foo() returns true.
  • fn has_foo(&self) -> bool: Returns true if the field is set.
  • fn clear_foo(&mut self): Clears the value of the field. After calling this, has_foo() returns false and foo() returns the default value. Cords have not been implemented yet.

For fields of type bytes the compiler generates the ProtoBytesCow type instead.

Optional Enum Fields (proto2 and proto3)

Given the enum type:

enum Bar {
  BAR_UNSPECIFIED = 0;
  BAR_VALUE = 1;
  BAR_OTHER_VALUE = 2;
}

The compiler generates a struct where each variant is an associated constant:

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
#[repr(transparent)]
pub struct Bar(i32);

impl Bar {
  pub const Unspecified: Bar = Bar(0);
  pub const Value: Bar = Bar(1);
  pub const OtherValue: Bar = Bar(2);
}

For either of these field definitions:

optional Bar foo = 1;
required Bar foo = 1;

The compiler will generate the following accessor methods:

  • fn has_foo(&self) -> bool: Returns true if the field is set.
  • fn foo(&self) -> Bar: Returns the current value of the field. If the field is not set, it returns the default value.
  • fn foo_opt(&self) -> Optional<Bar>: Returns an optional with the variant Set(value) if the field is set or Unset(default value) if it’s unset.
  • fn set_foo(&mut self, val: Bar): Sets the value of the field. After calling this, has_foo() will return true and foo() will return value.
  • fn clear_foo(&mut self): Clears the value of the field. After calling this, has_foo() will return false and foo() will return the default value.

Implicit Presence Enum Fields (proto3)

Given the enum type:

enum Bar {
  BAR_UNSPECIFIED = 0;
  BAR_VALUE = 1;
  BAR_OTHER_VALUE = 2;
}

The compiler generates a struct where each variant is an associated constant:

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
#[repr(transparent)]
pub struct Bar(i32);

impl Bar {
  pub const Unspecified: Bar = Bar(0);
  pub const Value: Bar = Bar(1);
  pub const OtherValue: Bar = Bar(2);
}

For these field definitions:

Bar foo = 1;

The compiler will generate the following accessor methods:

  • fn foo(&self) -> Bar: Returns the current value of the field. If the field is not set, it returns the default value.
  • fn set_foo(&mut self, value: Bar): Sets the value of the field. After calling this, has_foo() will return true and foo() will return value.

Optional Embedded Message Fields (proto2 and proto3)

Given the message type:

message Bar {}

For any of these field definitions:

//proto2
optional Bar foo = 1;

//proto3
Bar foo = 1;
optional Bar foo = 1;

The compiler will generate the following accessor methods:

  • fn foo(&self) -> BarView<'_>: Returns a view of the current value of the field. If the field is not set it returns an empty message.
  • fn foo_mut(&mut self) -> BarMut<'_>: Returns a mutable handle to the current value of the field. Sets the field if it is not set. After calling this method, has_foo() returns true.
  • fn foo_opt(&self) -> protobuf::Optional<BarView>: If the field is set, returns the variant Set with its value. Else returns the variant Unset with the default value.
  • fn set_foo(&mut self, value: impl protobuf::IntoProxied<Bar>): Sets the field to value. After calling this method, has_foo() returns true.
  • fn has_foo(&self) -> bool: Returns true if the field is set.
  • fn clear_foo(&mut self): Clears the field. After calling this method has_foo() returns false.

Repeated Fields

For any repeated field definition the compiler will generate the same three accessor methods that deviate only in the field type.

For example, given the below field definition:

repeated int32 foo = 1;

The compiler will generate the following accessor methods:

  • fn foo(&self) -> RepeatedView<'_, i32>: Returns a view of the underlying repeated field.
  • fn foo_mut(&mut self) -> RepeatedMut<'_, i32>: Returns a mutable handle to the underlying repeated field.
  • fn set_foo(&mut self, src: impl IntoProxied<Repeated<i32>>): Sets the underlying repeated field to a new repeated field provided in src.

For different field types only the respective generic types of the RepeatedView, RepeatedMut and Repeated types will change. For example, given a field of type string the foo() accessor would return a RepeatedView<'_, ProtoString>.

Map Fields

For this map field definition:

map<int32, int32> weight = 1;

The compiler will generate the following 3 accessor methods:

  • fn weight(&self) -> protobuf::MapView<'_, i32, i32>: Returns an immutable view of the underlying map.
  • fn weight_mut(&mut self) -> protobuf::MapMut<'_, i32, i32>: Returns a mutable handle to the underlying map.
  • fn set_weight(&mut self, src: protobuf::IntoProxied<Map<i32, i32>>): Sets the underlying map to src.

For different field types only the respective generic types of the MapView, MapMut and Map types will change. For example, given a field of type string the foo() accessor would return a MapView<'_, int32, ProtoString>.

Any

Any is not special-cased by Rust Protobuf at this time; it will behave as though it was a simple message with this definition:

message Any {
  string type_url = 1;
  bytes value = 2  [ctype = CORD];
}

Oneof

Given a oneof definition like this:

oneof example_name {
    int32 foo_int = 4;
    string foo_string = 9;
    ...
}

The compiler will generate accessors (getters, setters, hazzers) for every field as if the same field was declared as an optional field outside of the oneof. So you can work with oneof fields like regular fields, but setting one will clear the other fields in the oneof block. In addition, the following types are emitted for the oneof block:

  #[non_exhaustive]
  #[derive(Debug, Clone, Copy)]

  pub enum ExampleName<'msg> {
    FooInt(i32) = 4,
    FooString(&'msg protobuf::ProtoStr) = 9,
    not_set(std::marker::PhantomData<&'msg ()>) = 0
  }
  #[derive(Debug, Copy, Clone, PartialEq, Eq)]

  pub enum ExampleNameCase {
    FooInt = 4,
    FooString = 9,
    not_set = 0
  }

Additionally, it will generate the two accessors:

  • fn example_name(&self) -> ExampleName<_>: Returns the enum variant indicating which field is set and the field’s value. Returns not_set if no field is set.
  • fn example_name_case(&self) -> ExampleNameCase: Returns the enum variant indicating which field is set. Returns not_set if no field is set.

Enumerations

Given an enum definition like:

enum FooBar {
  FOO_BAR_UNKNOWN = 0;
  FOO_BAR_A = 1;
  FOO_B = 5;
  VALUE_C = 1234;
}

The compiler will generate:

  #[derive(Clone, Copy, PartialEq, Eq, Hash)]
  #[repr(transparent)]
  pub struct FooBar(i32);

  impl FooBar {
    pub const Unknown: FooBar = FooBar(0);
    pub const A: FooBar = FooBar(1);
    pub const FooB: FooBar = FooBar(5);
    pub const ValueC: FooBar = FooBar(1234);
  }

Note that for values with a prefix that matches the enum, the prefix will be stripped; this is done to improve ergonomics. Enum values are commonly prefixed with the enum name to avoid name collisions between sibling enums (which follow the semantics of C++ enums where the values are not scoped by their containing enum). Since the generated Rust consts are scoped within the impl, the additional prefix, which is beneficial to add in .proto files, would be redundant in Rust.

Extensions (proto2 only)

A Rust API for extensions is currently a work in progress. Extension fields will be maintained through parse/serialize, and in a C++ interop case any extensions set will be retained if the message is accessed from Rust (and propagated in the case of a message copy or merge).

Arena Allocation

A Rust API for arena allocated messages has not yet been implemented.

Internally, Protobuf Rust on upb kernel uses arenas, but on C++ kernels it doesn’t. However, references (both const and mutable) to messages that were arena allocated in C++ can be safely passed to Rust to be accessed or mutated.

Services

A Rust API for services has not yet been implemented.