Implementing Editions Support
This topic explains how to implement editions in new runtimes and generators.
Overview
Edition 2023
The first edition released is Edition 2023, which is designed to unify proto2 and proto3 syntax. The features we’ve added to cover the difference in behaviors are detailed in Feature Settings for Editions.
Feature Definition
In addition to supporting editions and the global features we’ve defined, you
may want to define your own features to leverage the infrastructure. This will
allow you to define arbitrary features that can be used by your generators and
runtimes to gate new behaviors. The first step is to claim an extension number
for the FeatureSet
message in descriptor.proto above 9999. You can send a
pull-request to us in GitHub, and it will be included in our next release (see,
for example, #15439).
Once you have your extension number, you can create your features proto (similar to cpp_features.proto). These will typically look something like:
edition = "2023";
package foo;
import "google/protobuf/descriptor.proto";
extend google.protobuf.FeatureSet {
MyFeatures features = <extension #>;
}
message MyFeatures {
enum FeatureValue {
FEATURE_VALUE_UNKNOWN = 0;
VALUE1 = 1;
VALUE2 = 2;
}
FeatureValue feature_value = 1 [
targets = TARGET_TYPE_FIELD,
targets = TARGET_TYPE_FILE,
feature_support = {
edition_introduced: EDITION_2023,
edition_deprecated: EDITION_2024,
deprecation_warning: "Feature will be removed in 2025",
edition_removed: EDITION_2025,
},
edition_defaults = { edition: EDITION_LEGACY, value: "VALUE1" },
edition_defaults = { edition: EDITION_2024, value: "VALUE2" }
];
}
Here we’ve defined a new enum feature foo.feature_value
(currently only
boolean and enum types are supported). In addition to defining the values it can
take, you also need to specify how it can be used:
- Targets - specifies the type of proto descriptors this feature can be attached to. This controls where users can explicitly specify the feature. Every type must be explicitly listed.
- Feature support - specifies the lifetime of this feature relative to edition. You must specify the edition it was introduced in, and it will not be allowed before then. You can optionally deprecate or remove the feature in later editions.
- Edition defaults - specifies any changes to the default value of the
feature. This must cover every supported edition, but you can leave out any
edition where the default didn’t change. Note that
EDITION_PROTO2
andEDITION_PROTO3
can be specified here to provide defaults for the “legacy” editions (see Legacy Editions).
What is a Feature?
Features are designed to provide a mechanism for ratcheting down bad behavior over time, on edition boundaries. While the timeline for actually removing a feature may be years (or decades) in the future, the desired goal of any feature should be eventual removal. When a bad behavior is identified, you can introduce a new feature that guards the fix. In the next edition (or possibly later), you would flip the default value while still allowing users to retain their old behavior when upgrading. At some point in the future you would mark the feature deprecated, which would trigger a custom warning to any users overriding it. In a later edition, you would then mark it removed, preventing users from overriding it anymore (but the default value would still apply). Until support for that last edition is dropped in a breaking release, the feature will remain usable for protos stuck on older editions, giving them time to migrate.
Flags that control optional behaviors you have no intention of removing are better implemented as custom options. This is related to the reason we’ve restricted features to be either boolean or enum types. Any behavior controlled by a (relatively) unbounded number of values probably isn’t a good fit for the editions framework, since it’s not realistic to eventually turn down that many different behaviors.
One caveat to this is behaviors related to wire boundaries. Using
language-specific features to control serialization or parsing behavior can be
dangerous, since any other language could be on the other side. Wire-format
changes should always be controlled by global features in descriptor.proto
,
which can be respected by every runtime uniformly.
Generators
Generators written in C++ get a lot for free, because they use the C++ runtime.
They don’t need to handle Feature Resolution themselves,
and if they need any feature extensions they can register them in
GetFeatureExtensions
in their CodeGenerator. They can generally use
GetResolvedSourceFeatures
for access to resolved features for a descriptor in
codegen and GetUnresolvedSourceFeatures
for access to their own unresolved
features.
Plugins written in the same language as the runtime they generate code for may need some custom bootstrapping for their feature definitions.
Explicit Support
Generators must specify exactly which editions they support. This allows you to
safely add support for an edition after it’s been released, on your own
schedule. Protoc will reject any editions protos sent to generators that don’t
include FEATURE_SUPPORTS_EDITIONS
in the supported_features
field of their
CodeGeneratorResponse
. Additionally, we have minimum_edition
and
maximum_edition
fields for specifying your precise support window. Once you’ve
defined all of the code and feature changes for a new edition, you can bump
maximum_edition
to advertise this support.
Codegen Tests
We have a set of codegen tests that can be used to lock down that Edition 2023 produces no unexpected functional changes. These have been very useful in languages like C++ and Java, where a significant amount of the functionality is in gencode. On the other hand, in languages like Python, where the gencode is basically just a collection of serialized descriptors, these are not quite as useful.
This infrastructure is not reusable yet, but is planned to be in a future release. At that point you will be able to use them to verify that migrating to editions doesn’t have any unexpected codegen changes.
Runtimes
Runtimes without reflection or dynamic messages should not need to do anything to implement Editions. All of that logic should be handled by the code generator.
Languages with reflection but without dynamic messages need resolved features, but may optionally choose to handle it in their generator only. This can be done by passing both resolved and unresolved feature sets to the runtime during codegen. This avoids re-implementing Feature Resolution in the runtime with the main downside being efficiency, since it will create a unique feature set for every descriptor.
Languages with dynamic messages must fully implement Editions, because they need to be able to build descriptors at runtime.
Syntax Reflection
The first step in implementing Editions in a runtime with reflection is to
remove all direct checks of the syntax
keyword. All of these should be moved
to finer-grained feature helpers, which can continue to use syntax
if
necessary.
The following feature helpers should be implemented on descriptors, with language-appropriate naming:
FieldDescriptor::has_presence
- Whether or not a field has explicit presence- Repeated fields never have presence
- Message, extension, and oneof fields always have explicit presence
- Everything else has presence iff
field_presence
is notIMPLICIT
FieldDescriptor::is_required
- Whether or not a field is requiredFieldDescriptor::requires_utf8_validation
- Whether or not a field should be checked for utf8 validityFieldDescriptor::is_packed
- Whether or not a repeated field has packed encodingFieldDescriptor::is_delimited
- Whether or not a message field has delimited encodingEnumDescriptor::is_closed
- Whether or not a field is closed
Note: In most languages, the message encoding feature is still currently
signaled by TYPE_GROUP
and required fields still have LABEL_REQUIRED
set.
This is not ideal, and was done to make downstream migrations easier.
Eventually, these should be migrated to the appropriate helpers and
TYPE_MESSAGE/LABEL_OPTIONAL
.
Downstream users should migrate to these new helpers instead of using syntax directly. The following class of existing descriptor APIs should ideally be deprecated and eventually removed, since they leak syntax information:
FileDescriptor
syntax- Proto3 optional APIs
FieldDescriptor::has_optional_keyword
OneofDescriptor::is_synthetic
Descriptor::*real_oneof*
- should be renamed to just “oneof” and the existing “oneof” helpers should be removed, since they leak information about synthetic oneofs (which don’t exist in editions).
- Group type
- The
TYPE_GROUP
enum value should be removed, replaced with theis_delimited
helper.
- The
- Required label
- The
LABEL_REQUIRED
enum value should be removed, replaced with theis_required
helper.
- The
There are many classes of user code where these checks exist but aren’t
hostile to editions. For example, code that needs to handle proto3 optional
specially because of its synthetic oneof implementation won’t be hostile to
editions as long as the polarity is something like syntax == "proto3"
(rather
than checking syntax != "proto2"
).
If it’s not possible to remove these APIs entirely, they should be deprecated and discouraged.
Feature Visibility
As discussed in
editions-feature-visibility,
feature protos should remain an internal detail of any Protobuf implementation.
The behaviors they control should be exposed via descriptor methods, but the
protos themselves should not. Notably, this means that any options that are
exposed to the users need to have their features
fields stripped out.
The one case where we permit features to leak out is when serializing descriptors. The resulting descriptor protos should be a faithful representation of the original proto files, and should contain unresolved features inside of the options.
Legacy Editions
As discussed more in legacy-syntax-editions, a great way to get early coverage of your editions implementation is to unify proto2, proto3, and editions. This effectively migrates proto2 and proto3 to editions under the hood, and makes all of the helpers implemented in Syntax Reflection use features exclusively (instead of branching on syntax). This can be done by inserting a feature inference phase into Feature Resolution, where various aspects of the proto file can inform what features are appropriate. These features can then be merged into the parent’s features to get the resolved feature set.
While we provide reasonable defaults for proto2/proto3 already, for edition 2023 the following additional inferences are required:
- required - we infer
LEGACY_REQUIRED
presence when a field hasLABEL_REQUIRED
- groups - we infer
DELIMITED
message encoding when a field hasTYPE_GROUP
- packed - we infer
PACKED
encoding when thepacked
option is true - expanded - we infer
EXPANDED
encoding when a proto3 field haspacked
explicitly set to false
Conformance Tests
Editions-specific conformance tests have been added, but need to be opted-in to.
A --maximum_edition 2023
flag can be passed to the runner to enable these. You
will need to configure your testee binary to handle the following new message
types:
protobuf_test_messages.editions.proto2.TestAllTypesProto2
- Identical to the old proto2 message, but transformed to edition 2023protobuf_test_messages.editions.proto3.TestAllTypesProto3
- Identical to the old proto3 message, but transformed to edition 2023protobuf_test_messages.editions.TestAllTypesEdition2023
- Used to cover edition-2023-specific test cases
Feature Resolution
Editions use lexical scoping to define features, meaning that any non-C++ code
that needs to implement editions support will need to reimplement our feature
resolution algorithm. However, the bulk of the work is handled by protoc
itself, which can be configured to output an intermediate FeatureSetDefaults
message. This message contains a “compilation” of a set of feature definition
files, laying out the default feature values in every edition.
For example, the feature definition above would compile to the following defaults between proto2 and edition 2025 (in text-format notation):
defaults {
edition: EDITION_PROTO2
overridable_features { [foo.features] {} }
fixed_features {
// Global feature defaults…
[foo.features] { feature_value: VALUE1 }
}
}
defaults {
edition: EDITION_PROTO3
overridable_features { [foo.features] {} }
fixed_features {
// Global feature defaults…
[foo.features] { feature_value: VALUE1 }
}
}
defaults {
edition: EDITION_2023
overridable_features {
// Global feature defaults…
[foo.features] { feature_value: VALUE1 }
}
}
defaults {
edition: EDITION_2024
overridable_features {
// Global feature defaults…
[foo.features] { feature_value: VALUE2 }
}
}
defaults {
edition: EDITION_2025
overridable_features {
// Global feature defaults…
}
fixed_features { [foo.features] { feature_value: VALUE2 } }
}
minimum_edition: EDITION_PROTO2
maximum_edition: EDITION_2025
Global feature defaults are left out for compactness, but they would also be present. This object contains an ordered list of every edition with a unique set of defaults (some editions may end up not being present) within the specified range. Each set of defaults is split into overridable and fixed features. The former are supported features for the edition that can be freely overridden by users. The fixed features are those which haven’t yet been introduced or have been removed, and can’t be overridden by users.
We provide a Bazel rule for compiling these intermediate objects:
load("@com_google_protobuf//editions:defaults.bzl", "compile_edition_defaults")
compile_edition_defaults(
name = "my_defaults",
srcs = ["//some/path:lang_features_proto"],
maximum_edition = "PROTO2",
minimum_edition = "2024",
)
The output FeatureSetDefaults
can be embedded into a raw string literal in
whatever language you need to do feature resolution in. We also provide an
embed_edition_defaults
macro to do this:
embed_edition_defaults(
name = "embed_my_defaults",
defaults = ":my_defaults",
output = "my_defaults.h",
placeholder = "DEFAULTS_DATA",
template = "my_defaults.h.template",
)
Alternatively, you can invoke protoc directly (outside of Bazel) to generate this data:
protoc --edition_defaults_out=defaults.binpb --edition_defaults_minimum=PROTO2 --edition_defaults_maximum=2023 <feature files...>
Once the defaults message is hooked up and parsed by your code, feature resolution for a file descriptor at a given edition follows a simple algorithm:
- Validate that the edition is in the appropriate range [
minimum_edition
,maximum_edition
] - Binary-search the ordered
defaults
field for the highest entry less than or equal to the edition - Merge
overridable_features
intofixed_features
from the selected defaults - Merge any explicit features set on the descriptor (the
features
field in the file options)
From there, you can recursively resolve features for all other descriptors:
- Initialize to the parent descriptor’s feature set
- Merge any explicit features set on the descriptor (the
features
field in the options)
For determining the “parent” descriptor, you can reference our C++ implementation. This is straightforward in most cases, but extensions are a bit surprising because their parent is the enclosing scope rather than the extendee. Oneofs also need to be considered as the parent of their fields.
Conformance Tests
In a future release, we plan to add conformance tests to verify feature resolution cross-language. Until then, our regular conformance tests do give partial coverage, and our example inheritance unit tests can be ported to provide more comprehensive coverage.
Examples
Below are some real examples of how we implemented editions support in our runtimes and plugins.
Java
- #14138 - Bootstrap compiler with C++ gencode for Java features proto
- #14377 - Use features in Java, Kotlin, and Java Lite code generators, including codegen tests
- #15210 - Use features in Java full runtimes covering Java features bootstrap, feature resolution, and legacy editions, along with unit-tests and conformance testing
Pure Python
- #14546 - Setup codegen tests in advance
- #14547 - Fully implements editions in one shot, along with unit-tests and conformance testing
𝛍pb
- #14638 - First pass at editions implementation covering feature resolution and legacy editions
- #14667 - Added more complete handling of field label/type, support for upb’s code generator, and some tests
- #14678 - Hooks up upb to the Python runtime, with more unit tests and conformance tests
Ruby
- #16132 - Hook up upb/Java to all four Ruby runtimes for full editions support