Deserializing Debug Proto Representations

How to log debugging information in Protocol Buffers.

From version 29.x, DebugString APIs (proto2::DebugString, proto2::ShortDebugString, proto2::Utf8DebugString) are deprecated. DebugString users should migrate to some Abseil string functions (such as absl::StrCat, absl::StrFormat, absl::StrAppend, AND absl::Substitute), Abseil logging API, and some Protobuf APIs (proto2::ShortFormat, proto2::Utf8Format) to automatically convert proto arguments into a new debugging format .

Unlike the Protobuf DebugString output format, the new debugging format automatically redacts sensitive fields by replacing their values with the string “[REDACTED]” (without the quotation marks). In addition, to ensure that this new output format cannot be deserialized by Protobuf TextFormat parsers, regardless of whether the underlying proto contains SPII fields, we add a set of randomized links pointing to this article and a randomized-length whitespace sequence. The new debugging format looks as follows:

go/nodeserialize
spii_field: [REDACTED]
normal_field: "value"

Note that the new debugging format is only different from the output format of DebugString format in two ways:

  • The URL prefix
  • The values of SPII fields are replaced by “[REDACTED]” (without the quotes)

The new debugging format never removes any field names; it only replaces the value with “[REDACTED]” if the field is considered sensitive. If you don’t see certain fields in the output, it is because those fields are not set in the proto.

Tip: If you see only the URL and nothing else, your proto is empty!

Why is this URL here?

We want to make sure nobody deserializes human-readable representations of a protobuf message intended for humans debugging a system. Historically, .DebugString() and TextFormat were interchangeable, and existing systems use DebugString to transport and store data.

We want to make sure sensitive data does not accidentally end up in logs. Therefore, we are transparently redacting some field values from protobuf messages before turning them into a string ("[REDACTED]"). This reduces the security & privacy risk of accidental logging, but risks data loss if other systems deserialize your message. To address this risk, we are intentionally splitting the machine-readable TextFormat from the human-readable debug format to be used in log messages.

This is intentional, to make the “debug representation” of your protos (produced, for example, by logging) incompatible with TextFormat. We want to prevent anyone from depending on debugging mechanisms to transport data between programs. Historically, the debug format (generated by the DebugString APIs) and TextFormat have been incorrectly used in a interchangeable fashion. We hope this intentional effort will prevent that going forward.

We intentionally picked a link over less visible format changes to get an opportunity to provide context. This might stand out in UIs, such as if you display status information on a table in a webpage. You may use TextFormat::PrintToString instead, which will not redact any information and preserves formatting. However, use this API cautiously – there are no built in protections. As a rule of thumb, if you are writing data to debug logs, or producing status messages, you should continue to use the Debug Format with the link. Even if you are currently not handling sensitive data, keep in mind that systems can change and code gets re-used.

I tried converting this message into TextFormat, but I noticed the format changes every time my process restarts.

This is intentional. Don’t attempt to parse the output of this debug format. We reserve the right to change the syntax without notice. The debug format syntax randomly changes per process to prevent inadvertent dependencies. If a syntactic change in the debug format would break your system, chances are you shouldn’t use the debug representation of a proto.

FAQ

Can I Just Use TextFormat Everywhere?

Don’t use TextFormat for producing log messages. This will bypass all built-in protections, and you risk accidentally logging sensitive information. Even if your systems are currently not handling any sensitive data, this can change in the future.

Distinguish logs from information that’s meant for further processing by other systems by using either the debug representation or TextFormat as appropriate.

I Want to Write Configuration Files That Need to Be Both Human-Readable And Machine-Readable

For this use case, you can use TextFormat explicitly. You are responsible for making sure your configuration files don’t contain any PII.

I Am Writing a Unit Test, and Want to Compare Debugstring in a Test Assertion

If you want to compare protobuf values, use MessageDifferencer like in the following:

using google::protobuf::util::MessageDifferencer;
...
MessageDifferencer diff;
...
diff.Compare(foo, bar);

Besides ignoring formatting and field order differences, you will also get better error messages.