Skip to content

proposal: encoding/{base32,base64}: constant-time implementations of RFC 4648 codecs for use with cryptography keys #73901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tob-scott-a opened this issue May 28, 2025 · 22 comments · May be fixed by #73909
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@tob-scott-a
Copy link

tob-scott-a commented May 28, 2025

Proposal Details

Background

The current implementations of the base32 and base64 codecs are not ideal for cryptographic use cases. The linked document describes practical timing side-channels against general purpose RFC 4648 codecs when used to encode/decode cryptographic secrets. You can verify the current implementation uses a table look-up in both directions, which introduces the risk of cache-timing attacks.

Specialized implementations of these codecs exist. @Sc00bz wrote a constant-time implementation of base64 in C, which I used as a basis for both base32 and base64 in PHP.

Proposed Change

New functions in encoding/base32 and encoding/base64:

  • EncodeToStringConstTime as an alternative to EncodeToString
  • DecodeStringConstTime as an alternative to DecodeString

These functions could then be used in other packages that handle cryptographic secrets and auditors can be absolutely sure that timing leaks are not present.

(I'm happy to provide a patch for this, if accepted.)

@gopherbot gopherbot added this to the Proposal milestone May 28, 2025
@mateusz834
Copy link
Member

CC @golang/security

@mateusz834 mateusz834 added the Proposal-Crypto Proposal related to crypto packages or other security issues label May 28, 2025
@tob-scott-a
Copy link
Author

Just in case there's any confusion: I do not think this qualifies as a security vulnerability (or I would have reported it as one, rather than filing an issue with the intent to follow up with a patch).

@gabyhelp gabyhelp added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label May 28, 2025
@Jorropo
Copy link
Member

Jorropo commented May 28, 2025

👍 however do we want new functions ? It would be simpler for everyone if we made the current functions constant time.
If you could send your patch already it would be nice, I would like to benchmark it.

@tob-scott-a
Copy link
Author

A constant-time alternative will certainly be slower than the general-purpose functions: It replaces a map look-up with many operations.

In the context of handling cryptographic secrets, this trade-off is worth it. However, making all use-cases for a base64 codec constant-time would slow most Go programs down unnecessarily.

It is for this reason that my opening offer was to implement new functions rather than change the current behavior.

@Jorropo
Copy link
Member

Jorropo commented May 28, 2025

A single map lookup is far from cheap, I don't want to risk myself to guessing the performance impact, a benchmark would really help 🙂.
Also in go we often do not mind trading a few % of performance if it means making the API easier to use.

@mateusz834
Copy link
Member

Also in go we often do not mind trading a few % of performance if it means making the API easier to use.

We can always do this in reverse, make the current API constant-time, but add unsafe variants (if needed) for speed. Then folks would actually use the constant-time APIs, otherwise we might have low usage of such (new) APIs.

@renthraysk
Copy link

renthraysk commented May 28, 2025

Doesn't the fact that stdlib have configurable alphabets for the base64/32 rule out replacing them?

@tob-scott-a
Copy link
Author

@renthraysk It's not actually a total show-stopper (although it does make the patch a bit trickier to write). See #73909 for a first pass at a patch that adheres to the suggestions provided in this discussion so far.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/676917 mentions this issue: encoding/base64: add constant-time behavior, enabled by default

@rolandshoemaker
Copy link
Member

rolandshoemaker commented May 29, 2025

I don't think we'd want to turn on constant-time behavior by default for all uses of encoding/{base32,base64}. I suspect the vast majority of their usage is for entirely benign inputs. That said we probably would want to use it by default for encoding/pem, where the contents are much more likely to be sensitive (of course we should probably still add some way to disable this, for people who i.e. need to parse large numbers of certificates etc).

I haven't thought through this too deeply, but based on my understanding of how people typically do constant time base64, could we add a generic decoder/encoder that generates the required bit shift offsets based on the alphabet, rather than needing to implement per-alphabet encoders and decoders? We'd then just need to add a WithConstantTime (or something) method to Encoder.

@rolandshoemaker
Copy link
Member

I'm not sure this is actually correct (threw it together very quickly), but something along the lines of this https://go.dev/play/p/4toQNnegoOh?

@tob-scott-a
Copy link
Author

tob-scott-a commented May 29, 2025

I'm not sure this is actually correct (threw it together very quickly), but something along the lines of this https://go.dev/play/p/4toQNnegoOh?

The problem with that sort of check is that the RFC 4648 alphabets for base64 are not contiguous, which requires arithmetic like https://go-review.googlesource.com/c/go/+/676917/1/src/encoding/base64/mapping.go to implement without branches.

I don't think we'd want to turn on constant-time behavior by default for all uses of encoding/{base32,base64}. I suspect the vast majority of their usage is for entirely benign inputs.

I'm of the same opinion here.

In this case, these lines can be omitted and the corresponding logic moved into other places (e.g., encoding/pem) as needed.

@rolandshoemaker
Copy link
Member

The problem with that sort of check is that the RFC 4648 alphabets for base64 are not contiguous, which requires arithmetic like https://go-review.googlesource.com/c/go/+/676917/1/src/encoding/base64/mapping.go to implement without branches.

Can this arithmetic not be precomputed for the discontiguous ranges in the alphabet such that there don't need to be branches? Or is there something obvious I'm missing.

@tob-scott-a
Copy link
Author

tob-scott-a commented May 29, 2025

Sorry, I misread the code. It looks like your approach will work:

Compare https://go.dev/play/p/EEz9V8fgsCC with https://go.dev/play/p/VHwr1LZpst9

EDIT: Looks like there's a bit of a bug with the logic for handling ret and valid. This fixes it: https://go.dev/play/p/vQeJ3QP2VWf

@Sc00bz
Copy link

Sc00bz commented May 30, 2025

I wrote this awhile ago. I think it's easier to read. Also you can edit the constants for the different character sets. Although these decode functions have an optimization for a single character range (which is obvious to change). Oh also if you want Crockford's base32 decode or case insensitive hex or base32 decode then there's a few modifications like adding ch |= 'A' ^ 'a' // convert to lowercase (wait unless EBCDIC... see footer) after any non-letter ranges. Then for Crockford's base32 decode check for o, i, and l for values 0, 1, and 1.

Note base64StandardDecodeCharFaster should save 4 operations in x86 vs base64StandardDecodeChar. Also due to instruction latency and throughput it should be the same speed but a little smaller code output.

Also for custom character sets there will be a point when scanning the character set will be faster. But I don't know when that is and I assume it's higher than common character sets (in ASCII). Basically unless the character set is randomised "for security" then doing ranges will leak info about the character set, but cases like that don't really matter because they will always find a way to break their scheme.

func base64StandardDecodeChar(ch int) int {
	const LO1 = int('A') - 1
	const HI1 = int('Z') + 1
	const LO2 = int('a') - 1
	const HI2 = int('z') + 1
	const LO3 = int('0') - 1
	const HI3 = int('9') + 1
	const LO4 = int('+') - 1
	const HI4 = int('+') + 1
	const LO5 = int('/') - 1
	const HI5 = int('/') + 1
	const COUNT1 = 0
	const COUNT2 = COUNT1 + HI1 - LO1 - 1
	const COUNT3 = COUNT2 + HI2 - LO2 - 1
	const COUNT4 = COUNT3 + HI3 - LO3 - 1
	const COUNT5 = COUNT4 + HI4 - LO4 - 1

	value := -1

	//    if (  LO  < ch  && ch < HI        )   { value += COUNT + ch - LO }
	value += (((LO1 - ch) & (ch - HI1)) >> 8) & (COUNT1 + ch - LO1)
	value += (((LO2 - ch) & (ch - HI2)) >> 8) & (COUNT2 + ch - LO2)
	value += (((LO3 - ch) & (ch - HI3)) >> 8) & (COUNT3 + ch - LO3)
	value += (((LO4 - ch) & (ch - HI4)) >> 8) & (COUNT4 + 1) // ch - LO4 == 1
	value += (((LO5 - ch) & (ch - HI5)) >> 8) & (COUNT5 + 1) // ch - LO5 == 1

	return value
}

func base64StandardDecodeCharFaster(ch int) int {
	const LO1 = int('A') - 1
	const HI1 = int('Z') + 1
	const LO2 = int('a') - 1
	const HI2 = int('z') + 1
	const LO3 = int('0') - 1
	const HI3 = int('9') + 1
	const CHAR4 = int('+')
	const CHAR5 = int('/')
	const COUNT1 = 0
	const COUNT2 = COUNT1 + HI1 - LO1 - 1
	const COUNT3 = COUNT2 + HI2 - LO2 - 1
	const COUNT4 = COUNT3 + HI3 - LO3 - 1
	const COUNT5 = COUNT4 + 1

	var borrow uint

	value := -1

	//    if (  LO  < ch  && ch < HI        )   { value += ch - LO + COUNT }
	value += (((LO1 - ch) & (ch - HI1)) >> 8) & (ch - LO1 + COUNT1)
	value += (((LO2 - ch) & (ch - HI2)) >> 8) & (ch - LO2 + COUNT2)
	value += (((LO3 - ch) & (ch - HI3)) >> 8) & (ch - LO3 + COUNT3)

	// if (                   ch == CHAR      ) { value +=  COUNT + 1 }
	_, borrow = bits.Sub(uint(ch ^ CHAR4), 1, 0); value += (COUNT4 + 1) & -int(borrow)
	_, borrow = bits.Sub(uint(ch ^ CHAR5), 1, 0); value += (COUNT5 + 1) & -int(borrow)

	return value
}

func base64StandardEncodeChar(src int) byte {
	const LO1 = int('A')
	const HI1 = int('Z')
	const LO2 = int('a')
	const HI2 = int('z')
	const LO3 = int('0')
	const HI3 = int('9')
	const LO4 = int('+')
	const HI4 = int('+')
	const LO5 = int('/')

	src += LO1

	//  if ( HI  < src      )   { src += LO_NEXT - HI - 1 }
	src += ((HI1 - src) >> 8) & (LO2 - HI1 - 1)
	src += ((HI2 - src) >> 8) & (LO3 - HI2 - 1)
	src += ((HI3 - src) >> 8) & (LO4 - HI3 - 1)
	src += ((HI4 - src) >> 8) & (LO5 - HI4 - 1)

	return byte(src)
}

This all assumes that it's in ASCII. If go can compile to IBM mainframes then strings are likely in EBCDIC. Which will break things.

@doggedOwl
Copy link

doggedOwl commented May 30, 2025

however do we want new functions ? It would be simpler for everyone if we made the current functions constant time.
If you could send your patch already it would be nice, I would like to benchmark it.

no actually it would not be simpler for everyone. base64 encoding is used extensively in various format (json, xml) etc without the need to be cryptographically secure. it would be a big regression if all those cases got slowed down for no good reason.

@rolandshoemaker
Copy link
Member

Ignoring the actual implementation details for a moment, I think a reasonable proposal here is to make the following change to encoding/base{32,64}:

// WithConstantTime creates a new encoding identical to enc except that the
// Encode/Decode operations happen in constant time. This is useful when the
// inputs to these operations are sensitive, and measuring the timing of the
// operations could leak information about them.  Note that Encode/Decode
// operations will still leak the length of the inputs.
func (enc Encoding) WithConstantTime() *Encoding

and the following changes to the existing encoding/pem functions:

// ...
// Encode uses constant-time base64 encoding, and will only leak the length of
// b.Bytes. To restore the old variable-time behavior, use the
// GODEBUG=vartimepem=0 environment variable.
func Encode(out io.Writer, b *Block) error

// ...
// EncodeToMemory uses constant-time base64 encoding, and will only leak the
// length of b.Bytes. To restore the old variable-time behavior, use the
// GODEBUG=vartimepem=0 environment variable.
func EncodeToMemory(b *Block) []byte

// ...
// Decode uses constant-time base64 decoding, and will only leak the length of
// p.Bytes. To restore the old variable-time behavior, use the
// GODEBUG=vartimepem=0 environment variable.
func Decode(data []byte) (p *Block, rest []byte)

and add these new functions to encoding/pem:

// VariableTimeEncode provides the same functionality as Encode, but uses
// variable-time base64 encoding. VariableTimeEncode should only be used when
// encoding non-sensitive values, or when the timing of VariableTimeEncode is
// not exposed.
func VariableTimeEncode(out io.Writer, b *Block) error

// VariableTimeEncodeToMemory provides the same functionality as EncodeToMemory,
// but uses variable-time base64 encoding. VariableTimeEncodeToMemory should
// only be used when encoding non-sensitive values, or when the timing of
// VariableTimeEncodeToMemory is not exposed.
func VariableTimeEncodeToMemory(b *Block) []byte

// VariableTimeDecode provides the same functionality as Decode, but uses
// variable-time base64 decoding. VariableTimeDecode should only be used when
// decoding non-sensitive values, or when the timing of VariableTimeDecode is
// not exposed.
func VariableTimeDecode(data []byte) (p *Block, rest []byte)

Another option would be to add a new field to Block, that lets setting variable time behavior for encoding, but we'd still need a VariableTimeDecode function. I think I like adding the new methods more, since it provides better symmetry in the API.

@tob-scott-a
Copy link
Author

@rolandshoemaker That does seem like a better user experience than the API changes in my hasty patch.

@neild
Copy link
Contributor

neild commented May 30, 2025

What's the use case for variable-time encoding in encoding/pem?

encoding/base{32,64} can't be constant-time by default (performance regression, most users don't need constant-time encoding). If encoding/pem.{Encode,Decode} become constant-time by default, then we're saying that most programs either need constant-time encoding or don't care about the performance penalty. That seems plausible to me. Are there programs that use PEM, don't need constant-time encoding, and care about the (hopefully small) performance cost of constant-time base64 encoding?

@tob-scott-a
Copy link
Author

Are there programs that use PEM, don't need constant-time encoding, and care about the (hopefully small) performance cost of constant-time base64 encoding?

Consider a use-case of a Key and/or Certificate Transparency system that accepts PEM-encoded public keys (which are not public) or certificates (which in turn only contain public information) in order to validate them before committing them to a Merkle tree.

In such a hypothetical system, slowing down the base64 decoding by default would add a performance bottleneck with no real upside for the system in question.

This is the sort of consideration I had in mind, and why I opened with not enabling it by default.

@neild
Copy link
Contributor

neild commented May 30, 2025

That's an example of a system that doesn't care about constant time encoding. Is PEM encoding/decoding going to be in critical performance path in a system like that, though? How much of a cost are we talking about here?

If we're going to double the API surface of the encoding/pem package and require all future users to spend time trying to decide whether they need constant-time encoding or not, than I think we should quantify what the actual performance difference is.

@rolandshoemaker
Copy link
Member

rolandshoemaker commented May 30, 2025

I think we need benchmarks to make reasonable decisions here about the trade-offs, but I will note that encoding/pem is currently only three functions. Doubling the API surface is perhaps not ideal, but concretely that comes down to adding three new functions, which isn't particularly horrific.

The point about requiring users to decide is valid, but I think the combination of naming and reasonable documentation makes it less scary. Additionally, since the existing Encode/Decode will become "safe" means it's unlikely people will unintentionally begin using the "unsafe" permutations unless they are explicitly looking for them for performance reasons.

Of course all of this is moot if the benchmarks show that the performance degradation is relatively low, hence why we need numbers before we can make a decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants