Fernet tokens and key distribution
In a previous post, I attempted to shed light on the key rotation method used in Keystone for Fernet tokens, shortly after the implementation landed. This segment is targeted towards understanding how the key rotation mechanism benefits deployments with multiple Keystone servers.
In an multi-Keystone OpenStack deployment, we must address replication. When using a persistent token format like UUID, the user should understand there is some latency between the token creation on one Keystone node and the token replication to all other Keystone servers in the deployment. This means that it would be possible for the user to authenticate against one Keystone server and in the process of validating that token on another Keystone server, get a 404 NotFound, because the token hasn’t been replicated yet. Granted this whole topic is subject to the backend, as well as the replication technology, but part of this headache subsides with Fernet tokens.
When using Fernet tokens, it’s safe to assume that you need a key repository of some kind. In the current implementation of Fernet, the key with the highest index, or primary key, is used to encrypt tokens (this is detailed in the previous post). Every other key in the repository can be used to decrypt. In addition to the fact that the primary key encrypts token, there is also a key that has the lowest index, 0 in this case. The key with the lowest index is considered the “staged” key. Which means the next time keys are rotated, the staged key will be promoted to the primary key, allowing it to encrypt tokens. Having a staged key helps solve the problem where Keystone fails to validate a token because the key used to encrypt the token doesn’t exist on the Keystone server yet (the same 404 NotFound case above applies to Fernet tokens!). This allows us the ability to distribute all keys to each Keystone server before rotating the keys. When we ensure the staged key exists on every Keystone endpoint, we can promote the staged key to primary on any Keystone server and tokens encrypted with that key can be validated by any other Keystone server sharing that staged key. As long as there is verification that the staged keys exists on each Keystone endpoint, we effectively remove the latency of “token replication”.
This doesn’t mean that we’ve removed the replication problem all together, but we have made it easier for a couple reasons. First, we no longer need to replicate each token, only the keys used to encrypt and decrypt tokens. The key repository should be much easier to replicate than thousands, possibly millions, of tokens. Second, this gives the deployer flexibility to replicate according to their deployment security needs (i.e. How often should I rotate my keys?), versus every time a token is created. In a persistent token setup, replication has to be robust enough to handle high traffic volumes. These high traffic periods may, or may not, be something that the deployer knows and once response time starts creeping out of acceptable thresholds, solving the problem is more reactive than it is proactive. In a Keystone deployment where the majority of the calls consist of authentication and validation, Fernet tokens with smart key rotation and distribution lower the probability of replication being a factor of poor user-experience.