Migrating token formats without downtime
Since the introduction of Fernet tokens in Kilo, we get asked regularly if upstream will provide a way to migrate token formats without downtime. Operators see the benefits of Fernet tokens, especially their non-persistent nature, but it's hard to migrate gracefully if you have strict SLAs. The problem occurs right before you make the switch in your keystone configuration file. If a user gets a fresh token from keystone and starts doing something in another service, it will eventually get rejected because the new token provider doesn't recognize the old format. The goal of this post is to share code that will help you move your deployment to the Fernet token format, gracefully.
An advantage we have with the current architecture is that the token provider bits are pluggable. This means we can write a couple providers that inherit the existing token provider interfaces and wrap them with the functionality that we need for a graceful migration. But, before we get into the code let's lay out some assumptions about our theoretical deployment and describe how we want the migration to work.
Let's assume our deployment is supported by three keystone nodes all issuing tokens in the UUID format. The tokens are kept in SQL and replicated. Let's also say all tokens are valid for one hour. If we want to move to the Fernet format gracefully, our first step is to get these nodes to understand the Fernet format; otherwise, they'll choke whenever a user attempts to pass a Fernet token to them for validation.
So, our first "hybrid" provider is going to issue and validate UUID tokens, but also understand what Fernet tokens are and attempt to validate them. Once all the keystone API nodes acknowledge the Fernet format, then we can do the inverse of the first hybrid provider. The second hybrid provider will issue and validate Fernet tokens by default, but it will know what UUID tokens are and how to validate them. Each hybrid provider compliments the other during the migration. To recap, our configuration changes throughout the migration will appear in the following order:
- Switch each keystone node to use the `hybrid_uuid` token provider, which issues UUID tokens but can validate Fernet tokens
- Switch each keystone node to use the `hybrid_fernet` token provider, which issues Fernet tokens but can validate UUID tokens
- After one hour, switch each keystone node to use the `fernet` token provider
During steps 1 and 2, each keystone node should validate both UUID and Fernet tokens, which allows for a smooth migration. We hold the deployment in step 2 for the maximum token validation time. After an hour, any remaining UUID tokens in the deployment will expire and be invalid. This means we're safe to make the final switch, which ditches both hybrid providers for the upstream `fernet` provider.
How about that?! You've gracefully swapped out a fundamental piece of OpenStack's most used API without disrupting end users. The following is how I implemented the hybrid UUID provider:
And here is how I implemented the hybrid Fernet provider:
This was a trial-by-error process in order to get this to work. The upstream keystone token provider has a property called `needs_persistence` which returns a boolean to the token manager. If True, the token manager knows that the token provider in use requires some sort of storage mechanism and that it should fetch references from it. If False, no storage is required so talking to a persistence manager is useless.
The trouble I encountered is that the hybrid providers needed a way to determine persistence based upon the token being validated. If the provider is dealing with a UUID token, it should return True. It should return False if it is dealing with a Fernet token. The upstream keystone implementation doesn't have the required plumbing to make this extendable to the drivers. I made those changes and pushed them to a fork of keystone on my Github account in case you want to see how I worked around that specific issue. Otherwise, the only thing I had to do was create entry points for each hybrid provider that I wanted to use, which are also included in that change.
In summary, it is possible to change the token format of your deployment without impacting your customers. It just takes a couple custom providers and some careful testing. As much as I hate rolling my own providers, I think it has a special place for this type of application. It's relatively easy to do if you extend the existing code and wrap it with the functionality you need. The best part is that they are only in your deployment for the duration of the migration. After that, you're back to using 100% upstream, grass-fed code. But, there's nothing stopping you from printing the hybrid token provider source and sticking it to the refrigerator in the break room!
Photo Credit: Has no context relevant to this post, but a quick shot I took while at the Botanical Gardens in San Antonio.