OpenStack Summit Vancouver Recap
Three years after planning the Liberty release, the OpenStack Summit returned to Vancouver. A lot in OpenStack has changed since then, but you'd still catch people drifting away from discussions only to be captivated by the views of Grouse Mountain and Stanley Park. When I wasn't distracted by scenery, I was taking notes, listening, and participating in sessions. The following is a concentrated version of my week in Vancouver.
We were fortunate to give the project update on Monday where Harry Rybacki and I covered what was accomplished in Queens, the work we're doing in Rocky, and some things we're looking forward to in Stein. Not only was it a great way to start the week, a Monday presentation gave us the opportunity to plug other keystone-related talks. The update was recorded and the slides are available on SlideShare.
The team proposed a handful of sessions, mostly related to cross-project efforts and collecting feedback. There were other sessions that focused on edge computing use-case and multi-region support.
It was roughly three years ago when we tried to introduce a basic set of default roles across OpenStack. Unfortunately, it was tough to get buy-in from developers and operators without a migration path that didn't require operators to track commits, read code, or hope a release note was written.
We spent the Queens release making enhancements to the oslo.policy library, which Ben shared in his project update. These changes make it easier for developers to evolve their default policies in ways that make sense for their APIs. Maybe we're in a better place to discuss those changes now that we have proper tooling to make graceful changes and a construct designed to protect system-level APIs?
We spent this session introducing two new default roles, which will be available after keystone is bootstrapped or installed. The keystone team also expressed interest in proposing a community goal for the Stein or T release to incorporate these three defaults roles across OpenStack services. By the end of this session, I collected two important pieces of feedback.
The first is that there wasn't opposition to the proposal of a community goal, especially since a couple projects were interested in being early adopters. Second, there were a handful of operators in the room with very specific policy requirements, meaning they wanted even more customization outside of the two new defaults. I think this is a good thing and some of the work services have to do to incorporate those defaults should make policy management more clear. But, eventually we should consider the next step towards addressing use cases that require extreme customization.
Doug Hellmann mentioned this session during the keynote Monday morning, which helped drive attendance - thanks, Doug! Melanie and Matt also mentioned the work we've been doing with the nova team during their project update to improve their defaults, too. It was great to see other projects socialize the work we've been doing for the last couple releases.
In conclusion, the session was well attended and I think we've overcome some of the challenges that stumped us three years ago. Progress feels good.
The forum session and discussion notes can be found on this etherpad.
Edge, Multi-Region, & Generalized Federation
In preparation for the summit and forum, I expected to spend a lot of time discussing major themes like policy and unified limits. What I wasn't expecting was the amount of productive conversation we had around edge deployments and what that meant for federation.
Edge deployments are a challenging use case for keystone. They usually require tens or hundreds of smaller regions deployed over a large area. In order to mesh a large number of regions, keystone needs to be deployed in a way that makes it seems like everything operates as a single deployment. This can be done in a couple ways.
The first, and probably the more common route, is to deploy keystone globally and replicate data between regions. In theory, this makes sense, but we've heard that keystone struggles after a certain number of regions have been deployed. We expect this is due to database replication issues, especially if keystone data is writable in every region. Technical issues aside, there are growing concerns about data replication and data regulations (think GDPR), where user data might not be able to cross a geographic boundary.
A second option would be to treat each region as an independent cloud and tie them together using federated identity. If users are managed by something like ADFS, then each region would act as an independent service provider. If users live within keystone's MySQL database, or an LDAP deployment, then something like keystone-to-keystone federation might be useful. Ideally, this would mean user information is kept in a single place and some sort of federated document is used to prove a user's identity in other regions.
Unfortunately, I'm not aware of anyone outside of CERN and the MOC dealing with federation at this level. Regardless, there were a number of us in various sessions who thought federation might have an important role to play for edge deployments. It might be the only option for areas subject to strict data regulation and laws. Next steps include writing down and prototyping typical edge deployments using federated identity, which has the opportunity to flush out usability and performance issues in keystone's federation implementation. These findings can also tie into some of the work Craig Lee and crew are doing to document and generalize federation.
All-in-all, I think a couple focused use cases leveraging existing federated identity work in keystone can help improve the usability and hopefully performance for deployments interested in federated identity.
You can find the notes from the federated edge session in this etherpad.
Operator & User Feedback
As usual, we held a session dedicated to collecting operator and user feedback, all of which was captured in this etherpad.
I touched on this in the previous section, but the first thing we discussed was performance issues with multiple regions and replication. There have been several reports of performance issues when dealing with multiple regions and having keystone replicate data between all of them. To improve this issue, we can test performance degradation while scaling out multiple regions. A secondary option would be to look into specific federated deployments that allow each region to be it's own deployment but hook them together using federation.
Another operator raised concerns about the time it takes to run specific keystone database migrations. This was specific to a migration that looped through every user in the deployment and made changes to the reference, serially. For clouds with many users, this could result in a migration that runs for a long time, lengthening downtime. The operator who raised the concern did have a fix that improved the migration, which we're going to try and fix upstream and backport. In the future we'll have to be aware of changes like this and the impact they have depending on the size of the data set. Unfortunately we don't have an automated way to flag changes like this, but it would be an improvement to our test infrastructure.
Finally, John Garbutt raised some longstanding concerns about using federation and other methods of delegation like trusts or application credentials. The historical context is captured in a bug report, but it ultimately boils down to keystone not wanting to duplicate authoritative information from the source of identity. What happens when a federated user creates a trust for a role assignment they have based on a SAML attribute, which is later removed? The trust continues to work even though the source of the user's identity has changed to reflect a different, and possibly less permissive, authoritative scope. The discussion concluded with the keystone team meeting in the middle, where we will explore "refreshable" application credentials or trusts. John and some others interested in this are going to write up the use cases as a specification, which we will iterate on and possibly implement as a future enhancement to application credentials.
Unified Limits & Hierarchical Enforcement Models
The last session we hosted for the week was focused on unified limits and walking through additional enforcement models. We started the discussion with a Queens recap and allowed time for Q&A. The rest of our time was spent walking through the example behaviors of the strict two-level enforcement model we derived from use cases presented by CERN. This allowed us to analyze how other projects can consume these changes to calculate usage without the burden of understanding project trees. It also led to improvements within the oslo.limit library, and resulted in design decisions that make the developer experience more pythonic.
Toward the end of the session, some operators raised concerns about having trees with hundreds of children, essentially a flat-wide project tree. This can be problematic because calculating usage would get really expensive and result in poor performance. Conversely, not calculating the usage means there is an opportunity for users to game the system and end up with more than their allotted limit or exceed their limit. Adam and Morgan found a few alternatives that make usage calculating less expensive and wrote them down in this blog post. The team is going to work through these ideas and incorporate them into the existing specification for Rocky before we merge, if possible.
All-in-all, I thought the session went really well, considering how contentious the topic has been in the past. We also had folks from CERN sitting the room, who were able to sanity check possible changes to the model during the discussion.
All notes and todo's were captured in this etherpad.
There were a host of other identity-related presentations and sessions. These were a few that I attended and found really valuable.
Keystone Federated Swift
Matt Treinish and Matthew Oliver walked through various deployment options using federated keystone and swift in a lighting talk.