OpenStack Summit Berlin Keystone Recap
Keystone Project Update
Keystone lucked-out again and we were able to give the project update early in the week. Colleen and I shared what the team accomplished in Rocky, gave an idea of what to expect to Stein, and how it helps us setup initiatives for Train. As usual, the session was recorded in case you missed it and I’ve published the slides.
Operator & User Feedback
The team participated in a session with operators and users to collect feedback on what we can improve. Most of the topics already had fixes merged or underway, which was reassuring. An operator raised a question about hierarchical multitenancy and how it can be inconsistent across services (e.g., I believe the example used was DNS names). We took time as a group to highlight how efforts like this span across services and that it ultimately boils down to assisting other OpenStack developers by helping them understand keystone-related concepts and applying them. Unified limits and improving default policies are subject to similar patterns.
Someone mentioned it would be nice to restrict access to projects based on IP address ranges. We've heard similar requests in the past, but developers in the room thought this would be a good reason to implement resource options for projects, which is something we already have for users. In addition to this work, we made a note to finally finish documenting what these options are and how to use them, since there isn't any documentation for it today. Colleen opened a placeholder bug to make sure we keep tabs on the request.
Full notes from the feedback session can be found in etherpad.
Identity Provider Proxy
Thursday we had a forum session dedicated to a new idea that came out of the PTG in Denver. The idea is to make keystone an identity provider proxy that knows how to handle identities from multiple providers and standardized protocols. Ultimately, this should make it easier for keystone to be a service provider without duplicating user identities by supporting more protocols.
Morgan (kmalloc) spent a good portion of time going through the high-level concepts and some of the implementation details required. Operators and developers in the room had the opportunity to question aspects of the approach or ask for further clarification. I do think we're going to need to find better tools for documenting the architecture we're shooting for and how it works. A few people in the room were eager to see things laid out visually, which we currently don't have. Despite being familiar with the concepts and existing implementation, I admit that diagrams would help describe the bigger picture. I can relate to folks who might not have the historical context and who are just jumping into these discussions.
I think we are socializing this at the right time, given the majority of the work to enable this won't start until the next release. The team is focusing on cleaning things up for Stein, making it easier to start the proxy work in Train. Full notes from the session are in the etherpad.
Edge Computing & Multi-Region Support
With edge computing being a hot topic recently, the edge working group hosted forum sessions to continue finding use case and discuss reference architectures. The notes below are specific to the edge architecture of keystone, which is different from the generalized reference architectures documented by the working group, though there were interesting discussions on those architectures as well.
The problem for keystone and edge deployments is figuring out how to scale the deployment. Ultimately, we want keystone to be available globally, but that means deploying keystone in hundreds or thousands of small regions (e.g., data center in a cell tower). Having a single global identity deployment means each region has to share replicated data, which raises concerns with latency reaching consensus for database writes and split-brain issues across the database cluster. Treating each region as separate deployments means duplicating user identities across many different places and keeping that data in sync can be problematic (e.g., password changes, multiple IDs for a single user). The edge computing working group has documented two proposed architectures attempting to solve the problem.
The first architecture is to optimize the layer that marshalls data between keystone and the database. Ideally, the layer is supposed to quickly and gracefully handle global write operations allowing for a single identity deployment to span globally. According to the StarlingX developers, this is an architecture they are investigating, and they are working on implementing a database replication tool.
The second architecture is to lean on federated identity to keep each deployment isolated, but provide the facade that all regions are a single deployment. Each region has a keystone deployment setup as a federated service provider and is configured to accept federated identities. A single identity provider holds all user identities and issues documents used by keystone service providers to validate users across regions. This approach is in line with how Oath is deploying OpenStack.
Oath developers have had to make some changes to keystone to get it to work correctly. The keystone team sat down with James Penick, and he walked us through the changes Oath made to keystone for them to deploy it this way. They needed to teach keystone how to federate users from their identity provider, which is Athenz. Like keystone, Athenz issues tokens when a user authenticates. In their deployment, a user provides keystone with an Athenz token, which contains information about their identity, their role assignments, projects, and domain. They implemented a pluggable authentication method for keystone that validates an Athenz token and then provisions those resources in keystone if they don't exist. The latter part is very similar to keystone's auto-provisioning feature, which populates resources based on SAML attributes.
To incorporate what Oath has done upstream, we could generalize keystone's auto-provisioning feature to accept arguments from multiple sources, not just SAML assertions, making auto-provisioning easier to use with other protocols. James also mentioned they are considering moving away from Athenz tokens and using X.509 certificates directly. This would mean we might be able to use the web server to validate the certificate used for the request, pull the user properties out of the certificate (possibly using plugin), and pump them into the auto-provisioner. Keystone wouldn't need to understand a token format specific to Athenz, and we'd have better functionality through X.509 certificates.
After that, we could identify gaps between the auto-provisioning code Oath has open-sourced and keystone's native implementation and try upstreaming those differences. Both of these changes should help us generalize parts of keystone federation to be more accepting of other identity protocols or at least more standardized formats.
I gave a lightning talk on unified limits (slides), but the topic cropped up in several sessions throughout the week. The team spent a good portion of time socializing the interaction between keystone and each service, really just going over which code was responsible for various parts of the quota system (e.g., resource usage versus resource limits).
The interesting theme was how to handle more complicated use cases. For example, today the unified limits implementation in keystone allows users to set limits on resources within a service, optionally providing a project or region. What people started asking for was a way to limit resources by availability zones, especially if their deployment has multiple availability zones in a single region. Another example is using limits in keystone to limit the number of floating IPs associated with a specific neutron network. The tough part about this is that given the current model, keystone might suffer from attribute bloat for things that are specific to services. Doing this would also couple attributes of an API in keystone to implementation specifics in other services, whereas today limit properties are general purpose and applicable to nearly every service.
I have an action item to summarize these asks from the summit and start a thread on the openstack-discuss mailing list. We also received plenty of positive feedback on our plan to add domain support for limits.
Resource Ownership & Clean-Up
Various folks from the public cloud working group led sessions trying to move the needle on two long-standing issues across OpenStack services. One is being able to transfer resources between projects (etherpad), and the other is cleaning up resources after deleting a project (etherpad).
Resource transfer between projects is tricky. Take servers for example. They can be attached to volumes, networks, floating IP addresses, among other resources controlled by services that are not nova. Each service needs to update the project ID for things attached or used by that instance. What should process the resource dependency for the server? What happens in the failure scenario of it something isn't updated correctly? Should this be something that lives in the client or do we expect services to handle the dependency graph? The discussion didn't result in an answer, but action items to start documenting solutions to the problem and then the advantages and disadvantages of each. At least this way we start covering ground and spend less time rehashing the same details every time the topic comes up.
The resource deletion portion of the topic was a little more fruitful, in my opinion. The resource dependency bit was still an aspect of the discussion, but there were two proposals to solve the issue. The first was to implement the dependency between resources as a plugin to the os-purge tool. This only works if the project is the last thing deleted. Otherwise, resources are going to be orphaned and harder to clean up (e.g., you can't get a token scoped to a non-existent project to clean up an orphaned instance) unless services expose forcible API options that bypass checking token scope against the tenancy of the resource. The second suggestion was to implement a system-scoped API in each service that deletes all resources within a project. Only system administrators would be allowed to call this API by default. Since it would be a system-scoped operation, it wouldn't require a token scoped to the deleted project, so there isn't a requirement to delete all the resources before deleting the project. An operator could delete the project, then iterate the service catalog to clean up all resources matching the deleted project ID. Next steps for the community are to write up plugin points for os-purge and an example REST API for each service to implement for cleaning up resources. This topic got traction in the forum session for grooming community goals for Train. Full notes can be found in etherpad.
OpenStack Policy 101
Ozz and Adam have a nice introductory talk into how policy works in OpenStack.
Dynamic Policy for OpenStack using Open Policy Agent
Ozz gave another talk on how to integrate Open Policy Agent into OpenStack deployments to manage policies.
Pushing Keystone over the Edge
Kristi, Adam, and Morgan gave a presentation on how keystone is working to fill edge-specific use cases.
The following certainly isn't inclusive of all details pertaining to the Technical Committee. I wanted to briefly summarize some of the high-level points from the week.
Board Meeting & Joint Leadership
The board and TC meet prior to the summit in a joint leadership meeting, which is open to the public. We spent the morning observing as the board reviewed and finalized a bunch of changes to the Foundation bylaws, updating them to reflect the current state of the Foundation and granting a bit more flexibility (due to relaxed wording) for the Foundation to move forward.
The TC had the opportunity to share details about Rocky, what's being worked on in Stein, and some interesting trends across the community including review mentorship as well as the new technical vision statement. We received positive feedback on almost all fronts, especially with respect to the technical vision statement. As far as I know, this was the first time the TC has approached a joint leadership meeting in this fashion and it will likely be how we structure our content from a technical perspective in future meetings. Doug put up a nice summary on the mailing list.
Technical Vision Retrospective
There was plenty of dialog related to OpenStack's technical vision statement leading up to the summit. By the time we were in Berlin, there had been several iterations on the proposed technical vision. For the most part, there weren't many adjustments to make that couldn't be done in a follow-on patch, so nothing major. The technical vision was approved and is available for reading. The hope is that it will remain a living document that we can use to evaluate projects and provide direction to OpenStack as a whole. Full notes can be found in etherpad.
Train Community Goals
The last session of the week was dedicated to discussing community goal candidates for the Train release. We parsed a long list of possible goals and categorized each. We ran across a few that didn't seem like they would be great fits across the entire community, but we noted the teams that would be impacted and suggested a more focused effort between those teams instead. Others that would impact the majority of the community are going to be fleshed out in more detail, either by the person who proposed the goal or a goal champion. In my opinion, three goals stuck out as possible candidates for Train:
Cleaning up resources when deleting a project
Service side health checks
Moving legacy clients to python-openstackclient
The full list of proposals and notes are in etherpad.
Photo: I was lucky enough to have some time to look around Berlin before and after the Summit. The cover photo of this post is a mural on the Berlin Wall from the East Side Gallery.