Keystone Rocky PTG Summary
Keystone's theme for the Rocky PTG, hosted in Dublin, was identity integration. The Queens release delivered several long-running initiatives like system role assignments and scoping, application credentials, and unified limits. The team's main focus in Dublin was to get feedback from projects consuming these initiatives and socialize the benefits of doing so.
One of the big efforts we delivered in Queens was the ability to grant users and groups role assignment on a new target type called system. The system represents OpenStack APIs that manage the deployment's infrastructure. A couple good examples are keystone endpoints or nova hypervisors.
Ideally, these are things that should only be managed by people managing the deployment. On Tuesday we had a session dedicated to socializing the state of this work and how projects can use it to separate project-level APIs typically used by end users, from elevated system-level APIs. Nova specifically stepped up to the plate to discuss implementing scope_types in their policies. This would be a huge step forward in refactoring out hard-coded admin role checks and work towards operator driven policy. If everything goes well, we should be able to seriously consider enforcing scope_types as a community goal for the S release.
Some folks asked if there would ever be the possibility of allowing system-scoped tokens to assume a project. Kind of like a "on behalf of project" type of thing. For example, operators could use system-scoped tokens to create instances for a specific project, which maintains backwards compatibility with how things work today. It would also make the policy check for the services consistent, instead of each service having a different way to obtain a project ID from a system-scoped context (e.g. some deployments have patched nova to pull this out of user supplied metadata if the role associated to the role is name admin). The result of this discussion was that if we ever support project operations on behalf of a system administrator, it should be configurable. Since this wouldn't really be an end-user thing, end-user interoperability isn't much of a concern. There are also a bunch of questions about audit trails and informing users of actions performed to their account without direct authorization. This needs more discussion, but ultimately we found people who think it would be useful and they also recognize the security implications of it.
Now that oslo.policy supports deprecating policies, it's easier for projects to evolve their default policies so long as they are registered in code. This was the main driver behind one of the community goals last release. Since then, we've taken a shot at trying to define a basic set of default roles that OpenStack should provide out-of-the-box.
The original proposal only focused on introducing admin, writer, and reader roles, which are not project specific. Surprisingly, several developers from other projects wanted more granularity that included default roles like network_admin, network_writer, and network_reader. They actually wanted to write policies specific to their project, which is great because it gives operators more flexibility. The tough part is that it requires keystone to know which services are going to be deployed at bootstrap time (e.g. before the service catalog exists) in order to create those roles. An alternative would be to make additional policy rules that consist of OR checks. For example, keystone's identity:create_endpoint policy should be accessible by role:admin or role:identity_admin. This can be expressed easily enough in oslo.policy's check string syntax, and the beauty of it is that it doesn't require additional roles when keystone is bootstrapped.
To summarize, by default a new installation will have admin, writer, and reader roles. If an operator wants to grant an identity_admin role to a user, all they have to do is create the role in keystone and assign it. The default policy will just work with the three additional roles out-of-the-box. Granted, this does make assumptions about the defaults being used, but it's also backwards compatible for deployments that are currently maintaining custom policy. Additionally, operators with custom policy will have a chance to migrate to the new defaults and simplify their overrides if they choose to.
To see how we broke this work up for the release, check out the epic in the Rocky roadmap.
Application credentials open up a bunch of new possibilities for folks writing applications that consume OpenStack APIs. This is especially true for LDAP-backed deployments or deployment with PCI requirements. This topic fell under the identity integration track so that we could socialize how they work and talk about what we can do to improve them.
The most notable discussion around application credentials was the implementation of a white list resource. This would give users the ability to optionally set a white list of APIs on their application credential. We could then re-use a previous implementation to filter requests based on the application white list in the token reference using keystonemiddleware. The white list will always be subject to the authorization a user has on a project via role assignments, though. Just because a user specifies a white list in an application credential, doesn't necessarily mean the application has the authorization to execute those API. This highlights some usability issues we can smooth out in the future. For now, it let's end users be super restrictive in the things an application credential can do on their behalf. There are also some gaps in clients support, mainly shade, python-openstackclient, and horizon. We documented these areas and targeted them for the Rocky release.
From there, the keystone team sat back for the rest of the session and listened to conversations from other projects looking to consume application credentials in some form or fashion. We're going to follow up with them around milestone 2 to see how things are going. Details about the work we're targeting for Rocky can be found in the epic.
This was another monolithic topic that spanned multiple sessions, some of which were cross-project. Again, the main purpose was to socialize what was completed in Queens with services looking to implement unified quota enforcement.
Four significant things came out of these sessions. First, we found an email from Tim Bell and blog post documenting CERN's use cases for hierarchical quota. Since this is some of the only hierarchical project feedback we have, we're going to try and work it into an enforcement model. Second, we realized we're missing a significant amount of information from deployments using hierarchical projects. This makes developing enforcement models hard. In short, we need more deployments to do what Tim Bell did and describe how they expect this system to behave. We have action items to scrub old hierarchical multitenancy etherpads for use cases and ask the User Committee if they can help collect some feedback for future enforcement models. Third, there was unanimous agreement that services should be able to rely on a library to enforce usage when given a quota and a resource type. This resulted in an action item to create a new oslo library for limits and it should make unified limits easier for services to consume. Lastly, we started to come up with a list of things in the unified limit API we might consider changing before making it stable. We have a soft deadline of milestone 3 before marking the API as stable and providing full support, which can be coordinated with the release of oslo.limits version 1.0.
Unified limits should be an exciting area of development during the Rocky release. The epic can be found in the roadmap.
JSON Web Tokens
I touched on this topic after the Queens PTG, but we took some time to revisit it as a group in Dublin. The team agreed that it makes sense to pursue a JSON Web Token provider. We found owners and even started discussing various aspects of the implementation like the encryption versus signing of a token. One important bit about JWT is that it's used everywhere and our discussions wandered off-topic when taking that into account. Despite it's growing support in other communities, we'll save the "OpenStack integration with JWT" topic for another day, or when we get a clear use case and goal. Otherwise, it feels like we're walking around with a hammer just looking for a nail.
The epic for implementing JWT in keystone can be found in the roadmap.
External Policy Enforcement
By default, policy is enforced at the service using the oslo.policy library. The library does support the ability to pass decisions off to an external system if configured to do so. In doing that, it supplies the external system with as much data about the check as it can (e.g. user information, the type of resource, the policy name, role information, et cetera). The problem is that services don't necessarily pass the same information to oslo.policy at enforcement time. This makes it harder for people using external policy systems to rely on a consistent set of attributes always being present.
The solution we came up with was to make oslo.policy expect certain attributes during enforcement. Since this would be a backwards incompatible change, we'd have to release a new major release. Once that is done we should be able to patch projects that fail with the new major version. After that, external systems should be able to rely on those attributes consistently. Details regarding this work can be found in the epic targeted to Rocky.
Deprecations & Removals
The list of removals and deprecations for Rocky is short but contains sizable parts of keystone.
UUID Token Provider & SQL Token Driver
These two pieces of code will finally be removed this release. In fact, the patch to remove them already landed. The removal of these parts of keystone will open up the ability to refactor nearly the entire token provider API. We can make the implementation and interfaces between authentication and tokens much simpler. In turn, it will be easier to implement new token providers, or maintain custom token providers out-of-tree. In general, the maintenance overhead will be less because the entire system should be simpler and easier for everyone to understand.
Templated Catalog Backend
This backend would be nice to deprecate in favor of something that is easier to use and maintain. We have a patch for implementing a YAML-based catalog backend, but we still need to smooth out a couple things before we can consider merging it. If we get around to that this release, we'll likely deprecate the templated catalog backend in favor of it.
At every PTG or Forum we try and set aside time to talk about performance. There seems to be an uptick in the amount of people dropping by #openstack-keystone and asking if X ms per authentication or validation request is normal. While individual response times can vary greatly depending on configuration, we decided to take a look at how we can help people get more out of their deployment. Caching seemed to be the short-term answer until we can get actionable results from some sort of testing initiative.
Keystone supports caching, but there are a bunch of different ways to deploy it. You can do request local caching, you can setup memcached along-side every instance of keystone, you can setup a ring of memcached servers and shard data across them, et cetera. The problem is that our documentation doesn't really clearly say what the benefits and drawbacks are of each. We have an action item to work on a caching guide for keystone this release that describes how to setup caching for production-like deployments.
The toughest part about performance testing is having consistent hardware that can give you results that don't vary wildly from run to run. We sat down with the infrastructure team to try and figure if there is a way we can get consistent hardware. Matt Treinish offered up some resources and we discussed hooking into those. We'll be continuing this discussion on the mailing list so that resource providers for CI can weigh in on how to help with dedicated resources.
From a keystone-specific perspective, there are assumptions about the application that lead to performance discussions. For example, how many times does keystone need to read a user from the backend when authenticating? We know keystone pulls these references more than once in a single API call, but the thing we don't know is how much it impacts total response time. It's also safe to assume we don't necessarily use python in the most efficient ways. We need an objective way to determine these things before we go off and start implementing dependency injection or refactoring python.
As a team, we came up with a few ways to improve the efficiency of office hours. Everyone agreed it would be nice to establish workflows that are more friendly for peer code reviews and peer-programming. We also want to be more detailed in exactly what we target for each session, even if it is closing just one or two bugs. Expect that feedback to work its way into future office hour sessions.
We stuck to the same format that we used in Denver. Instead of having a session in the middle of the day, we held off until later in the week to hold our team retrospective. This was also the day we were kicked out of the convention center due to bad weather, but the hotel came through big by letting us use the hotel conference rooms from 6:00 - 9:00 PM (they also brought us food, which was awesome). Since we didn't feel pressed for time, people could leisurely add topics, which were time boxed to ~5 minutes.
You can find everything we talked about in Trello, including the action items we have for improving this release.