Keystone Pike PTG Summary

Keystone Pike PTG Summary

Our full schedule from the Pike PTG is documented in etherpad. There is going to be some minor revisions to all keystone etherpads as we follow up on action items and fill in discussions from the PTG.

Policy & RBAC

I’m realizing now more than ever how hard policy is, specifically because it’s not just a keystone problem. Traditionally, it’s been very hard to solve policy problems from a keystone-only perspective because it’s never been a keystone-only problem. It’s certainly not that other projects don’t see the challenges here, but I think as a community we’re accepting the fact we have to do something about them. As a result, I found the discussions in Atlanta to be very productive and promising.

This might be old news for some, but in Austin, nova proposed moving policy into code, essentially treating policy exactly the same way we treat configuration values. This allows operators to slim down their policy files to only the policies they want to customize, which override the defaults specified in code. It also allows for a smooth upgrade path for operators because new policies are assigned a default, versus having to diff through new and old versions of policy files manually to see what was added and what changed. In the weeks leading up to the PTG, there was some cross-project discussion that highlighted these points and continued building on the idea of registering policy in code.

The majority of the discussion revolved around the importance of breaking the role check (i.e. does the user have the necessary permissions to execute the operation) and the scope check (i.e. does the resource being operated on have the same scope as the token used to initiate the operation) into separate pieces. The main reason was because the role check can be done with information supplied by keystone, but the scope check is currently done by the project. In order for keystone to do both, we’d need to include a lot of information from other projects and the resources they own in order for keystone to make a decision about scope. Nova actually has a specification in-flight that we reviewed during the session. This topic specifically is going to be something we continue to discuss as a larger group.

Regardless of the scope check discussion, there were two things we could do immediately based on the path blazed by nova. The first was moving policy in code, which we have a specification for and an implementation in flight. The second was adding support to the oslo.policy library that allows in code descriptions to be a part of the oslo.policy objects. This is the exact same pattern we use for help text when registering default configuration options. The general consensus was that the description should contain as much information about the API call as possible. For example, policy readers should be able to easily associate identity:get_user to GET /v3/users/{user_id} from the provided description.

I think coming to agreement on moving policy into code and providing clear/consistent documentation was a huge step forward. It helps projects be more consistent with each other, and it forces developers through an exercise that requires them to understand warts in the existing policy definitions. We can use all that information later to improve policy on a common library that is already in place as a prerequisite.

OUTCOME: Keystone is going to pursue moving policy into code and provide well written descriptions of each policy. As a group, we’re going to continue meeting and discussing the next steps to improve more granular policy enforcement. Project teams are going to continue sifting through ideas that lead to better policy management and support. I expect this to be another big topic at the next PTG.

Testing

The testing etherpad grouped all testing topics together, instead of breaking them out individually.

Functional testing

There has been on-going work over the last couple releases to get true functional testing into keystone. As of Ocata, the framework for functional testing is done. The biggest thing that we use it for today is testing federation. This is great because federated functional testing has been a keystone goal for a long time. Even though there are examples in code, we still need developer documentation for adding functional tests, especially if other developers want to hop in and start improving functional test coverage.

Another improvement that can be made is scripting the federation setup through Devstack to not rely on TestShib. Instead it would be ideal to have some sort of known identity provider. This is something that the OpenStack-Ansible folks also mentioned since they want to loop in functional testing for federation. Kristi (@knikolla) has a patch to get that work rolling. Not only should we do that for identity providers, but we also need the same type of testing for LDAP. The OpenStack-Ansible folks specifically asked for documentation that would help them build a role that integrates keystone with LDAP.

OUTCOME: We need to write developer documentation that explains how to use the existing Devstack/Tempest functional tests. This will help improve functional test coverage by enabling other developers. We also need to work with the OpenStack-Ansible community to setup a basic identity provider and LDAP role so they can start leveraging functional testing for both federated and LDAP deployments in OpenStack-Ansible.

Performance testing

Performance testing was another key testing topic. I personally have a vested interest in performance testing that resulted in a couple pet projects to hook performance testing into keystone’s review process.

Ever since the initial implementation of Fernet, there seems to be more focus on keystone’s performance. I doubt that the sole reason was due to Fernet adoption, but maybe the initial issues with Fernet from a performance perspective pushed us to think differently (I know it did for me!). Optimizing for performance is a hard problem. The fact that people are looking for feedback is a good sign that we’re becoming more aware of our performance footprint and we’re willing to improve it. Luckily, the OpenStack-Ansible community has integrated benchmarking support into the testing of the os_keystone role.

Outside of testing infrastructure and deployment, there was some dialog about maintaining performance tests. Various members of the keystone community that have used a couple different tools. We talked about what we liked and disliked about each, and ultimately came up with a list of things we need:

  • Must be easy to add test cases
  • Must be flexible enough to handle cases consisting of multiple steps
  • Must provide detailed test results and metrics in a parseable format
  • Must be something that is not deployment specific

Locust and apachebench are the primary tools used today. Locust is a python framework, so writing flexible test cases consisting of multiple steps is easy. But it lacks the detailed metrics we are used to seeing from apachebench. The last bullet is a given, but it ensuring reusability if other deployment projects want to provide their results using the same tests, ultimately a list of performance results filtered by deployment project would be an awesome tab in OpenStack Health.

OUTCOME: We need to transition the OSIC Performance Bot (and the dedicated hardware it runs on) to using OpenStack-Ansible’s benchmarking approach. We should also keep vocalizing the things we need in a performance testing platform.

Rolling upgrade testing

We’re in a really good place to move forward with rolling upgrade testing. OpenStack-Ansible has a dedicated testing environment for it (which is something I talked about in a previous post). OSA already gates on keystone’s rolling upgrade process. For the keystone community, this means we can piggyback on that approach once support lands in OSA to install keystone source from the deployment host, which there is a patch for. From there, we will need to add a job so that keystone starts gating on rolling upgrades. Once that’s done, we should be able to assert the rolling upgrade tag.

API Keys

We stayed true to our tradition of crazy-idea Friday by talking about API keys (the inception of shadow users started the same way during the Tokyo summit). We’ve been asked numerous times if keystone currently supports, or has a plan to support API keys. Our answers have typically leaned on trusts and OAuth, which keystone already has. With all the work that has been done with shadow users, we’re clearly moving towards decoupling the way in which a user authenticates from the actual user reference. I think that was the big difference in our ability to have a productive discussion around API keys now versus in the past.

The basic rules we came up with were that API keys:

  • Must be immutable
  • Must have quota applied on a per user basis (i.e. 5 API keys per user)
  • Must have a role on a project that the user creating the API key has
  • Must only be revealed to the user once, after creation time
  • Must include the API key ID and value at authentication time
  • Cannot be able to create new API keys from tokens authenticated by API key
  • Cannot be able to delete API keys from tokens authenticated by API key
  • Should support the ability to expire
  • Should have description fields

The biggest improvement the group saw with API keys was the usability of service users for the following reasons:

  • Instead of having a service user for each service, a single service user could be created and all service actions would tie back to that user (improved auditing)
  • Each service’s configuration file would contain an API key specific to that service from the service user, instead of a username and password
  • API keys could be rotated through service configurations without service interruption (this would be harder with password rotation)
  • If/when we get to the point of service roles (i.e. cinder_service, nova_service, neutron_service, each only containing the operations required by that service) API keys can be created and scoped to specific roles which limits the service to only the operations needed (concept of least privilege)

In addition, we have a better user-experience for folks building applications that consume or monitor services in OpenStack. Below is our whiteboard diagram and so far we have a specification proposed.

Deprecations & Removals

Full etherpad can be found here.

UUID token format

With the work that was done in Ocata to make UUID token validation and Fernet token validation utilize the same code path, we contemplated the deprecation and removal of the UUID format. The only strong case we had for keeping it around was as a fallback in the event a massive security flaw is exposed in Fernet. While unlikely, it was decided that the best path forward would be to modify the UUID format to write the same values to SQL as Fernet has in the payload, thus making the code paths even more alike. Especially since we validate all tokens the same way, regardless of persistence. If we need to extend the deprecation period in the future we can, but for now we are going to consider the UUID token format deprecated.

SQL token driver

If we deprecate the UUID token format, we should also deprecate the SQL token driver. This make sense if we actually end up removing the UUID format because it wouldn’t be used otherwise. It would also be subject to deprecation extensions depending on the state of the UUID token format.

v2.0 API

The last time we talked about the deprecation and removal of the v2.0 API was in Tokyo. The outcome was that we should deprecate everything except authenticate and validate, which we did. Morgan eluded to pursuing an exception from the TC that would allow us to completely remove all of the v2.0 API, but not without making the final rounds and processes. We need to come up with a testing strategy for removing the v2.0 API and work that into the gate. We need to run this by Defcore and RefStack again since there are v2.0 calls in Defcore.

Policy API

The policy API in keystone was designed for use with centralized policy storage, but it was never adopted. The consensus was to deprecate it, but never remove it. We would just not include a policy API if we develop a keystone v4 API.

Credential API

This is pretty much the exact same argument as the policy API. While it is useful and we use it for TOTP authentication secrets, there are better solutions for secret storage. This is going to be deprecated in the Pike release, but never removed. If we ever do a keystone v4 API, we’ll make a hard requirement on using proper secret storage that’s not homegrown in keystone.

python-keystoneclient

The truth is that python-keystoneclient is just a super thin wrapper. It’s another library that we have to maintain for minimal functionality. The general consensus was that we need to have a discussion with the maintainers of shade and python-openstackclient before deprecating python-keystoneclient.

Federation

Shadow users has been pushing better federated usability experiences across OpenStack, but we still have to overcome the account linking hurdle. We do have a patch that helps us link accounts between federated and local users. We’re expecting that to land that specification in Pike. There is also a considerable amount of work we’re going to carry over from Ocata that includes native SAML support and allowing versioned mappings. We did have a cross-project session specifically talk about remaining issues with ephemeral group memberships and federated users. With the current shadow user implementation, a user reference is created after successful federated authentication. Operators then have the ability to assign various roles assignments to the user or place the user into specific groups manually. The problem is that group placement isn’t done automatically. For example, if a federated user is placed into a privileged group based on values from their assertion, they won’t be able to use things like trusts because there is no concrete role assignment or concrete group membership at trust creation time.

As a community we hesitated placing users into groups on authentication automatically because if the values of their identity ever changes in their identity provider it up to the administrator to clean it things up manually. If the administrator has to do the clean up manually anyway, then we are inclined to let them do the setup, too. As opposed to exposing the possibility for group assignments to continue to grow over time as a byproduct of authentication. While the workflow might be undesirable, we found it better than violating the principal of least privilege from the source of truth.

OUTCOME: Continue pushing native SAML support, as well as enhancing mapping to include a version. Include better documentation for deployments that require federated users to use trusts. We need to clearly state the responsibility the operator has to place federated users into groups after they authenticate, or use auto-provisioning.

Summary

Overall, the perception of the PTG was well-received, and I’m excited to see how the foundation will incorporate attendee feedback for the upcoming one. Our keystone sessions felt exactly like a mid-cycle meetup. We were given the flexibility to adjust our schedules according to topic prioritization. On the other hand, cross-project discussions felt like smaller design sessions. We lacked key operator participation, which left gaps in some of our design sessions. At previous summits, this wasn’t an issue because operator attendance was higher.

Throughout the week, we made great progress in key areas like testing, RBAC/policy, and federated usability. I personally found this liberating because it was another example of a community working together to solve hard problems. As Major would say, “let’s crush the hard stuff”.

 

Photo Credit: tpsdave via pixabay

OpenStack Summit Boston Recap

OpenStack Summit Boston Recap

Using OpenStack Ansible to performance test rolling upgrades

Using OpenStack Ansible to performance test rolling upgrades