Addressing a common misconception regarding OpenStack Trove security

Since my first OpenStack Summit in Atlanta (mid 2014), I have been to a number of OpenStack-related events, meetups, and summits. And at every one of these events, as well as numerous customer and prospect meetings, I’ve been asked some variant of the question:

Isn’t Trove insecure because the guestagent has RabbitMQ credentials?

A bug was entered in 2015 with the ominous (and factually inaccurate) description that reads “Guestagent config leaks rabbit password”.

And while I’ve tried to explain to people that this is not at all the case, this misconception has persisted.

At the Summit in Barcelona, I was asked yet again about this and I realized that obviously, whatever we in the Trove team had been doing to communicate the reality was insufficient. So, in preparation for the upcoming Summit in Boston, I’m writing this post as a handy resource.

What is the problem?

Shown here is a simplified representation of a Trove system with a single guest database instance. The control plane components (Trove API, Trove Task Manager, and Trove Conductor) and the Guest Agent communicate via oslo.messaging which is typically implemented with some messaging transport like RabbitMQ.

To connect to the underlying transport, each of these four components needs to store credentials; for RabbitMQ this is a username and password.

The contention is that if a guest instance is somehow compromised (and there are many ways to do this) and a bad actor gains access to the RabbitMQ credentials, then the OpenStack deployment is compromised.

Why is this not really a problem?

Here are some reasons this is not really an issue on a properly configured production system.

Nothing requires that Trove use the same RabbitMQ servers as the rest of OpenStack. So at the very least, the compromise can be limited to the RabbitMQ servers used by Trove.
The guest instance is not intended to be a general-purpose instance that a user has access to; in the intended deployment, the only connectivity to the guest instance would be to the database ports for queries. These are configurable with each database (datastore) and enforced by Neutron. Shell access (port 22, ssh) is a no-no. No deployer would use images and configurations that allowed this kind of access.
On the guest instance, other database specific best practices are used to prevent shell escapes and other exploits that will give a user access to the RabbitMQ credentials.
Guest instances can be spawned by Trove using service credentials, or credentials for a shadow tenant to prevent an end user from directly accessing the underlying Nova instance. Similarly Cinder volumes can be provisioned with a different tenant to prevent an end user from directly accessing the underlying volume.

All of this notwithstanding, the urban legend was that Trove was a security risk. The reason invariably involved a system configured by devstack, with a single RabbitMQ, open access to port 22 on the guest, run in the same tenant as the requestor of the database.

Yet, one can safely say that no one in their right mind would operate OpenStack as configured by devstack in production. And certainly, with Trove, one would not use the development images whose elements are part of the source tree in a production deployment.

proposed security related improvements in Ocata

In the Ocata release, one additional set of changes has been made to further secure the system. All RPC calls on the oslo.messaging bus are completely encrypted. Furthermore, different conversations are encrypted using unique encryption keys.

The messaging traffic on oslo.messaging is solely for oslo_messaging.rpc, the OpenStack Remote Procedure Call mechanism. The API service makes calls into the Task Manager, the Task Manager makes calls into the Guest Agent, and the Guest Agent makes calls into the Conductor.

The picture above shows these different conversations, and the encryption keys used on each. When the API service makes an RPC call to the Task Manager, all parameters to the call are encrypted using K1 which is stored securely on the control plane.

Unique encryption keys are created for each guest instance, and these keys are used for all communication. When the Task Manager wishes to make a call to Guest Agent 1, it uses the instance specific key K2, and when it wants to make a call to Guest Agent 2, it uses the instance specific key K3. When the guest agents want to make calls to the Conductor, the traffic is encrypted using the instance specific keys and the conductor decrypts the parameters using those instance specific keys.

In a well configured production deployment, one that takes steps to secure the system, if a bad actor were to compromise a guest instance (say Guest Agent 1) and get access to K2 and the RabbitMQ Credentials, the user could access RabbitMQ but would not be able to do anything to impact either another guest instance (he wouldn’t have K3) or the Task Manager (he wouldn’t have K1).

Code that implements this capability is currently in upstream review.

This blog post resulted in a brief twitter exchange with Adam Young (@admiyoung)

@amrithkumar broker provides encryption.
What oslomsg needs is authentication and authorization: https://t.co/U110mZNtUR

— Adam Young (@admiyoung) January 9, 2017

@admiyoung agreed, been there, and couldn’t get that; hence this implementation is all that’s left … https://t.co/uOS6CTnrxl

— amrith (@amrithkumar) January 9, 2017

@amrithkumar You should be able to get there for Trove, though: Separate user, domain etc. No?

— Adam Young (@admiyoung) January 9, 2017

Unfortunately, a single user (in RabbitMQ) for Trove isn’t the answer. Should a guest get compromised, then those credentials are sufficient to post messages to RabbitMQ and cause some amount of damage.

One would need per guest instance credentials to avoid this; or one of the many other solutions (like shadow tenants, etc).

4 thoughts on “Addressing a common misconception regarding OpenStack Trove security”

Adam Young says:

January 9, 2017 at 12:55 pm

The basic Rabbit config leaves a lot to be desired. The LDAP based on is much more powerful, and provides a beter model. It gives you an LDAP query to run when a user (authenticated) attaches to a queue. Provides Read, write, and enumerate granularity.

https://www.rabbitmq.com/access-control.html

https://www.rabbitmq.com/ldap.html

But, lets assume we re stuck with the basics. Start by creatuing a new vhost for Trove, separate from the Openstack default one. Credenatils for all the Trove users should only be able to see this one, not default.

Each As you said, each guest gets a credential. The trick to using the ACLs is that they are regex based, so there needs to be some naming convention between the name of the exchanges and the names of the users.

LikeLike

1. amrith says:
  
  January 9, 2017 at 1:10 pm
  
  I did some simple experiments with multiple user names and ACL’s. They dramatically increase RabbitMQ’s CPU consumption. Especially when you have complex ACL’s. Also, given the way most projects (Trove included) use Rabbit, message queue credentials and ACL’s are not sufficient. Also there’s the minor issue of how a project (like Trove) manages credentials when it isn’t clear what the underlying transport is.
  
  So after much discussion with OSLO, we concluded that either the old project to implement encryption within oslo_messaging should be resurrected, or the consumer (in this case Trove) has to fend for itself.
  
  I pushed up a change to try and do the encryption in oslo but that didn’t pass muster in review; I proposed a second which would allow consumers to change the dispatcher, that didn’t fly either. So finally, the pluggable serializer is my only recourse.
  
  LikeLike
  
Mark Kirkwood says:

February 4, 2017 at 1:07 am

Being the originator of the bug in question – I beg to differ about your assessment. It seems many of you folk deploying trove are using private clouds – so security and billing are vastly less of a concern. For instance even using a separate rabbit one still leaves wide open the ability to cripple the trove service for everyone – hardly desirable. Now the shadow tenant business sounds promising. However the issue of how to bill trove instances in this tenant to several different ‘real’ tenants is also not immediately clear (if this has been sorted then great – but docs are required). Also unclear is how to actually setup this shadow business. You are absolutely right in that the trove team has not communicated well enough around these issues. I note there is yet another question on the openstack list about this. It is unacceptable to bemoan about people looking at devstack as a guide when you do not actually provide a valid alternative.

A clear, step by step discussion of how to setup this up would put an end to you being asked about trove security.

LikeLike

Pingback: Architectural nuances of OpenStack. Databases as a Service, Trove Implementation - TechBurst Magazine