Wednesday, October 14, 2015

Multi-Site Management in OpenStack

Managing multiple Openstack clouds as a single resource pool


In this series of posts, we would be diving into "Tricircle" 
- an open source project that promises to give a "single pane of glass" management over multiple OpenStack environments, using a cascading approach.

As more and more companies are deploying OpenStack, it is becoming clear that there is a need to be able to manage multiple cloud installations. The reasons range from application lifecycle management, through spill-over scenarios and all the way to multi-site Disaster Recovery orchestration.


So why would one care to deploy the same service over multiple environments?


There are multiple reasons, and here are a few:

  • Service continuity and geo-redundancy (in case of one site going awry)
  • Geo-based load balancing and service locality (in case traffic comes from various places in the globe, or if there are strict quality/latency requirements, or if there are regulatory constraints, etc)
  • Cost optimization (in case some resources are cheaper in another place)
  • Growth (in case one environment cannot grow enough)
  • Resource Utilization (in case you have multiple sites and want to aggregate their resources for better value or easier management)
  • Single Configuration (instead of continually synchronizing multiple sub-instances of the service)
  • ... (feel free to share additional incentives in the comments).

OpenStack Tricircle



Managing multiple OpenStack instances could be done in several ways, for example by introducing multi-site awareness into each project in OpenStack (which we ruled out due to complexity of evolving all OpenStack projects to do it).

The approach we took in Tricircle was to add a "Top" management OpenStack instance over multiple "Bottom" OpenStack instances.

The "Top" introduces a cascading service layer to delegate APIs downwards, and injects itself into several OpenStack components.


So, how does it feel to use such a "Top" OpenStack instance?


Well, first of all let's define the different users:
  1. The Multi-site Tenant Admin (the "User") - Uses the multi-site OpenStack cloud (create VMs, networks, etc.)
  2. The Multi-site Admin (the "Admin") - This user can add new sites, and needs to have the necessary credentials on them to put it together
User







For the "User", it is pretty straightforward: when you launch a VM, you get to choose from a list of Data Centers (a new drop box in Horizon), and then from a list of Availability Zones based on your Data Center selection, and that's basically it.

Admin


For the "Admin", you get a new "Add Site" API (in CLI only, at this point), and you need to have substantial knowledge about the "Bottom" sites you are adding (credentials and network-related information which we will cover in the next post).

Some High-Level Architecture



The "Top" Instance


The design principle we took was to reuse OpenStack components in the TOP and bottom layer as much as possible and to mange any OpenStack
deployment without any additional requirements (OpenStack API compatible).

For the top layer we used a non-modified OpenStack API layer to intercept operational requests and handle them in the cascading service.

Doing this required integrating with the different OpenStack core components:

Neutron 


  • We introduced a custom core plugin
  • Updates are written to the database 
  • Operational requests are forwarded to our OpenStack Adaptor service
  • Reads and Reports are served directly from the database

Nova


  • We implemented a custom Nova Scheduler that runs inside our OpenStack Adaptor service
  • We created a Compute Node emulation that runs in the Adaptor service and listens to the Nova Compute service queues
  • Our current working assumption is to map "Bottom" sites as "Compute Nodes" that reside on different logical AZs
  • The Compute Node emulation instance for each "Bottom" site also aggregates information and statistics that represent the site
  • At some point, we plan to let the admin decide how to expose the "Bottom" sites, e.g. different AZs on the "Bottom" site, or all the actual "Compute Nodes", etc. This will create a complete decoupling between the Adaptor and the "Cascading" service.

The "Bottom" Instances


We assume that the "Bottom" sites are unmodified and potentially heterogeneous (in terms of network, configuration and version).

At this stage, our design assumes a centralized Keystone service running on the "Top" (we are planing "Federated Keystone" in the future).

In order to add a "Bottom" site to the multi-site environment, the admin needs to deploy a "Cascaded" service, and register it in the "Top" site, using a special "add site" API.  Then, configure the "Bottom" site to use the "Top" Keystone.


The Full Picture


Here is how the entire system looks:


Is it really that simple?


From the user experience point of view - We hope it is.  
But in order to get it there, we needed to handle quite a few obstacles:
  • Resource Synchronization across the Multi-site
  • Cross-site network
  • Image synchronization
  • Metadata synchronization (e.g. flavors)
  • Resource Status monitoring and propagation (i.e. so that you can see what's happening from the "Top" dashboard) 

Coming next


In our coming posts, we will dive into resource synchronizationthe cross-site networking, explain how we tackled the status and notification updates and share our approach to meeting large scale deployments.

Please share your thoughts about this in the comments.
We will be talking about this project in the upcoming OpenStack Tokyo summit, so if you're coming there, be sure to attend our talk.

To join the development effort: 

7 comments:

  1. First of all, very well explained description of Tricircle especially the pictures are very informative. I think this can be a solution for the vendor lock-in problem also.

    So in the bottom layer openstack there will be no admin and complete administration will be done from the top layer openstack. If my understanding is correct, that will create a single point of failure. When my sites are distributed do we really want that kind of situation to arise? Please correct me if my understanding is wrong.

    However it will be nice if there is a workflow description of the 'full picture' in the coming post. Interested to know about the communication between the opensatck adapter and the top layer openstack.

    ReplyDelete
  2. Hi Mohammad thank you for your comment,
    This is actually a very good point that we consider in the design, this was one of the reasons we introduced the bottom cascading service.
    We are planing to allow local bottom OpenStack management, and actually got it as a requirement.
    In the first phase we will allow limited management in the local site: Local Tenant, Resource life cycle without creation. In a later phase we are planning to allow full local management by orchestrating the local IPAM of the shared networks to site segments and synchronizing the bottom site changes via the notifier.

    ReplyDelete
  3. Thank you very much for this very good explanation.
    Actually , I am working on resource allocation within a distributed cloud wich can be modelised with the tricircle architecture(top for a global orchestrator and bottom layer for different datacenters or POP).
    I already know how the nova scheduler make decision to place instances on the different compute nodes, but my question is how to make this decision from the bottom layer within this type of architecture(tricirle project)? I mean if we don't choose a datacenter from a list (from Horizon for example) how this decision is made ?Is nova scheduler on the top layer is still based on the same algorithm(sending requests to the compute node with most free resources but for datacenters ?

    Thank you

    ReplyDelete
  4. I am not up to date with the latest development in tricircle, but at the time the decision on top layer was done using the availability zone parameter entered by the user, each bottom DC was mapped to an availability zone.

    ReplyDelete
  5. Thank you for your answer .

    And what if the datacenter from the availabiltiy zone entered by the user has not enough resources to accept the request ?Which criteria is used to send the request to another datacenter ?

    Thank you again

    ReplyDelete
    Replies
    1. Hello, more than one bottom OpenStack instances could be mapped to same availability zone, if one OpenStack instance reach the limit of the resource utilization, then will switch to another OpenSatck instances. One availability zone could include several group of OpenStack instances and each group with multiple OpenSatck. You can refer to https://github.com/openstack/tricircle/blob/master/specs/dynamic-pod-binding.rst The feature implementation has not been finished yet.

      Delete
  6. Hi Farah, I have asked Joe Huang the PTL of Tricircle, to join this thread and answer, he is most up to date with the current status of the project.
    To my understanding the Launch VM API call on the TOP layer will fail and return an error to the user.

    ReplyDelete