Tuesday, December 15, 2015

Smaug - Application Data Protection for OpenStack


During the recent OpenStack Tokyo 2015 summit, we introduced a new project that we've been working on recently, which will provide an Application Data Protection as a Service (video) (slides).

What is Smaug?

Not to be confused with Application Security or DLP, Smaug deals with protecting the Data that comprises an OpenStack-deployed Application (what is referred to as "Project" in Keystone terminology) against loss/damage (e.g. backup, replication).
It does that by providing a standard framework of APIs and services that enables vendors to introduce various data protection services into a coherent and unified flow for the user.

We named it Smaug after the famous dragon from J.R.R. Tolkien’s “The Hobbit”, which was known to guard the treasures of the kingdom of Erebor, as well as have specific knowledge on every item in its hoard.  Unlike its namesake, our Smaug is designed to give a simple and user-friendly experience, and not burn a user to a crisp when they want to recover a protected item.

The main concept behind Smaug is to provide protection of an entire OpenStack project, across OpenStack sites (or with a single local site).

Lets take a typical 3-tier cloud app:



In order to fully protect such a deployment (e.g. Disaster Recovery), we would have to protect many resources, which have some dependency between them.

The following diagram shows how such a dependency tree might look:


In Smaug, we defined a plugin engine that loads a protection plugin for each resource type.
Then, we let the user create a Protection Plan, which consists of all the resources she wants to protect.
   
These resources can be divided into groups, each of which is handled by a different plugin in Smaug:
  • Volume - Typically, a block of data that is mapped/attached to the VM and used for reading/writing
  • VM - A deployed workload unit, usually comprised of some metadata (configuration, preferences) and connected resources (dependencies)
  • Virtual Network - The virtual network overlay where the VM runs
  • Project - A group of VMs and their shared resources (e.g. networks, volumes, images, etc.)
  • Image - A software distribution package that is used to launch a VM

Smaug Highlights 

Open Architecture

Vendors create plugins that implement Protection mechanisms for different OpenStack resources.

User perspective: Protect Application Deployment

Users configure and manage custom protection plans on the deployed resources (topology, VMs, volumes, images, …).
The user selects a "Protection Provider" from a selection of available Protection Providers, which is maintained and managed by the admin.

Admin perspective: Configure Protection Providers  

The Admin defines which Protection Providers are available to the users.  
A "Protection Provider" is basically a bundle of per-resource protection plugins and a bank, which are curated from the total available protection plugins and bank plugins.
In addition, the Admin configures a Bank Account for each user (tenant).




Smaug APIs

We are currently in the process of defining Smaug's set of User Service APIs:


Resource (Protectable) API

Enables the Smaug user to access information about which resource types are protectable (i.e. can be protected by Smaug).
In addition, enables the user to get  additional information on each resource type, such as a list of actual instances and their dependencies.

Plan API

Enables the Smaug user to create or edit Protection Plans using a selected Protection Provider, as well as access all the parameters of the plan.

Provider API 

Enables the Smaug user to list available providers and get parameters and result schema super-set for all plugins of a specific Provider.

Checkpoints API

Enables the Smaug user to access and manage Protection Checkpoints, as well as listing and querying of the existing Checkpoints in a Provider.In addition, provides Checkpoint Read Access to the Restore API, when recovering a protected application's data.
Calling the Checkpoint Create (POST) API will start a protection process that will create a Vault in the user's Bank Account, on the Bank that is assigned to the Provider.
The process will then pass the Vault on a call to the Protect action on each of the Protection Plugins assigned to the Provider, so each will write its metadata into the Vault. 
It is left up to the Plugin implementation to decide where to store the actual data (i.e. in the Vault or somewhere else).

Schedule Operation API

Enables the Smaug user to create a mapping between a Trigger and Operation(s) definitions 

Smaug Architecture


We defined three services for Smaug:

Smaug API service

These top-level north-bound APIs expose Application Data Protection services to the Smaug user.
The purpose of the services is to maximize flexibility and accommodate for (hopefully) any kind of protection for any type of resource, whether it is a basic OpenStack resource (such as a VM, Volume, Image, etc.) or some ancillary resource within an application system that is not managed in OpenStack (such as a hardware device, an external database, etc.).

Smaug Schedule Service

This subsystem is responsible for scheduling and orchestrating the execution of Protection Plans.
The implementation can be replaced by any other external solution.
All actual Protection-related activities are managed via the Operation northbound APIs, in order to support:
  • Record maintaining of all operations in the Smaug database (to drive Operation Status APIs)
  • Decoupling the implementation of the Scheduler from the implementation of the Protection Service

Smaug Protection Service

This subsystem is responsible for handling the following tasks:
  • Operation Execution
  • protectable(resources) plugin management 
  • Protection provider management
  • Protection Plugin management
  • Bank Plugin management
  • Bank checkpoints sub-service

Join Smaug

We are currently in the process of reviewing the API definition
  • Our IRC (we are always there): #openstack-smaug