Tuesday, May 5, 2015

DragonFlow SDN based Distributed Virtual Router for OpenStack Neutron

In my previous posts I presented a PoC implementation of an embedded L3 controller in Neutron that solves the network node bottleneck the SDN way, and overcomes some of the DVR limitations.
In this post I am going to cover the first release of DragonFlow - an SDN based Distributed Virtual Router L3 Service Plugin (aligned with OpenStack Kilo), now an official sub-project of Neutron.
The main purpose of DragonFlow is to simplify the management of the virtual router while improving performance, scale and eliminating single point of failure together with the network node bottleneck.
The DrangonFlow solution is based on separation of the routing control plane from the data plane. This is accomplished by implementing the routing logic in distributed forwarding rules on the virtual switches (called "flows" in OpenFlow terminology). To put this simply, the virtual router is implemented using OpenFlow flows only.
DragonFlow eliminates the use of software stack to act as a virtual router (the Linux network namespaces in the DVR and the legacy L3 architecture), it use purely OVS flow to act as a virtual router. A diagram showing DragonFlow components and overall architecture can be seen here:
DragonFlow High-level Architecture

DragonFlow Features for the Kilo Release:

  • East-West traffic is fully distributed using direct flows, reactively installed upon VM-to-VM first connection.
  • Support for all ML2 type drivers GRE/VXLAN/VLAN
  • Support for centralized shared public network (SNAT) based on the legacy L3 implementation
  • Support for centralized floating IP (DNAT) based on the legacy L3 implementation
  • Support for HA, in case the connection to the Controller is lost, fall back to the legacy L3 implementation until recovery. Reused all the legacy L3 HA. (Controller HA will be supported in the next release).

Key Advantages:

  • Performance improvements for inter-subnet network communication by removing the number of kernel layers (namespaces and their TCP stack overhead)
  • Scalability improvements for inter-subnet network communication by offloading L3 East-West routing from the Network Node to all Compute Nodes
  • Reliability improvements for inter-subnet network communication
  • Simplified virtual routing management, mange only active flows and not all possible 
  • Non Intrusive solution, does not rely on ML2 modification

How it works

  1. On bootstrap the L3 service plugin sends an RPC message to the L2 service plugin, setting the L3  Controller Agent as the controller of the integration bridge.
  2. The Controller queries the OVS  for its port configuration via Openflow, matches the actual ports configured on the OVS to the Neutron tenant networks data model. 
  3. Then, it installs the bootstrap flow pipeline that offloads all L2 traffic and local subnet L3 traffic into the NORMAL pipeline, while sending every unmatched VM-to-VM inter-sunbet traffic to the controller.
DragonFlow L3 Service Bootstrap

The following diagram shows the multi-table OpenFlow pipeline installed onto the OVS integration bridge (br-int) in order to represent the virtual router using flows:

bootstrap flows pipeline

The base table pipe-line is installed proactively on bootstrap while the East-West rules on the L3 Forwarding table are installed reactively on each first VM-to-VM communication.

If you would like to try it yourself, install guide is available here.
To join the development effort: 
My next post will cover the L3 reactive OpenFlow application, and how we install the East-West reactive flows. 


  1. but how was TTL be handled?

  2. This is handled by the L3 Forwarding flow (OpenFlow ) installed reactively on the first VM-to-VM communication.
    This flow will do the following (all standard OpenFlow actions)
    1) Change the source MAC to the virtual router port MAC
    2) Change the destination MAC to to the target VM MAC address
    3) Decrease TTL by one
    4) Change the segmentation ID to the destination Subnet segmentation ID
    You can find more details in this slide http://www.slideshare.net/gampel/dragonflow-sdn-based-distributed-virtual-router-for-openstack-neutron.

  3. Nice article, what's the latency of the first packet? Does the latency increase when the dragonflow controller is very busy?

  4. Nice article! What is latency of the first packet of the new flow? Will the latency increase while dragonflow controller is very busy?

  5. We are currently using this flow module(first packet path establishment ) with a distributed SDN architecture in Dragonflow, so the controller is handling ONLY traffic from the local hosted VMs/containers.
    The latency is only to the user space of local Dragonflow controller.
    From our tests until now the added latency to the local controller the is minimal considering that OVS in anyway use slow path between the kernel space and user space for first packet.
    The load on the controller is minimal as we are handling only locally hosted VMs and only first packet path establishment between two L3 connected VMs