Thursday, January 15, 2015

Common Misconceptions about SDN Controller Management and Scalabilty

Here are a few common misconceptions about SDN controller management and scalability.

#1: Either Proactive mode or Reactive mode

When an SDN controller wants to enforce a set of rules or a policy on a forwarding element, it uses the southbound API, OVSDB, OpFlex, OpenFlow or others. In OpenFlow this process is called flow installation and it can be done using different methods: proactive, reactive or mixed (aka hybrid approach).  




Proactive

During bootstrap, the controller installs all flows and pipelines (multi-table entries) into all the forwarding elements.  The flows must cover all possible scenarios.
Whenever there is a change in the network, the controller removes or installs flows where necessary.


Reactive

When a packet arrives in a switch, a look-up is performed on the flow tables.
If there is no match and the switch is connected to a controller, it will forward the packet (either with or without its payload) to the controller.
In OpenFlow, this is called a PACKET_IN message.

It is also possible to create rules that forward the packet to the controller when matched and we call that...


Mixed Proactive / Reactive

With the mixed approach, you can benefit from the best of both worlds, achieving a balance between dynamic management and performance.

For frequently-used and rarely-modified flows, you install a pipeline and flows proactively.

For unmatched flows, or for flows that you want to handle in-line in your SDN application, you install a flow to forward the packet to the controller.

This mixed approach allows the controller to focus on making real-time dynamic decisions only on the traffic that requires it (reactive), while leaving the heavy-lifting of the majority of the traffic to the real-time forwarding element (proactive).

In addition, with this approach you can avoid over-provisioning the forwarding element with flows that are rare, thus dramatically reducing the number of entries in the flow tables. 

#2: Lack of Scalability and High Availability


In SDN, the control plane (i.e. the SDN Controller) is separated from the data plane (i.e. the Forwarding Elements).

Due to the centralized nature of control in SDN, we now need to support high availability and redundancy of the Control Plane.

In OpenFlow, the Forwarding Element (the Switch) connects to the controller via TCP or via TLS for secure channels.

Up until OpenFlow v1.2, whenever the connection to the controller was lost, the switch would lose the ability to forward PACKET_IN messages to the controller, thus having to drop all unmatched traffic or handle it in the NORMAL switch (only in switches that implemented the dual nature). 

This non-deterministic approach would yield unpredictable network behavior 
while the controller was unavailable.

OpenFlow v1.2 introduced the capability of working with multiple controllers.
The Master and Slave modes provide a mechanism for Active-Passive high availability, whereas the Equal mode provides an Active-Active model.

OpenFlow multiple controllers


With the multi-controller capability, we gained the control plane high availability we needed, improved reliability, fast recovery from failure and controller load balancing.

#3: You cannot design SDN Applications for Really Big Scale


Utilizing the two mechanisms I covered - Mixed-Mode and Multi-Controller - is the key to designing really big scale SDN applications.

If you take care to design your application to be stateless (whenever possible) and share nothing (or little) with other controller instances you can benefit from the Reactive mode where any instance can handle any flow.

In my opinion, the best approach for achieving minimal controller response latency and maximal bandwidth while keeping the dynamic allocation of flows, is by using all of these mechanisms together.

When you add to that the OVS OpenFlow extensions and the dual-nature Hybrid OpenFlow capability (which I covered in my previous post), you can really gain a dramatic performance and management boost.


In my next post I will demonstrate how we utilized these design guidelines and capabilities in a prototype for an alternative to Neutron's L3 Distributed Virtual Router.

Tuesday, January 13, 2015

Hybrid OpenFlow Switch


In my last post I summarized the DVR solution and tried to explain the motivation for yet another L3 implementation in Neutron that I am going to present in the coming posts.

This 2-post series is intended to cover basic SDN and OpenFlow mechanisms that we used in the L3 controller:
  • Hybrid OpenFlow Switch 
  • SDN models for managing the forwarding elements (switches)

Hybrid OpenFlow Switch

Hybrid OpenFlow switch was introduced in OpenFlow/1.1. Hybrid switches support both OpenFlow operation pipeline and normal (legacy) Ethernet switching functionality.

The hybrid switch allows forwarding of packets from the OpenFlow
pipeline to the normal pipeline through the NORMAL and FLOOD reserved ports.

The main reason for introducing the hybrid switch was to optimize the handling of operations like MAC learning, where a reactive approach was just not efficient - Doing MAC learning in the OpenFlow controller poses significant cost in terms of network bandwidth and latency and does not scale for large networks.

The NORMAL action comes to the rescue and lets us offload legacy non-OpenFlow pipeline (like MAC learning mechanism, VLAN, ACL, QoS and other base features) to the forwarding element kernel module, which is optimized to handle such operations in near-line-rate.

But what happens when the NORMAL action is used in an OpenFlow flow?

Basically, what happens is that the traffic is redirected to a completely separated processing pipeline. This is illustrated in the diagram below.


OVS Hybrid OpenFlow Switch Pipelines
The OpenFlow pipeline and the Normal Pipeline, each act as a completely isolated switch.

There are, however, some issues with this hybrid approach.
  • NORMAL pipeline is not standardized, so it behaves differently on switches from different vendors - There is a variance in the supported features, and no standard way to configure them
  • NORMAL pipeline does not play well with some OpenFlow actions, for example if the port is tagged for the NORMAL pipeline (using the ovs-vsctrl), you can not tag it using OpenFlow actions and then forward it to the NORMAL path, because it will end-up dropped due to double tagging error.

The Open virtual Switch extensions to OpenFlow were developed to support these extra features using Flows, for example the LEARN action (Open vSwitch extension to OpenFlow) for MAC learning .

In my next post I will cover the SDN models for managing the forwarding elements in reactive, proactive and mix modes.